[gate-users] condor

Fernando Rannou rannou at diinf.usach.cl
Thu Apr 28 11:55:13 CEST 2005


Hi Sadek, Manu
yes we've been using Condor since we started running distributed jobs.
For our humble simulations, it does what we want and it is pretty
stable.

The main difficulty in running Condor jobs is the setup
of the nodes. The default setup is not appropriate to, for instance,
our policy (UCLA cluster), on running distributed jobs. We want
each job never to be killed, never to be preempted, and never to
be suspended. Once a job is assigned to a CPU, we let it run there
until the end. If you use the default settings, your jobs may
start running, but if for some reason, something else
starts running there, Condor may suspend your job and restart it later.

It happens also, that your jobs never get started; this is also
related to node configuration.

Tell me, did you install Condor with NFS and all users
having the same uid and gid on all the nodes?

Do all nodes see the same executable through NFS and the
same working directory?

Tell me a little bit more what your problem is, so I can
suggest you what to do.


Fernando


On Thu, 2005-04-28 at 08:12 +0200, Manuel Bardiès wrote:
> 
> 
> ______________________________________________________________________
> 
> Hi  Sadek,
> 
> We installed Condor, and after some upgrading (we were not satisfied
> with the first versions we tried) it seems to work fine.
> In early times, jobs used to get 'lost' after a certain time of
> computation, so we ran our jobs separately (with a shell script to
> split jobs/merge results).
> I'm actually not using Condor (nor Gate - am I really working? ;-), so
> I pass your question to Damien, who's far more experienced than me on
> that topic.
> He'll send you our set-up files.
> Mind, it's designed for our Mac cluster, so needs to be adapted to
> your configuration.
> I think Fernando (?) also uses Condor, so his input should also be
> beneficial.
> Good thing about Condor is a very dynamic mail list, so you may want
> to send Linux specific questions there...
> 
> Best regards,
> 
> Manu
> 
> Le 27 avr. 05, à 00:39, Nehmeh, Sadek/Medical Physics a écrit :
> 
>         
>         Hi Manuel, 
>         
>         I'm trying to run Gate on condor; this is my first experience
>         running on a cluster. Can you please describe to me the steps
>         I should follow to do that. I did compile Geant4, root, CLHEP,
>         and Gate as usual, but this obviously does not help; when I
>         submit a job to the cluster, it sits there for few minutes,
>         then it disappears, but it never runs. Thanks.
>         
>         
>                 Sadek 
>         
>           
>           
>         ---------------------------------------- 
>          Sadek A. Nehmeh, Ph.D. 
>          Assistant Attending Physicist 
>          Medical Physics Department 
>          Nuclear Medicine Service 
>          Memorial Sloan-Kettering Cancer Center 
>          Tel: (212)-639-2175 
>          Email: nehmehs at mskcc.org 
>         ------------------------------------------ 
>         
>         
>         
>         =====================================================================
>         
>         Please note that this e-mail and any files transmitted with it
>         may be 
>         privileged, confidential, and protected from disclosure under 
>         applicable law. If the reader of this message is not the
>         intended 
>         recipient, or an employee or agent responsible for delivering
>         this 
>         message to the intended recipient, you are hereby notified
>         that any 
>         reading, dissemination, distribution, copying, or other use of
>         this 
>         communication or any of its attachments is strictly
>         prohibited. If
>         you have received this communication in error, please notify
>         the 
>         sender immediately by replying to this message and deleting
>         this 
>         message, any attachments, and all copies and backups from
>         your 
>         computer.
>         _______________________________________________
>         gate-users mailing list
>         gate-users at lphe1pet1.epfl.ch
>         http://lphe1pet1.epfl.ch/mailman/listinfo/gate-users
>         
> Manuel Bardiès
> INSERM UMR 601
> 9 Quai Moncousu
> 44093 Nantes cedex
> -----------------------------
> Tel:   02 40 41 28 21
> Fax:  02 40 35 66 97
> Sec:  02 40 08 47 47
> 
> _______________________________________________
> gate-users mailing list
> gate-users at lphe1pet1.epfl.ch
> http://lphe1pet1.epfl.ch/mailman/listinfo/gate-users




More information about the Gate-users mailing list