[Gate-users] Condor_hold and condor_release GATE simulation on a cluster

Mathieu Dupont mdupont at cppm.in2p3.fr
Thu Apr 9 09:02:29 CEST 2020


Without Condor, my first idea would be to send SIGTSTP signal to
yours GATE processes runned by condor. And SIGCONT signal to resume

And by looking at condor documentation, i found command condor_suspend
and condor_continue
which seem to do it.  Maybe can you try them ? 

On Wed, 8 Apr 2020 15:46:52 -0700
Zhengzhi Liu <zliu36 at stanford.edu> wrote:

> Dear Gate users,
> For some GATE simulation, the runtime could be as long as a couple of
> days even on a 56 cores cluster. However, I can't let my GATE
> simulation occupy all the cores on the cluster during working hours
> since other colleagues are also using the machine. Thus I tried to
> hold my GATE simulation during the working hours and later resume
> previous GATE simulation. The commands I found to achieve this goal
> are condor_hold
> <https://www.cl.cam.ac.uk/manuals/condor-V6_8_3-Manual/condor_hold.html>
> and condor_release
> <https://www.cl.cam.ac.uk/manuals/condor-V6_8_3-Manual/condor_release.html#man-condor-release>.
> Everything works fine that condor_hold can put my GATE jobs on hold
> and condor_relese can resume GATE simulation. Except that running
> condor_release would wipe existed data.
> I might have misunderstood the function of condor_hold. Honestly, I
> don't fully understand the description. It might have killed the GATE
> program. Are there any GATE experts who know how to pause GATE
> simulation and resume it at a later time? If this is possible.
> Thank you very much for any help.
> Sincere wishes,
> Zhengzhi

Mobilisé contre la réforme des retraites et la LPPR
Mathieu Dupont - Ingénieur de Recherche
UMR 7346 - Aix-Marseille Université - CNRS/IN2P3
163 avenue de Luminy, Case 902, F -13288 Marseille CEDEX 09
Tél.: +33 (0) 4 91 82 72 19
Site : cppm.in2p3.fr - Email : mdupont at cppm.in2p3.fr

More information about the Gate-users mailing list