[Gate-users] job splitting takes too much time
Martin Sower
melkatib1 at gmail.com
Sun Oct 25 01:41:27 CEST 2020
Hi Ashok, thank you for your help,
I can finally split successfully my simulation, after realizing that the
issue comes from a very long comment line in my .mac file, and after
deleting it the job splitting was completed in a few seconds,
Now I have another issue: I submitted my .submit file using condor_submit
mysim.submit command and that seems fine and the jobs are successfully
submitted, but when I look at my CPU monitoring, the percentage CPU usage
does not increase compared to normal PC working, After 2h of jobs running
(my whole simulation takes 1 min without splitting), I am getting juste two
.out files and two .root output files (the two .root files are fine), and
from stat.txt files (actors) I knew that the time between this two
completed jobs is about 1h (1h between the Gate acquisitions, and 20 s
duration each acquisition ), then I forced the remaining job to stop using
condor_rm job_ID.
I will greatly appreciate your suggestions.
Martin
Le sam. 24 oct. 2020 à 04:49, Ashok Tiwari <tiwarias at yahoo.com> a écrit :
> Hi Martin,
>
> I think linking multiple macros in one main file should yield the same
> thing! But let me know if it works, and if not try to demystify between how
> the personal HT condor and real condor cluster works.
>
> Best,
> Ashok
>
> On Friday, 23 October 2020, 06:23:47 pm GMT-4, Martin Sower <
> melkatib1 at gmail.com> wrote:
>
>
> THank you Ashok for your time and for sharing the script,
>
> Effectively, I'm getting the same output as you when typing 'gjs' alone,
> so I think the gjs installation was successful,
>
> I mention here that I'm running this job splitting command on my PC (6
> cores intel i7 CPU) and not a (real) cluster, and that my .mac file is
> separated to many .mac files (geometry, output, source, etc) and all linked
> in the main.mac that I'm referring to in the command,
>
> I will try to regroup all .mac files in one and retype the command.
>
> Martin
>
>
>
> Le ven. 23 oct. 2020 à 22:03, Ashok Tiwari <tiwarias at yahoo.com> a écrit :
>
> Hi Martin,
>
> I am wondering whether your gjs is working fine. What do you see when you
> run gjs command? Do you see the following usage message:
> +-------------------------------------------+
> | gjs -- The GATE cluster job macro spliter |
> +-------------------------------------------+
>
> Usage: gjs [-options] your_file.mac
>
> Options (in any order):
> -a value alias : use any alias
> -numberofsplits, -n n : the number of job splits; default=1
> -clusterplatform, -c name : the cluster platform, name is one of the
> following:
> openmosix - condor - openPBS - SGE - xgrid
> This executable is compiled with SGE as
> default
>
> -openPBSscript, -os script : template for an openPBS script
> see the example that comes with the source
> code (script/openPBS.script)
> overrules the environment variable below
>
> -SGEscript, -ss script : template for an SGE script
> see the example that comes with the source
> code (script/SGE.script)
> overrules the environment variable below
>
> -condorscript, -cs script : template for a condor submit file
> see the example that comes with the source
> code (script/condor.script)
> -v : verbosity 0 1 2 3 - 1 default
>
> .............................etc?
>
> I think if you see this message then it should run in principle, if not
> then there might be something wrong with the installation or job
> submission! I am not familiar with the HT condor cluster, so I cannot give
> you the specific information but in the SGE cluster, I normally submit the
> job using the following script: (this is the copy-paste of the script)
>
> #!/bin/bash
> #
> # Queue to submit job
> #$ -q CCOM,UI
>
> # batch job stderr and stdout
> #$ -o GC_WORKDIR/GC_LOG
> #$ -e GC_WORKDIR/GC_ERR
>
> # Job name
> #$ -N GC_JOBNAME
> # Use current working directory
> #$ -cwd
>
> # print date and time
> date
>
> # I want SGE cluster to send me an email
> # when the job begins and when it ends
> #$ -M ashok-tiwari at uiowa.edu
> #$ -m be
>
> # -l h_vmem=20G
> # Set simulation time
> # -l h_rt=24:00:00
> #$ -pe smp 8
> ## -l mf=20G
>
> # executable
> GC_GATE
>
> To use the jobsplitter functionality I submit the job using the following
> command in the command line:
> $ gjs -numberofsplits 10 -clusterplatform SGE ../somedir/script/main.mac
>
> After I hit enter in the command line, it will generate the main.submit
> file in the same directory within ~seconds. Also at the same time, 10 split
> macros will be created in the GC_DOT_GATE directory. Then you are supposed
> to run the main.submit executable file using: ./main.submit to perform a
> simulation.
>
> Hope this is a little bit helpful.
>
> Best,
> Ashok
>
>
>
>
>
>
> On Friday, 23 October 2020, 02:46:04 pm GMT-4, Martin Sower <
> melkatib1 at gmail.com> wrote:
>
>
> Thank you Ashok for your response,
>
> effectively, I had a .split file with one of the expected .mac files
> (mysim1.mac but no mysim2.mac !) in the DOT_GATE directory but no .submit
> file in the current directory (where gjs is launched),
>
> I will appreciate if you could share with me your scripts so that I can
> compare with mines,
>
> thank you for your help.
> Martin
>
> Le ven. 23 oct. 2020 à 19:17, Ashok Tiwari <tiwarias at yahoo.com> a écrit :
>
> Hi Martin,
>
> I don't have experience with the HT condor platform but after running the
> gjs command you should have a .split file in the current directory where
> you ran the gjs command and split macros in your .GC_DOT_GATE directory
> based on your installation. It should not take such a long time, I think it
> is a matter of ~secs (based on my experience). I don't have an idea about
> why it is taking too much time, but I suggest you to check the condor
> script based on available scripts in GATE. I am happy to send you my SGE
> cluster setting files if you want to compare the scripts to find out the
> culprit.
>
> Thanks,
> Ashok Tiwari
>
>
>
> On Friday, 23 October 2020, 06:41:19 am GMT-4, Martin Sower <
> melkatib1 at gmail.com> wrote:
>
>
> Hi,
>
> I installed the gjs (job splitter) and gjm (job merger) as in the Gate
> User's guide and installed a Personal HTcondor as in
> https://htcondor.readthedocs.io/en/stable/cloud-computing/using-annex-first-time.html#install-a-personal-htcondor
> with the tests passed successfully, and now I'm trying to split my
> simulation to 2 jobs, with the command gjs -numberofsplits 2
> -clusterplatform condor -condorscript
> /home/..../Gate-8.2/cluster_tools/jobsplitter/script/condor.script
> mysim.mac, I'm getting the .split and mysim1.mac in the .GATE directory but
> nothing else, and the problem is that this command take too much long (1h
> before I force them to stop), so my question is: is it normal that this gjs
> command take such a long time? and what can be the origin of this problem?
>
> thank you in advance for your help.
> Martin
> _______________________________________________
> Gate-users mailing list
> Gate-users at lists.opengatecollaboration.org
> http://lists.opengatecollaboration.org/mailman/listinfo/gate-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opengatecollaboration.org/pipermail/gate-users/attachments/20201025/d76bffcb/attachment.html>
More information about the Gate-users
mailing list