[Gate-users] Some runs seem to hang in cluster mode

Marc Chamberland mchamber at connect.carleton.ca
Fri Feb 10 19:37:53 CET 2012


I've noticed that the runs that hang were running on nodes that ran out of memory (I think). However, the nodes all have 16 GB of RAM. I'm not sure how I could check for a memory leak and what would be causing it in the simulation...

Could anyone offer pointers? Thanks!
 
Marc



__________________________


Marc Chamberland, MSc
PhD candidate
Department of Physics
Carleton University
Ottawa (ON)

Le 2012-02-10 à 1:18 PM, Marc Chamberland a écrit :

> Hi Gate Users!
> 
> I'm running into some problems when running my simulations on a cluster.
> 
> Some of the runs seem to hang for a long time. Their Root output files will grow in size for a while, then stop growing and will not be closed properly by the end of the simulation. In addition, looking at the terminal output of one of these runs show that they hang at a weird place, then they simply terminate with the word "Killed". Here's an example where I've left out most of the output:
> 
> -----------
> 
> [G4] 
> [G4] *************************************************************
> [G4]  Geant4 version Name: geant4-09-04-patch-01    (18-February-2011)
> [G4]                       Copyright : Geant4 Collaboration
> [G4]                       Reference : NIM A 506 (2003), 250-303
> [G4]                             WWW : http://cern.ch/geant4
> [G4] *************************************************************
> [G4] 
> [Core-0] Initialization of geometry
> [Core-0] Initialization of physics
> [Core-0] Initialization of actors
> [Core-0] 
> [Core-0] **********************************************************************
> [Core-0]  GATE version name: gate_v6.1                                         
> [Core-0]                     Copyright : OpenGATE Collaboration                
> [Core-0]                     Reference : Phys. Med. Biol. 49 (2004) 4543-4561  
> [Core-0]                     Reference : Phys. Med. Biol. 56 (2011) 881-901    
> [Core-0]                     WWW : http://www.opengatecollaboration.org/       
> [Core-0] **********************************************************************
> [Core-0] 
> [Core-0] Starting macro ./.Gate/bgsrc3/bgsrc33.mac
> [G4] 
> [G4] GATE object:        'systems/cylindricalPET'
> [G4] Components:    
> [G4] 
> [G4] GATE object:        'systems/cylindricalPET/base'
> [G4] Attached to volume: cylindricalPET
> [G4] Nb of children:       1
> 
> 
> (Skipping a lot of uninteresting stuff...)
> 
>    Voxelisation: top memory users:
> [G4]     Percent     Memory      Heads    Nodes   Pointers    Total CPU    Volume
>    -------   --------     ------   ------   --------   ----------    ----------
> [G4]       60.43         27k        59      351        924         0.01    cylindricalPET_log
> [G4]       27.41         12k        37      146        406         0.00    module_log
> [G4]        4.62          2k         6       27         56         0.00    world_log
> [G4]        3.68          1k         1       29         34         0.00    Colair_log
> [G4]        3.07          1k         5       16         40         0.00    rsector_log
> [G4]        0.80       Killed
> 
> --------------------
> 
> All the runs that hang seem to hang at that same place in their terminal output, just before the "Killed" word. 
> 
> By contrast, here is a run that ran and terminated properly:
> 
> -----------
> 
>    Voxelisation: top memory users:
> [G4]     Percent     Memory      Heads    Nodes   Pointers    Total CPU    Volume
>    -------   --------     ------   ------   --------   ----------    ----------
> [G4]       60.43         27k        59      351        924         0.01    cylindricalPET_log
> [G4]       27.41         12k        37      146        406         0.00    module_log
> [G4]        4.62          2k         6       27         56         0.00    world_log
> [G4]        3.68          1k         1       29         34         0.00    Colair_log
> [G4]        3.07          1k         5       16         40         0.00    rsector_log
> [G4]        0.80          0k         1        5          8         0.00    Advcol_log
> [G4] Start Run processing.
> [G4] Run terminated.
> [G4] Run Summary
> [G4]   Run Aborted after 27051062 events processed.
> [G4]   User=4744.99s Real=6005.04s Sys=333.28s
> [Core-0] End of macro ./.Gate/bgsrc3/bgsrc31.mac
> [G4] UserDetectorConstruction deleted.
> [G4] UserPhysicsList deleted.
> [G4] UserRunAction deleted.
> [G4] UserPrimaryGenerator deleted.
> [G4] G4 kernel has come to Quit state.
> [G4] G4SDManager deleted.
> [G4] EventManager deleted.
> UImanager deleted.
> Units table cleared.
> StateManager deleted.
> RunManagerKernel is deleted.
> RunManager is deleting.
> 
> ----------------
> 
> 
> Does anyone have any idea what might be causing this? Thank you for your time!
> 
> I'm using Gate 6.1 with Root 5.30.p02 and Geant4.9.4.p01.
> 
> Marc
> 
> 
> 
> __________________________
> 
> 
> Marc Chamberland, MSc
> PhD candidate
> Department of Physics
> Carleton University
> Ottawa (ON)
> 
> _______________________________________________
> Gate-users mailing list
> Gate-users at lists.opengatecollaboration.org
> http://lists.opengatecollaboration.org/mailman/listinfo/gate-users



More information about the Gate-users mailing list