[Gate-users] Some runs seem to hang in cluster mode - due to memory leaks?

Marc Chamberland mchamber at connect.carleton.ca
Sat Feb 11 04:35:42 CET 2012


Okay, I've looked into this a little bit more. I suspected a memory leak, so I used Valgrind ( http://valgrind.org/ )

It reported several leaks, but I don't know which errors may be relevant to my situation (if any!).

Here are a few that it reports:

1) This one seems to have to do with Root. I'm not sure if it's relevant to this user list or not.

==15553== HEAP SUMMARY:
==15553==     in use at exit: 500,865,360 bytes in 8,133,737 blocks
==15553==   total heap usage: 269,055,865 allocs, 260,922,128 frees, 10,972,102,382 bytes allocated
==15553== 
==15553== 18 bytes in 1 blocks are definitely lost in loss record 15,813 of 64,959
==15553==    at 0xB936: malloc_zone_malloc (vg_replace_malloc.c:267)
==15553==    by 0x55438C6: malloc_set_zone_name (in /usr/lib/system/libsystem_c.dylib)
==15553==    by 0x5543DF2: _malloc_initialize (in /usr/lib/system/libsystem_c.dylib)
==15553==    by 0x5544201: malloc_create_zone (in /usr/lib/system/libsystem_c.dylib)
==15553==    by 0x54B2A34: putenv (in /usr/lib/system/libsystem_c.dylib)
==15553==    by 0x1C33453: DylibAdded(mach_header const*, long) (in /Applications/root_v5.30/lib/libCore.so)
==15553==    by 0x7FFF5FC02F0E: dyld::registerAddCallback(void (*)(mach_header const*, long)) (in /usr/lib/dyld)
==15553==    by 0x5460EBF: _dyld_register_func_for_add_image (in /usr/lib/system/libdyld.dylib)
==15553==    by 0x1C303E7: TUnixSystem::Init() (in /Applications/root_v5.30/lib/libCore.so)
==15553==    by 0x1B275C4: TROOT::InitSystem() (in /Applications/root_v5.30/lib/libCore.so)
==15553==    by 0x1B28F92: TROOT::TROOT(char const*, char const*, void (**)()) (in /Applications/root_v5.30/lib/libCore.so)
==15553==    by 0x1B2A2DB: ROOT::GetROOT() (in /Applications/root_v5.30/lib/libCore.so)


2) This one seems related to the Fast I-124 source type that I use in my simulations. However, it looks like this is only "possibly" a memory leak. 

==15553== 24 bytes in 1 blocks are possibly lost in loss record 18,221 of 64,959
==15553==    at 0xB823: malloc (vg_replace_malloc.c:266)
==15553==    by 0x530A68D: operator new(unsigned long) (in /usr/lib/libstdc++.6.0.9.dylib)
==15553==    by 0x104B75: GateFastI124::InitializeFastI124() (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)
==15553==    by 0x27732D: GateVSource::GeneratePrimariesForFastI124Source(G4Event*) (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)
==15553==    by 0x277801: GateVSource::GeneratePrimaries(G4Event*) (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)
==15553==    by 0x1F1D8A: GateSourceMgr::PrepareNextEvent(G4Event*) (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)
==15553==    by 0x1B5108: GatePrimaryGeneratorAction::GenerateSimulationPrimaries(G4Event*) (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)
==15553==    by 0x1B51E1: GatePrimaryGeneratorAction::GeneratePrimaries(G4Event*) (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)
==15553==    by 0x312C72D: G4RunManager::GenerateEvent(int) (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4run.dylib)
==15553==    by 0x312BF66: G4RunManager::DoEventLoop(int, char const*, int) (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4run.dylib)
==15553==    by 0x312B5BE: G4RunManager::BeamOn(int, char const*, int) (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4run.dylib)
==15553==    by 0x96D6B: GateApplicationMgr::StartDAQCluster(CLHEP::Hep3Vector) (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)


3) Those next two show up very often.

==15553== 26 bytes in 1 blocks are possibly lost in loss record 18,322 of 64,959
==15553==    at 0xB823: malloc (vg_replace_malloc.c:266)
==15553==    by 0x530A68D: operator new(unsigned long) (in /usr/lib/libstdc++.6.0.9.dylib)
==15553==    by 0x52F7809: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/libstdc++.6.0.9.dylib)
==15553==    by 0x52F8A3B: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib/libstdc++.6.0.9.dylib)
==15553==    by 0x52F8B2C: std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/libstdc++.6.0.9.dylib)
==15553==    by 0x4EC869B: G4UIcontrolMessenger::G4UIcontrolMessenger() (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4intercoms.dylib)
==15553==    by 0x4EC9A47: G4UImanager::CreateMessenger() (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4intercoms.dylib)
==15553==    by 0x4ECC2FC: G4UImanager::GetUIpointer() (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4intercoms.dylib)
==15553==    by 0x4EB5A13: G4UIcommand::G4UIcommandCommonConstructorCode(char const*) (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4intercoms.dylib)
==15553==    by 0x4EB7844: G4UIcommand::G4UIcommand(char const*, G4UImessenger*) (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4intercoms.dylib)
==15553==    by 0x4EC97C2: G4UIdirectory::G4UIdirectory(char const*) (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4intercoms.dylib)
==15553==    by 0x8C5A6: GateActorManagerMessenger::BuildCommands(G4String) (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)

==15553== 26 bytes in 1 blocks are possibly lost in loss record 18,325 of 64,959
==15553==    at 0xB823: malloc (vg_replace_malloc.c:266)
==15553==    by 0x530A68D: operator new(unsigned long) (in /usr/lib/libstdc++.6.0.9.dylib)
==15553==    by 0x52F7809: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/libstdc++.6.0.9.dylib)
==15553==    by 0x667734: char* std::string::_S_construct<char*>(char*, char*, std::allocator<char> const&, std::forward_iterator_tag) (in /Applications/root_v5.30/lib/libHist.so)
==15553==    by 0x6677CC: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<char*>(char*, char*, std::allocator<char> const&) (in /Applications/root_v5.30/lib/libHist.so)
==15553==    by 0x4ED18A4: G4UIparameter::SetDefaultValue(double) (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4intercoms.dylib)
==15553==    by 0x4EC7E53: G4UIcontrolMessenger::G4UIcontrolMessenger() (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4intercoms.dylib)
==15553==    by 0x4EC9A47: G4UImanager::CreateMessenger() (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4intercoms.dylib)
==15553==    by 0x4ECC2FC: G4UImanager::GetUIpointer() (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4intercoms.dylib)
==15553==    by 0x4EB5A13: G4UIcommand::G4UIcommandCommonConstructorCode(char const*) (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4intercoms.dylib)
==15553==    by 0x4EB7844: G4UIcommand::G4UIcommand(char const*, G4UImessenger*) (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4intercoms.dylib)
==15553==    by 0x4EC97C2: G4UIdirectory::G4UIdirectory(char const*) (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4intercoms.dylib)


4) Later on, there are some "definite" memory leaks with the Fast I-124 source type:

==15553== 449,186,440 (3,278,808 direct, 445,907,632 indirect) bytes in 409,851 blocks are definitely lost in loss record 64,959 of 64,959
==15553==    at 0xB823: malloc (vg_replace_malloc.c:266)
==15553==    by 0x530A68D: operator new(unsigned long) (in /usr/lib/libstdc++.6.0.9.dylib)
==15553==    by 0x104B68: GateFastI124::InitializeFastI124() (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)
==15553==    by 0x27732D: GateVSource::GeneratePrimariesForFastI124Source(G4Event*) (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)
==15553==    by 0x277801: GateVSource::GeneratePrimaries(G4Event*) (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)
==15553==    by 0x1F1D8A: GateSourceMgr::PrepareNextEvent(G4Event*) (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)
==15553==    by 0x1B5108: GatePrimaryGeneratorAction::GenerateSimulationPrimaries(G4Event*) (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)
==15553==    by 0x1B51E1: GatePrimaryGeneratorAction::GeneratePrimaries(G4Event*) (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)
==15553==    by 0x312C72D: G4RunManager::GenerateEvent(int) (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4run.dylib)
==15553==    by 0x312BF66: G4RunManager::DoEventLoop(int, char const*, int) (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4run.dylib)
==15553==    by 0x312B5BE: G4RunManager::BeamOn(int, char const*, int) (in /Applications/geant4.9.4.p01/lib/Darwin-g++/libG4run.dylib)
==15553==    by 0x96D6B: GateApplicationMgr::StartDAQCluster(CLHEP::Hep3Vector) (in /Applications/gate_v6.1/tmp/Darwin-g++/Gate/libGate.dylib)




Again, I'd appreciate any help with this situation. Thanks in advance!

Cheers!
Marc



__________________________


Marc Chamberland, MSc
PhD candidate
Department of Physics
Carleton University
Ottawa (ON)

Le 2012-02-10 à 1:37 PM, Marc Chamberland a écrit :

> I've noticed that the runs that hang were running on nodes that ran out of memory (I think). However, the nodes all have 16 GB of RAM. I'm not sure how I could check for a memory leak and what would be causing it in the simulation...
> 
> Could anyone offer pointers? Thanks!
> 
> Marc
> 
> 
> 
> __________________________
> 
> 
> Marc Chamberland, MSc
> PhD candidate
> Department of Physics
> Carleton University
> Ottawa (ON)
> 
> Le 2012-02-10 à 1:18 PM, Marc Chamberland a écrit :
> 
>> Hi Gate Users!
>> 
>> I'm running into some problems when running my simulations on a cluster.
>> 
>> Some of the runs seem to hang for a long time. Their Root output files will grow in size for a while, then stop growing and will not be closed properly by the end of the simulation. In addition, looking at the terminal output of one of these runs show that they hang at a weird place, then they simply terminate with the word "Killed". Here's an example where I've left out most of the output:
>> 
>> -----------
>> 
>> [G4] 
>> [G4] *************************************************************
>> [G4]  Geant4 version Name: geant4-09-04-patch-01    (18-February-2011)
>> [G4]                       Copyright : Geant4 Collaboration
>> [G4]                       Reference : NIM A 506 (2003), 250-303
>> [G4]                             WWW : http://cern.ch/geant4
>> [G4] *************************************************************
>> [G4] 
>> [Core-0] Initialization of geometry
>> [Core-0] Initialization of physics
>> [Core-0] Initialization of actors
>> [Core-0] 
>> [Core-0] **********************************************************************
>> [Core-0]  GATE version name: gate_v6.1                                         
>> [Core-0]                     Copyright : OpenGATE Collaboration                
>> [Core-0]                     Reference : Phys. Med. Biol. 49 (2004) 4543-4561  
>> [Core-0]                     Reference : Phys. Med. Biol. 56 (2011) 881-901    
>> [Core-0]                     WWW : http://www.opengatecollaboration.org/       
>> [Core-0] **********************************************************************
>> [Core-0] 
>> [Core-0] Starting macro ./.Gate/bgsrc3/bgsrc33.mac
>> [G4] 
>> [G4] GATE object:        'systems/cylindricalPET'
>> [G4] Components:    
>> [G4] 
>> [G4] GATE object:        'systems/cylindricalPET/base'
>> [G4] Attached to volume: cylindricalPET
>> [G4] Nb of children:       1
>> 
>> 
>> (Skipping a lot of uninteresting stuff...)
>> 
>>   Voxelisation: top memory users:
>> [G4]     Percent     Memory      Heads    Nodes   Pointers    Total CPU    Volume
>>   -------   --------     ------   ------   --------   ----------    ----------
>> [G4]       60.43         27k        59      351        924         0.01    cylindricalPET_log
>> [G4]       27.41         12k        37      146        406         0.00    module_log
>> [G4]        4.62          2k         6       27         56         0.00    world_log
>> [G4]        3.68          1k         1       29         34         0.00    Colair_log
>> [G4]        3.07          1k         5       16         40         0.00    rsector_log
>> [G4]        0.80       Killed
>> 
>> --------------------
>> 
>> All the runs that hang seem to hang at that same place in their terminal output, just before the "Killed" word. 
>> 
>> By contrast, here is a run that ran and terminated properly:
>> 
>> -----------
>> 
>>   Voxelisation: top memory users:
>> [G4]     Percent     Memory      Heads    Nodes   Pointers    Total CPU    Volume
>>   -------   --------     ------   ------   --------   ----------    ----------
>> [G4]       60.43         27k        59      351        924         0.01    cylindricalPET_log
>> [G4]       27.41         12k        37      146        406         0.00    module_log
>> [G4]        4.62          2k         6       27         56         0.00    world_log
>> [G4]        3.68          1k         1       29         34         0.00    Colair_log
>> [G4]        3.07          1k         5       16         40         0.00    rsector_log
>> [G4]        0.80          0k         1        5          8         0.00    Advcol_log
>> [G4] Start Run processing.
>> [G4] Run terminated.
>> [G4] Run Summary
>> [G4]   Run Aborted after 27051062 events processed.
>> [G4]   User=4744.99s Real=6005.04s Sys=333.28s
>> [Core-0] End of macro ./.Gate/bgsrc3/bgsrc31.mac
>> [G4] UserDetectorConstruction deleted.
>> [G4] UserPhysicsList deleted.
>> [G4] UserRunAction deleted.
>> [G4] UserPrimaryGenerator deleted.
>> [G4] G4 kernel has come to Quit state.
>> [G4] G4SDManager deleted.
>> [G4] EventManager deleted.
>> UImanager deleted.
>> Units table cleared.
>> StateManager deleted.
>> RunManagerKernel is deleted.
>> RunManager is deleting.
>> 
>> ----------------
>> 
>> 
>> Does anyone have any idea what might be causing this? Thank you for your time!
>> 
>> I'm using Gate 6.1 with Root 5.30.p02 and Geant4.9.4.p01.
>> 
>> Marc
>> 
>> 
>> 
>> __________________________
>> 
>> 
>> Marc Chamberland, MSc
>> PhD candidate
>> Department of Physics
>> Carleton University
>> Ottawa (ON)
>> 
>> _______________________________________________
>> Gate-users mailing list
>> Gate-users at lists.opengatecollaboration.org
>> http://lists.opengatecollaboration.org/mailman/listinfo/gate-users
> 
> _______________________________________________
> Gate-users mailing list
> Gate-users at lists.opengatecollaboration.org
> http://lists.opengatecollaboration.org/mailman/listinfo/gate-users



More information about the Gate-users mailing list