[Gate-users] Adding G4MTRunManager Support to GATE

Marc Verderi verderi at in2p3.fr
Wed Apr 1 09:49:26 CEST 2015


Hi Alex,

The part you wrote in the GateActionInitialization class in fine to me. 
I suspect that the problem (please remember I don't know GATE code) may 
come from:

new GateUserActions( runManager, myRecords );

given the runManager is passed to the class, it may set the event action 
internally, using the non-MT methods ?

Cheers,
Marc

On 03/31/2015 08:55 PM, Alex Vergara Gil wrote:
> Dear Marc
>
>   I've managed to update a few things, however the same message appears
> when running Gate, I can't manage myself to remove it.
>          [G4-cerr]
> -------- EEEE ------- G4Exception-START -------- EEEE -------
> *** G4Exception : Run3011
>        issued by : G4MTRunManager::SetUserAction()
> For multi-threaded version, define G4UserEventAction in
> G4VUserActionInitialization.
> *** Fatal Exception *** core dump ***
> -------- EEEE -------- G4Exception-END --------- EEEE -------
>
>          [G4-cerr]
>          [G4-cerr] *** G4Exception: Aborting execution ***
> Abortado (`core' generado)
>
> I suspect something is not initialized properly
> Regards
>
> Alex
>
> 2015-03-31 12:09 GMT-04:00, Alex Vergara Gil <alexvergaragil at gmail.com>:
>> Dear Marc
>>
>> Thanks for your support, I will study these recommendations and let
>> you know as soon as I get something new.
>>
>> Regards
>> Alex
>>
>> 2015-03-31 11:22 GMT-04:00, Marc Verderi <verderi at in2p3.fr>:
>>> Dear Alex,
>>>
>>>       Thank for your work and message. I see you got the bulk of the MT.
>>> I put below several things to look at / consider. Please note I know
>>> mostly nothing on Gate code itself...
>>>
>>>       To summarize the issues, the "pCallbackMan" and "recorder"
>>> arguments in the action initialization need some design consideration as
>>> threads will very likely conflict on these objects. I would guess that
>>> the most significant issues will be here. Please see below for more
>>> details.
>>>
>>>       I'll be happy to help more if I can, or involve some of the G4
>>> experts on MT if needed !
>>>
>>> Cheers,
>>> Marc
>>>
>>>
>>> o The lines:
>>>       G4int nThreads = G4Threading::G4GetNumberOfCores();
>>>       runManager->SetNumberOfThreads(nThreads); // Is equal to 2 by
>>> default
>>>
>>>       are correct. Please note that you may use this number as a maximum
>>> number of threads, not the number by default.
>>>       For debugging purpose, I would suggest to start with 2 cores only,
>>> and when the case looks clean, augment the number of cores.
>>>
>>> o The line:
>>>       runManager->SetUserInitialization( GatePhysicsList::GetInstance() );
>>>
>>>       looks correct to me.
>>>       One question is that if Gate had implemented "home made physics
>>> processes" (G4VProcess) in this physics list ? If so, they should comply
>>> to the new G4VProcess interface, which has methods for the MT case.
>>>
>>> o There are several lines with potential problems (I gather together
>>> lines concerned):
>>>       // Set the Basic ROOT Output
>>>       GateRecorderBase* myRecords = 0;
>>>
>>>       --> ** ROOT is not thread safe ! ** For this reason, Geant4 is
>>> providing, in the "analysis" package, many, but not all, of the ROOT
>>> functionalities to create histograms and trees. The histograms are
>>> filled in each thread, and their content are merged at the end of the
>>> job. The trees are dumped individually by the threads (not merged) and
>>> should be analyzed using a chain.
>>>       In a first stage, I would recommend to switch off the recording, to
>>> get rest of the machinery right, and then include the output
>>> functionalities.
>>>
>>>
>>>       // Set the users actions to handle callback for actors - before the
>>> initialisation
>>>       GateUserActions* myActions = new GateUserActions( runManager,
>>> myRecords );
>>>       runManager->SetUserInitialization( new GateActionInitialization(
>>> myActions, myRecords ) );
>>>       and the constructor:
>>>       GateActionInitialization(GateUserActions * cbm, GateRecorderBase *
>>> r);
>>>       with the lines in the GateActionInitialization class with arguments
>>> "pCallbackMan, recorder", specially in the Build() method.
>>>
>>>       --> very likely this will not work. 'pCallbackMan' and 'recorder'
>>> are the same objects, share among the threads the way they are created
>>> and passed to the action initialization. What will happen is that they
>>> will be messaged at whatever times by the threads during the event loop
>>> -thread 1 is calling method a() and while a() is processed, method b()
>>> is called by thread 2, and thread 3 re-calls a(), still processed by
>>> thread 1; if inside these methods data members are changed, this will
>>> result in a unpredictable behavior-. The recorder, I understand is the
>>> ROOT based class, should be redesigned using "analysis" to avoid these
>>> conflicts and one instance of it (a priori) should be made per thread to
>>> make the recording independent among these threads.
>>>       For pCallbackMan, I admit my ignorance. It looks to be a
>>> configuration class (correct ?) being read only at that time. Is this
>>> correct ? If so, this should not be too problematic. But certainly, some
>>> iteration is needed here.
>>>
>>> o At first sight, the rest looks fine to me. One comment is that the
>>> "action initialisation" mechanism works also for the usual G4RunManager,
>>> so that some #ifdef ... #endif could be removed. In the G4RunManager
>>> case, the BuildForMaster() is ignored.
>>>
>>>
>>>
>>>
>>> On 03/31/2015 02:57 PM, Alex Vergara Gil wrote:
>>>> Dear All
>>>>
>>>> I have managed to create a patch that makes gate using G4MTRunManager,
>>>> it compiles fine and run, but I am facing that it doesn't run in
>>>> several threads, I need somebody who guide me in the right direction.
>>>>
>>>> Dear Marc
>>>>
>>>> Thank a lot for your suggestions they help me a lot in creating this
>>>> patch. Can you or some G4 member take a look to this and see what is
>>>> happening here.
>>>>
>>>> Regards
>>>> Alex
>>>>
>>>> PS: Dear Marc, sorry for double mail you, I missed to check the
>>>> respond to all tick.
>>>>
>>>> 2015-03-26 9:31 GMT-04:00, Marc Verderi <verderi at in2p3.fr>:
>>>>> Dear All,
>>>>>
>>>>>        The interest of G4MTRunManager is that the geometry and the
>>>>> cross-section tables are shared among the threads. For big applications
>>>>> -and simulation of phantom irradiation is one example- this represents
>>>>> a
>>>>> large memory. For machines with many cores, spawning N jobs of such
>>>>> application may exhaust the memory, preventing to use all the available
>>>>> cores. By sharing geometry and cross-section tables, the G4MTRunManager
>>>>> saves a large fraction of memory, allowing to use much more cores. Some
>>>>> tests have been done by Geant4, on Xeon Phi, see for example
>>>>> https://twiki.cern.ch/twiki/bin/view/Geant4/MultiThreadingTaskForce#CPU_and_Memory_Performances
>>>>> and one single application of high energy physics type (simplified CMS
>>>>> simulation) could run smoothly with 240 threads, the maximum available
>>>>> (the machine has 60 user cores, up to 4 thread/core). Without MT, just
>>>>> spawning jobs, only ~30 jobs could have been run in parallel, leaving
>>>>> 30
>>>>> cores unoccupied, because of lack of memory !
>>>>>
>>>>>        Moving to multi-threading has some constraints. Each thread
>>>>> processes a bunch of events. Events are hence generated and processed
>>>>> in
>>>>> parallel, independently. This means that primary generator action,
>>>>> event
>>>>> action, stepping action have to have independent instances in each
>>>>> thread. This is the very purpose of the new class
>>>>> G4VUserActionInitialization : the method "Build()" is called for each
>>>>> thread, to instantiate in each of these the above actions. For the run
>>>>> action it is a bit more complicated : a run action may be for the
>>>>> entire
>>>>> application, or may be for each thread. For an "all application"
>>>>> action,
>>>>> BuildForMaster() has to be used.
>>>>>        This independence of threads has a similar impact on sensitive
>>>>> detectors : for these, the G4VUserDetectorConstruction class has a new
>>>>> method : ConstructSDandField(). Again, this method is called for each
>>>>> thread, so that sensitive detectors and fields live independent lives
>>>>> in
>>>>> the various threads.
>>>>>        This looks quite work, but is not that heavy in practice.
>>>>>
>>>>>        In practice also, what has to be taken care of in your code are
>>>>> "static" variables : at each occurence of a static variable, you have
>>>>> to
>>>>> think if this variable has to be common to the entire application -a
>>>>> "true" static-, or if it is common the thread only : "thread local"
>>>>> static. In most of the cases, static variables are static to the
>>>>> thread.
>>>>> For the case of a true "static", be aware that this means that each
>>>>> thread may access the variable, at any time. If this variable is read &
>>>>> write during the processing, it will have a quite unpredictable
>>>>> behavior, and this is a source of debugging headaches ;) . Any random
>>>>> crash -which are often non-reproducible between two processing- signs
>>>>> this sort of conflict.
>>>>>
>>>>>        Most of the G4 examples (basic, extended) are provided in MT
>>>>> mode,
>>>>> and are good starting points.
>>>>>
>>>>>        Hope this helps.
>>>>>
>>>>> Cheers,
>>>>> Marc (a G4 member)
>>>>>
>>>>>
>>>>> On 03/26/2015 01:02 PM, Alex Vergara Gil wrote:
>>>>>> Dear All
>>>>>>
>>>>>> I started this thread to unify all those enthusiast people who want to
>>>>>> add G4MTRunManager support into GATE. The advantages of a Multi
>>>>>> Threading Run Manager are obvious but I will explain it here anyway, I
>>>>>> will send you also my first patch and the problems I am facing.
>>>>>>
>>>>>> Advantages
>>>>>> 1. You will not depend on an external cluster software to run on a
>>>>>> single multi cpu PC.
>>>>>> 2. The time you need for simulation scales linearly with the number of
>>>>>> cpus
>>>>>> 3. You doesn't need to merge the outputs, since this is performed
>>>>>> automatically
>>>>>> 4. Any other you may add
>>>>>>
>>>>>> My first patch
>>>>>>
>>>>>> <start of the code>
>>>>>>
>>>>>> Autor: Alex Vergara Gil <alexvergaragil at gmail.com>  2015-03-25
>>>>>> 17:13:45
>>>>>> Committer: Alex Vergara Gil <alexvergaragil at gmail.com>  2015-03-25
>>>>>> 17:13:45
>>>>>> Padre: db6875e64d60ad1e0f2d100c496843632acb23c8 (Merge
>>>>>> https://github.com/OpenGATE/Gate)
>>>>>> Hija:  28c338cd3263108df3927db14c6975f4cdcc31b4 (Agregado el
>>>>>> UserActionInitialization)
>>>>>> Rama: partopc
>>>>>> Sigue-a:
>>>>>> Precede-a:
>>>>>>
>>>>>>        trying g4mtRunManager
>>>>>>
>>>>>> ------------------- source/general/include/GateRunManager.hh
>>>>>> -------------------
>>>>>> index c4164d9..b72327b 100644
>>>>>> @@ -28,12 +28,19 @@
>>>>>>     #define GateRunManager_h 1
>>>>>>
>>>>>>     #include "G4RunManager.hh"
>>>>>> +#ifdef G4MULTITHREADED
>>>>>> + #include "G4MTRunManager.hh"
>>>>>> +#endif
>>>>>>     #include "GateHounsfieldToMaterialsBuilder.hh"
>>>>>>
>>>>>>     class GateRunManagerMessenger;
>>>>>>     class GateDetectorConstruction;
>>>>>>
>>>>>> +#ifdef G4MULTITHREADED
>>>>>> +class GateRunManager : public G4MTRunManager
>>>>>> +#else
>>>>>>     class GateRunManager : public G4RunManager
>>>>>> +#endif
>>>>>>     {
>>>>>>     public:
>>>>>>       //! Constructor
>>>>>> @@ -60,8 +67,11 @@ public:
>>>>>>
>>>>>>       //! Return the instance of the run manager
>>>>>>       static GateRunManager* GetRunManager()
>>>>>> +  #ifdef G4MULTITHREADED
>>>>>> +  {	return
>>>>>> dynamic_cast<GateRunManager*>(G4MTRunManager::GetRunManager()); }
>>>>>> +  #else
>>>>>>       {	return
>>>>>> dynamic_cast<GateRunManager*>(G4RunManager::GetRunManager());
>>>>>> }
>>>>>> -
>>>>>> +  #endif
>>>>>>       bool GetGlobalOutputFlag() { return mGlobalOutputFlag; }
>>>>>>       void EnableGlobalOutput(bool b) { mGlobalOutputFlag = b; }
>>>>>>       void SetUserPhysicList(G4VUserPhysicsList * m) { mUserPhysicList
>>>>>> =
>>>>>> m;
>>>>>> }
>>>>>>
>>>>>> --------------------- source/general/src/GateRunManager.cc
>>>>>> ---------------------
>>>>>> index 2604e47..75b3fb5 100644
>>>>>> @@ -8,6 +8,9 @@
>>>>>>
>>>>>>
>>>>>>     #include "GateRunManager.hh"
>>>>>> +#ifdef G4MULTITHREADED
>>>>>> + #include "G4MTRunManager.hh"
>>>>>> +#endif
>>>>>>     #include "GateDetectorConstruction.hh"
>>>>>>     #include "GateRunManagerMessenger.hh"
>>>>>>     #include "GateHounsfieldToMaterialsBuilder.hh"
>>>>>> @@ -27,7 +30,11 @@
>>>>>>     #endif
>>>>>>
>>>>>>
>>>>>> //----------------------------------------------------------------------------------------
>>>>>> +#ifdef G4MULTITHREADED
>>>>>> +GateRunManager::GateRunManager():G4MTRunManager()
>>>>>> +#else
>>>>>>     GateRunManager::GateRunManager():G4RunManager()
>>>>>> +#endif
>>>>>>     {
>>>>>>       pMessenger = new GateRunManagerMessenger(this);
>>>>>>       mHounsfieldToMaterialsBuilder = new
>>>>>> GateHounsfieldToMaterialsBuilder();
>>>>>> @@ -112,7 +119,11 @@ void GateRunManager::InitializeAll()
>>>>>>
>>>>>> G4ProductionCutsTable::GetProductionCutsTable()->GetHighEdgeEnergy());
>>>>>>
>>>>>>         // Initialization
>>>>>> +#ifdef G4MULTITHREADED
>>>>>> +    G4MTRunManager::SetUserInitialization(mUserPhysicList);
>>>>>> +#else
>>>>>>         G4RunManager::SetUserInitialization(mUserPhysicList);
>>>>>> +#endif
>>>>>>
>>>>>>         //To take into account the user cuts (steplimiter and special
>>>>>> cuts)
>>>>>>     #if (G4VERSION_MAJOR > 9)
>>>>>> @@ -126,7 +137,11 @@ void GateRunManager::InitializeAll()
>>>>>>       } // End if (mUserPhysicListName != "")
>>>>>>
>>>>>>       // InitializePhysics
>>>>>> +#ifdef G4MULTITHREADED
>>>>>>       G4RunManager::InitializePhysics();
>>>>>> +#else
>>>>>> +  G4MTRunManager::InitializePhysics();
>>>>>> +#endif
>>>>>>
>>>>>>       // Take into account the em option set by the user (dedx bin etc)
>>>>>>       GatePhysicsList::GetInstance()->SetEmProcessOptions();
>>>>>> @@ -169,7 +184,11 @@ void GateRunManager::InitGeometryOnly()
>>>>>>       if (!geometryInitialized)
>>>>>>         {
>>>>>>           GateMessage("Core", 1, "Initialization of geometry" <<
>>>>>> G4endl);
>>>>>> +#ifdef G4MULTITHREADED
>>>>>> +      G4MTRunManager::InitializeGeometry();
>>>>>> +#else
>>>>>>           G4RunManager::InitializeGeometry();
>>>>>> +#endif
>>>>>>         }
>>>>>>       else
>>>>>>         {
>>>>>> @@ -189,7 +208,11 @@ void GateRunManager::InitGeometryOnly()
>>>>>>
>>>>>> //----------------------------------------------------------------------------------------
>>>>>>     void GateRunManager::InitPhysics()
>>>>>>     {
>>>>>> +  #ifdef G4MULTITHREADED
>>>>>> +  G4MTRunManager::InitializePhysics();
>>>>>> +#else
>>>>>>       G4RunManager::InitializePhysics();
>>>>>> +#endif
>>>>>>     }
>>>>>>
>>>>>> //----------------------------------------------------------------------------------------
>>>>>>
>>>>>> @@ -205,7 +228,11 @@ void GateRunManager::RunInitialization()
>>>>>>
>>>>>>       // GateMessage("Core", 0, "Initialization of the run " <<
>>>>>> G4endl);
>>>>>>       // Perform a regular initialisation
>>>>>> +  #ifdef G4MULTITHREADED
>>>>>> +  G4MTRunManager::RunInitialization();
>>>>>> +#else
>>>>>>       G4RunManager::RunInitialization();
>>>>>> +#endif
>>>>>>
>>>>>>       // Initialization of the atom deexcitation processes
>>>>>>       // must be done after all other initialization
>>>>>>
>>>>>> </end of the code>
>>>>>>
>>>>>> This patch compiles without any special warnings, however when I try
>>>>>> to run it it explodes with the following message
>>>>>>
>>>>>> <start of message>
>>>>>> [G4]
>>>>>> [G4] *************************************************************
>>>>>> [G4]  Geant4 version Name: geant4-10-01 [MT]   (5-December-2014)
>>>>>> [G4]   << in Multi-threaded mode >>
>>>>>> [G4]                       Copyright : Geant4 Collaboration
>>>>>> [G4]                       Reference : NIM A 506 (2003), 250-303
>>>>>> [G4]                             WWW : http://cern.ch/geant4
>>>>>> [G4] *************************************************************
>>>>>> [G4]
>>>>>>            [G4-cerr]
>>>>>> -------- EEEE ------- G4Exception-START -------- EEEE -------
>>>>>> *** G4Exception : Run3011
>>>>>>          issued by : G4MTRunManager::SetUserAction()
>>>>>> For multi-threaded version, define G4UserEventAction in
>>>>>> G4VUserActionInitialization.
>>>>>> *** Fatal Exception *** core dump ***
>>>>>> -------- EEEE -------- G4Exception-END --------- EEEE -------
>>>>>>
>>>>>>            [G4-cerr]
>>>>>>            [G4-cerr] *** G4Exception: Aborting execution ***
>>>>>> Abortado (`core' generado)
>>>>>> </end of message>
>>>>>>
>>>>>> So I wonder if some of you have ever face this situation and how to
>>>>>> help
>>>>>> me
>>>>>>
>>>>>> Best Regards
>>>>>> Alex
>>>>>> _______________________________________________
>>>>>> Gate-users mailing list
>>>>>> Gate-users at lists.opengatecollaboration.org
>>>>>> http://lists.opengatecollaboration.org/mailman/listinfo/gate-users
>>>>> _______________________________________________
>>>>> Gate-users mailing list
>>>>> Gate-users at lists.opengatecollaboration.org
>>>>> http://lists.opengatecollaboration.org/mailman/listinfo/gate-users
>>>>>
>>>



More information about the Gate-users mailing list