[Gate-users] GATE performances and ROOT output saving optimisation

Wed Jan 22 14:20:06 CET 2020

Hello Antoine,

Your findings mirror what I found in some benchmark work that I did in
2018. investigating whether SMT/hyper-threading increases performance and
also quantifying the impact of memory speed on Ryzen 2000-series CPUs and
the Spectre/Meltdown mitigations on older Ivy Bridge Intel CPUs.

You can read the results here at
https://1drv.ms/b/s!Al3r4ajL5Z4jkLs7LVyGSI_MmA-uzg, but the most relevant
part for your work is that I performed this study by disabling ROOT output
to truly see whether SMT/hyper-threading could allow for performance
increases beyond the number of physical CPU cores in the system. In short,
the simulation time stayed relatively constant (ie. no performance
improvement) when moving beyond the number of physical cores. I saw similar
performance when using a RAM disk as well, but did not use it for my
testing since I was also testing the performance difference of different
RAM speeds. Since the IO burden of a GATE simulation can be quite high (as
you've seen), that would have affected my results of testing the RAM (and
thus Infinity Fabric) speed on computation time.

One finding that I did not include in the report was a plot of how
simulation time changed when using a mechanical hard drive (2 TB Seagate
Barracuda). I've attached a .PNG file showing the actual simulation times
plotted alongside the ideal simulation time (ie. an integer division of the
single-thread simulation time divided by the number of threads used), and
in my case the simulation time started INCREASING as soon as I moved beyond
five threads. This was on a six-core/twelve-thread Ryzen 5 2600X with
DDR4-3200 memory. You may also note that the single-thread simulation time
was closer to sixty minutes, versus the 38 minutes seen when using a RAM
disk.

So, to summarize, you have two possible paths forward to decrease your
simulation times. I would highly recommend investing in a solid-state drive
for your workstation, such as a Samsung 860 EVO or even something faster.
That would dramatically decrease the simulation times, since not only is
the SSD much faster than a mechanical hard drive it can also read/write to
multiple sectors of the drive without needing to wait for the physical disk
platter to spin around again; this shows up especially well on massive
workloads. Anandtech's "Bench" review database shows this remarkably well;
I've set uup a pairing of a Samsung 860 EVO versus a Western Digital Black
7200 RPM HDD to show how gigantic the change can be (
https://www.anandtech.com/bench/product/2198?vs=2270).

The second, and more immediate, solution is to use a RAM disk. I followed
the directions here at
https://www.linuxbabe.com/command-line/create-ramdisk-linux; bear in mind
that this is a tiny bit dangerous, since if you lose power to your computer
or need to restart then you lose all of the data you have generated so far.
Also, the RAM disk will only use as much RAM as required at a given time; a
10 GB total RAM disk that only has 2 GB of files taking up space will only
use 2 GB of RAM.

I hope this helps you move forward! If you (or other readers) have any
questions or need some clarification, please let me know.

Cheers,

-Bryan McIntosh

On Wed, Jan 22, 2020 at 5:38 AM <
gate-users-request at lists.opengatecollaboration.org> wrote:

> Send Gate-users mailing list submissions to
>         gate-users at lists.opengatecollaboration.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.opengatecollaboration.org/mailman/listinfo/gate-users
> or, via email, send a message with subject or body 'help' to
>         gate-users-request at lists.opengatecollaboration.org
>
> You can reach the person managing the list at
>         gate-users-owner at lists.opengatecollaboration.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Gate-users digest..."
>
>
> Today's Topics:
>
>    1. GATE performances and ROOT output saving optimisation
>       (Antoine Merlet)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 21 Jan 2020 17:33:29 +0100
> From: Antoine Merlet <ant.merlet at gmail.com>
> To: gate-users at lists.opengatecollaboration.org
> Subject: [Gate-users] GATE performances and ROOT output saving
>         optimisation
> Message-ID:
>         <
> CADC2NA6vdHT0aOa5PnmzhzaiqT19H2W_Rfe2KQ4cMWTsshqQ1Q at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Dear GATE users,
>
> Once again, i am looking for your educated experiences regarding the
> optimization of simulations in GATE. I am also sharing my results in the
> hope that my findings could be useful to the current and future users.
>
> I recently installed GATE on a powerful workstation. Despite the expected
> increase from going from 4 physical cores (MacBook pro) to 36
> (workstation), I could only get worse global simulation times. Therefore I
> performed several tests, and while they allowed me to find the reason
> behind the slow simulations (spoiler, it is writing the ROOT output on the
> HDD), they also shown other "features" of GATE.
>
> In my attempt to keep my thought process clear, I will first present both
> machines specs, then the method for the simulations, followed by the
> results, and ending with my remarks/questions.
>
> Machines Specifications
> *MacBook* *Pro*: Intel i7 2,8GHz, *4 physical cores*, 8 logical cores,
> *SSD*
> *Workstation*: Intel Xeon Gold 3,00GHz, *36 physical cores* (2 sockets, 18
> physical each), 72 logical cores, *HHD*
>
> Simulation method
> *The same 4 tests were performed of both machines.* They consist of a PET
> camera, a digitizer and a source. In order to *split the simulation on
> several cores and run them* *simultaneously*, I simply generate the
> according number of main files by splitting the simulation time, and run
> one instance of GATE for each main file. In the case of *ROOT output, Hits,
> Singles and Coincidences are saved. * All obtained data sizes are from the
> .root files generated by the simulation.
>
> *The test where considering 4 and 8 cores, with and without ROOT output*
>
>
> I) MacBook Pro results
>
>    1. 4 cores, no output: 8m38s
>    2. 8 cores, no output: 8m30s
>    3. 4 cores, ROOT output: 21m37, 3690 MB --> 2,8MB/s
>    4. 8 cores, ROOT output: 14m10s, 3690 MB --> 4,3MB/s
>    5. 4 cores, writing time only (test 3. - test 1. ): 14m59s for 3690 MB
>    --> 4,7MB/s
>    6. 8 cores, writing time only (test 4. - test 2. ):  5m40s for 3690 MB
>    --> 10,8MB/s
>
> II) Workstation results
>
>    1. 4 cores, no output: 6m33s
>    2. 8 cores, no output: 3m20s
>    3. 4 cores, ROOT output: 47m35, 3690 MB --> 1,3MB/s
>    4. 8 cores, ROOT output: 28m58s, 3690 MB --> 2,1MB/s
>    5. 4 cores, writing time only (test 3. - test 1. ): 41m02s for 3690 MB
>    --> 1,5MB/s
>    6. 8 cores, writing time only (test 4. - test 2. ):  25m38s for 3690 MB
>    --> 2,4MB/s
>
> III) Extra results
>  My initial simulation does not have enough slices to be split in 72
> process, so I did a new one (being 4,5 times longer than the original 16
> slices one) without output to check Hyperthreading effect on the
> workstation:
>
>    1. 36 cores, no output: 4m50s
>    2. 72 cores, no output: 3m14s
>
> Remarks
>
>    - The workstation's cores are indeed better for simulation (compare*
>    I/1. *& *II/1. *)
>    - The bottleneck on the workstation is the HHD (compare* I/3. *& *II/3.
> *)
>    : there is now way around it I suppose, a SSD is required
>    - If there are any free physical cores upon starting a simulation, GATE
>    will be assigned to them in priority (general note, but can be seen on
> both
>    machines using *1. & 2.*)
>    - As stated in this section of GATE documentation
>    <
> https://opengate.readthedocs.io/en/latest/how_to_use_gate_on_a_cluster.html#id3
> >,
>    only physical cores are beneficial to the simulation of particle
>    interaction/transport when using standard CPU architecture  (compare*
>    I/1. *& *I/2. *).
>    - However, *Hyperthreaded cores can still be used in order to manage the
>    I/O *(compare* I/3. *& *I/4. *). I did not see this noted anywhere in
>    the documentation, and might still be useful information as it divided
> my
>    output saving time by about 2 compared to the original time on MacBook
> Pro
>    (compare* I/5. *& *I/6. *)
>    - *The gain in writing speed is not linear when using more physical
>    cores* (compare* II/5. *& *II/ 6. *). Maybe induced by how ROOT manages
>    core usage in order to compress/save data?
>    - Keeping in mind all the previous remarks, we can see in *III/1. &
>    III/2. *that *on the workstation, ALL the cores (physical and
>    hyperthreaded) are beneficial to the simulation of particle
>    interaction/transport *(linear speed increase all the way from 8 cores
>    to 72). It is due to the architecture of the processor (server-like) ?
>
> In the end, my main questions are:
>
>    - Did anyone give a try to optimise the output writing speed (by any
>    means other than getting a high bandwith MoBo/SSD)?
>    - Excluding HDD bandwith, what is the bottleneck in writting the data
>    (maybe such as number of processors assigned, ROOT working principle)?
>    - Is it possible to delay the output saving and keep the to-be-saved
>    data in the RAM for a while, and then write the output data in batch
>    (assuming you manage your RAM to not have overflow problems)?
>
>
> Any suggestion which might improve the global simulation time is welcome,
> and failed tries regarding output saving optimisation would be equally
> appreciated (as it would save me time not having to look these directions).
> If I did any error in my method/deductions, please let me know. I am still
> quite new to GATE, and could have done some obvious conceptual/practical
> mistakes.
>
> Best regards,
> Antoine
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.opengatecollaboration.org/pipermail/gate-users/attachments/20200121/3129ea09/attachment-0001.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Gate-users mailing list
> Gate-users at lists.opengatecollaboration.org
> http://lists.opengatecollaboration.org/mailman/listinfo/gate-users
>
> ------------------------------
>
> End of Gate-users Digest, Vol 164, Issue 25
> *******************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opengatecollaboration.org/pipermail/gate-users/attachments/20200122/6288c5ec/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HDD_testing.png
Type: image/png
Size: 22111 bytes
Desc: not available
URL: <http://lists.opengatecollaboration.org/pipermail/gate-users/attachments/20200122/6288c5ec/attachment-0001.png>