[Gate-users] Merging of many root files with gjm

Hannes Hofmann opengate at f00f.de
Wed Jan 16 11:49:11 CET 2008


>> Is divide-and-conquer merging possible? Which means to me: merge 8 times
>> 600 root files into 8 intermediate files and merge those intermediate
>> files again into the final root file. Has someone done that before? What
>> could go wrong with that? I am only interested in Singles anyway.
>
> You could, if you don't care about the correct eventID for each single.
> You can also easily merge everything with a chain in a simple root
> script then.
> I wasn't aware of the limitation though. We don't have a 1000+ cpu
> cluster ;-)
> It could be solved by opening only a few files at a time in the merger.

As proposed I will try merging 8 times 600 results and then those 8 into
the final one. But in order to help finding a final solution that fits to
all merging problems I thought about strategies how merging could be
improved.

Strategy #1: Recursive Merging

Partition the set of output files into N equally sized subsets of max.
(e.g.) 900 files each. Merge each subset seperately into N intermediate
files. Do also store meta data like last_event_ID. It should then be
possible to merge these intermediate files again w/o loosing information,
but my knowledge of gjm is limited and I don't know how exactly to do it.


Strategy #2: Incremental Merging

Take a subset of the first (e.g.) 900 output files and merge them into an
intermediate file. Then take this and the next 900 output files and merge
those again. I have the feeling that with this method it is easier to get
things right, but it probably is less efficient.


But unless someone else get's as insane as I am noone will need it anyway.
Maybe it is just smarter to think about the job splitting strategy but
hey, I am running 9600 right now :)

Kind regards,
Hannes




More information about the Gate-users mailing list