Program to compare data files that are Multithreaded?

rive22

Supreme [H]ardness
Joined
Mar 10, 2004
Messages
4,646
I'm trying to do a quick task for work (well not so quick) to compare a handful of large data files which ultimately outputs the difference into new different files. The program we use is UltraCompare Professional and I think it's a great program but I was wondering if anyone knew of anything like this that would take advantage of multithreads? :)

Right now when I load up a couple files it spends a good time doing the sorting on the HDD through temp files which is no prob I can just leave it or set up a little raid. After that it reverts to CPU using a single thread, and it takes a while. Which is still no prob really but I'd like to be more efficient and use more threads.
 
Last edited:
what kind of datafiles?
I would suggest writing your own in python (python now has true multithread support) if it isn't too complex or proprietory binary files
 
That's a great idea would that be difficult to do? They are excel files but I frequently convert them to .txt.

I just made a ram drive and put my temp folders on it and ran another go through the program. The sorting that required 12 hours on a single Samsung took under 2 minutes this morning :D I'm extremely excited.

So now I just have the single thread bottleneck while the cpu runs through everything to create the new files. I'm not sure how long this could take, the way it's looking right now it could take days maybe even a week for each set of 2 files. I'm not sure at this point it's pretty timely. I could probably run a couple instances at once this way, not a bad idea. I'm definitely interesting though in doing this mulithreaded. :D
 
Last edited:
*Update

I broke the files down to 80MB & 40MB a piece instead of 250MB and the entire task is done in about 20 seconds to compare a set of 2 files.

I bumped RamDrive to 1.5GB for temp files and the program utilizes 850MB of system memory to compare an 80&40MB set of data files and output the difference. So there's my answer, I was/am memory bottlenecked. :D AWESOME

The thing that me confused before was when I'd start the prog and it would get going it's system memory usage would start at 250MB and very very very slowly creap it's way up, but I still had plenty left over so I dismissed that. Furthermore to the large additional amount of required calculating to be done for 250MB vs 250MB. Obviously it would require more time but I wasn't expecting 20 seconds vs a week's worth of time. :)
 
Last edited:
Back
Top