Comparing files in java

Aeren

n00b
Joined
Aug 29, 2005
Messages
37
I would like to know if 2 of the file(File source and File dest) have the
same content or not. How can i do this the best way?

Thanks

Aeren
 
The usual hack is to first compare the size of the files. If they are not the same, you can be assured that they are not the same file. But if they have the same size, then you can do a bit-for-bit comparison. It's as ugly a hack as it can get. Note that, if the two files aren't on the same computer (You're writting a FTP software or somethin'), then you could very well simply make a checksum (MD5 or some other) and work with that.

Cheers!
 
Nemezer said:
The usual hack is to first compare the size of the files. If they are not the same, you can be assured that they are not the same file. But if they have the same size, then you can do a bit-for-bit comparison. It's as ugly a hack as it can get.

Why is comparing the content of the files ugly? Why is it a hack to test the file lengths first?
 
Because the files can be big. Very big. Doing a bit-for-bit comparison of a 4 gig file is not a fun party :p
 
Nemezer said:
Because the files can be big. Very big. Doing a bit-for-bit comparison of a 4 gig file is not a fun party :p
Is there a faster or more elegant approach?
 
Not to my knowledge. Which is why I call it a hack. Something not pretty that has to be done.
 
I see; thanks for the clarification. That's a curious definition of "hack".
 
I just don't think it's a correct use of the word, and therefore mitigates the value of your advice by making the correct answer look less than appealing. With that definition, you could argue that anything is a hack.
 
Being a non-native english speaker, I am entitled to certain perks :cool:

Knowing that forums such as these are viewable all around the world, colloquialism is bound to arise. That cannot be helped. Best hope is to try and read the meaning, not the words. IMHO of course.
 
Doh: I'm writing a program and i would like to know how i can do this elegantly. I'm pretty sure you're the Überone here, but i don't know all the classes in Java. If you don't know what to search, you can't search for it.

For others: Yes bit-to-bit comparison is what i knew so far, but making a MD5 coding on a byte[] could be really ugly(there's a thread about this). The reason i wrote this thread is to find a better solution.
 
If you're simply comparing two files, there's no better algorithm than to compare lengths, then compare bytes. If you're trying to figure out if one file has a duplicate among a set of other files, then you mgiht have some appealing alternatives.

Performance will come from spending care in the implementation. If you're not careful when reading two files at the same time, you can end up overwhelming your I/O work with seeking, pumping the disk head between each file. You'll want to minimize that, but because it's unavoidable, you'll want to do non-blocking I/O to try to get work done while waiting for read requests to be satisfied.
 
Back
Top