MD5 check a file

What code do I need to check the MD5 hash of a specified file?

http://documentation.xojo.com/index.php/MD5

But that doesn’t help me check a specified file. I’ve already read that.

Are you trying to check the contents of the file? If so then have a look here: http://documentation.xojo.com/index.php/TextInputStream
Once you have the contents in a variable you can then use the MD5 functions that you have already checked out.

Use a BinaryStream to read your file in chunks. Feed each chunk to an MD5Digest and get the result at the end.

dim mdfive as new MD5Digest
dim bs as BinaryStream = BinaryStream.Open( f )
while not bs.EOF
  dim chunk as string = bs.Read( 1000000 )
  mdfive.Process chunk
wend

dim hash as string = EncodeHex( mdfive.Value )

The file is over 1GB.

That’s why you use md5digest.

Use @Kem Tekinay 's code - it will do the job on files much larger than 1GB. As a quick test, I just whipped up a test application that calculates the MD5 hash on a dump of /dev/urandom to my desktop that is 3.4GB in size. It completes in about 9 seconds on my MBP.

Yikes, 9 seconds is a bit slow IMHO… What’s the speed difference if you load the whole file into memory and calculate from there?

I can tell you that varying the chunk size doesn’t really make a difference. On a 5.4 GB file, I tried 1,000,000, 2,000,000, 20,000,000, and 1,000,000,000, and all gave me the same result in about 11 to 13 seconds. (The smaller numbers were actually better!) When I tried to load the whole file into memory, it either hung or was taking far longer than I was willing to wait.

One idea to improve speed is to check only the first and last x bytes of the file. For example, if your chunk size is 10 MB and the file is greater than 20 MB, you can calculate on the first 10 MB and the last 10 MB (or something like that).

I’ve never tried to hash such a large file, normally 30~50mb is the largest I’ve tried.

What kind of file is it? Just wondering if you could code sign it instead (executables and installer packages only), this way you can verify the code signature to check that it came from your computer.

Just out of curiosity… What platform do you need this on? If it’s OS X or Linux, you could probably just use the command line.

Just like within Xojo, the first run with md5 on the command line (Mac OS X) took over a minute on that 5.4 GB file. Subsequent runs took about 11 seconds.

[~]: time md5 ~/Desktop/xxx.sql
MD5 (/Users/ktekinay/Desktop/xxx.sql) = f5e6add0b486aaf2aad8372b94094283

real	1m10.613s
user	0m11.211s
sys	0m2.155s
[~]: time md5 ~/Desktop/xxx.sql
MD5 (/Users/ktekinay/Desktop/xxx.sql) = f5e6add0b486aaf2aad8372b94094283

real	0m11.355s
user	0m11.024s
sys	0m2.815s

[quote=178169:@Kem Tekinay]Use a BinaryStream to read your file in chunks. Feed each chunk to an MD5Digest and get the result at the end.

[code]
dim mdfive as new MD5Digest
dim bs as BinaryStream = BinaryStream.Open( f )
while not bs.EOF
dim chunk as string = bs.Read( 1000000 )
mdfive.Process chunk
wend

dim hash as string = EncodeHex( mdfive.Value )
[/code][/quote]

Thanks very much for this - works a treat. :slight_smile:

Revisited in 2020, worked a charm. Thanks @Kem Tekinay