How to get the MD5 of a FolderItem

if you are doing a MD5 on a folder/directory, although it hasnt changed, the MD5 could depending on the order you process the files in it. I was trying (a few years back) to do a MD5 sum of a directory to see if things changed. And I got changing MD5 sums although the files themselves and number of/which ones are there didnt change.

I know this was not the original question. But it was adjacent to it and wanted to share the pain point to make sure werent going down that route.

As an alternate method, consider using a shell:

Dim theShell As New Shell theShell.Mode = 0 theShell.Execute "md5sum """ + f.NativePath + """ | cut -d ' ' -f 1" theHash = theShell.ReadAll
On OS X use md5 instead of md5sum.

This method is consistent for every type of file, is very fast, and you’re not playing around with any file reading or memory blocks (unless that’s what you’re trying to do).

[quote=320490:@Norman Palardy]iterate each uint8, use toHex on each one & concatenate them together

[code]
result = Hash(data, xojo.Crypto.HashAlgorithms.MD5) ’ <— Convert to Hex???

dim strResult as text

For i As Integer = 0 To result.size()-1
strResult = strResult + if(result.UInt8Value(i)<10, “0” + result.UInt8Value(i).toHex, result.UInt8Value(i).toHex)
Next

return strResult
[/code][/quote]

Cant’t do:

.ToHex( StrResult.Length )

?

Thanks to @Norman Palardy here’s a working version using only the new framework:

[code]Public Function ToMD5(extends file as Xojo.IO.FolderItem) as Text
’ Returns the MD5 hash value for this file.
’ Returns “” if there’s an error.

using xojo.Core
using xojo.Crypto

dim data, hashMB as xojo.Core.MemoryBlock
dim bs as xojo.IO.BinaryStream
dim hash as Text

’ Sanity check
if file is Nil or file.IsFolder or not file.IsReadable then
return “”
end if

’ Open the file and read it into a MemoryBlock
bs = Xojo.IO.BinaryStream.Open(file, xojo.IO.BinaryStream.LockModes.Read)
data = bs.Read(bs.Length)
bs.Close()

hashMB = Hash(data, xojo.Crypto.HashAlgorithms.MD5)

’ Convert the result to a hex string
for i as Integer = 0 to hashMB.Size-1
hash = hash + if(hashMB.UInt8Value(i) < 10, “0” + hashMB.UInt8Value(i).ToHex, hashMB.UInt8Value(i).ToHex)
next i

return hash

exception err as RuntimeException
return “”
End Function
[/code]

Unsurprisingly, it’s about 50% slower than using the classic framework.

The code has an error, you have to prefix “0” if the value is less than 16 (&h10 ), not 10.

What happens if you use an array instead of concatenating?

dim joiner() as text
for i as integer = 0 to hashMB.Size - 1
  dim b as UInt8 = hashMB.UInt8Value( i )
  if b < &h10 then
    joiner.Append "0"
  end if
  joiner.Append b.ToHex
next
dim hash as text = join( joiner, "")

Oh, we don’t need the If statement at all.

dim joiner() as text
for i as integer = 0 to hashMB.Size - 1
  dim b as UInt8 = hashMB.UInt8Value( I )
  joinder.Append b.ToHex( 2 )
next
dim hash as text = join( joiner, "")

I like the use of an array @Kem Tekinay. Here’s the revised method. It gets the MD5 of a 5GB movie file in 900 ms. If I use the classic framework example you provided for Kaju, it takes 15000 ms so I’m pretty pleased :slight_smile:

[code]Public Function ToMD5(extends file as Xojo.IO.FolderItem) as Text
’ Returns the MD5 hash value for this file.
’ Returns ‘’" if an error occurs.

using xojo.Core
using xojo.Crypto

dim data, hashMB as xojo.Core.MemoryBlock
dim bs as xojo.IO.BinaryStream
dim b as UInt8
dim hexValues() as Text

’ Open the file and read it into a MemoryBlock
bs = Xojo.IO.BinaryStream.Open(file, xojo.IO.BinaryStream.LockModes.Read)
data = bs.Read(bs.Length)
bs.Close()

hashMB = Hash(data, xojo.Crypto.HashAlgorithms.MD5)

’ Convert the result to a hex string
for i as Integer = 0 to hashMB.Size-1
b = hashMB.UInt8Value(i)
hexValues.Append(b.ToHex(2))
next i

return Text.Join(hexValues, “”)

exception err as RuntimeException
return “”
End Function
[/code]