Find Marker String in Binary File

I would like to open a binary file and look for a marker string that is embedded within. I am very new to Xojo so any guidance is appreciated. So far I am able to get the open dialog to behave properly (only opens specific binary), and I can get the size, etc. Now I need to look inside.

Thanks for the asist.

Xojo strings can be treated as a bunch of bytes. There are bytewise versions of things like String.IndexOfBytes. Depending on the size of the file, you can read the contents into a string variable and search for the marker.

First thing you have to do is open the file as binary see

Accessing binary files — Xojo documentation

As you can see you need to know the encoding of the file to read it properly. If your file is short you may read it completely in a MemoryBlock. Say the string begins in B. Then look for the first B. Next starting from that position read x characters, where x is the length of the encoded string. If it matches, you fount it, if note search for the next B and compare again, until you find the string or not.

If the file is large, load it in the MemoryBlock by chunk of say 64 bytes, until you find the string.

I don’t remember the name of the algorithm, but the fastest way is to look at the last character and work backwards.

Lets say you are looking for a ten-character string that ends with “X”. Look at the 10th character. If it’s “X”, work backwards.

But let’s say it’s “A” and “A” doesn’t appear in your string. You can skip right to the 20th character and check again.

Or let’s say it’s “Y” and "Y appears in the 5th position of your string. You can then skip ahead 5 characters to line them up and check again.

However, in this case, if the binary is long, I’d load it in chunks and use InStrB. If the string is not found, lop off all but the last x-1 characters, tack it onto the next chunk, and try again.

4 Likes

That is crazy cool clever!

It’s the Boyer-Moore algorithm.

It was a lot of hair pulling (again, I’m new to Xojo and come from microcontrollers where memory is treated like a simple array), but I managed to get it to work. In the compiled program there is a text marker “VersionIsHere” that precedes the version string (several characters, followed by a NULL).

This is the function code that finds and returns the version string for the selected file. I verified the result returned by the function by opening the binary file in a hex editor and manually locating the marker and version.

Var fileStream As BinaryStream
Var fileData As MemoryBlock 
Var fdSize As Integer
Var marker As MemoryBlock
Var mSize As Integer
Var mbCheck As MemoryBlock
Var result As String = ""

If f <> Nil And f.Exists Then
  fileStream = BinaryStream.Open(f)
  fileData = fileStream.Read(f.Length)
  fileStream.Close
  fdSize = fileData.Size
  marker = "VersionIsHere"
  mSize = marker.Size
  
  For i As Integer = 0 To fdSize-mSize
    ' get chunk from file
    mbCheck = fileData.StringValue(i, mSize)
    If mbCheck = marker Then 
      ' move past marker
      Var j As Integer = i + 13
      ' scape the version out (uses NULL termination)
      Do
        Var c As Integer = fileData.Byte(j)
        If c = 0 Then
          Exit
        Else
          result = result + Chr(c)
          j = j + 1
        End If
      Loop
      Return result
    End If
  Next
End If

Return "v?.?.?"

Comments vis-a-vis simplification and improvement are appreciated.

Yes, that could be simplified as

Var fileStream as BinaryStream
Var fileData as String
Var marker as String
Var mStart as Integer
Var result as String

if f <> Nil and f.Exists then
   fileStream = BinaryStream.Open(f)
   fileData = fileStream.Read(f.Length)
   fileStream.Close   // not technically needed, it will be closed automatically at the end of the method
   marker = "VersionIsHere"
   mStart = fileData.IndexOfBytes(marker)
   if mStart >= 0 then
      Var j as Integer = mStart+marker.Bytes 
      Do
         Var c as String = fileData.MiddleBytes(j, 1)
         if c.AscByte = 0 then
            Exit
         else
            result = result + c
            j = j + 1
         End If
      Loop
      Return result
   End If
End If

Thanks. I’ll look it over and give it a try.

Or this:

Var fileStream as BinaryStream
Var fileData as String
Var marker as String
Var mStart as Integer
Var result as String

if f <> Nil and f.Exists then
   fileStream = BinaryStream.Open(f)
   fileData = fileStream.Read(f.Length)
   fileStream.Close   // not technically needed, it will be closed automatically at the end of the method

   marker = "VersionIsHere"
   mStart = fileData.IndexOfBytes(marker)
   if mStart >= 0 then
      Var fieldStart As Integer = mStart + marker.Bytes
      Var nullPos As Integer = fileData.IndexOfBytes(fieldStart, String.ChrByte(0))
      if nullPos >= 0 then
          result = fileData.MiddleBytes(fieldStart, nullPos - fieldStart)
          Return result
      end if
   End If
End If

And, of course, using a regular expression:

Var fileStream as BinaryStream
Var fileData as String
Var marker as String
Var result as String

if f <> Nil and f.Exists then
   fileStream = BinaryStream.Open(f)
   fileData = fileStream.Read(f.Length)
   fileStream.Close   // not technically needed, it will be closed automatically at the end of the method

    marker = "VersionIsHere"

    var rx as new RegEx
    rx.Options.CaseSensitive = true
    rx.SearchPattern = marker + "([^\x0]+)"

    var match as RegExMatch = rx.Search(fileData)
    if match isa RegExMatch then
        result = match.SubExpressionString(1)
        return result
    end if
end if

Thank you. That works, too, and is a little tidier.

I nod in your general direction. Or was that bow down? Nice.