Find Marker String in Binary File (Again!)

Hi all.

There’s some clever stuff here, and I’ve been testing @Kem Tekinay’s very efficient routines on binary files, here :
https://forum.xojo.com/t/find-marker-string-in-binary-file/69960/10

I am trying to adapt them in order to extract an embedded png file buried in the midst of a text file. The png header and end markers contain unprintable characters (return and end of line) along with others (above ASCII 128) that seem to thow the routines completely ! Is it possible to do this at all I wonder ?
The png header in HEX is 89 50 4E 47 OD OA 1A OA (âPNG are the only visinle characters) and the end marker is 49 45 4E 44 AE 42 60 82 (which are actually all visible - IENDÆB`Ç). My idea was to find the header and start saving the code into another string until hitting the footer, but I can’t even recognise either the header or end marker with Kem’s routines. Is there maybe a way to search in pure HEX or something ? I’ve searched for a good while (a few days now actualy !) and not had any joy. I thought RegExMatch should work, but it doesn’t. I also looked at SED and AWK and things like that using shell scripts but have found no solution there either.
Could any kind soul give me any insights to point me in the right direction please ?
Thanks in advance.

You are mixing (in your mind) Text file and Binary file.
RegEx is for dealing with text (and not with Binary Values, IMHO).

Forget Text and concentrate on BinaryData (and use BinaryStream to load data). In what you say, $0D is 13 and $0A is 10 (not related to end of line or line feed…)

Then, search the binary bytes (as Hex): 50 4E 47
to get the beginning of your png file. do the same for the end of the png file.

en.wikipedia.org (or elsewhere) may have the png definition file (start and end bytes).

When you have these (the start and end positions of the png file), load these bytes and place them in a MemoryBlock.

Then, instructions exists to save the MemoryBlock contents to png file; look MemoryBlock oin the documentation…

Actually, I do not know (or remember) what are SED and AWK…

Read carefully Portable Network Graphics, you really need to do that to go ahead (unless someone can share code to you).

Well, you can read the whole file into memory with BinaryStream and read it into a string.
You build the search string with Chrb(&hXX) with the hex values.
Then you use IndexOfBytes to find the positions and copy it with MiddleBytes function and then pass to picture.FromData to get the picture decoded.

Thanks for your reply. I am loading the file as a binary file but how do you search in HEX ?
The png Header tag and end marker ARE the correct png start (89 50 4E 47 OD OA 1A OA - eight bytes) and end (49 45 4E 44 AE 42 60 82 - also eight bytes) HEX tags. If I open the file in BBEdit and copy the code inbetween and including the tags and place them in a new document, saving it with the png suffix rather than txt, it opens fine as png image, so I have all the information I need regarding the png. I just need to figure out how to extract it from the file !

Thanks for your reply.
Yes. I think that is what I need. So to code my search string for say the start tag, would I have to code each character separately Chrb(&h89) + Chrb(&h50) + Chrb(&h4E) … etc. or can it be done in one string Chrb(&h89504E…). Well. I’ll try both anyway and see what it gives.
I’ll be back, as someone once said…

Use DecodeHex to decode multiple hex characters at once:


Dim marker As String = DecodeHex("89504E47ODOA1AOA")
2 Likes

Thank you Andrew. Very handy and also allowed me to (re)discover EncodeHex, which may get me out of a new problem I’ve encountered in loading my file as binary. I’ll report back and post my final code as soon as I have something that holds water !

OK. With your help, I’ve put together a method that works, but the results are a total headache !!

I’ll post a link to my xojo file, plus a “trial” document and the resulting embedded wallpaper but I give more details about these below.

The documents I’m trying to parse and extract a png file from are “nicnt” files, which are basically xml files specifically tailored for Kontakt instrument libraries. Some, but not all of these files have “wallpaper.png” files embedded in them, that show up in the Kontakt app as banners illustrating the library. I actually want to display the banner in a library utility that I’m writing. When the banner is an external wallpaper.png, it’s simple to import it into a canvas. It’s when it’s embedded that things start getting complicated.

The first thing, obviously, is to extract the png file if it’s embedded and in my enclosed little test app, this now works.

The snag is that the png data that I extract is not at all in the correct encoding and if I copy and paste it into a file and save it as a file.png, it is not recognised as a png file. It’s clear that the encoding is all wrong and is corrupting the file as if I do this manually it works just fine.

There’s another snag in all this. Although Kontakt is both for Mac and Windows, the nicnt file is saved in Western (MacOS Roman) rather than UTF8 format. If I change the format to UTF8 or anything else, it doesn’t work anymore, which I guess is normal.

Anyway, for a start, I just need to extract the png from the nicnt file in the same format as it appears in the nicnt file. As you may see in my test program, that is not what’s happening !!

If anyone has any ideas as to how I can get there, I’d be most grateful.

My little app looks for the nicnt file on the desktop by default. The nicnt file is from a real library, and the resulting wallpaper.png that I extracted manually from the nicnt is indeed the one that appears in the Kontakt app. You can see in the nicnt file that the png header tag is at the end of line 29 and the end marker tag is in line 523, if you open the nicnt file in a text editor.
The link to the files is here :
png search.zip

A PNG file is binary data; it doesn’t have a text encoding. Converting it to use an encoding or displaying it in a TextArea will necessarily corrupt binary data.

Instead, write the raw bytes out to a new PNG file or load it as a Xojo Picture object:

' in extractPNG
result = fileData.MiddleBytes(fieldStart, fieldEnd)

Dim p As Picture = Picture.FromData(result) ' p will be Nil if not valid PNG data
'or
fileStream = BinaryStream.Create(SpecialFolder.Desktop.Child("wallpaper.png"))
fileStream.Write(result)
fileStream.Close()
1 Like

If the nicnt file is actual xml, you should load it into an XMLDocument and just extract the correct tag. No need for byte twiddling.

But given the way you’re doing it,

Ignore encodings. Nil encoding is treated as binary data.
Make sure both the leading marker and trailing marker are included in the string you extract.

This seems to work with your .nicnt file:

'Uses old API1 syntax

Public Sub GrabPNG()
  'Get the .nicnt input file
  Dim readFile As FolderItem = GetOpenFolderItem("")
  If readFile <> Nil And readFile.Exists Then
    Dim ReadStream As BinaryStream = BinaryStream.Open(readFile, False)
    dim rawInput As string = ReadStream.Read(ReadStream.Length,encodings.ASCII)
    readStream.Close
    'Header:   89 50 4E 47 OD OA 1A OA
    dim pngHeader As String = chrb(&h89)+chrb(&h50)+chrb(&h4E)+chrb(&h47)+chrb(&h0D)+chrb(&h0A)+chrb(&h1A)+chrb(&h0A)
    'Trailer:  49 45 4E 44 AE 42 60 82
    dim pngTrailer As String = chrb(&h49)+chrb(&h45)+chrb(&h4E)+chrb(&h44)+chrb(&hAE)+chrb(&h42)+chrb(&h60)+chrb(&h82)
    dim rawPNG() As String = parseTxt(rawInput,pngHeader,pngTrailer)
    if rawPNG.Ubound<0 then
      MsgBox "Unable to locate PNG image in the source file"
    else
      'found at least one embedded png
      MsgBox "Found "+str(rawPNG.Ubound+1)+" PNG images in the source file"
      'Save all found png files
      for i as Integer = 0 to rawPNG.Ubound
        Dim WriteFile as FolderItem = GetSaveFolderItem("","ImageFile"+right("000"+str(i+1),3)+".png")
        If writeFile <> Nil Then
          Dim writeStream As BinaryStream = BinaryStream.Create(writeFile, True)
          'writeStream.LittleEndian = True
          writeStream.Write(pngHeader+rawPNG(i)+pngTrailer)
          writeStream.Close
        End If
      next
    End If
  End If
End Sub

Public Function parseTxt(s As String, delimA As String, delimB As String) as string()
  'Returns a string array of every substring of s,
  'that falls between left delimiter text delimA and right delimiter text delimB.
  dim outList() As String = Split(s,delimA)
  dim hitCount As Integer = UBound(outList)
  if hitCount>-1 then outList.Remove(0)
  hitCount=hitCount-1
  for i as integer = 0 to hitCount
    dim ss() As String = split(outList(i),delimB)
    if UBound(ss)<0 then
      outList(i)=""
    else
      outList(i)=ss(0)
    end if
  next
  'return array of matching strings
  return outList
End Function

Edited to add: The .nicnt file doesn’t appear to be xml. It simply has a short chunk of data at the beginning that is xml. All the rest appears to be proprietary format.

1 Like

Thanks Tim. I thought of that, but as the png hasn’t got it"s own tag, I couldn’t see how to extract it with the usual XML tools.
I did include the start and end markers in the extracted string in the method I posted.

Thanks Andrew and Robert for your replies. I’ll try all those things later but I’m at work right now and can’t get to it.

Oh how embarrassing. I’d like to mark 2 answers as solutions, but it seems I can’t do that !! Both Andrew’s and Robert’s solutions work just fine so I consider them both correct solutions. Huge thanks to both of you :+1: :+1: :+1:
I’ve marked Robert’s answer as the solution, as it goes above and beyond what I was asking. As it happens, I supplied a nicnt file that has several png files embedded in it, not just the wallpaper, but all the interface graphics as well ! This is the exception rather than the rule, and the wallpaper is always the first in the nicnt file. It may however come in handy to be able to extract the others, you never know :smiley:
Sincere thanks to all who replied. I always appreciate your insights as a Xojo beginner.

For what it’s worth, I’m rarely able to deal with binary files and come out completely unscathed. So, don’t feel bad. There are a lot of pitfalls when using strings to handle binary data. The code that I posted is some that I wrote several years ago, to scan html documents for specific tags, and return the contents of those tags. So, it was simple enough to modify the header and trailer search strings to look for the image data.