Reading a binary stream

I’m trying to read subtitle files (EBU STL files - specs here:
The file is opened and read it into a binary stream and the information is copied to a string and outputted to a textarea.
But I’m obviously missing something because I can’t see all the information in the files in the output area.

This is what I have in a push buttons action event:

[code] //Specs for STL’s: https://tech.ebu.ch/docs/tech/tech3264.pdf

Dim f As FolderItem
Dim bs As BinaryStream
Dim s As String

//Get a FolderItem
f = GetOpenFolderItem(“stl”)

//make sure it exists before we try to read it
If f <> NIL And f.Exists Then

//open the folderitem as a binary file without write privelages
// To open with write privelages, use true instead of false
bs = BinaryStream.Open(f, False)

//make sure we have a binary stream to read from
If bs <> Nil Then
  
  //  Get the General Subtitle Information (GSI) block
  s = "Code Page Number: " + bs.Read(3, encodings.UTF8) + EndOfLine
  s = s + "Disk Format Code: " + bs.Read(8) + EndOfLine
  s = s + "Display Standard Code: " + bs.Read(1) + EndOfLine
  s = s + "Character Code Table NUmber: " + bs.Read(2) + EndOfLine
  s = s + "Language Code: " + bs.Read(2) + EndOfLine
  s = s + "Original Programme Title: " + bs.Read(32) + EndOfLine
  s = s + "Original Episode Title: " + bs.Read(32) + EndOfLine
  s = s + "Translated Programme Title: " + bs.Read(32) + EndOfLine
  s = s + "Translated Episode Title: " + bs.Read(32) + EndOfLine
  s = s + "Translator's Name: " + bs.Read(32) + EndOfLine
  s = s + "Translator's Contact Details: " + bs.Read(32) + EndOfLine
  s = s + "Subtitle List Reference Code: " + bs.Read(16) + EndOfLine
  s = s + "Creation Date: " + bs.Read(6) + EndOfLine
  s = s + "Revision Date: " + bs.Read(6) + EndOfLine
  s = s + "Revision Number: " + bs.Read(2) + EndOfLine
  s = s + "Total Number of Text and Timing Information (TTI) blocks: " + bs.Read(5) + EndOfLine
  s = s + "Total Number of Subtitles: " + bs.Read(5) + EndOfLine
  s = s + "Total Number of Subtitle Groups: " + bs.Read(3) + EndOfLine
  s = s + "Maximum Number of Displayable Characters in any text row: " + bs.Read(2) + EndOfLine
  s = s + "Maximum Number of Displayable Rows: " + bs.Read(2) + EndOfLine
  s = s + "Time Code: Status : " + bs.Read(1) + EndOfLine
  s = s + "Time Code: Start-of-Programme: " + bs.Read(8) + EndOfLine
  s = s + "Time Code: First In-Cue: " + bs.Read(8) + EndOfLine
  s = s + "Total Number of Disks: " + bs.Read(1) + EndOfLine
  s = s + "Disk Sequence Number: " + bs.Read(1) + EndOfLine
  s = s + "Country of Origin: " + bs.Read(3) + EndOfLine
  s = s + "Publisher: " + bs.Read(32) + EndOfLine
  s = s + "Editor's Name: " + bs.Read(32) + EndOfLine
  s = s + "Editor's Contact Details: " + bs.Read(32) + EndOfLine
  s = s + "Spare Bytes: " + bs.Read(75) + EndOfLine
  s = s + "User-Defined Area: " + bs.Read(576) + EndOfLine + EndOfLine
  
  //Get the first Text and Timing Information (TTI) block
  s = s + "Sub one:" + EndOfLine
  s = s + "Subtitle Group Number: " + bs.Read(1) + EndOfLine
  s = s + "Subtitle Number: " + bs.Read(2) + EndOfLine
  s = s + "Extension Block Number: " + bs.Read(1) + EndOfLine
  s = s + "Cumulative Status: " + bs.Read(1) + EndOfLine
  s = s + "Time Code In: " + bs.Read(4) + EndOfLine
  s = s + "Time Code Out: " + bs.Read(4) + EndOfLine
  s = s + "Vertical Position: " + bs.Read(1) + EndOfLine
  s = s + "Justification Code: " + bs.Read(1) + EndOfLine
  s = s + "Comment Flag: " + bs.Read(1) + EndOfLine
  s = s + "Text Field : " + bs.Read(112)
  
  OutputArea.Text = s
  
  //close the binaryStream
  bs.Close
End If

End If[/code]

And this is the output in the textarea:

[code]Code Page Number: 865
Disk Format Code: STL25.01
Display Standard Code: 0
Character Code Table NUmber: 00
Language Code: 07
Original Programme Title: original_programme_title
Original Episode Title: original_episode_title
Translated Programme Title: translated_programme_title
Translated Episode Title: translated_episode_title
Translator’s Name: The Translator
Translator’s Contact Details:
Subtitle List Reference Code: 0
Creation Date: 150214
Revision Date: 150214
Revision Number: 01
Total Number of Text and Timing Information (TTI) blocks: 00004
Total Number of Subtitles: 00004
Total Number of Subtitle Groups: 001
Maximum Number of Displayable Characters in any text row: 40
Maximum Number of Displayable Rows: 23
Time Code: Status : 1
Time Code: Start-of-Programme: 00000000
Time Code: First In-Cue: 00000100
Total Number of Disks: 1
Disk Sequence Number: 1
Country of Origin: DNK
Publisher:
Editor’s Name:
Editor’s Contact Details:
Spare Bytes:
User-Defined Area:

Sub one:
Subtitle Group Number: [/code]
And nothing more. I don’t get all the information from the TTI block
If I read the string in the debugger as binary I can see all the content from the TTI block but it doesn’t show in the outputarea.

Any ideas?

What is missing in your data? I would also recommend NOT to use readline. Depending on the size of your files it’s very much faster to do a readall and then parse your data after reading.

I miss most of the TTI block - the first subtitle.

Several of those fields you read are not strings, yet you’re adding that data as a string. So maybe where it stops

s = s + "Subtitle Group Number: " + bs.Read(1) + EndOfLine

the byte it’s reading corrupts or terminates the string there. You could try the optional encoding parameter

s = s + "Subtitle Group Number: " + bs.Read(1, Encodings.UTF8) + EndOfLine

But if the field is supposed to be a numeric value you should interpret it as such instead

s = s + "Subtitle Group Number: " + Str(bs.ReadUInt8) + EndOfLine

The Subtitle Group Number contains a binary value, not a string. Use

s = s + “Subtitle Group Number” + Str(bs.ReadUInt8) + EndOfLine

Same with several other values in the TTI block.

Thank you very much

I didn’t realize that reading binary data as string would corrupt the entire string. I thought there would be just no information.

Is it possible to check if data is binary before you read it in to a string?

At the moment it doesn’t matter if the data is binary or string. In the future it will. This is one reason why reading and parsing data should be 2 steps.

In that format specification it tells you what the bytes of each field are representing. You need to handle each one correspondingly.

I do not think so. But, you have the specs and these are in the specs (usually).

The purpose of the use of a BinaryStream is to read binary data: strings as well as integers, etc.

Look at the Structure of the datafile part of the pdf for the kind of data (string or integer or).

I do not spend long time to read the pdf, but each time a “Code” appears, it certainly is a number thus an integer (probably, may also be a float).

BTW: the number value can be coded in one or more bytes (up to ?). So you HAVE to know the number of bytes to read to get the number you want. Watch carefully the entry named Total Number of TTI Blocks (TNB).

WARNING: I saw that Dates are concerned: do you know how a date is stored in your file(s) ???
See entries on and below Creation Date.

Thank you all again. I’m getting a lot more help than I hoped for.

Just to be clear, it doesn’t corrupt the string per se, just the display of the string.