Binary Run-Lengths...

I’m trying to create an array of binary run-lengths from a string of binary. For example:

strBinary = “00110110000”

would convert into an array such as:

arrRL(0) = “00”
arrRL(1) = “11”
arrRL(2) = “0”
arrRL(3) = "11
arrRL(4) = “0000”

Anybody have any ideas or feedback on the most efficient way to do this without having to crawl 1-digit at a time through the entire binary string… perhaps using ‘replaceall’ to replace ‘01’ and ‘10’ with something followed by the ‘split’ functions?

Nevermind… actually got something to work using:

strBinary = replaceall(strBinary,“01”,“0!1”)
strBinary = replaceall(strBinary,“10”,“1!0”)

arrRL = split(strBinary, “!”)

To know whats faster use the Profiler. I’d write the ‘crawl 1 digit at a time’ code then call both methods 1000 times and see what the Profiler says. Things are faster in a built app and it can vary what speeds up but I find Profiler results are a good relative valuation.

Also test with ReplaceAllB, the B versions are usually much faster, especially the longer the string.

Appears the ReplaceAll method is a LOT faster than one-at-a-time crawling. Now, I need to find the quickest way to get the binary value of a complete file into a memoryblock or string without also using the slower crawling (reading a byte/long, converting that into binary, and appending the binary ‘1’ & ‘0’ digits to a string).

Dim bs As BinaryStream = BinaryStream.Open(MyFolderItem) Dim mb As MemoryBlock = bs.Read(bs.Length) bs.Close

You can replace the MemoryBlock with a string if you want.

I have done that, but how do I reference the binary digits in the memoryblock itself without having to clarify a position (again, facing the crawl issue)? Would be awesome if I could do something like mb.binarystring (which would treat the entire memoryblock as one long binary value).

An alternative is to use a regular expression:

  dim arrRL() as string
  dim rx as new RegEx
  rx.SearchPattern = "0+|1+"
  dim match as RegExMatch = rx.Search( binaryString )
  while match <> nil
    arrRL.Append match.SubExpressionString( 0 )
    match = rx.Search
  wend

But when I did a test on this vs. using the solution Eric came up with, the difference was 753 ms for the RegEx vs. 16.5 ms for the ReplaceAllB method. If I use RegExMBS instead, the results are roughly the same as for the ReplaceAllB method:

  dim arrRL() as string
  dim rx as new RegExMBS
  rx.CompileOptionUngreedy = false
  if rx.Compile( "0+|1+" ) and rx.Study then
    dim cnt as integer = rx.Execute( binaryString, 0 )
    while cnt <> 0 
      arrRL.Append rx.Substring( 0 )
      cnt = rx.Execute( rx.Offset( 1 ) )
    wend
  end if

MemoryBlocks and strings are in many cases synonymous, you can use MemoryBlocks with the ReplaceAll method you already have:

  Dim bs As BinaryStream = BinaryStream.Open(MyFolderItem)
  Dim strBinary As MemoryBlock = bs.Read(bs.Length)
  strBinary= ReplaceAll(strBinary, "01","0!1")
  strBinary = replaceall(strBinary,"10","1!0")
  arrRL = split(strBinary, "!")

But the file doesn’t contain zeros and ones, it contains bytes, which need to be converted into strings of zeros and ones. Then you can use the replaceall trick.

You’re going to have to crawl the memoryblock and build a string. But it shouldn’t take long. Go byte by byte and append each 8 char string to an array (you’ll have to zero-fill the result of bin()). Then join the array when you’re done.

dim mb as memoryblock = bs.read(bs.length) dim arr() as string dim s, result as string for i as integer = 0 to mb.size-1 s = right("00000000" + bin(mb.byte(i))) arr.append s next result = join(arr, "")

A slight adjustment to Tims code will make it 10-17x faster, basically precompute the byte strings.

dim map(255) As String for i As integer = 0 to 255 map(i) = Right("0000000" + Bin(i), 8) next

Then scan from mb directly into an output memoryblock,

dim mbOut As new MemoryBlock(mb.Size * 8) dim last As integer = mb.Size - 1 for i As integer = 0 to last mbOut.StringValue(i * 8, 8) = map(mb.UInt8Value(i)) next

Also, what is it you’re doing with this ‘binary’ data? Maybe a boolean array or even leaving the bits packed would be better for your processing algorithms. . . . or not.

And it’s 2x faster still by mapping 2 bytes at a time and a little more faster with all the checking pragmas false. : )

[quote=15723:@doofus]A slight adjustment to Tims code will make it 10-17x faster, basically precompute the byte strings.
[/quote]

If this code needs to be run multiple times, store the map:

static map() as string
if map.Ubound = ?1 then
    redim map( 255 )
    for i As integer = 0 to 255
        map(i) = Right("0000000" + Bin(i), 8)
    next
end if

I’ve been playing with the Profiler and this problem was a good learning test. Here’s the fastest version I’ve come up with, twice as fast as a single byte version, but the map uses more memory (1000k vs 2k) and more time to build, still just part of a second.

[code]Function binaryStringRepresentation(mb As MemoryBlock) As MemoryBlock
//pass in MemoryBlock or String (or nil to just initialize the map)
//returns MemoryBlock or String of the 0s and 1s
//============================================================
#pragma BackgroundTasks false
#pragma BoundsChecking false
#pragma StackOverflowChecking false
#pragma NilObjectChecking false

static map16() As String
if map16.Ubound = -1 then //need to build map
dim pre(16), s As String //shaves 10% off map building time
for i As integer = 15 DownTo 1
pre(i) = pre(i+1) + “0”
next
redim map16(65535)
for i As integer = 0 to 65535 //build
s = Bin(i)
map16(i) = pre(s.LenB) + s
next
end

if mb = nil or mb.Size = -1 then return “” //no input, just build map

dim origEndian As Boolean = mb.LittleEndian //change endian
mb.LittleEndian = false

dim mbOut As new MemoryBlock(mb.Size * 8) //create output memory
dim last As integer = mb.Size - 2
for i As integer = 0 to last step 2 //scan over 2 bytes at a time
mbOut.StringValue(i * 8, 16) = map16(mb.UInt16Value(i))
next

if mb.Size mod 2 = 1 then //if odd number of bytes then write last one
dim pos As integer = mb.Size - 1
mbOut.StringValue(pos * 8, 8) = RightB(map16(mb.UInt8Value(pos)), 8)
end

mb.LittleEndian = origEndian //restore endianness

return mbOut

End Function[/code]

I timed your map-making code at about 450 ms. This is a little more code, but comes in at under 30 ms:

  static map16() as string
  if map16.Ubound = -1 then
    dim bDigit() as string = Array( "0000", "0001", "0010", "0011", "0100", "0101", "0110", "0111", "1000", "1001", "1010", "1011", "1100", "1101", "1110", "1111" )
    dim bByte(255 ) as string
    dim byteIndex as integer
    for i1 as integer = 0 to 15
      dim dig1 as string = bDigit( i1 )
      for i2 as integer = 0 to 15
        dim dig2 as string = bDigit( i2 )
        bByte( byteIndex ) = dig1 + dig2
        byteIndex = byteIndex + 1
      next
    next

    redim map16( 65535 ) 
    dim mapIndex as integer
    for i1 as integer = 0 to 255
      dim dig1 as string = bByte( i1 )
      for i2 as integer = 0 to 255
        dim dig2 as string = bByte( i2 )
        map16( mapIndex ) = dig1 + dig2
        mapIndex = mapIndex + 1
      next
    next
  end if

All of this is awesome, guys! Appreciate the assistance… :slight_smile:

Is the bit stream continuous? or is it set to 8/16/32 bits… If so then a lookup table would work real fast.

0000 0000 = “0000000”
0000 0001 = “000000”, “1”
0000 0010 = “000000”, “1”, “0”

[code] dim str as string=“01011100”
dim i, l as integer
dim cc,cp as string
dim o(-1) as string
dim acc as string=""

l = str.len
cp = str.Mid(1)
acc = cp
for i=2 to l
cc = str.Mid(i, 1)
if cc <> cp then
o.Append acc
acc = cc
else
acc = acc + cc
end if
cp=cc
next i
o.Append acc
[/code]