Anybody have any ideas or feedback on the most efficient way to do this without having to crawl 1-digit at a time through the entire binary string… perhaps using ‘replaceall’ to replace ‘01’ and ‘10’ with something followed by the ‘split’ functions?
To know whats faster use the Profiler. I’d write the ‘crawl 1 digit at a time’ code then call both methods 1000 times and see what the Profiler says. Things are faster in a built app and it can vary what speeds up but I find Profiler results are a good relative valuation.
Also test with ReplaceAllB, the B versions are usually much faster, especially the longer the string.
Appears the ReplaceAll method is a LOT faster than one-at-a-time crawling. Now, I need to find the quickest way to get the binary value of a complete file into a memoryblock or string without also using the slower crawling (reading a byte/long, converting that into binary, and appending the binary ‘1’ & ‘0’ digits to a string).
I have done that, but how do I reference the binary digits in the memoryblock itself without having to clarify a position (again, facing the crawl issue)? Would be awesome if I could do something like mb.binarystring (which would treat the entire memoryblock as one long binary value).
dim arrRL() as string
dim rx as new RegEx
rx.SearchPattern = "0+|1+"
dim match as RegExMatch = rx.Search( binaryString )
while match <> nil
arrRL.Append match.SubExpressionString( 0 )
match = rx.Search
wend
But when I did a test on this vs. using the solution Eric came up with, the difference was 753 ms for the RegEx vs. 16.5 ms for the ReplaceAllB method. If I use RegExMBS instead, the results are roughly the same as for the ReplaceAllB method:
dim arrRL() as string
dim rx as new RegExMBS
rx.CompileOptionUngreedy = false
if rx.Compile( "0+|1+" ) and rx.Study then
dim cnt as integer = rx.Execute( binaryString, 0 )
while cnt <> 0
arrRL.Append rx.Substring( 0 )
cnt = rx.Execute( rx.Offset( 1 ) )
wend
end if
But the file doesn’t contain zeros and ones, it contains bytes, which need to be converted into strings of zeros and ones. Then you can use the replaceall trick.
You’re going to have to crawl the memoryblock and build a string. But it shouldn’t take long. Go byte by byte and append each 8 char string to an array (you’ll have to zero-fill the result of bin()). Then join the array when you’re done.
dim mb as memoryblock = bs.read(bs.length)
dim arr() as string
dim s, result as string
for i as integer = 0 to mb.size-1
s = right("00000000" + bin(mb.byte(i)))
arr.append s
next
result = join(arr, "")
A slight adjustment to Tims code will make it 10-17x faster, basically precompute the byte strings.
dim map(255) As String
for i As integer = 0 to 255
map(i) = Right("0000000" + Bin(i), 8)
next
Then scan from mb directly into an output memoryblock,
dim mbOut As new MemoryBlock(mb.Size * 8)
dim last As integer = mb.Size - 1
for i As integer = 0 to last
mbOut.StringValue(i * 8, 8) = map(mb.UInt8Value(i))
next
Also, what is it you’re doing with this ‘binary’ data? Maybe a boolean array or even leaving the bits packed would be better for your processing algorithms. . . . or not.
I’ve been playing with the Profiler and this problem was a good learning test. Here’s the fastest version I’ve come up with, twice as fast as a single byte version, but the map uses more memory (1000k vs 2k) and more time to build, still just part of a second.
[code]Function binaryStringRepresentation(mb As MemoryBlock) As MemoryBlock
//pass in MemoryBlock or String (or nil to just initialize the map)
//returns MemoryBlock or String of the 0s and 1s
//============================================================ #pragma BackgroundTasks false #pragma BoundsChecking false #pragma StackOverflowChecking false #pragma NilObjectChecking false
static map16() As String
if map16.Ubound = -1 then //need to build map
dim pre(16), s As String //shaves 10% off map building time
for i As integer = 15 DownTo 1
pre(i) = pre(i+1) + “0”
next
redim map16(65535)
for i As integer = 0 to 65535 //build
s = Bin(i)
map16(i) = pre(s.LenB) + s
next
end
if mb = nil or mb.Size = -1 then return “” //no input, just build map
dim origEndian As Boolean = mb.LittleEndian //change endian
mb.LittleEndian = false
dim mbOut As new MemoryBlock(mb.Size * 8) //create output memory
dim last As integer = mb.Size - 2
for i As integer = 0 to last step 2 //scan over 2 bytes at a time
mbOut.StringValue(i * 8, 16) = map16(mb.UInt16Value(i))
next
if mb.Size mod 2 = 1 then //if odd number of bytes then write last one
dim pos As integer = mb.Size - 1
mbOut.StringValue(pos * 8, 8) = RightB(map16(mb.UInt8Value(pos)), 8)
end
I timed your map-making code at about 450 ms. This is a little more code, but comes in at under 30 ms:
static map16() as string
if map16.Ubound = -1 then
dim bDigit() as string = Array( "0000", "0001", "0010", "0011", "0100", "0101", "0110", "0111", "1000", "1001", "1010", "1011", "1100", "1101", "1110", "1111" )
dim bByte(255 ) as string
dim byteIndex as integer
for i1 as integer = 0 to 15
dim dig1 as string = bDigit( i1 )
for i2 as integer = 0 to 15
dim dig2 as string = bDigit( i2 )
bByte( byteIndex ) = dig1 + dig2
byteIndex = byteIndex + 1
next
next
redim map16( 65535 )
dim mapIndex as integer
for i1 as integer = 0 to 255
dim dig1 as string = bByte( i1 )
for i2 as integer = 0 to 255
dim dig2 as string = bByte( i2 )
map16( mapIndex ) = dig1 + dig2
mapIndex = mapIndex + 1
next
next
end if
[code] dim str as string=“01011100”
dim i, l as integer
dim cc,cp as string
dim o(-1) as string
dim acc as string=""
l = str.len
cp = str.Mid(1)
acc = cp
for i=2 to l
cc = str.Mid(i, 1)
if cc <> cp then
o.Append acc
acc = cc
else
acc = acc + cc
end if
cp=cc
next i
o.Append acc
[/code]