text.mid over x10 slower than mid()

Eric_Brown2 · December 10, 2015, 4:32am

I’m trying to stick with the new framework as much as possible; however, I’ve noticed that using text.mid(0, loopIndex) on a large text (about 8000 characters of hexadecimal representations) is about x10 SLOWER than using mid(text, 0, loopIndex) when parsing the information.

Anybody have any explanation on why there would be such a drastic difference? Shouldn’t any new framework functions be AT LEAST as fast as the older stuff?

Made a console application, and put this in the Run event handler;

[code]
dim myByte() as byte
dim myText as text
dim startTicks as uInt32
dim textLength as uInt32
dim iLoop as uInt32
dim oLoop as uInt32

myText = “FF01C32FFFCA399400FC499923FF01C32FFFCA399400FC499923FF01C32FFFCA399400FC499923”
myText = myText + myText + myText + myText + myText + myText + myText + myText + myText ’ make er big
textLength = myText.length

print "text length: " + textLength.toText

startTicks = ticks

for oLoop = 1 to 10000
for iLoop = 1 to textLength step 2
myByte.append(val("&h" + mid(myText, iLoop, 2)))
next iLoop
next oLoop

print “using mid(): " + format((ticks - startTicks) / 60, “0.0000”) + " seconds…”

startTicks = ticks

for oLoop = 1 to 10000
for iLoop = 1 to (textLength - 2) step 2
myByte.append(val("&h" + myText.mid(iLoop, 2)))
next iLoop
next oLoop

print “using text.mid(): " + format((ticks - startTicks) / 60, “0.0000”) + " seconds…”

do
app.doEvents()
loop[/code]

Be prepared to go grab some coffee during the second routine…

Greg_O_Lone · December 10, 2015, 4:43am

Just out of curiosity, why aren’t you accessing this using a memoryblock?

Travis_Hill · December 10, 2015, 4:47am

No. The Text functions should be more accurate than String- not faster. They now understand user-perceived characters made up of different numbers of bytes across encodings.

If you want to do parsing at a per-byte level (where you don’t care about user perceived characters or encodings- but you do care about speed)- the appropriate way in the new framework is to use things like Mid on a MemoryBlock instead like Greg mentioned.

Eric_Brown2 · December 10, 2015, 4:52am

[quote=234878:@Travis Hill]No. The Text functions should be more accurate than String- not faster. They now understand user-perceived characters made up of different numbers of bytes across encodings.

If you want to do parsing at a per-byte level (where you don’t care about user perceived characters or encodings- but you do care about speed)- the appropriate way in the new framework is to use things like Mid on a MemoryBlock instead like Greg mentioned.[/quote]

Gotcha. That makes sense.

“the appropriate way in the new framework is to use things like Mid on a MemoryBlock instead like Greg mentioned.”

EDIT: Nevermind, I found the documentation.

Eric_Brown2 · December 10, 2015, 4:56am

Alright, Greg… could you elaborate on the code above where you’re talking about using a memoryblock? I have a method that converts a text of hexadecimal to a memoryblock. It’s SLOW as hell. I’ve tried a dozen different ways I can think of to make this faster…

Travis_Hill · December 10, 2015, 4:58am

Why not keep it in binary data in a memoryblock from the get-go? Is the source itself a textual hex representation?

Eric_Brown2 · December 10, 2015, 5:05am

Here’s what I’m trying to do:

put a binary file into a xojo.core.memoryBlock
convert xojo.core.memoryBlock to a text of hex representation
pass that text in a utf-8 json query
pull the passed text in the utf-8 json query and put it into a xojo.core.memoryBlock
write that xojo.core.memoryBlock back to a file using the binaryStream

I’ve touched on this prior (using encodeHex on a xojo.core.memoryBlock), and now I’m trying to do the reverse:
https://forum.xojo.com/22057-encodehex-memoryblock/0

I’ve created methods to do the xojo.core,memoryBlock <> text of hex conversions. Now I just need to optimize them.

Kem_Tekinay · December 10, 2015, 5:13am

About 1.5 milliseconds. But in fairness, this was in the IDE

 dim hexText as text = "0123456789ABCDE"
  hexText = M_String.Repeat( hexText, 1000 ).ToText
  
  dim msg as string
  dim sw as new Stopwatch_MTC
  sw.Start
  
  dim bytes() as byte
  redim bytes( hexText.Length \\ 2 )
  
  dim mb as Xojo.Core.MemoryBlock = _
  Xojo.Core.TextEncoding.ASCII.ConvertTextToData( hexText.Uppercase )
  dim byteIndex as integer = -1
  dim lastIndex as integer = mb.Size - 1
  for mbIndex as integer = 0 to lastIndex step 2
    dim byte1 as byte = mb.Data.Byte( mbIndex )
    dim byte2 as byte = mb.Data.Byte( mbIndex + 1 )
    
    byte1 = if( byte1 >= 65, byte1 - 55, byte1 - 48)
    byte2 = if( byte2 >= 65, byte2 - 55, byte2 - 48)
    
    byteIndex = byteIndex + 1
    bytes( byteIndex ) = ( byte1 * 16 ) + byte2
    
  next
  
  sw.Stop
  msg = format( sw.ElapsedMicroseconds, "#," ) + " microsecs"
  AddToResult msg

Eric_Brown2 · December 10, 2015, 5:32am

[quote=234886:@Kem Tekinay]About 1.5 milliseconds. But in fairness, this was in the IDE

[code]
dim hexText as text = “0123456789ABCDE”
hexText = M_String.Repeat( hexText, 1000 ).ToText

dim msg as string
dim sw as new Stopwatch_MTC
sw.Start

dim bytes() as byte
redim bytes( hexText.Length \ 2 )

dim mb as Xojo.Core.MemoryBlock = _
Xojo.Core.TextEncoding.ASCII.ConvertTextToData( hexText.Uppercase )
dim byteIndex as integer = -1
dim lastIndex as integer = mb.Size - 1
for mbIndex as integer = 0 to lastIndex step 2
dim byte1 as byte = mb.Data.Byte( mbIndex )
dim byte2 as byte = mb.Data.Byte( mbIndex + 1 )

byte1 = if( byte1 >= 65, byte1 - 55, byte1 - 48)
byte2 = if( byte2 >= 65, byte2 - 55, byte2 - 48)

byteIndex = byteIndex + 1
bytes( byteIndex ) = ( byte1 * 16 ) + byte2

Man, Kem… this is great. The HEX text entered doesn’t match the result, but with a bit of looking into it I think I can find the problem. Where do I send the beer? If I can get this to work, it’s WAY better than anything I’ve come up with.

Eric_Brown2 · December 10, 2015, 5:45am

Found the issue… I wasn’t converting the text to uppercase. That through it off. Much appreciated!

kevin_g · December 10, 2015, 7:59am

[quote=234878:@Travis Hill]No. The Text functions should be more accurate than String- not faster. They now understand user-perceived characters made up of different numbers of bytes across encodings.

If you want to do parsing at a per-byte level (where you don’t care about user perceived characters or encodings- but you do care about speed)- the appropriate way in the new framework is to use things like Mid on a MemoryBlock instead like Greg mentioned.[/quote]

Hi Travis

I’m not sure your explanation is correct.

String functions in the classic framework don’t work at the byte level. If you perform a Mid on a UTF8 character which is 3 bytes long you will get all 3 bytes. Whereas if you used MidB you would only get 1 byte.

A better description might be to say that it works at the Unicode code point or byte sequence level.

The ‘more accurate statement’ is also debatable as it depends on what you are trying to achieve. However, it still doesn’t justify the performance as that added ‘accuracy’ shouldn’t come with such a huge penalty.