My method is so slow. Please help me out.

system · August 4, 2013, 11:51pm

Hello,

I need to decode large text files or strings (each about 1 MB in size) and I am using the following method for it.
It works, but it is so slowwwwww.

I am pulling out my hear how to make this much faster, but I really can’t think of any solution.
Please help me out guys or put me in the right direction. I am sure the are much better and faster alternatives than my method.

Thanks in a million.

–

The method is:

'This method converts all the characters from a text file input. The file itself is loaded in a different method and calling this method for the conversion.
'The string variable ‘inputdata’ is sent to this method as a parameter

Dim char As String
Dim decoded As String
Dim chars() As String

chars=inputdata.Split("")

For i As Integer = 0 To chars.Ubound
char = chars(i)
if char = “=” Then
decoded = decoded + Chr((Asc(char) -64) - 42)
i = i +1
Else
decoded = decoded + Chr(Asc(char) - 42)
end if
Next

chars = nil

return decoded

Kem_Tekinay · August 5, 2013, 12:00am

Try this:

Dim char As String
Dim decoded As String
Dim chars() As String
dim newChars() as string

chars=inputdata.Split("")

For i As Integer = 0 To chars.Ubound
  char = chars(i)
  if char = "=" Then
    newChars.Append Chr( ( char.Asc - 64  ) - 42 )
    i = i +1 
  Else
    newChars.Append Chr( char.Asc - 42 )
  end if
Next

chars = nil // You don't need this

decoded = join( newChars, "" )
return decoded

Kem_Tekinay · August 5, 2013, 12:02am

[quote=25084:@Kem Tekinay]Try this:

[code]
Dim char As String
Dim decoded As String
Dim chars() As String
dim newChars() as string

chars=inputdata.Split("")

For i As Integer = 0 To chars.Ubound
char = chars(i)
if char = “=” Then
newChars.Append Chr( ( char.Asc - 64 ) - 42 ) // Are you sure about the math here? Asc( “=” ) is 61.
// And if it’s always the same value, you don’t need to calculate it each time anyway
i = i +1
Else
newChars.Append Chr( char.Asc - 42 )
end if
Next

chars = nil // You don’t need this

decoded = join( newChars, “” )
return decoded
[/code][/quote]

Marc_Zeedar · August 5, 2013, 12:03am

It’s a little hard to tell without knowing what, exactly, you’re doing with this data, but I get the idea and I have a couple of suggestions.

You might replace the chars.Ubound call with a variable containing that value, as the way you’re doing it unbound (which is a function) is being called each time through the loop.

Another suggestion is to use an array for decoded instead of a string (string additions are slower than array appends). Just return it with a join(decoded) at the end so you’re still returning a string.

The chr/asc stuff looks ugly, but I can’t off the top of my head thing of a better way to do that. You could build a table of pre-computed values in a dictionary and do lookups and return the looked-up value instead of computing it every time, but you’d have to test that and see if it would be faster (I think it would, but I’m not sure it would be enough faster to make it worth the trouble). It also depends on what your input data is like – if it’s always within a certain range, such as ASCII 128, that could be workable. (If it could be any UTF8 character, that would be too complicated.)

You might also look at using chrB and ascB if your data isn’t UTF8.

Kem_Tekinay · August 5, 2013, 12:03am

OK, I don’t know why it wouldn’t let me edit my reply, but see the one above, even though it appears as a quote.

system · August 5, 2013, 12:41am

Thank you for the fast answers. I hope they speed up the process.

The method is needed to decode yenc files (or chunks of data encoded in yenc).
There are many examples in c language and I used some code from them, but unfortunately there aren’t for xojo.

I hoped to decode some bigger chunks of data in one go instead of each single character, but that is beyond my knowledge. Maybe regular expressions can do something here.

Anyway, thank you for your help so far.

Tim_Hare · August 5, 2013, 12:42am

If this is byte level manipulation, put the string in a MemoryBlock and use .Byte to manipulate the values as numbers.

Kem_Tekinay · August 5, 2013, 1:33am

To clarify why your code was slow, anytime you do string1 = string1 + string2, a new string has to be created and the old one destroyed (or at least marked as free) on every pass. As String1 gets bigger, this takes more and more time.

Using the array provides for better memory management as appending an element to an array doesn’t have to recreate the entire array each time, and each character already has a memory location thanks to Split.

Even faster (proably) would be to preallocate the array at the start with redim newChars( chars.Ubound ), then truncate it at the end with redim newChars( actualSize ). You’d need to keep a running counter in your loop for actualSize.

Mike_D · August 5, 2013, 3:45pm

Even faster than that, define a Pointer variable (Ptr) and use Ptr.Byte.

Here’s my try at a heavily-optimized version:

#pragma DisableBackgroundTasks
Dim char As integer  // use integer, rather than string for maximum speed
dim data as memoryBlock = inputData // convert string to memoryblock for maximum speed
Dim decoded As new memoryBlock(data.size) // we know it will be no larger than the input, this may be shortened later
dim u as integer = data.size -1
dim p as Ptr = data // for ultimate speed, use a Ptr to the memorybock
dim q as Ptr=decoded

dim j as integer // index into decoded MemoryBlock
For i As Integer = 0 To u
  char = ptr.byte(i)
  if char = 61 then // 61 is ASCII for "="
    q.byte(j) = char -64 - 42
    j = j + 1
    i = i +1 // skip next byte in input
  Else
    q.byte(j) = char-42
    j=j+1
  end if
Next

// trim decoded memoryblock to actual size
decoded.size = j-1
return decoded

Kem_Tekinay · August 5, 2013, 3:54pm

If you’re looking for ultimate speed, you should use a constant for replacing the “=” character instead of doing the calculation (that results in a negative number anyway).

Also, for the last line, it’s probably faster to do return decoded.StringValue( 0, j - 1 ) than to resize the MemoryBlock.

I’d love to know what the input looks like to see if there are faster ways to do this entirely.

Joe_Ranieri · August 5, 2013, 4:24pm

With great power comes great responsibility. Ptr has no bounds checking and can very easily crash your application if you use it incorrectly. As with all performance tuning, the code should be profiled and optimizations (and safety tradeoffs) only done as needed.

ChristopheDV · August 5, 2013, 5:06pm

About #pragma DisableBackgroundTasks

I am not sure if this speeds up things. I did many tests and I never got any speed advantages using #pragma DisableBackgroundTasks

Or am I wrong?

Kem_Tekinay · August 5, 2013, 5:09pm

First, you have to test in the compiled app, not the debug version.

DisableBackgroundTasks will keep Xojo from giving time to other threads or tasks during loops and other points where it normally would. (AFAIK, those points are undefined and subject to change.) So if your code is looping a lot, yes, it can make a huge difference.

Joe_Ranieri · August 5, 2013, 5:28pm

[quote=25289:@Kem Tekinay]First, you have to test in the compiled app, not the debug version.

DisableBackgroundTasks will keep Xojo from giving time to other threads or tasks during loops and other points where it normally would. (AFAIK, those points are undefined and subject to change.) So if your code is looping a lot, yes, it can make a huge difference.[/quote]

They’re well-defined actually:

loop boundaries, unless DisableBackgroundTasks is off
explicit thread yield calls
internal framework code (okay, this one isn’t well-defined)

As with the other pragmas, DisableBackgroundTasks only affects code in that specific scope. It does not affect code called by that scope.

system · August 5, 2013, 8:52pm

[quote=25242:@Michael Diehr]Even faster than that, define a Pointer variable (Ptr) and use Ptr.Byte.

Here’s my try at a heavily-optimized version:

[code]
#pragma DisableBackgroundTasks
Dim char As integer // use integer, rather than string for maximum speed
dim data as memoryBlock = inputData // convert string to memoryblock for maximum speed
Dim decoded As new memoryBlock(data.size) // we know it will be no larger than the input, this may be shortened later
dim u as integer = data.size -1
dim p as Ptr = data // for ultimate speed, use a Ptr to the memorybock
dim q as Ptr=decoded

dim j as integer // index into decoded MemoryBlock
For i As Integer = 0 To u
char = ptr.byte(i)
if char = 61 then // 61 is ASCII for “=”
q.byte(j) = char -64 - 42
j = j + 1
i = i +1 // skip next byte in input
Else
q.byte(j) = char-42
j=j+1
end if
Next

// trim decoded memoryblock to actual size
decoded.size = j-1
return decoded

[/code][/quote]

OMG, this is blazing fast.
Superb !!

Thanks so much Michael and everyone else.
What an awesome community we got here.

Mike_D · August 6, 2013, 3:01am

Glad you like it - please do test it out thoroughly, as I may have mangled the logic in translation. Also, as Joe points out, using Ptr.Byte() bypasses valuable error checking.

John_Hansen · August 6, 2013, 6:15am

You sometimes have to refresh your page before it’s allows you to update the page

Alwyn_Bester · August 6, 2013, 6:42am

Man, did I learn a lot from this discussion when it comes to writing faster code.

system · September 26, 2013, 10:10pm

I wonder if I can speed up the process writing a plugin in c++ or are plugins not faster than internal xojo code?