Speed Up String Strip Method

Björn_Eiríksson · July 9, 2022, 7:33pm

I have not looked at the solutions…but right away looking at your original code then you could skip Left(temp,1) which would give you some speed improvement.

Asc always just takes first character from string so the Left(temp,1) is redundant.

And also:

if temp = "" then Exit // Deal with s consisting of only spaces.

Checking if Length is 0 is probably faster than comparing string.

Jeff_Tullin · July 9, 2022, 8:02pm

Posts like these tend to go on a long time.

How slow is it now?
How fast does it need to be?
How many times a minute does it need to run?

Eg if it runs once per selected text file, the user will not notice the difference between any time between 1ms and 1/2 seoond to complete.

Here is a possible solution using a memoryblock.
I dont know how it will compare to actual data.



dim s as string
//for testing
for k as integer = 1 to 32
  s = s + chr(k)
next
s = s + "any text at all"

//now we have a sample string
dim pos as integer =0
dim l as integer 

dim fnd as integer = 1
dim t1 as integer = ticks //for timing






//=========the work===========
l = len(s)
dim mb as New MemoryBlock(l) 
mb.StringValue(0, l) =s
while pos < l and fnd < 32
  fnd = mb.byte(pos)
  if fnd  < 33 then 
    mb.byte(pos) = 32
  end if
  pos = pos +1
wend
s = trim(mb)
//=========the work===========


dim t2 as integer = ticks  //for timing
msgbox cstr(t2-t1)

Julia_Truchsess · July 9, 2022, 10:03pm

Is that a bad thing?

6-20 seconds for half a million calls during a run of a batch by the app depending on which of the methods discussed is used. So far the only one that takes 20 seconds is TrimLeft; all the others apart from regex, which I haven’t tried yet, are pretty close to each other at 6-10s.

As fast as possible!

Yes, I know that.

KarenA · July 9, 2022, 10:54pm

BTW When you find the fastest for you, please post about it!

-Karen

Douglas_Handy · July 9, 2022, 10:57pm

And to speed this up, create a RegEx object once ahead of the loop so the pattern is only parsed once.

Lawrence_Johnson · July 9, 2022, 11:29pm

@Julia_Truchsess Having recently written an HTML Parser, I can safely say the fastest way to check a large amount of string data byte-by-byte was using a MemoryBlock.

In my case however, I parsed a single chunk of data, whereas it looks like you’ve got a large list of short strings which I’m assuming you’re calling your strip method on each. It’s highly likely the overhead of the function call/object creation each time is negating some of the benefit of using a MB in this situation.
I don’t suppose there’s any way you can collate your list of strings into a single block of data?

Julia_Truchsess · July 10, 2022, 1:31am

Thanks for your interest, @KarenA, I will re-run all the benchmarks in a more methodical manner. I wish the IDE would let you rename profiler runs!

Julia_Truchsess · July 10, 2022, 1:40am

Thanks, @Lawrence_Johnson. This method is from Plist Class for RealBasic, written in Olden Tymes by Mac Crafters. I adopted plists as my file format for all my apps twelve years ago, when I think Xojo’s support for XML and JSON may have been embryonic; In any case I found Mac Crafters’ Plist class very convenient to use and I have far too much technical debt invested in it to switch now. Speed has never been a big issue until this latest app, which needs to parse thousands of Plists and their sub-dictionaries.

The Strip method is called for every line of every dictionary in every Plist. I suppose I could run a regex on each entire file and then do away with the line-by-line stripping altogether, but it’s a bit more than I’m able to tackle right now just to save a few seconds.

Kem_Tekinay · July 10, 2022, 3:10am

Just to confirm, you’re testing the compiled versions? Running in the ide won’t necessarily give you accurate results (unless your final product will run in the ide).

Jeff_Tullin · July 10, 2022, 7:32am

100,000 iterations of this, using a 2015 intel machine, in the compiled build.
= 67 ticks


//=========the work===========
l = len(s)
dim mb as New MemoryBlock(l) 
mb.StringValue(0, l) =s
fnd = mb.byte(pos)
while pos < l and fnd < 32
  pos = pos +1
  fnd = mb.byte(pos)
wend

s = mb.StringValue(pos,l-pos)
//msgbox s
//=========the work===========

kevin_g · July 10, 2022, 9:51am

If the string is utf-8 try using the ‘B’ functions.
Try splitting the string into an array and cycle through that until you find the first valid character and then join the array from that index onwards.

Julia_Truchsess · July 10, 2022, 11:22am

No, I’m looking at profiler results in the IDE for now. I’ve never noticed any performance difference in this app between compiled and debug. More interested at this point in a relative speed gain than accurate quantification but good to know that there can be a difference, thanks.

Arnaud_N · July 10, 2022, 12:43pm

Not needing to remove ascii 127 (delete key character) as well?

Julia_Truchsess · July 10, 2022, 12:52pm

I don’t think Plist XML will ever contain a delete character at the start of a line, but good catch

Julia_Truchsess · July 10, 2022, 1:27pm

Fastest so far is this, combining Jim’s suggestion of not manipulating the source inside the loop, byte string functions as recommended by a few people, and Kem’s suggestion of eliminating Asc():

slen = s.Bytes
For c = 0 to slen
  If s.MiddleBytes(c,1) > " " then
    Exit
  end
next

If c > slen then // s is all spaces
  return ""
Else
  Return s.MiddleBytes(c)
End

With my test batch, this beats the original by a second, array and memory block by a couple of seconds, and TrimLeft by a looong time. Not the 5-second-ish improvement I was hoping for, I guess the overhead is what’s really taking the time. Still haven’t gotten to regex

Greg_O · July 10, 2022, 3:26pm

@Julia_Truchsess
Try compiling your app using Aggressive. It’ll take a while but it may make a significant difference.

KarenA · July 10, 2022, 3:45pm

Two more suggestions BUT they depend on you not wanting your code to look elegant…

IIRC functions calls are expensive so try put your trim code inline where you need it rather than calling the function itself 100000X times.
IIRC also FOR NEXT Loops are expensive (not sure about DO loop)… In any case partially unrolling your loop using a DO LOOP can speed things up. How much and many times you should unroll depends on what you expect the data to be like. See Code below

Dim S as String

For j as integer = 1 to 100
  S = S + Encodings.ASCII.Chr(j Mod 32)
Next
S = S+"Some Text"

Dim SLen as Integer = s.Bytes, C as Integer

Do
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
  If s.MiddleBytes(c,1) > " " then Exit
  c = c + 1 
  if c > SLen Then Exit
  
Loop

S = s.MiddleBytes(c)

Julia_Truchsess · July 10, 2022, 3:45pm

@Greg_O compiled at default mode is about 2 seconds faster than debug; aggressive saves another second vs default.

Julia_Truchsess · July 10, 2022, 3:57pm

BRILLIANT! I’d ■■■-umed due to the number of total calls that it was being called from a bunch of locations, but in fact it’s only called from one, so minimal elegance is sacrificed, and I gained 3 seconds! Thanks!

EDIT: Code entry error made the improvement seem like more than it is Picked up maybe a second max.

KarenA · July 10, 2022, 4:04pm

Try unrolling 10 or twenty times as well… You might shave off another couple of seconds!!! (Unrolling is only copy and pasting after all! )

karen