I have not looked at the solutions…but right away looking at your original code then you could skip Left(temp,1) which would give you some speed improvement.
Asc always just takes first character from string so the Left(temp,1) is redundant.
And also:
if temp = "" then Exit // Deal with s consisting of only spaces.
Checking if Length is 0 is probably faster than comparing string.
How slow is it now?
How fast does it need to be?
How many times a minute does it need to run?
Eg if it runs once per selected text file, the user will not notice the difference between any time between 1ms and 1/2 seoond to complete.
Here is a possible solution using a memoryblock.
I dont know how it will compare to actual data.
dim s as string
//for testing
for k as integer = 1 to 32
s = s + chr(k)
next
s = s + "any text at all"
//now we have a sample string
dim pos as integer =0
dim l as integer
dim fnd as integer = 1
dim t1 as integer = ticks //for timing
//=========the work===========
l = len(s)
dim mb as New MemoryBlock(l)
mb.StringValue(0, l) =s
while pos < l and fnd < 32
fnd = mb.byte(pos)
if fnd < 33 then
mb.byte(pos) = 32
end if
pos = pos +1
wend
s = trim(mb)
//=========the work===========
dim t2 as integer = ticks //for timing
msgbox cstr(t2-t1)
6-20 seconds for half a million calls during a run of a batch by the app depending on which of the methods discussed is used. So far the only one that takes 20 seconds is TrimLeft; all the others apart from regex, which I haven’t tried yet, are pretty close to each other at 6-10s.
@Julia_Truchsess Having recently written an HTML Parser, I can safely say the fastest way to check a large amount of string data byte-by-byte was using a MemoryBlock.
In my case however, I parsed a single chunk of data, whereas it looks like you’ve got a large list of short strings which I’m assuming you’re calling your strip method on each. It’s highly likely the overhead of the function call/object creation each time is negating some of the benefit of using a MB in this situation.
I don’t suppose there’s any way you can collate your list of strings into a single block of data?
Thanks, @Lawrence_Johnson. This method is from Plist Class for RealBasic, written in Olden Tymes by Mac Crafters. I adopted plists as my file format for all my apps twelve years ago, when I think Xojo’s support for XML and JSON may have been embryonic; In any case I found Mac Crafters’ Plist class very convenient to use and I have far too much technical debt invested in it to switch now. Speed has never been a big issue until this latest app, which needs to parse thousands of Plists and their sub-dictionaries.
The Strip method is called for every line of every dictionary in every Plist. I suppose I could run a regex on each entire file and then do away with the line-by-line stripping altogether, but it’s a bit more than I’m able to tackle right now just to save a few seconds.
Just to confirm, you’re testing the compiled versions? Running in the ide won’t necessarily give you accurate results (unless your final product will run in the ide).
100,000 iterations of this, using a 2015 intel machine, in the compiled build.
= 67 ticks
//=========the work===========
l = len(s)
dim mb as New MemoryBlock(l)
mb.StringValue(0, l) =s
fnd = mb.byte(pos)
while pos < l and fnd < 32
pos = pos +1
fnd = mb.byte(pos)
wend
s = mb.StringValue(pos,l-pos)
//msgbox s
//=========the work===========
If the string is utf-8 try using the ‘B’ functions.
Try splitting the string into an array and cycle through that until you find the first valid character and then join the array from that index onwards.
No, I’m looking at profiler results in the IDE for now. I’ve never noticed any performance difference in this app between compiled and debug. More interested at this point in a relative speed gain than accurate quantification but good to know that there can be a difference, thanks.
Fastest so far is this, combining Jim’s suggestion of not manipulating the source inside the loop, byte string functions as recommended by a few people, and Kem’s suggestion of eliminating Asc():
slen = s.Bytes
For c = 0 to slen
If s.MiddleBytes(c,1) > " " then
Exit
end
next
If c > slen then // s is all spaces
return ""
Else
Return s.MiddleBytes(c)
End
With my test batch, this beats the original by a second, array and memory block by a couple of seconds, and TrimLeft by a looong time. Not the 5-second-ish improvement I was hoping for, I guess the overhead is what’s really taking the time. Still haven’t gotten to regex
Two more suggestions BUT they depend on you not wanting your code to look elegant…
IIRC functions calls are expensive so try put your trim code inline where you need it rather than calling the function itself 100000X times.
IIRC also FOR NEXT Loops are expensive (not sure about DO loop)… In any case partially unrolling your loop using a DO LOOP can speed things up. How much and many times you should unroll depends on what you expect the data to be like. See Code below
Dim S as String
For j as integer = 1 to 100
S = S + Encodings.ASCII.Chr(j Mod 32)
Next
S = S+"Some Text"
Dim SLen as Integer = s.Bytes, C as Integer
Do
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
If s.MiddleBytes(c,1) > " " then Exit
c = c + 1
if c > SLen Then Exit
Loop
S = s.MiddleBytes(c)
BRILLIANT! I’d ■■■-umed due to the number of total calls that it was being called from a bunch of locations, but in fact it’s only called from one, so minimal elegance is sacrificed, and I gained 3 seconds! Thanks!
EDIT: Code entry error made the improvement seem like more than it is Picked up maybe a second max.