Speed Up String Strip Method

Average string length is 22 chars, so I unrolled 28 times and picked up a little more time - it all adds up!

Out of curiosity, can you let us know the time you started with and the time you ended up with? Did you get it as fast as you hoped?

(BTW I assume you used the Pragmas too)

-karen

It’s hard to say - when it was a separate method it was easy to measure because I could see the calls in the Profiler results. Now that it’s inline, it’s all mixed up with the parent method. I think overall my batch processing time has been reduced by about 10 percent.

I’m not seeing any speed improvement from the Pragmas.

Of course I’d like it to be faster, but I think I’m at (or past, lol) the point of diminishing returns with this piece of code, but added to some gains I achieved in other parts of the program overall it’s much better than it was a week ago.

Thanks to all for your time, insights, and suggestions!

I haven’t gone through all the code examples, but with the initial code example, if you are trying to eliminate all < 33 characters, it assumes that once you hit one that is => 33 there won’t be any more < 33 and you exit out.

I don’t know your data, so maybe you know that once you hit a character => 33 there won’t be any more to remove.

That is correct. The goal is to eliminate all characters at the start of the string whose ASCII value is less than 33. Once we get a char 33 or greater, we are in the string proper.

Then, and I’m sorry if this was covered in one of the 43 previous posts, consider that rather than loop and erase, erase, erase, one character at a time, you just need to find the first char 33 or great and with one command, trim everything to the left of that position. So it’s one search and one trim.

What the compiler does behind the scene in compiler code, and if some “native” action or Regular Function - ality results in faster execution, that’s out of my realm.

Remember, every year computers get faster, but humans may have to look at the code days/weeks/months/years later so there is something to be said for a little inefficiency traded off for more readability and understand-ability later on.

1 Like

This is a great idea, but Xojo needs to search for all characters > 33 to find which one is to the left. In practice that many searches.

The Trim function removes whitespaces. Wouldn’t there be room for a new Xojo function to remove all unprintable characters? (I know “unprintable” might be a varying set depending on its definition, but I’d say characters 0→31 and 127).

And how to you propose to do that? My initial suggestion was RegEx. Create the RegEx object and search pattern ahead of the loop over each iteration, then execute against each iteration needed.

TrimLeft accepts a ParamArray of characters to be removed. I tried it with an array of all characters 0-32 but it was slower than any other method by a factor of 3-5x.

1 Like

I know I’m late to the chat but I was wondering if you tried something like this: (swiped some of KarenA’s code) and if so, how did it compare?

Dim S, S1, S2, S3, S4  as String, i as Integer
Dim FirstLegalChar as String = Encodings.ASCII.Chr(33)

For j as integer = 1 to 100
  S = S + Encodings.ASCII.Chr(j Mod 32)
Next
S = S + "Some Text"

Dim CharArr() as String = S.SplitB("")
Dim ub as Integer = CharArr.Ubound

While CharArr(0) < FirstLegalChar
  CharArr.RemoveAt 0
Wend

S=Join(CharArr, "")

Yes I tried the array method, its speed is similar to that of the memory block suggestion but slower than the original using string functions, modified per Jim’s suggestion.

I’m still curious how this would compare to using RegEx instead, especially if you pre-create the RegEx object ahead of your loop and call it within your loop and not abstract it to a called function.

We all have our experiences with different languages and “native” commands. I understand this is Xojo and admit that I don’t have a strong command of its (or third-party) string tools. But I’ve seen that most languages have somewhat the same feature set - though the syntax and function/statement names might be different.

Also, though one might eliminate an explicit loop by calling a single function, it might be that “under the hood” that function triggers its own loop to do the job. The faith is - the compiler code will be more efficient - like the difference between raising a number to a power in Assembler rather than BASIC.

Let’s say we have an action - call it trim(here) just to give it a name - that loops off all characters to the left of a position on a string. So now we just have to find that position - a position, starting from the left, where the character is >= 33.

Now 33 is a lower bound. I’m guessing the user can guarantee an upper bound - like nothing greater than 126.

I’ve used languages that allow me to specify a domain of values (33 -126) and searching from left to right, returns the first position where the character is within that domain. That value, or that value minus 1, would be returned to the trim.

Now most likely “behind the scenes” that search function generates a loop. But it should be a very efficient, complier level loop.

I’m thinking of something in the form of:

here = search ("! - ~",mystring)

It would look from left to right, and return the first position where a character in the domain of ! to ~ (that’s 33 to 126) appears. That location is passed to the trim function.

What Xojo has, or what any third-party plug-in has, or what RegEx has, that does that - I don’t know. But I’ve done it before (maybe in the procedure language of the Mac database Panorama) so I’m guessing the same functionality is available in the Xojo world.

My suggestion was to approach the problem that way, if possible, rather than explicitly looping from left to right at the user coding level.

Which is what virtually everyone has said, in one way or another.
I’m not sure another solution remains to be found…

1 Like

So am I :slight_smile:

All you need to do is insert something like this in your code:

// Create RegEx object ahead of loop
var re as new RegEx
re.SearchPattern = "^[x\00-\x20]+"
re.ReplacementPattern = ""

// Iterate over source data; 
' user code here for setting up variable "data" to trim

// Strip off leading data which is hex 00 to hex 20 (up to and including ASCII space)
var s as string = re.Replace(data)

Just instantiate the RegEx object and set the options ahead of your half million iterations. For each one, just use re.Replace({source string}) to left trim the values below ASCII 33.

In the search expression “^[\x00-\x32]+” the parts simply mean:

  • ^ = must occur at start of string; that is this performs a left trim only
  • [\x00-\x20] = match any character in the range hex 00 to hex 20 (i.e. null to ASCII 32)
    • = this pattern must occur at least once or nothing happens

Edit: in the last line, it should be a plus sign instead of degree symbol – the forum software is changing it on me.

And if you have MBS plugins, try the same thing with RegExMBS…

Many thanks, @Douglas_Handy, I’ll give it a shot as soon as time permits.

The Trim function accepts an array of characters to trim (Thanks Julia to having taught me that); it surely is exactly as you suggest (and has been tried, and isn’t the fastest).