Speed Up String Strip Method

How can I make this faster?

Its purpose is to remove all characters from the beginning of a string whose ASCII value is < 33.

function Strip(s as String) as String

dim temp As string

If s<>"" Then
  temp=s
  while asc(left(temp,1))<33 
    temp=right(temp,len(temp)-1)
    if temp = "" then Exit // Deal with s consisting of only spaces.
  wend
end
return temp

I’m guessing a Regex replacement would be faster, but I have not timed it.

1 Like

Try this one, i re-uses s and this may improve things:

function Strip(s as String) as String
#Pragma BackgroundTasks False
#Pragma StackOverflowChecking False
#Pragma NilObjectChecking False

If s <> "" Then
  while asc(left(s,1)) < 33 
    s = right(s, len(s) - 1)
    if s = "" then Exit // Deal with s consisting of only spaces.
  wend
end
return s

Alternatively ask @Christian_Schmitz for a c optimized version or have @Kem_Tekinay look at this if he’s not near a sunny spot on the beach.

I would not modify the string in the “while”… instead just look for the first character you want to keep and count where it is located in the string… then use .mid or .right to remove all the unwanted characters at once.

7 Likes

I have an old Regex that I use to strip invisible strings (I think):

dim theString as String
dim searchpattern as String = "[^\x{0009}\x{000A}\x{000D}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFF}]+"
dim rx as RegExMBS = New RegExMBS
if rx.Compile(searchPattern) and rx.Study() then
  theString = rx.ReplaceAll(theXML.ToString, "")
end if

Guessing, but Jim’s recommendation is probably faster.

1 Like

Somewhat surprisingly, this is not significantly faster on average:

If s<>"" Then
  
  Dim slen as Integer = s.Length
  Dim c As Integer
  
  For c = 1 to slen
    If Asc(mid(s,c,1)) > 32 then
      Exit
    end
  next
  
  If c > slen then
    return ""
  Else
    Return mid(s,c)
  End
Else
  Return ""
End

I guess the time is being spent in Asc() and Mid().

If you are sure there are no double space in the right hand side of the string, you could split it with a couple space. Then the last member of the split array result will be the one without beginning spaces.

var s as string = "                                hello, world"
var spl() as string = s.split("  ")
msgbox spl(spl.ubound-1)

I need to strip any character whose ASCII value is < 33 from the start of the string, not just spaces.

Yes, very surprising.

Here are 4 different ways. One of them may work better for your type of data . Of course use the pragmas to speed things up.

This is API 1 code:

Dim S, S1, S2, S3, S4  as String, i as Integer

For j as integer = 1 to 100
  S = S + Encodings.ASCII.Chr(j Mod 32)
Next
S = S+"Some Text"


' Using Char Comparison 
Dim SLen as Integer = S.Len
Dim FirstLegalChar as String = Encodings.ASCII.Chr(33)

For i = 1 to SLen
  If S.Mid(i,1) >= FirstLegalChar Then
    S1 = S.Mid(i)
    Exit
  End if
Next

' Using Char Binary Comparison 
Dim SLenB As Integer = S.LenB

For i = 1 to SLenB
  If S.MidB(i,1) >= FirstLegalChar Then
    S2 = S.MidB(i)
    Exit
  End if
Next

' Using Split Char Binary Comparison 

Dim CharArr() as String = S.SplitB("")
Dim ub as Integer = CharArr.Ubound

For i = 0 to ub
  If CharArr(i) >= FirstLegalChar Then
    S3 = S.MidB(i)
    Exit
  End if
Next

' using a MemoryBlock
Dim MB as MemoryBlock = S

For i = 0 to  ub
  If MB.Byte(i) > 32 then 
    S4 = S.MidB(i)
    Exit
  End if
Next

Break

BTW i’m pretty sure at one time in the stone age in some other version of BASIC (or in some other language I used ) Trim would have taken care of that…

You know back in the days when one used ASCII control codes and ASCII was all there was!!!

-Karen

Have you looked at TrimLeft with its optional parameters?

https://documentation.xojo.com/api/data_types/string.html#string-trimleft

2 Likes

How about something like this?

Public Function Strip(s As String) As String
  Var mb As MemoryBlock = s
  Var lim As Integer = mb.Size - 1
  Var count As Integer = 0
  
  For i As Integer = 0 To lim
    If mb.Byte(i) < 33 Then
      count = count + 1
    Else
      Exit For i
    End If
  Next i
  
  Return s.Middle(count, s.Length - count)
End Function

I’m not a pro, but just counting bytes and using string manipulating only once should be quite fast.

EDIT: I just gave it a try and in my testing scenario my algorithm is just 10 to 20 percent faster than yours. So it’s not worthwhile…

1 Like

I like the elegance, but seems way slower than Asc() / Mid() and MB, by a factor of 3-4x. I haven’t tried regex yet…

Sorry if I wasn’t clear, @KarenA - I need to remove all offending characters from the start of the string, not just the first one, and I don’t know how many of them there are at the start of the string.

The set of possible unwanted characters is probably limited to tabs, spaces, and maybe CRs and LFs, but checking the ASCII value against 32 covers all of those.

I’d forgotten that you can split a string into characters using “” as the delimiter, thanks. Maybe iterating through an array will improve speed vs Mid().

You don’t need Asc though, just compare to a space. That should save some cycles.

1 Like

It does eliminate all the chars < 33 at the beginning… Each Method results in a string that is simply “Some Text”

Why do you think that code only removes the first one?

That is code I copied from the IDE and it works…

If you leave the final parameter of Mid off, it returns a string from the index onward to the end

Try it!
Each one works!

Paste the code into the IDE and run it and look at S1, S2, S3,S4 in the debugger.

  • karen

Sorry, doing too many things at once :slight_smile:

Your suggestion

For i = 1 to SLen
  If S.Mid(i,1) >= FirstLegalChar Then
    S1 = S.Mid(i)
    Exit
  End if
Next

is basically my implementation of Jim’s suggestion

For c = 1 to slen
    If Asc(mid(s,c,1)) > 32 then
      Exit
    end
  next

but eliminating the Asc() conversion by direct string comparison, which as Kem mentions may indeed be faster, I’ll give it a try.

If the MemoryBlock solution is not fastest (overhead of object creation) I think one of the StringB (binary) methods should be.

-Karen