Case insensitive string replacement

Since I last did a lot of programming, the string functions I’m trying to use have become case insensitive. (It seems to me that they used to be case sensitive, but maybe I’m just used to VBA and don’t remember.)

Anyway, does anyone have a rather simple solution for case-insensitive string replacement?
Given a variable called myParagraph, I want in case 1 to replace the first A and B with blanks. In case 2, I want to replace the first a and b with blanks. My code was:

If Case = 1 Then
myParagraph = myParagraph.Replace(“A”, “”).Replace(“B”, “”)
Else
myParagraph = myParagraph.Replace(“a”, “”).Replace(“b”, “”)
End If

But the two conversion statements are really the same because they both replace the first A/a and first B/b with blanks. I’m thinking that I can loop through each character in the string and test it individually to see if it’s an A or an a and make the appropriate change. But it seems that there has to be a more elegant solution.

This can do this with RegEx using a case sensitive flag (the i flag in the search pattern)

dim sourceText as string = "AAaaBBbb"

dim rx as new RegEx
rx.SearchPattern = "(?m-Usi)B"
rx.ReplacementPattern = "x"

dim rxOptions as RegExOptions = rx.Options
rxOptions.LineEndType = 4
rxOptions.ReplaceAllMatches = true

dim replacedText as string = rx.Replace( sourceText )

// replacedText = "AAaaxxbb"

You just have to be careful if you start searching for special characters if you aren’t comfortable with RegEx syntax since it might break the search pattern.

Check out @Kem_Tekinay’s awesome RegExRX app if you want to start dabbling in RegEx.

For those less familiar with RegEx, the code can be made more self documenting by using the RegExOptions.CaseSensitive setting too.

2 Likes

Good call Tim. I’m just so used to taking whatever RegExRX spits out :upside_down_face:

Simplified…

dim sourceText as string = "AAaaBBbb"

dim rx as new RegEx
rx.SearchPattern = "B"
rx.ReplacementPattern = "x"

dim rxOptions as RegExOptions = rx.Options
rxOptions.ReplaceAllMatches = true
rxOptions.CaseSensitive = true

dim replacedText as string = rx.Replace( sourceText )

// replacedText = "AAaaxxbb"
1 Like

Thanks for the feedback.

I ended up creating a function that I can call that loops through the string and compares each character. To me, the RegEx is even more convoluted and difficult to read than looping through the code.

But am I crazy about remembering that myString = “k” used to be case sensitive? I don’t remember using Replace in the past, so I don’t know if it has changed over time or not.

1 Like

Replace has always been case insensitive in Xojo for as long as I remember.

You could use String.ReplaceBytes which is case sensitive.

From the docs

ReplaceBytes is case-sensitive; it treats sourceString as a series of raw bytes. It should be used instead of Replace when the String represents a series of bytes or when your application will run in a one-byte character set (such as the US system) and you want case-sensitivity.

2 Likes

That line in the documentation really should stop right there. ReplaceBytes really should only be used when you are treating a string as data - not human-readable text. It can damage a string which is encoded in any multi-byte encoding, such as the virtually ubiqutious UTF-8. Please do not use it for this particular situation.

2 Likes

Make a note to come back to this once you have your overall system going and you are considering where to optimize your code. Using a RegEx will be many, many times faster than looping over the characters.

1 Like

I thought of that; thanks.
The good thing is that the source string will never be more than about 5 or 6 characters.

I have a String extensions module I constantly maintain (use in every project) that handles this. It will take your inputs, escape any regex metacharacters, and do case-sensitive replacements using Regex.

Add the following method to a Module and you can use it like:

Var original As String = "Hello World, hello universe!"
Var result As String = original.ReplaceCaseSensitive("hello", "hi") 
Public Function ReplaceCaseSensitive(extends originalString As String, matchString As String, replaceString As String, Optional maxReplacements As Integer = -1) As String
  // Case-sensitive string replacement
  //
  // Input Parameters:
  //   originalString
  //   matchString - Substring to find and replace
  //   replaceString - String to replace matches with
  //   maxReplacements - Maximum number of replacements to make:
  //                    -1 (default) = replace all occurrences
  //                     0 = replace none (returns original string)
  //                    >0 = replace up to this many occurrences
  //
  // Output:
  //   Returns a new string with the specified replacements made
  
  If matchString = "" or originalString = "" or maxReplacements = 0 Then Return originalString
  
  
  Try
    // Escape regex metacharacters 
    Var escapedMatch As String = matchString.ReplaceAll("\", "\\") // backslashes have to be first
    escapedMatch = escapedMatch.ReplaceAll(".", "\.")
    escapedMatch = escapedMatch.ReplaceAll("^", "\^")
    escapedMatch = escapedMatch.ReplaceAll("$", "\$")
    escapedMatch = escapedMatch.ReplaceAll("*", "\*")
    escapedMatch = escapedMatch.ReplaceAll("+", "\+")
    escapedMatch = escapedMatch.ReplaceAll("?", "\?")
    escapedMatch = escapedMatch.ReplaceAll("{", "\{")
    escapedMatch = escapedMatch.ReplaceAll("}", "\}")
    escapedMatch = escapedMatch.ReplaceAll("[", "\[")
    escapedMatch = escapedMatch.ReplaceAll("]", "\]")
    escapedMatch = escapedMatch.ReplaceAll("|", "\|")
    escapedMatch = escapedMatch.ReplaceAll("(", "\(")
    escapedMatch = escapedMatch.ReplaceAll(")", "\)")
    
    Var regex As New RegEx
    regex.SearchPattern = escapedMatch
    regex.ReplacementPattern = replaceString
    regex.Options.CaseSensitive = True
    
    If maxReplacements = -1 Then
      // Replace all
      regex.Options.ReplaceAllMatches = True
      Return regex.Replace(originalString)
    Else
      // Loop to maxReplacements
      regex.Options.ReplaceAllMatches = False
      
      Var result As String = originalString
      Var replacementCount, searchOffset As Integer = 0
      
      While replacementCount < maxReplacements
        
        Var match As RegExMatch = regex.Search(result, searchOffset) //start from current offset
        
        If match = Nil Then Exit While //we're done
        
        Var matchStartByte As Integer = match.SubExpressionStartB(0) // Convert byte index to character index in case of UTF-8 multibyte characters
        Var matchedText As String = match.SubExpressionString(0)
        Var beforeMatch As String = result.LeftBytes(matchStartByte)
        Var matchStartChar As Integer = beforeMatch.Length
        Var matchLengthChar As Integer = matchedText.Length
        
        // Replace this match
        Var beforeText As String = result.Left(matchStartChar)
        Var afterText As String = result.Middle(matchStartChar + matchLengthChar)
        result = beforeText + replaceString + afterText
        
        replacementCount = replacementCount + 1
        
        // Update search offset to pick up after the replacement
        Var newPrefix As String = beforeText + replaceString
        searchOffset = newPrefix.Bytes
        
        // Holy infinite loops, Batman! Double check our bounds.
        If searchOffset >= result.Bytes Then Exit While
      Wend
      
      Return result
    End If
    
  Catch e As RegExException
    
    //This returns the original string if there was a problem.
    //Maybe raise an exception here instead?
    Return originalString
    
  End Try
  
End Function
3 Likes

Jeremie,
thanks for pointing me to ReplaceBytes. This is what I was looking for.

Are you converting your string to a 1-byte text encoding before doing the replace? If not, you run the risk of damaging the string by manipulating its bytes directly.

I’m only working with ASCII characters 48-122. So I think that answer to your question is: yes.

Unless you are using ConvertEncoding or DefineEncoding, the answer is no. :wink:

I’ve read the documentation now for ConvertEncoding and DefineEncoding.
This is over my head, but I don’t see how they will help my project.

You need to read about text encodings to really understand what the issues are here, because you’re setting yourself up for weird problems with what you’re doing.

Start here:

Great Video to help understand encodings
Encodings

That’s a bit extreme. Chars 48-122 are the same in every encoding, or even in no encoding. A string can be just a bag-o-bytes and treated as such.

1 Like

Yes, but he’s specifically dealing with text - not bytes. And we don’t know that all of his text is within the ASCII range; in fact, it is very unwise to make that assumption when dealing with text.

The easiest rule to express to a programmer who doesn’t understand text encodings is to ignore all the string functions that deal with bytes directly; they can and will cause problems and limitations down the line. They should treat strings as opaque data types that store text using any style of arbitrary byte patterns that can’t be manipulated in a byte-by-byte fashion without risking damage to the data.

Of course, the better approach is for them to understand text encodings so they can write better code that does what they want.

1 Like

You may have missed the note where I listed the ASCII codes that I’m working with.
I only have a subset of the uppercase and lowercase characters that I have to convert.