Since I last did a lot of programming, the string functions I’m trying to use have become case insensitive. (It seems to me that they used to be case sensitive, but maybe I’m just used to VBA and don’t remember.)
Anyway, does anyone have a rather simple solution for case-insensitive string replacement?
Given a variable called myParagraph, I want in case 1 to replace the first A and B with blanks. In case 2, I want to replace the first a and b with blanks. My code was:
If Case = 1 Then
myParagraph = myParagraph.Replace(“A”, “”).Replace(“B”, “”)
Else
myParagraph = myParagraph.Replace(“a”, “”).Replace(“b”, “”)
End If
But the two conversion statements are really the same because they both replace the first A/a and first B/b with blanks. I’m thinking that I can loop through each character in the string and test it individually to see if it’s an A or an a and make the appropriate change. But it seems that there has to be a more elegant solution.
This can do this with RegEx using a case sensitive flag (the i flag in the search pattern)
dim sourceText as string = "AAaaBBbb"
dim rx as new RegEx
rx.SearchPattern = "(?m-Usi)B"
rx.ReplacementPattern = "x"
dim rxOptions as RegExOptions = rx.Options
rxOptions.LineEndType = 4
rxOptions.ReplaceAllMatches = true
dim replacedText as string = rx.Replace( sourceText )
// replacedText = "AAaaxxbb"
You just have to be careful if you start searching for special characters if you aren’t comfortable with RegEx syntax since it might break the search pattern.
Check out @Kem_Tekinay’s awesome RegExRX app if you want to start dabbling in RegEx.
Good call Tim. I’m just so used to taking whatever RegExRX spits out
Simplified…
dim sourceText as string = "AAaaBBbb"
dim rx as new RegEx
rx.SearchPattern = "B"
rx.ReplacementPattern = "x"
dim rxOptions as RegExOptions = rx.Options
rxOptions.ReplaceAllMatches = true
rxOptions.CaseSensitive = true
dim replacedText as string = rx.Replace( sourceText )
// replacedText = "AAaaxxbb"
I ended up creating a function that I can call that loops through the string and compares each character. To me, the RegEx is even more convoluted and difficult to read than looping through the code.
But am I crazy about remembering that myString = “k” used to be case sensitive? I don’t remember using Replace in the past, so I don’t know if it has changed over time or not.
Replace has always been case insensitive in Xojo for as long as I remember.
You could use String.ReplaceBytes which is case sensitive.
From the docs
ReplaceBytes is case-sensitive; it treats sourceString as a series of raw bytes. It should be used instead of Replace when the String represents a series of bytes or when your application will run in a one-byte character set (such as the US system) and you want case-sensitivity.
That line in the documentation really should stop right there. ReplaceBytes really should only be used when you are treating a string as data - not human-readable text. It can damage a string which is encoded in any multi-byte encoding, such as the virtually ubiqutious UTF-8. Please do not use it for this particular situation.
Make a note to come back to this once you have your overall system going and you are considering where to optimize your code. Using a RegEx will be many, many times faster than looping over the characters.
I have a String extensions module I constantly maintain (use in every project) that handles this. It will take your inputs, escape any regex metacharacters, and do case-sensitive replacements using Regex.
Add the following method to a Module and you can use it like:
Var original As String = "Hello World, hello universe!"
Var result As String = original.ReplaceCaseSensitive("hello", "hi")
Public Function ReplaceCaseSensitive(extends originalString As String, matchString As String, replaceString As String, Optional maxReplacements As Integer = -1) As String
// Case-sensitive string replacement
//
// Input Parameters:
// originalString
// matchString - Substring to find and replace
// replaceString - String to replace matches with
// maxReplacements - Maximum number of replacements to make:
// -1 (default) = replace all occurrences
// 0 = replace none (returns original string)
// >0 = replace up to this many occurrences
//
// Output:
// Returns a new string with the specified replacements made
If matchString = "" or originalString = "" or maxReplacements = 0 Then Return originalString
Try
// Escape regex metacharacters
Var escapedMatch As String = matchString.ReplaceAll("\", "\\") // backslashes have to be first
escapedMatch = escapedMatch.ReplaceAll(".", "\.")
escapedMatch = escapedMatch.ReplaceAll("^", "\^")
escapedMatch = escapedMatch.ReplaceAll("$", "\$")
escapedMatch = escapedMatch.ReplaceAll("*", "\*")
escapedMatch = escapedMatch.ReplaceAll("+", "\+")
escapedMatch = escapedMatch.ReplaceAll("?", "\?")
escapedMatch = escapedMatch.ReplaceAll("{", "\{")
escapedMatch = escapedMatch.ReplaceAll("}", "\}")
escapedMatch = escapedMatch.ReplaceAll("[", "\[")
escapedMatch = escapedMatch.ReplaceAll("]", "\]")
escapedMatch = escapedMatch.ReplaceAll("|", "\|")
escapedMatch = escapedMatch.ReplaceAll("(", "\(")
escapedMatch = escapedMatch.ReplaceAll(")", "\)")
Var regex As New RegEx
regex.SearchPattern = escapedMatch
regex.ReplacementPattern = replaceString
regex.Options.CaseSensitive = True
If maxReplacements = -1 Then
// Replace all
regex.Options.ReplaceAllMatches = True
Return regex.Replace(originalString)
Else
// Loop to maxReplacements
regex.Options.ReplaceAllMatches = False
Var result As String = originalString
Var replacementCount, searchOffset As Integer = 0
While replacementCount < maxReplacements
Var match As RegExMatch = regex.Search(result, searchOffset) //start from current offset
If match = Nil Then Exit While //we're done
Var matchStartByte As Integer = match.SubExpressionStartB(0) // Convert byte index to character index in case of UTF-8 multibyte characters
Var matchedText As String = match.SubExpressionString(0)
Var beforeMatch As String = result.LeftBytes(matchStartByte)
Var matchStartChar As Integer = beforeMatch.Length
Var matchLengthChar As Integer = matchedText.Length
// Replace this match
Var beforeText As String = result.Left(matchStartChar)
Var afterText As String = result.Middle(matchStartChar + matchLengthChar)
result = beforeText + replaceString + afterText
replacementCount = replacementCount + 1
// Update search offset to pick up after the replacement
Var newPrefix As String = beforeText + replaceString
searchOffset = newPrefix.Bytes
// Holy infinite loops, Batman! Double check our bounds.
If searchOffset >= result.Bytes Then Exit While
Wend
Return result
End If
Catch e As RegExException
//This returns the original string if there was a problem.
//Maybe raise an exception here instead?
Return originalString
End Try
End Function
Are you converting your string to a 1-byte text encoding before doing the replace? If not, you run the risk of damaging the string by manipulating its bytes directly.
You need to read about text encodings to really understand what the issues are here, because you’re setting yourself up for weird problems with what you’re doing.
Yes, but he’s specifically dealing with text - not bytes. And we don’t know that all of his text is within the ASCII range; in fact, it is very unwise to make that assumption when dealing with text.
The easiest rule to express to a programmer who doesn’t understand text encodings is to ignore all the string functions that deal with bytes directly; they can and will cause problems and limitations down the line. They should treat strings as opaque data types that store text using any style of arbitrary byte patterns that can’t be manipulated in a byte-by-byte fashion without risking damage to the data.
Of course, the better approach is for them to understand text encodings so they can write better code that does what they want.
You may have missed the note where I listed the ASCII codes that I’m working with.
I only have a subset of the uppercase and lowercase characters that I have to convert.