ReplaceAll acts as ReplaceAllBytes when search string's encoding is Nil

Does anyone else think this is a bug?

https://tracker.xojo.com/xojoinc/xojo/-/issues/77262

In the following code:

dim NBSPAsBytes As String

NBSPAsBytes=ChrB(&hE2) + ChrB(&H80) + ChrB(&HAF)

dim UTF8NBSP as string

UTF8NBSP=DefineEncoding(ChrB(&hE2) + ChrB(&H80) + ChrB(&HAF), encodings.UTF8)

UTF8NBSP=UTF8NBSP.ReplaceAll(NBSPAsBytes, "@")

…the properly encoded UTF-8 non-breaking space character is replaced with the “@” character when searching for the non-breaking space character as bytes. The character-as-bytes string has a Nil encoding.

I don’t think that’s a bug. Xojo has to normalize both strings to the lowest common denominator, which is Nil. From your description, it sounds like it is finding the 3-byte sequence and replacing it with “@”.

2 Likes

I very much doubt that Xojo is downgrading that UTF-8 string to Nil to perform the replacement using ReplaceAll, because that’s actually exactly what ReplaceAllBytes is supposed to do (consider bytes instead of characters). ReplaceAll is explicitly supposed to be encoding aware.