String.Left fails with nil encoding on a Korean-language macOS system

I’m finding a case where some string commands such as String.Left() or String.Instr() are failing, but only when the string has Nil encoding and the system language is set to Korean. There are several workarounds: fix the string encoding, in this case to UTF8, or use the binary versions of String.LeftB() or String.InstrB()

But I’m wondering if this is a known bug? I’m seeing this with Xojo 2019r1.1.

Failing how?

The code looks something like this:

dim S as string = ... // (from a memoryblock, so it has Nil encoding)
dim x as integer = S.instr("<tag1>")
if x > 0 then
    S = S.left(x-1) + "</tag2>"
end if

When run on macOS with korean langauge set as primiary, the resulting strong is malformed - I haven’t yet figured out if it’s the Left() or Instr() that’s failing, or perhaps the String + (concatenation) operation that’s to blame.

Ugh, I’m guessing it’s the bug reported 2+ years ago: <https://xojo.com/issue/54638>
54638 - Mid function may return unexpected length string on systems using an multibyte language

I’ve verified the failure - it’s basically what’s seen in <https://xojo.com/issue/54638> however it’s also affecting String.Left() as well as String.Mid() - with a nil-encoded string, on a Korean macOS system, the wrong # of bytes is returned.

Edit: i just added my points to this bug, and it’s now ranked #49. Please vote this up if you want this fixed!

The reason this bug is so nasty is that it also depends on the contents (possibly the length?) of the string as well. Very subtle and hard to catch. I only found out about this because some Korean students trying to use my software found the bug and emailed me.

Thins kind of bugs, where the very basic data types are not working, should be a priority not requiring points. 2 YEARS reported :expressionless:

Absoltely Agree, though I can understand how it didn’t get fixed, given limited resources: it only affects Asian systems on macOS and only some strings trigger the behavior.

Just out of curiosity, since Left relies on the string being properly encoded, why aren’t you just using LeftB or LeftBytes instead?

Actually this is a good question. More generally, how can any of the methods which are counting characters work on a string without an encoding?

The crux is that the behavior is ill-defined when the encoding is Nil, and the behavior is not documented in any way I can find [ which is probably worth a Documentation update feedback request too.]

My assumption (shared by others) is that Nil encoding is treated as a bag of bytes, but clearly that’s not true in certain situations. I wonder if this is a new bug or goes back all the way to REALbasic?

The docs clearly state that Left works with characters and LeftB and LeftBytes work with individual bytes. The fact that Left does something undefined when no encoding is applied one you expect it to be able to seems to point to a general misunderstanding of how important encodings are when it comes to strings of printable characters. You should never try to work on a string of text that you’ll be showing to a user without an encoding.

I did more testing, and the plot thickens:
I have a simple app which runs this code:

  Dim s as string = "Audio and Video © by their respective owners"
  
  s = s.DefineEncoding(nil)
  
  dim r as string
  dim s2 as string = s.left(17)
  dim n2 as integer = lenB(s2)
  r = "Encoding: nil Left(17).LenB = " + str(n2) + " '" + s2 + "'"
  
  label1.Text = r

I then test the app on Engligsh and Korean versions of macOS.

  • Built with Xojo 2019R1.1, on Korean systems we see Left(17).LenB = 18.
  • Built with REALstudio 2011R3, on Korean systems we see Left(17).LenB = 17.
  • On English systems, 2011R3 and 2019R1 always return Left(17).LenB = 17

So, somewhere between 2011 and 2019 versions of Xojo the behvior changed. The older behavior (in which nil encoding was treated as a bag of bytes regardless of OS settings) was preferable. The newer behavior is more dangerous.

Was this change intentional?

1 Like