Encoding problem with Sonoma?

I’m having an encoding problem with Sonoma, at least that’s where I think the problem is.

I have time strings, e.g. “5/12/24 11/54/27 AM” coded as “String”. To save them to a file, I first write them to a MemoryBlock using StringValue:

myblock.StringValue(offset,mytime.Length) = mytime

This has worked fine up until I run it under Sonoma. For some reason, the last 3 characters are now encoded differently. I’ll get “5/28/24 10/46/57���”, with " AM" corresponding to hex “E2 80 AF” instead of “20 41 4D”.

What’s going on here and how can I fix it?

I think I read somewhere that now the space before ‘AM’ is a non-breaking space (or something like that).

Maybe you need to define UTF-8 as the encoding?

1 Like

The unicode specifies that before “AM/PM” the space separator is not the usual " " it is a specific “narrow space”, Xojo unicode : &u202f → " "

11:22:33 PM // normal space, breaks lines
11:22:33 PM // narrow non-breaking one, a little bit closer, it acts more like glue than a divider.

As Alberto said, your encoding seems improper.


From the very useful Unicode/UTF-8 reference at:


we have this entry:


So, nothing to do with Sonoma.

Are you sure?

use this code:

Var dd As DateTime = DateTime.Now
Var test As String = dd.ToString


macOS 13.6.7 (Intel), Xojo 2024r1.1

And this is the same code with macOS 14.5 (M2), Xojo 2024r1.1

Edit: looks like Sonoma now adheres to the unicode standard (what Rick posted above) while previous versions use a normal space.

1 Like

Thanks for the comments. I’ll fix it with encoding.

What’s strange to me is that this has been working for years with all MacOS versions up through Ventura. The only other difference in my testing is that the Sonoma machine is an M2 versus other test machines having M1 or Intel, but I can’t see how that would make a change like this.


Nailed it.

1 Like

Depends on your PoV, I suppose (he said, splitting hairs). I’d call it an encoding problem which Sonoma happens to highlight.

1 Like

Technically there’s no PoV involved.

There are 2 things in this report:

There’s a bug in John’s code he found due to a bad encoding receiving a non-ASCII UTF-8 codepoint.


Sonoma (or Xojo 2024r1.1+ under Sonoma, not sure how this pair handle ICU) updated the ICU/CLDR changing the behavior.

It is advisable that if you “parse” such string, never trust if the separator is &u0020 or &u202f, accept the possibility of being anyone of those.

I was mistaken about the origin of this problem (writing to memoryblock). This unicode character (3 bytes) is output by the DateTime.ToString function. The reason I’m not seeing the AM or PM (just the unknown characters) is that I’m using String.Length for how many bytes to write to the memoryblock and it’s now 2 bytes longer (I guess I should have used String.Bytes). But I want to save it as simple string anyway.

I couldn’t figure out how to change those bytes back to a space character with Encodings. When I tried to use ConvertEncoding, I got a “?”. I ended up using ReplaceAllBytes to do it:

mytime = mytime.ReplaceAllBytes(&u202f, " ")

Maybe there’s a more eloquent way to do it, but this works for now.

I don’t see that you could accomplish that with ConvertEncoding, unless you tried to convert to ASCII. But then it would depend how clever ConvertEncoding is.

1 Like

Fixing this seemed to me to be a big waste of time, but I’m sure all those people who lost sleep over the huge amount of wasted space before “AM” or “PM” will be happy now that Sonoma has come along. :smiley:

Do you really need memoryblocks to do whatever you are doing? Are you just appending strings? Sometimes users use shotguns to kill flies, that’s why I ask.

Sometimes more direct things like these are enough:

Var someDateTime As DateTime = DateTime.Now

Var tf As TextOutputStream = TextOutputStream.Open(SpecialFolder.Desktop.Child("dates_test.txt"))

For offsetDays As Integer = 1 to 10
  tf.WriteLine someDateTime.AddInterval(0,0,offsetDays) _       // next day
  .ToString(new Locale("en-us"), DateTime.FormatStyles.Short) _ // m/d/y h:m:s am/pm
  .ReplaceAll(&u202f, " ")                                      // user prefer normal space


1 Like

Ha ha. I’m saving dozen or even hundreds of “songs” which are MIDI performances containing possibly thousands of 3-byte MIDI events including timestamps. I found it to be way too slow to save it all writing to a stream so I build my file in a memory block. What took 25 seconds before now takes about 1 second.

Thanks for your more direct example. I try to do that but often forget.

25x slower is weird, a BinaryStream is supposed to be fast and use buffers (a memoryblock) internally to speed things up, flushing its contents over time. Should not be so slow. Something seems not well tuned in the Xojo framework.

Part of my optimization was achieved by using a memoryblock in each song to hold the MIDI event data instead of an array of thousands of MIDI event classes. When those events are needed for playback, the time to parse it into an array is negligible. That might explain much of the speed difference. The 25x difference was extreme when testing with hundreds of thousands of songs resulting in a file of several hundred MBs. To use it as an example may have been a bit dramatic. Maybe it took 2 seconds. :smiley:

1 Like