Text encoding of PopupMenu

Richard_Francis · March 23, 2023, 6:16pm

I have a problem with handling accented characters (or other characters such as the Scandinavian o with a slash through it: ø) in PopupMenus. They work fine throughout the rest of the app, including reading and writing TextFields, Drawing text in a Graphics, writing in a TextArea and saving and reading from an SQLLite database. But in a PopupMenu they do not work (I am filling the menu using AddRow).

The one I aim having trouble with at the moment is the lower case o with slash, which is given as hex C3B8 (2 bytes!) in the debugger.

If I try this:

enc = Stations(i).Name.Encoding
stn = Stations(i).Name.ConvertEncoding(Encodings.UTF8)
enc = stn.Encoding
PopupMenuStations.AddRow(stn)
enc = PopupMenuStations.SelectedRowValue.Encoding

Stations(i).Name contains the name “Vardø”

The first time enc is set, it is Nil
The second time it is Nil as well (and it is here we can see the character showing up as C3B8)
The third time it has a value, with the InternetName being US-ASCII

If I change the second line so that the .ConvertEncodings part is removed, the result (including the C3B8 representation) is identical.

What is going on and how can I get PopupMenu to behave like the rest of the app?

Thanks,
Richard

kevin_g · March 23, 2023, 6:22pm

If the encoding is Nil then ConvertEncoding won’t know how to convert it to UTF-8.

If the data is valid UTF-8 but for some reason doesn’t have an encoding try using DefineEncoding instead of ConvertEncoding. That should really be done when the data is read into Xojo though rather than when adding it to the popup menu.

Richard_Francis · March 23, 2023, 7:15pm

Thanks for the response, I’ll try to work with that.

But please note (I probably wasn’t sufficiently clear before) that these data are not read into Xojo, they are entered as text into a Textfield and then used. They go around the houses a bit, but it’s all internal.

I notice that there was a now-closed thread on this same topic back in 2019 which never had a satisfactory resolution.

I wonder if there is something weird with PopupMenus?

Richard_Francis · March 23, 2023, 7:23pm

Anyway, thanks for the tip – it did work

Why it was necessary, I don’t know!

AlbertoD · March 23, 2023, 7:24pm

If you can create a sample project open an Issue with it.
Or share here.
That way others can easily reproduce the problem and help you.

Richard_Francis · March 23, 2023, 8:03pm

I did make a very simple one (defining 2 strings and loading them into a PopupMenu), and that one worked OK …

In any case, though I don’t understand it, the tip from kevin g was a solution for me.

Cheers,
Richard

TimStreater · March 23, 2023, 10:37pm

This is UTF-8. From Unicode/UTF-8-character table (the Latin-1 supplement page):

U+00F8 ø c3b8 LATIN SMALL LETTER O WITH STROKE

Richard_Francis · March 23, 2023, 10:50pm

Thanks. I did wonder about that, but as in various ASCII tables it has a single byte vale (though not either of those) I wasn’t sure.

Arnaud_N · March 24, 2023, 7:40am

Keep in mind that if you just make a simple assignment anywhere in your app (such as MyProperlyEncodedString=MyProperlyEncodedString+AWeirdString), the encoding can fairly be lost. It’s easy to encounter this situation.

kevin_g · March 24, 2023, 7:43am

I would check the code that manipulates those strings as it looks like you are doing something that trashes the encoding.

Could you be using the B / Byte functions or possibly concatenating a string that uses a different encoding?

Emile_Schwarz · March 24, 2023, 7:52am

Look at wikipedia what is ASCII: 7 bits defined characters; essentialli a-z, A-Z,0-9 and some other 1 Byte characters. Diacritic vowels and other ø are not (and never were) ASCII.

And, unlike what some wrote on the Internet, “Extended ASCII” never existed too.

TimStreater · March 24, 2023, 10:28am

ASCII goes from 00 to 127 (hex 00 to 7F). Anything between hex 7F and FF is NOT ASCII, it’s someone’s attempt to have an extended character set containing their idea of other common characters, with the whole only occupying one byte each. Sensible people ignore these, and only use UTF-8. ASCII forms the first 128 chars of UTF-8 and are the only one-byte characters in UTF-8. Other characters are 2, 3, or 4 bytes long.

Emile_Schwarz · March 24, 2023, 10:34am

etc.

Isn’t it what I wrote above ?

Ian_Kennedy · March 24, 2023, 11:49am

If you read strings from anywhere and get a nil encoding you should use define encoding to tell Xojo what is in use. If you then want UTF-8 and haven’t got it you can only then use ConvertEncoding to change the string as desired. For example if you read a string in WindowsLatin1 and want UTF-8 output, and the source fails to identify it as WindowsLatin1. You would so:

Var myUTF8String as string = SourceString.DefineEncoding( Encodings.WindowsLatin1 ).ConvertEncoding( Encodings.UTF8 )

Emile_Schwarz · March 24, 2023, 11:55am

The OP stated earlier:

But please note (I probably wasn’t sufficiently clear before) that these data are not read into Xojo, they are entered as text into a Textfield and then used. They go around the houses a bit, but it’s all internal.

He does not read the characters from a file or internet or siri/cortana/vanessa/whoever.

Ian_Kennedy · March 24, 2023, 12:05pm

Fairy nuf (sorry, I should say “fair enough”)

Monzer_El-Dakkak · December 8, 2023, 1:16am

I struggled for a long time to get the Swedish characters ÄÅÖ, from CSV file to Listbox. This solved my problem.
st = split(s.DefineEncoding( Encodings.WindowsLatin1 ).ConvertEncoding( Encodings.UTF8 ), EndOfLine)

Thanks

Robert_Livingston · December 8, 2023, 6:54pm

What works for me is just to have constants for the “out of ASCII” characters that I might need. I just store them in some module so they are global. I can use a name that resonates with me rather than the “official name”.

Const SWEDISH_LOWER_O As String = "ø"
Const SWEDISH_UPPER_O As String = "Ø"

Then I just construct the strings that I need for items in the IDE or elsewhere.

It is all UTF8 and I do not have to convert anything or worry about how many bytes etc.