Locale rules: where are they?

Mike_Linacre1 · October 23, 2024, 3:18am

Japanese end-users are having trouble with string.uppercase instructions in my code. “a…z” are not converting to “A…Z”. I suspect a Locale rule for Japanese is being applied to mixed Japanese + Western text. Am I right? Where can I find the Locale rules? (Yes. Read the documentation! But where?)

My solution so far is to write my own uppercase routine, which loops through the string as bytes and changes relevant ones.

Beatrix_Willius · October 23, 2024, 5:41am

What do you mean with locale rules? Xojo uses the ICU library for text.

Mike_Linacre1 · October 23, 2024, 6:17am

Thanks, Beatrix.

Xojo documentation says:

" Uppercase(Optional locale As Locale = Nil) As Text

Returns a new Text value that has its characters uppercased. If the locale parameter is non-Nil, it will use that locale’s rules when performing the operation."

The ICU (unicode) library controls the characters. Does it also control the uppercase-lowercase pairing?

Sascha_S · October 23, 2024, 7:30am

Following the Link to the “Locale” Documentation, there’s a Link to the UNICode Organization and there you will find the Rules: ICU Demonstration - Locale Explorer

Beatrix_Willius · October 23, 2024, 8:29am

Sorry, I have used Xojo too long. Never seen that before.

Jeremie_L · October 23, 2024, 8:38am

You could try this:

If locale.current.identifier.beginsWith("ja") then
Value = value.upperCase(locale.raw)
Else
Value = value.UpperCase
End if

Mike_Linacre1 · October 23, 2024, 8:43am

Thanks, Sascha. Yes, those are Japanese characters, and some indications of Western alphabets, such as “UTC” and “E”.

So do we infer that for languages without upper/lower case, string.uppercase will be inactive for any included Latin text ?

Does anyone have experience of this situation?

kevin_g · October 23, 2024, 9:23am

Do you have some example text you can share?

Mike_Linacre1 · October 23, 2024, 10:42am

Hi kevin:

I used a generic English sentence in a plain text file. string.uppercase works fine in my Xojo app on my English-US Windows Desktop computer. “c” changed to “C”.

An end-user in Japan installed the Xojo app in his Windows Desktop computer. Launched the app with my generic sentence. “c” stayed “c”.

I rewrote the app omitting string.uppercase and looped down the string instead, conceptually: asc(midb(…))-asc(“a”)+asc(“A”). Then “c” changed to “C”.for me and in Japan.

kevin_g · October 23, 2024, 1:38pm

I must admit, i’ve not been able to repeat the problem on Windows 10 or macOS when using a string defined in code (not tried from a file).

Have you tried the v1 API Uppercase function to see if that operates differently?

Mike_Linacre1 · October 24, 2024, 1:16am

Thanks kevin. Same here. I cannot replicate the problem on my computers. The Japanese end-user is in a production environment so just wants the app to work (which it does with my kluge).

Asian Xojo developers: have you experienced any problems with string.uppercase?

kevin_g · October 24, 2024, 8:32am

When you read the file are you setting the encoding to UTF-8?

Mike_Linacre1 · October 27, 2024, 2:23am

Hi kevin g:

Great idea! But the Latin letters are in a string with local text (in this case Japanese), so I cannot assert a particular encoding. This app has end users with many different local scripts.

Yes, it would be great if the world standardized to UTF-8 (and also to one replacement 18 volt rechargeable battery for all battery operated tools and devices), but …

kevin_g · October 27, 2024, 10:24am

You can assert UTF-8 (we do and we support Chinese, Japanese, Korean & Thai amongst other languages).

If the text is being typed into a Xojo edit field then it is probably UTF-8 (since that is Xojo’s default encoding).

If the text is coming from somewhere else then it might be a different encoding such as SHIFT JIS which Xojo might not be handling correctly,

Mike_Linacre1 · October 27, 2024, 10:46am

Thanks, k.g.,

The text is coming from a file which is probably the local language + Latin letters + Arabic numbers. I am not specifying the encoding. If Xojo is assuming UTF-8 then surely uppercase should work correctly for the Latin words. Perhaps Xojo is guessing the encoding (from the Windows settings?), getting it correct for the local language. but wrong for the Latin letters.

TimStreater · October 27, 2024, 11:07am

I would have thought uppercase and encoding are two different issues. UTF-8 can perfectly well handle most local languages + Latin letters + Arabic numbers (especially since ASCII is included).

But if you can only specify one locale to .UpperCase, then I would expect that characters for that locale only will be what will be uppercased.

kevin_g · October 27, 2024, 2:39pm

If you aren’t doing anything with the encoding then that could be the cause of the problem as Xojo might be guessing it incorrectly.

I would get your software to dump out the encoding of the string after reading the file and also dump out the value of each string byte.

kevin_g · October 27, 2024, 2:41pm

That’s what I initially thought but when I ran some tests it worked correctly.

Tim_Hare · October 28, 2024, 7:24pm

Xojo does not attempt to guess the encoding. If you read it from a file and do not specify an encoding, the encoding will be Nil. That probably renders Uppercase useless.

Mike_Linacre1 · October 29, 2024, 2:11am

Hi Tim. I have realized through problems with .uppercase and .trimright that VB6 rules do not apply to Xojo. So now I write my own methods.