Locale rules: where are they?

Japanese end-users are having trouble with string.uppercase instructions in my code. “a…z” are not converting to “A…Z”. I suspect a Locale rule for Japanese is being applied to mixed Japanese + Western text. Am I right? Where can I find the Locale rules? (Yes. Read the documentation! But where?)

My solution so far is to write my own uppercase routine, which loops through the string as bytes and changes relevant ones.

What do you mean with locale rules? Xojo uses the ICU library for text.

Thanks, Beatrix.

Xojo documentation says:

" Uppercase(Optional locale As Locale = Nil) As Text

Returns a new Text value that has its characters uppercased. If the locale parameter is non-Nil, it will use that locale’s rules when performing the operation."

The ICU (unicode) library controls the characters. Does it also control the uppercase-lowercase pairing?

Following the Link to the “Locale” Documentation, there’s a Link to the UNICode Organization and there you will find the Rules: ICU Demonstration - Locale Explorer

1 Like

Sorry, I have used Xojo too long. Never seen that before.

You could try this:

If locale.current.identifier.beginsWith("ja") then
Value = value.upperCase(locale.raw)
Else
Value = value.UpperCase
End if

Thanks, Sascha. Yes, those are Japanese characters, and some indications of Western alphabets, such as “UTC” and “E”.

So do we infer that for languages without upper/lower case, string.uppercase will be inactive for any included Latin text ?

Does anyone have experience of this situation?

Do you have some example text you can share?

Hi kevin:

I used a generic English sentence in a plain text file. string.uppercase works fine in my Xojo app on my English-US Windows Desktop computer. “c” changed to “C”.

An end-user in Japan installed the Xojo app in his Windows Desktop computer. Launched the app with my generic sentence. “c” stayed “c”.

I rewrote the app omitting string.uppercase and looped down the string instead, conceptually: asc(midb(…))-asc(“a”)+asc(“A”). Then “c” changed to “C”.for me and in Japan.

I must admit, i’ve not been able to repeat the problem on Windows 10 or macOS when using a string defined in code (not tried from a file).

Have you tried the v1 API Uppercase function to see if that operates differently?

1 Like

Thanks kevin. Same here. I cannot replicate the problem on my computers. The Japanese end-user is in a production environment so just wants the app to work (which it does with my kluge).

Asian Xojo developers: have you experienced any problems with string.uppercase?

When you read the file are you setting the encoding to UTF-8?

Hi kevin g:

Great idea! But the Latin letters are in a string with local text (in this case Japanese), so I cannot assert a particular encoding. This app has end users with many different local scripts.

Yes, it would be great if the world standardized to UTF-8 (and also to one replacement 18 volt rechargeable battery for all battery operated tools and devices), but …

You can assert UTF-8 (we do and we support Chinese, Japanese, Korean & Thai amongst other languages).

If the text is being typed into a Xojo edit field then it is probably UTF-8 (since that is Xojo’s default encoding).

If the text is coming from somewhere else then it might be a different encoding such as SHIFT JIS which Xojo might not be handling correctly,

Thanks, k.g.,

The text is coming from a file which is probably the local language + Latin letters + Arabic numbers. I am not specifying the encoding. If Xojo is assuming UTF-8 then surely uppercase should work correctly for the Latin words. Perhaps Xojo is guessing the encoding (from the Windows settings?), getting it correct for the local language. but wrong for the Latin letters.

I would have thought uppercase and encoding are two different issues. UTF-8 can perfectly well handle most local languages + Latin letters + Arabic numbers (especially since ASCII is included).

But if you can only specify one locale to .UpperCase, then I would expect that characters for that locale only will be what will be uppercased.

If you aren’t doing anything with the encoding then that could be the cause of the problem as Xojo might be guessing it incorrectly.

I would get your software to dump out the encoding of the string after reading the file and also dump out the value of each string byte.

That’s what I initially thought but when I ran some tests it worked correctly.

Xojo does not attempt to guess the encoding. If you read it from a file and do not specify an encoding, the encoding will be Nil. That probably renders Uppercase useless.

Hi Tim. I have realized through problems with .uppercase and .trimright that VB6 rules do not apply to Xojo. So now I write my own methods.