Localeidentifiers supported

When instantiating a new locale, according to the documentation:

you can pass a string holding the localeidentifier.

The documentation points to the following ISO Language Code table for localeidentifier lookup:

My WebSession languagecode is set to ‘en’, which is also available in the above ISO Language Code table.

However,

Var locale As New Locale("en")

Raises an InvalidArgumentException with Reason “en is not a valid locale”.

It seems only a subset of the table mentioned in the Documentation is supported.

Do I need to create my own lookup table, or is there another method?

First of all, remember that locale is based on the server, not the browsers because that’s where the binary runs.

In a web app, i seem to remember that we had to use the language & locale codes because that’s what the browsers do… like en-US

IIRC You can get the user’s code from Session.LanguageCode.

Note: the docs for that property say this:

You can either use raw IETF language codes or the constants from the Localization module.

The browser passes the language code in the Accept-language header. My personal language setting in the browser is set to the following list:

English
German
English (United Kingdom)
English (United States)
Dutch

Resulting in the following Header:

accept-language
en,de;q=0.9,en-GB;q=0.8,en-US;q=0.7,nl;q=0.6

This results in a WebSession with a Session.LanguageCode = ‘en’

And that languagecode is not accepted by the Locale class constructor.

Or file an Issue.
Looks like the implementation is wrong.
Reading the IETF: RFC 5646 - Tags for Identifying Languages it says:

2.1. Syntax A language tag is composed from a sequence of one or more “subtags”, each of which refines or narrows the range of language identified by the overall tag.

Seems like Xojo is limited to <language>-<region>

Appendix A.
Examples of Language Tags (Informative)
Simple language subtag:
de (German)
fr (French)
ja (Japanese)

Note:

also fails on Desktop apps.

It should work with “en” and other languages that just report 2 letters.

1 Like

Filed an issue:

https://tracker.xojo.com/xojoinc/xojo/-/issues/79725

The reason Xojo uses <language>-<region> has more to do with localized constants and what the desktop operating systems require than anything else. That was around long before the web framework or the Locale class.

“en” by itself can’t work. If you just look at the differences between en-US and en-GB you’ll understand why. For example…

  • Dates: United States uses m/d/yyyy, Great Britain uses d/m/yyyy
  • Currency: US uses $, GB uses £

So you must use the region when getting a Locale. This is not a bug.

3 Likes

We also spell things differently. Colour vs Color for example. I know that doesn’t change the locale but it does point out that ‘en’ isn’t qualified enough.

1 Like

That’s a good point though. You may find that in the long form dates things are spelled differently too.

IIRC we had to jump through some hoops to get the WebDatePicker control to localize for this very reason. There are some languages that could work because there was only one region and others that could not.

1 Like

If your web app runs on a Linux server, you could run

locale -a

To print the list of installed locales.

If your locale doesn’t include the region you are going to have to get it from somewhere.

Idea:

  1. For some languages, the region code will be the same as the language code but uppercase.
    eg:
    de-DE
    fr-FR

  2. Parse the list of installed locales and find one that starts with the language code and also includes a region code.

I could not write a better answer.

en-US and en-UK are not alone, Portugal and Brazil locals are different too. I am sure they are not alone.

I often use en-DE, English language, but German formatted units.

Having links to at least 2 documents that mention only language as a possible locale.

Mention to the IETF Language Codes where they mention the use of one or more “subtags” where ‘en’ is perfectly valid.

Having at least 2 languages that offer no region:

  • eo for Esperanto
  • ts for Tsonga

(I know the languages may not be used by anybody).

Forcing Catalàn to use Spain as a region (the only region available) when they are trying to separate from Spain.

Forcing other languages with only 1 region to use the region (makes no sense). For example:

  • id must use id-ID
  • is must use is-IS

Original RFC for languages, circa 1995 mentions the use of en, fr, de, etc.
RFC1766 - Tags for the Identification of Languages

At least the bug is that Xojo is using links to official documents that mention the use of only Language and not region as valid and Xojo can’t do that. There is no warning that will not work.

After doing some tests with Catalàn, that should only be available as ‘ca’ and ‘ca-SP’, Xojo is happy to use ‘ca_US’ and ‘ca-US’.

I guess that as a workaround: if you get only 2 letters from the browser, you can add ‘_US’ (or ‘-US’), to make it work.

Note: eo-US and ts-US are accepted by Xojo, not tested with string localization, but I remember that didn’t work with Portuguese Brazil recently.

Edit: tested with Desktop, not sure if Web will be different.

If that’s the case then it’s still not a Xojo bug. You would need to look at the library that they’re using for locale and tell them about your problem. I believe it’s libicu.

Sorry that we are going this cycle that ‘is not a Xojo bug’ when the docs here say:

Creates a Locale with the given localeIdentifier.

As an example, “en-US” is used as a localeIdentifier for English in the United States.

Links to look up codes:

A RuntimeException is raised when an invalid localeIdentifier is used.

I visit the first link and I see:

so it looks like en and eo can be used, because there is no information on that link that a region is needed.

Besides that if I use this code:

Var MyLocale as New Locale("en-MX")

what should I expect as ‘en-MX’ is not on the list?
For me, reading the documentation and following the links and see the tables, I expect to get a RuntimeException. No?

Edit: I know Xojo will not change this (it is as designed), maybe they can add a note on the documentation at least?

There’s a simple solution to this in a web app. You could do something like this:

Dim languageCode as string = session.languageCode
Dim languages as String = Session.Header("Accept-Language")
Dim rx as new RegEx
Rx.searchpattern = "([a-z]+-[A-Z]+)"
Dim rm as new regexmatch = rx.search(languages)
If rm <> nil then
  LanguageCode = rm.subexpressionstring(1)
End if

Now what I don’t know is whether this will muck with xojos dynamic constant stuff, so it might be better to be exposed as a computed property on Session that you only look up once.

While I haven’t included it, the q values in that string are a “quality” of the guess. So in your case, the browser is saying 90% sure that the user wants English and then German, 80% for en-GB and then 70% for en-US, 60% nl (Dutch/Flemish).

Thanks for al the suggestions and insights. It is now clear why the two letter language code will not work without a region code.

I do find however that the documentation should be clear about this, directly stating that you need to have a region code with the language code; also removing incorrect information that suggests you can use the Session.LanguageCode directly to construct a new Locale.

2 Likes