I’m getting a lot of text files from all over the world in different languages in different encodings.
I’m detecting the ones I can based on the BOM. When nothing found, I run them through ‘Encodings.xyz.IsValidData(s)’ for all possible encodings and present the user with a list so he can pick the right one.
I have a few questions:
-
If UTF-8 is coming out of isValidData (because the BOM wasn’t set), I present that as the default. If not UTF-8, what are the most common other ones I can present as the default? (I don’t know the language of the text file upfront)
-
The Xojo list of possible encodings is pretty large. Are there encodings that most likely will never happen so I can exclude them? Maybe because they’re very old or used only for very specific stuff?
-
Other tools have nicely formatted names. Is there an easy way I can get names like ‘Western (ISO Latin 1)’, ‘Central European (Windows Latin 2)’ or ‘Western (Mac OS Roman)’? I can map them but I’m not sure if Latin 1 is always Western or Latin 5 is always Turkish. Would be nice if I can get that info from Xojo somehow.
-
Do I need to keep something in mind for x-plat? Are these encodings types coming from the system or is everything handled by Xojo?