Trouble reading files (incorrect encoding)

Michel_Bujardet · August 18, 2014, 12:34pm

If your text is plain English UTF-16 is really overkill, since you do not need Chinese or Korean in there. Your text string is probably fine. I would try a simple DefineEncoding in the save file, and load without extra encoding intervention. DefineEncoding simply tells your app to use UTF-8 for that particular string it lost its marbles about.

http://documentation.xojo.com/index.php/DefineEncoding

dim t as TextOutputStream = TextOutputStream.Create(f) t.Write(DefineEncoding(text, Encodings.UTF8)) t.Close

Then you use the code in your OP to read the file. Chances are it will work just fine.

If it did not, UTF-16 should not be a problem speedwise on the three platforms. All it does is inflate the size of the file (more bytes per code point). But I suppose we are not talking about War and Peace (huge book), so it probably does not matter.

Eli_Ott · August 18, 2014, 12:44pm

Michel, you’re wrong on that one:

dim t as TextOutputStream = TextOutputStream.Create(f) t.Write(DefineEncoding(text, Encodings.UTF8)) // Wrong: use ConvertEncoding here t.Close
When writing text to a file or database or tcp socket you use ConvertEncoding not DefineEncoding.

DefineEncoding is used to tell Xojo what the incoming data is encoded in - for example if you know a file contains UTF16:

dim t as TextInputStream = TextInputStream.Open(f) Dim text As String = t.ReadAll.DefineEncoding(Encodings. UTF16).ConvertEncoding(Encodings.UTF8) // now text is in UTF8

Michael_Hußmann · August 18, 2014, 3:59pm

Yep, applying DefineEncoding before saving serves no useful purpose your definition may be right or wrong but at this point it would make no difference either way. Use ConvertEncoding before saving and DefineEncoding after reading the text back in.

By the way, both UTF-8 and UTF-16 can encode all the Unicode characters so there is no reason to prefer one over the other depending on the characters required.

Michel_Bujardet · August 18, 2014, 5:17pm

[quote=122105:@Eli Ott]Michel, you’re wrong on that one:
[/quote]

[quote=122170:@Michael Hußmann]Yep, applying DefineEncoding before saving serves no useful purpose your definition may be right or wrong but at this point it would make no difference either way. Use ConvertEncoding before saving and DefineEncoding after reading the text back in.
[/quote]

No issue about being wrong. But… Why is it that Oliver code posted in his OP specifically uses ConvertEncoding(text, Encodings.UTF8) and yet, the resulting file ends up as UCS-2 ?

Michael_Hußmann · August 18, 2014, 6:52pm

I have no idea as it works for me. When I convert to UTF8 before saving and define the text read back in as UTF8, what I get is UTF8, not UCS-2.

Michel_Bujardet · August 18, 2014, 8:46pm

No question about that. I was just trying to understand what was going on in the OP PC. Oliver mentioned he got "the text file it gives reads the file very incorrectly ". He then explained that the file he obtained was showing in NotePad Plus, that program reported it was encoded in UCS-2. I have to trust he reported correctly what was happening.

After checking, his text string was in UTF-16. He solved his problem by setting encoding to UTF-16, so the OP issue is over now. I was simply wondering how convertencoding to UTF-8 a string that was in UTF-16 could have produced UCS-2.

Oliver_Scott-Brown · August 18, 2014, 10:24pm

[quote=122269:@Michel Bujardet]No question about that. I was just trying to understand what was going on in the OP PC. Oliver mentioned he got "the text file it gives reads the file very incorrectly ". He then explained that the file he obtained was showing in NotePad Plus, that program reported it was encoded in UCS-2. I have to trust he reported correctly what was happening.

After checking, his text string was in UTF-16. He solved his problem by setting encoding to UTF-16, so the OP issue is over now. I was simply wondering how convertencoding to UTF-8 a string that was in UTF-16 could have produced UCS-2.[/quote]
Please note that the ConvertEncoding was written in the wrong place so I realised that I did not actually convert the encoding. I got hold of the text from the CustomEditField, that might have something to do with it but I doubt it.

Michel_Bujardet · August 18, 2014, 10:26pm

As long as you solved your issue, it’s cool

Oliver_Scott-Brown · August 18, 2014, 10:26pm

Thanks