textEncoding fails with russian text

My app loads RTF-documents to a TextArea. All works properly, until I try to open a file with russian text. In that case, as line like “??? ???, ? ??? ???” suddenly looks like " , ".

The encoding of the TextArea is prompted as “2” equals UTF8, after I load the file to the TextArea. (TextArea1.Value.Encoding.Format.ToString)

Strangely, if I force-save the RTF-document in apple TextEdit to UTF8, suddenly the text appears correct in my TextArea. Must be some mismatch of encodings? How to find out what causes this? Thanks for any advice.

What encoding that file is set to ?
(rtf code asked)

Do you really read the text from the file UTF8 encoded?

t = TextInputStream.Open(f) t.Encoding = Encodings.UTF8 TextArea1.Value = t.ReadAll

or do you read it without encoding and then try to set the encoding in the TextArea?

Insert “óíèâåðñèòåò çàêîí÷èëà, ñ îòëè÷èåì òîæå” in your text and redo the process :frowning:

[quote=478098:@Sascha S]Do you really read the text from the file UTF8 encoded?

t = TextInputStream.Open(f) t.Encoding = Encodings.UTF8 TextArea1.Value = t.ReadAll

or do you read it without encoding and then try to set the encoding in the TextArea?[/quote]

I’ve tried both.

Maybe the textfile is not UTF8 encoded but something like Windows-1251?
That would explain why it works after you have saved the file with an UTF8 encoding.

[quote=478108:@Sascha S]Maybe the textfile is not UTF8 encoded but something like Windows-1251?
That would explain why it works after you have saved the file with an UTF8 encoding.[/quote]
Why would Xojo then tell me it is UTF8? Is there a trial and error procedure, to find out if that is the case?

Many apps fail to detect the correct encoding, because of various reasons i’ve read somewhere here in this forum. Give it a try :slight_smile:

No matter to what I encode the text, TextArea1.Value.Encoding.Format.ToString always gives out “2”:

t.Encoding = Encodings.UTF8 t.Encoding = Encodings.WindowsCyrillic t.Encoding = Encodings.DOSRussian t.Encoding = Encodings.MacCyrillic t.Encoding = Encodings.ISOLatinCyrillic

Do you know what an rtf file is ? How it is built ? It have an encoding included in its header ?

Somehow i’ve missed the info from the first post, saying that it’s a RTF (Rich Text Format?, Rich Text?, Enriched Text?) file. :slight_smile:

Can’t test it right now:

t = TextInputStream.Open(f) t.Encoding = Encodings.UTF8 // (Or any other fitting encoding) TextArea1.StyledText.RTFData = t.ReadAll

Maybe it’ll work?

[quote=478126:@Sascha S]Can’t test it right now:

t = TextInputStream.Open(f) t.Encoding = Encodings.UTF8 // (Or any other fitting encoding) TextArea1.StyledText.RTFData = t.ReadAll

Maybe it’ll work?[/quote]
Thanks, Sascha. That is the code I am using.

@Emile: I checked the headers of the documents:
{\rtf1\adeflang1025\ansi\ansicpg1251\uc1\adeff31507\deff0\… // <— this one is not working
{\rtf1\ansi\ansicpg1252\cocoartf1561\cocoasubrtf600 // <— this one is working

Sasha was right ! Your file have that encoding.

Did you try to set the TextArea text to the UTF-8 Encoding ?
something like:

TextArea1.Value.Encoding(UTF8) // <-- not real code/ from memory

Maybe Xojo is really miss-interpreting the RTF source. I’d report it via Feedback and in the meanwhile, maybe convert the RTF files using an external/shell tool to convert it to plain text before using it in your App?

On macOS you could do textutil -convert rtf test.txt. But i do not know if this is still valid for Catalina f.e… :slight_smile:

Thanks Sascha. It is really discouraging though.

Give the Forum a bit more time to help you solve it. The Devs are (most/all) all located in the US and will (wake up and) enter this Forum soon. I am sure, there are far more skilled people here which may be able to help you even better than Emile and i could. :slight_smile:

If you share an RTF sample, someone can test with that file and may come with a solution.

The rtf below is displayed correctly using the old declare / Xojo 2015r1.

I build it using TextEdit (open rtf as text), and pasting Dodo shared data.

[code]{\rtf1\adeflang1025\ansi\ansicpg1251\uc1\adeff31507\deff0
{\fonttbl\f0\froman\fcharset0 Times-Roman;}
{\colortbl;\red255\green255\blue255;\red0\green0\blue0;}
\paperw11900\paperh16840\margl1440\margr1440\vieww15840\viewh19200\viewkind0
\deftab720
\pard\pardeftab720\sl280\partightenfactor0

\f0\fs24 \cf2 \expnd0\expndtw0\kerning0
\outl0\strokewidth0 \strokec2 \uc0\u1091 \u1085 \u1080 \u1074 \u1077 \u1088 \u1089 \u1080 \u1090 \u1077 \u1090 \u1079 \u1072 \u1082 \u1086 \u1085 \u1095 \u1080 \u1083 \u1072 , \u1089 \u1086 \u1090 \u1083 \u1080 \u1095 \u1080 \u1077 \u1084 \u1090 \u1086 \u1078 \u1077 }[/code]

And the traditional (read code below) works fine with the rtf above:

[code] Dim TIS As TextInputStream

TIS = TextInputStream.Open(GetFolderItem(“Data in Russian.rtf”))

TA_Notes.StyledText.RTFData = TIS.ReadAll
TIS.Close[/code]

Now, I am asking myself what was the problem ?