textEncoding fails with russian text

Paul_Lefebvre · March 6, 2020, 8:30pm

[quote=478127:@Dodo Hunziker]{\rtf1\adeflang1025\ansi\ansicpg1251\uc1\adeff31507\deff0\… // <— this one is not working
[/quote]
cp1251 is Encodings.WindowsCyrilic.

I would expect this to work for a file with that encoding:

t = TextInputStream.Open(f) t.Encoding = Encodings.WindowsCyrillic TextArea1.StyledText.RTFData = t.ReadAll

I think you should check TextArea1.Value.Encoding.InternetName, not Format.

Dodo_Hunziker · March 7, 2020, 5:58am

This is a sample of a failing document:

{\\rtf1\\ansi{\\fonttbl{\\f0\\fnil Arial;}}
{\\colortbl\\red0\\green0\\blue0;}
\\paperw11906\\paperh16838\\margl1417\\margr1417\\margt1417\\margb1134
\\jexpand
{\\f0 \\fs24 \\ul0 \\b0 \\i0 \\cf0 {\\f0 \\fs24 \\ul0 \\b0 \\i0 \\cf0 #20:52:49-5# \\u1048\\'3f\\u1074\\'3f\\u1072\\'3f\\u1085\\'3f: \\u1085\\'3f\\u1091\\'3f, \\u1074\\'3f\\u1089\\'3f\\u1077\\'3f? \\u1053\\'3f\\u1091\\'3f, \\u1079\\'3f\\u1072\\'3f\\u1083\\'3f\\u1072\\'3f\\u1079\\'3f\\u1100\\'3f. \\u1047\\'3f\\u1072\\'3f\\u1083\\'3f\\u1072\\'3f\\u1079\\'3f\\u1100\\'3f. \\u1058\\'3f\\u1099\\'3f \\u1074\\'3f\\u1086\\'3f\\u1090\\'3f \\u1090\\'3f\\u1072\\'3f\\u1082\\'3f \\u1087\\'3f\\u1086\\'3f\\u1077\\'3f\\u1076\\'3f\\u1077\\'3f\\u1096\\'3f\\u1100\\'3f, \\u1074\\'3f \\u1089\\'3f\\u1072\\'3f\\u1087\\'3f\\u1086\\'3f\\u1075\\'3f\\u1072\\'3f\\u1093\\'3f? }\\par}
{\\f0 \\fs24 \\ul0 \\b0 \\i0 \\cf0 \\par}
}

@Paul Lefebvre : t.Encoding = Encodings.WindowsCyrillic doesn’t work either. And thanks for the hint with .InternetName!

Emile_Schwarz · March 7, 2020, 11:08am

According to the internet, ansi is ISO-8859-1.

Someone knows what Encodings.<Name> can be ?

Read:
https://www.oreilly.com/library/view/rtf-pocket-guide/9781449302047/ch04.html

Probably a lack in Xojo.

RTF Language Reference talks about the ANSI encoding:

rtf15_spec .

kevin_g · March 7, 2020, 11:58am

\u means Unicode so the charset / codepage commands are not relevant for those characters.
Maybe the Xojo RTF parser doesnt understand \u

Emile_Schwarz · March 7, 2020, 12:54pm

In my generated rtf file, \u is used and the file is correctly displayed.

Not that.

kevin_g · March 9, 2020, 11:49am

I suggest you read the RTF file using a BinaryStream as you want to pass the data to the RTF parser exactly as it is in the file.

However, I think I see another problem (Xojo bug).

If I assign Dodo’s RTF to a text field this is what I get:

[quote]#20:52:49-5# ???: ???, ??? ???, ???. ???. ??? ??? ??? ???, ?? ???
[/quote]

I assume this is the correct output:

[quote]#20:52:49-5# ???: ??, ??? ??, ???. ???. ?? ??? ??? ???, ? ???
[/quote]

It looks like Xojo is not processing \u correctly.

\u was designed to allow a number of ANSI characters to follow the command so that non-Unicode compliant RTF readers could display something. Unicode compliant readers skip those characters.

The number of characters is controlled by the \uc command. If \uc isn’t present then 1 should be assumed.

This means that all of the \'3f characters following the \u command should be getting skipped but they aren’t.

@Emile Schwarz - Your example works because there are no characters to skip (\uc0).

so… Once you have solved the problem with reading the data from the file you might have to solve this one next.