Writing French accented text using "TextOutputStream" creates odd entry

Hi Folks,

Given a text string like:

Cte D’azur

I end up with a literal “?” in the text:

C?te D’azure

The text is correct in the debugger and I use the following code to create the entry:

    curPath = ConvertEncoding(lbBackupSelected.List(counter), Encodings.SystemDefault)
    // curPath is correct here in the debugger as 'Cte D'azur'
    tof.Write curPath + EndOfLine.UNIX

Is there a setting for the TextOutputStream / FolderItem that I’m missing?

BTW - I’ve also tried using a BinaryStream instead of a TextOutputStream with the same result.

I tried the second example for TextOutPutStream with all sorts of accents, and all of them show in the resulting file.

How are you reading the file ?

Just been through this.
The text is fine in the program and in the debugger.
Put it in a file, then read it back, and you have a problem.

It’s reading back that is the time you need to state the encoding, as Michel is suggesting.

read it back, and use ConvertEncoding at that point to tell Xojo that it is some UTF8 data.

See
https://forum.xojo.com/conversation/edit/26395

I’m using a TextInputStream or BinaryStream (tried both). However, reading doesn’t matter in this case since the file on disk contains the literal “?” character instead of the Á or è. and I can see that in an external editor.

However, if instead of using the write command, I use a shell and echo the text, the encoding is retained and the contents are written properly. And then can be read properly.

[quote=218244:@Tim Jones]I’m using a TextInputStream or BinaryStream (tried both). However, reading doesn’t matter in this case since the file on disk contains the literal “?” character instead of the Á or è. and I can see that in an external editor.

However, if instead of using the write command, I use a shell and echo the text, the encoding is retained and the contents are written properly. And then can be read properly.[/quote]

Tim, I repeat. Here is what I used and it does write the characters perfectly fine into the file which I am then able to open in the standard system tools and common apps :

Dim t As TextOutputStream Dim f As FolderItem f = GetSaveFolderItem("", "CreateExample.txt") If f <> Nil Then t = TextOutputStream.Create(f) t.WriteLine("Ceci est un test : éèçàùâêîôûäëïöü") t.Close End If

That is all there is to my test, and it works in both Mac and Windows. I wonder what you are doing that is writing question marks into your file. Are you doing some text encoding conversion on the string ?

The difference is in that I’m using “Write” not “WriteLine” because I need to force the Unix EOL.

@Jeff Tullin - are you also using WriteLine()?

What is Encodings.SystemDefault? Does that support the characters you’re using? I suspect your ConvertEncoding is mangling the string.

Write vs. WriteLine is a red herring.

So - What are the byte values of the string before ConvertEncoding? What are they afterward? And what should the encoding be (what is SystemDefault set to)?

In the end, bytes is bytes. ConvertEncoding changes bytes. DefineEncoding does not change bytes. Write/WriteLine does not change bytes either.

And why are you using Encodings.SystemDefault on the one hand, and EndOfLine.Unix on the other? What’s the goal?

try with “ISOLatin1”

curPath = ConvertEncoding(lbBackupSelected.List(counter), Encodings.ISOLatin1)
// curPath is correct here in the debugger as 'Cte D'azur'
tof.Write curPath + EndOfLine.UNIX

i use similar to write in SQL, and when i restored use “Encodings.UTF8” to me work fine, to write an other languages.

Well, it’s something in the writing since the resulting text file contains a physical, literal question mark character, not the multi-byte character that is expected. As I mentioned, I see this completely outside of Xojo in a hex editor and in ‘od’.

As for the SystemDefault, it apparently does support the French accented characters since the ConvertEncodings call is generating a proper string for all other purposes in the code and the resulting string passed into a shell and written using:

theShell.Execute “echo “”” + curLine + “”" >> “”" + f.UnixPathMBS + “”""

The result on disk then contains the properly formed string.

[quote=218249:@Tim Jones]The difference is in that I’m using “Write” not “WriteLine” because I need to force the Unix EOL.

@Jeff Tullin - are you also using WriteLine()?[/quote]

I tried Write as well. No problem either.

Humor me. Take the example from the LR and try without any convertencoding before write.

I work in UTF8
I save using Write or WriteLine
And I use DefineEncodings (text,encodings.utf8) when I get the text back.
I dont convert encodings on the way out… all text in the program is UTF8

[quote=218277:@Jeff Tullin]I work in UTF8
I save using Write or WriteLine
And I use DefineEncodings (text,encodings.utf8) when I get the text back.
I dont convert encodings on the way out… all text in the program is UTF8[/quote]

That is the way it should be done, IMHO.

Just for fun whats the Encodings.SystemDefault name on whatever system you’re using ?
ie/
msgbox Encodings.SystemDefault.InternetName (or system.debuglog it)

And are you viewing the string in that encoding or UTF-8 in the debugger ?

@Norman Palardy - it returns simply “macintosh”. This is on OS X 10.10.5.

As mentioned before, it doesn’t matter what I set the encoding to on the read end since there is a literal “?” - character 63 - written into the file. Chr(63) is a “?” regardless of the encoding :S

I even went so far as to remove the ConvertEncoding call. Same results.

I think I’ll rip the code out, save the project, exit, reload and then enter those few lines again to see what happens. This is a project that’s come forward from back in the RS 2006 time frame.

It was something hiding in the old RS code layout for that module. I deleted that method, saved, exited, reloaded, and recreated that same exact code leaving out the ConvertEncoding since it’s really not needed and it’s now working properly.

Very odd