BUG on TextOutputStream when using Encodings and its workaround

Ivan_Tellez · April 21, 2021, 2:39am

The Languaje Reference states: If you need to write the file using a particular encoding, use the ConvertEncoding function to first convert the encoding of the text to the desired encoding before passing the text to the Write or WriteLine methods.

BUT if you need this documented functionality to save a file with a “particular encoding”, lets say WindowsANSI, copy and paste the example to test it, just changing the desired encoding:

Var documents As FolderItem = SpecialFolder.Documents
If documents <> Nil Then
  Var file As FolderItem = Documents.Child("Sample.txt")
  If file <> Nil Then
    Try
      // TextOutputStream.Create raises an IOException if it can't open the file for some reason.
      Var output As TextOutputStream = TextOutputStream.Create(file)
      output.Write(ConvertEncoding(TextField1.Value, Encodings.WindowsANSI)) '<- The particular encoding (only change)
      output.Close
    Catch e As IOException
      // handle
    End Try
  End If
End If

Your file is created and SURPRISE it is not a WindowsANSI file, it is a UTF8 File WTF?

The Write method of the TextOutputStream will convert the text to UTF8 IGNORING the enconding of the string passed.

If you want to save a file with a particular encoding that is not UTF8, you have to use a workaround, fortunatelly I found it already on the forum in a post of 2016: How to create a file of ANSI type - #2 by Jeff_Tullin

Use a BinaryStream to write a text file

A little more reading an turns out that Xojo unintentionally kind of “fix” this bug (More likely added another workaround) by adding the new Encoding property to the TextOutputStream in 2019r2+, now xojo can just say that is a documentation bug and change the examples in TextOutputStream.

Just the usual mess: ancient bugs, feedback cases ignored, incorrect documentation, half baked features, silent behavior changes…

Workarounds:
if you need a file with a particular encoding use a BinaryStream to save text,

if you have 2019r2+, maybe the new Encoding property will work fine

Roger_van_Leeuwen · April 21, 2021, 4:35am

Hi Ivan,

There is nothing wrong with the code.
Try this Var documents As FolderItem = SpecialFolder.Documents
If documents <> Nil Then
Var file As FolderItem = Documents.Child(“Sample.txt”)
If file <> Nil Then
Try
// TextOutputStream.Create raises an IOException if it can’t open the file for some reason.
Var output As TextOutputStream = TextOutputStream.Create(file)
output.Write(ConvertEncoding(TextField1.Value, Encodings.WindowsHebrew)) '<- The particular encoding (only change)
output.Close
Catch e As IOException
// handle
End Try
End If
End If

now copy and paste this word please : שָׁלוֹם
It means Shalom in Hebrew.
After doing this check the text file, it is writen in hebrew.
Now leave your code as it is and enter your name.
Your name is not converted in hebrew for you.

Ivan_Tellez · April 21, 2021, 4:48am

I’m not sure if you understand what the encoding of a FILE is. Using your code, the file is saved as UTF-8, just as I said in the post.

MarkusR · April 21, 2021, 6:52am

there is one row missing that the stream know the encoding.

Var documents As FolderItem = SpecialFolder.Desktop

Var t As String = "HelloÄÖÜß"

Var file As FolderItem = documents.Child("Test.txt")

Var output As TextOutputStream = TextOutputStream.Create(file)
output.Encoding = Encodings.WindowsANSI

output.Write(ConvertEncoding(t, Encodings.WindowsANSI))

output.Close

i wrote a feedback case 64515
the docu is wrong because it said its read only.

Emile_Schwarz · April 21, 2021, 9:00am

The docs says “Read Only” for a write Function. ;- or

DerkJ · April 21, 2021, 12:43pm

The ts is correct, the docs say:

If you need to write the file using a particular encoding, use the ConvertEncoding function to first convert the encoding of the text to the desired encoding before passing the text to the Write or WriteLine methods.

Ivan_Tellez · April 21, 2021, 1:22pm

My bad, I was working in a proyect with 2019r2 and I assume that I screw up when writing my code not caring about the encodings. But no, I test a prior version of my software and it was Behaving correctly…

This is a mistake by Xojo. This is WORSE than i thought:

Turns out that the documentation was RIGHT on Xojo versions prior to 2019r2, in my code I only define the encoding of the text and the file was created with the correct encodig, my software was writing the files with the correct ANSI encoding…

BUT xojo made a silent BEHAVIOR CHANGE and break all the previously working programs

When I upgraded to 2019r2, with no code changes, my software sudently started making UTF8 files instead of ANSI files.

Ivan_Tellez · April 21, 2021, 1:24pm

By the way, there is a FC: <https://xojo.com/issue/59351>

Roger_van_Leeuwen · April 21, 2021, 5:47pm

Hi Ivan,
The code you showed at the top of this really works.
I have a screenshot for you that shows you the content of 2 files with different encodings.
It writes what you see, enter this: Γεια it means hello in greek.
Instead of WindowsGreek you can also use : UTF-8.
The file will be saved as utf-8, because utf-8 will becomes the standard.

MarkusR · April 21, 2021, 5:58pm

seems the manual is fixed …

why the stream can’t write the data given. i don’t know.

Ivan_Tellez · April 21, 2021, 6:42pm

So, if your client has a system that accepts ANSI FILES, are you gona tell him that they have to change all their internal software and processes because utf-8 will becomes the standard?

What xojo version are you testing in?

Ivan_Tellez · April 21, 2021, 6:48pm

HALF fixed

Sure, now kind of says that you have to use the encoding property, but still says that you need to use convertencoding and they dont say that 2019r2+ makes a behavior change that can break user code.

MarkusR · April 21, 2021, 6:57pm

you could/should debate in online meetings with xojo team.

Emile_Schwarz · April 22, 2021, 1:20am

A bit OT, but not so much:

I noticed the presence of:

TextOutputStream.Open

First idea: its plain wrong.

Then: Oh ! I understand: read TextOutputStream.Append (maybe).

Michael_Hußmann · April 22, 2021, 1:58pm

The problem appears to be that the behaviour in cases where the Encoding property is not set isn’t documented and the chosen default value (UTF-8) was an unfortunate choice.

Setting the encoding with TextOutputStream.Encoding is an improvement as you can throw anything at a TextOutputStream and it will get converted to the desired encoding automagically. But if you have previously played by the rules and converted the encoding yourself your code will break unless the encoding is UTF-8. Had they chosen to make nil the default and implemented TextOutputStream.Write so that its argument would be written as is when the Encoding property was nil, this issue wouldn’t exist. Old code wouldn’t break and new code could be more elegant by using the Encoding property rather than sprinkling your code with calls to ConvertEncoding.

But now the main issue is that if you follow the documentation you will write code that doesn’t do what it’s supposed to do, unless you stick to UTF-8 for everything.

Emile_Schwarz · April 22, 2021, 2:52pm

the chosen default value (UTF-8) was an unfortunate choice

Are-you loving the imperial notations ?
(inch, foot, mile, poutd, pence, etc.)

No, no. Forcing may not be a good idea, but not doing so will lead to people still talking in Reich Mark (old Francs from before 1959), etc. Ask my sister about bread prices… she will give you its price in Francs from 1999 (the year Euros were introduced in the general population.

Who, in 2021, still use a 8 bits computer in its Desktop work ?

Some people even use “high ASCII values” where ASCII range is 0-127… to talk about characters values n the 128 to 255 values (and that set is different on macOS vs Windows)

Back to the core of the op question.

This is where a version number was created. At a moment in time, the new software goes to UTF-8, and at load time check the creation date - before reading - and if that date is before “UTF8-date”, read it as it was done in the “old times”; otherwise, it read it as UTF8.

At save time, te encoding is always UTF-8.

Of course, we do not always do that (me included).

Reference:

This is not a new technology.

Ivan_Tellez · April 22, 2021, 3:16pm

lol, there are currently a BILLION and a half computers using windos, and most of them DONT USE UTF8. Neither the software in them.

This files are Tab separated text files, intended to be opened with excel. Excel cant directly open the UTF8 encoded.

Michael_Hußmann · April 22, 2021, 3:21pm

That’s not the point at all. I am loving UTF-8 as much as the next guy and I’m all in favour of using it. But making UTF-8 the default when the Encoding property isn’t set in code did break a lot of existing code. Code that painstakingly stuck to the documentation, mind you. Which is bad in my book.

And of course there are situations where a specific encoding is required and we don’t really have a choice.

Roger_van_Leeuwen · April 22, 2021, 3:45pm

You are more then some one who write code for your client.
Never tell you client anything , you advice your client.That is part of the job.
This website uses also utf-8 for example.
But back to your problem, have you got a soultion yet?

Ivan_Tellez · April 22, 2021, 4:00pm

If you “advise” your client that they have to make extra steps to convert file outside the app just to be opened in standard software like excel, they probably find another software that can make a file that dont need extra steps.

I know, I use UTF8 for all web related, I use UTF8 for all my databases, I use UTF8 internally for all the text in the software, but if your client nedd a ANSI file, you have to deliver an ANSI file.

Well, the “soultion” is to set the encoding property before writing the file.

Xojo should send a mail to all the users acknowledging that every proyect moved to 2020r2 have to update the code to use the new behavior to prevent wrongly encoded files.