Application fail to read its own file (encoding problem)

Hello All,

I have a strange encoding problem, which I failed to solve. Can you help me with the following code :

[code]// This method fills file “Employees.exo” with standard values.

Using Xojo.Core
Using Xojo.IO

Dim tosEmployees As Xojo.IO.TextOutputStream
Dim intListIndex As Integer
Dim textList As Text
Dim intListCount As Integer
Dim i As Integer
Dim textRecord As Text

// Filling strList with standard values
textlist = “1;Person A name;Person A Firstname;???;6;2#”+_
“2;Person B name;Person B Firstname;CGA;4;3#”+_
“10;Person C name;Person C Firstname;???;4;3#”

// Writing the list to the actual file
// if the file is empty
intListCount = CountFields(textList, “#”)
If f_List_Employees.Length = 0 Then
tosEmployees = TextOutputStream.Append(f_List_Employees, TextEncoding.UTF16)
For i = 1 To intListCount
tosEmployees.WriteLine NthField(textList, “#”, i).ToText
Next i
tosEmployees.Close
End If

// This is just a text section
// Fails to read the file back in;
// Some strange characters are inserted.
Dim tisTemp As Xojo.IO.TextInputStream
tisTemp = TextInputStream.Open(f_List_Employees, TextEncoding.UTF16)
For i = 1 To intListCount
textRecord = tisTemp.ReadLine
MsgBox textRecord

Next i

tisTemp.Close[/code]

The MsgBox command shows the text with some strange diamond characters. The textfile is created as UTF16.

When I open the file in WordPad, a space is aded after the single numbers in the front, but the numbers greater dan 9 are :
1 0 for 10
1 1 for 11

and so on.

Can you tell me what is wrong? I need UTF16 because I need characters like the and

I am on Windows 7 using the English version but the application works in Dutch.

Your help will very much appreciated. Thank you in advance.

Chris

UTF-8 suffices for these characters.

Hello Michel,

Thank you very much for your fast reply.

Regretfully I already tried to use UTF8 but then I got a strange “A” instead of the “”.

When I check the file in TopStyle, it tells me it is UTF16. In Wordpad I can read the lines without any problem, however I get the space after the single numbers (1 to 9), but from 10 and upwards, the space is between the two numbers.

You are correct in that way, when I am using UTF8, the problem disappears, however like mentioned here before that strange “A” appears where I use the " " character.

I am using Xojo 2015 r2.3

I appreciate your help very much.

Chris

This looks like a bug with Xojo.IO.TextOutputStream and UTF-16. If you have control over the file format, you should be able to use UTF-8 without any problem.

Well, I just tried the posted code with UTF8, and all accented characters show up just fine in the MsgBox, as well as in NotePad, and no question mark lozenge…

When I set to UTF16, I do see the <?>.

[code] // This method fills file “Employees.exo” with standard values.

Using Xojo.Core
Using Xojo.IO

Dim tosEmployees As Xojo.IO.TextOutputStream
Dim intListIndex As Integer
Dim textList As Text
Dim intListCount As Integer
Dim i As Integer
Dim textRecord As Text

dim f_List_Employees as new folderitem(“c:\Users\Mitch\Downloads\zut.txt”)

// Filling strList with standard values
textlist = “1;Person A name ;Person A Firstname;???;6;2#”+_
“2;Person B name;Person B Firstname;CGA;4;3#”+_
“10;Person C name;Person C Firstname;???;4;3#”

// Writing the list to the actual file
// if the file is empty
intListCount = CountFields(textList, “#”)
'If f_List_Employees.Length = 0 Then
tosEmployees = TextOutputStream.Append(f_List_Employees, TextEncoding.UTF8)
For i = 1 To intListCount
tosEmployees.WriteLine NthField(textList, “#”, i).ToText
Next i
tosEmployees.WriteLine("")
tosEmployees.Close
'End If

// This is just a text section
// Fails to read the file back in;
// Some strange characters are inserted.
Dim tisTemp As Xojo.IO.TextInputStream
tisTemp = TextInputStream.Open(f_List_Employees, TextEncoding.utf8)
For i = 1 To intListCount
textRecord = tisTemp.ReadLine
MsgBox textRecord

Next i

tisTemp.Close[/code]

Hello Joe and Michel,

Thank you both very much for your help.

Michel, you are correct when I am using UTF8, my application can read its own files without any problems. When I open the file in topStyle, everything is also fine.

When I open in Wordpad, then I got that strange character “Mariën”. But this is no problem, because those files only has to be opened by my own application.

I appreciate both replies very much. You both really made my day.

Chris

A bug in WordPad.

In Paint (from Microsoft too), if you press ctrl-w to close the front window, anoher window is opened… go figure.

99% of the time, once you write out a file, it “loses” its encoding. There’s nothing in the file itself that tells you what the encoding should be, so it becomes a wild guess, and every app is on its own to try and deal with the data. You are at a distinct advantage in that your app can make a well-informed guess and be right most of the time. But in the end, all you can do is make an assumption based on the belief that it was your app that wrote the file in the first place.

In Windows, a common encoding is WindowsANSI. You may want to try it. I bet it will work in Wordpad.

I do not want to be rude, AND I AM NOT: doesn’t common encoding have to be utf-8 ?

I think: our job is to provide a good (if not the) solution for our users and in this internet world, a solution that can be used as simple as possible by our users, thus my answer above.

Years ago, the common graphic format looked to be gif, then jpg (both format were lossy)… now, I do not know.

But, who am I to write that ? People will do as they will, what they feelsis goodfor their users…

The common encoding will most likely be utf-8. Eventually. But it isn’t yet. Each OS has its own standard.