XOJO 2019R3 Encodings Issues

Greetings,

No idea what happened honestly but since I started to port the project in R3 I have only issues and I start to feel sorry that I did that .

So I do a query on the MariaDB database , I get the data , I know that it is UTF8 , I open a rtf file and I write the data into it and when I open the document my data is messed up .

In the Database and in XOJO ide in Debug and everywhere I have [quote]Stphanie[/quote] in the document I have [quote]St?phanie[/quote] no matter what I do, I tried converting it to UTF8, ISOLatin1 interface is perfect , DB is perfect , when it gets written in the file it is messes up .

Any ideas what happened and where to look ? previous versions with identical code works perfect, R3 is messed up.

Is there a way to detect the actual encoding ? apparently the data that comes from the DB always has no encoding and I have to define it. Do I have to do same when I write it to the file ?

Thanks in advance.

did you define the encoding in the rtf file ?

Example (ansi):

{\\rtf1\\ansi\\ansicpg1252

R u using TextOutputStream?

output.Write(ConvertEncoding(StringValue, Encodings.UTF8))

[quote=471246:@Emile Schwarz]did you define the encoding in the rtf file ?

Example (ansi):

{\\rtf1\\ansi\\ansicpg1252

First line it has that so I believe that the encoding is there {\\rtf1\\adeflang1025\\ansi\\ansicpg1252\\uc1

[quote=471247:@Sascha S]R u using TextOutputStream?

output.Write(ConvertEncoding(StringValue, Encodings.UTF8))

I already tried that, same result , no change at all in the document. I"m starting to think that ConvertEncoding is broken or so .

Can you show us some code please? :slight_smile:

while I cannot share all the code I extracted bits of it that are related to what I need with the needed comments on what I found .

[code]Var TIStream As TextInputStream
Var t As TextOutputStream
Var fContent As String
Var rows As RowSet

If file <> Nil Then
Try
TIStream = TextInputStream.Open(file)
TIStream.Encoding = encodings.windowsLatin1 ’ Imported from some WindowsGeneratedRTF File and always worked this way until now.
courrierContent = TIStream.readAll

Catch e As IOException

End Try
TIStream.close

End If

Try
rows = app.sqlBase.SelectSQL(“SELECT * FROM patients WHERE id_patient=?”, superWin.patientID)

If rows <> Nil Then
For Each row As DatabaseRow In rows

  Var lName As String
  Var fName As String
  
  ' For test purposes, apparently if i define ISOLatin1 the name it shows properly  in XOJO Debug
  
  lName = row.Column("lName").StringValue.DefineEncoding(Encodings.ISOLatin1) 
  lName = lName.ConvertEncoding(Encodings.UTF8)
  
  fName = row.Column("fName").StringValue.DefineEncoding(Encodings.ISOLatin1)
  fName = fName.ConvertEncoding(Encodings.UTF8)
  
  fContent=fContent.ReplaceAll("<#PGender#>", row.Column("gender").StringValue.DefineEncoding(Encodings.UTF8))
  fContent=fContent.ReplaceAll("<#PlName#>", lName)
  fContent=fContent.ReplaceAll("<#PfName#>", fName)
  
  
Next

rows.Close

End If
Catch e As DatabaseException

End Try

If file <> Nil Then
Try
file.Remove

Try
  t = TextOutputStream.Create(file)
  't.Write(fContent) ' Functional in R2
  t.Write(ConvertEncoding(fContent, Encodings.UTF8)) ' Same result no matter what i put here.
  t.Close
  
Catch e As IOException
  ....
  
  
End Try

Catch error As IOException

End Try

End If[/code]

I did found this commented in the code, I guess it was replaced on the way but so far it is not related :

[code]'Dim t as TextOutputStream

'file.delete()
''t = file.CreateTextFile()
't = TextOutputStream.Create(file)

't.Write ConvertEncoding(fContent, Encodings.macRoman)
't.close()

'This part we had to remove as it was not working anymore and i did not found any replacement for it .
'file.macCreator = “MSWD”
'file.macType = "RTF "
[/code]

So far for me it seems that ConvertEncoding is not working at all, no matter what I put there I get always same result, or at least this is how it looks.

seems you put utf8 in a document with ansi / codepage windows 1252.
you can convert your string into ansi or you need a utf8 template file.

I can’t see an issue in your code.

But i see you are reading data from the database as ISOLatin1 and UTF8. Then you combine them using replace statements.

What happens if you convert f+lName to UTF8 before you combine them in an UTF8 string?

Well I believe that that was the [quote]TIStream.Encoding = encodings.windowsLatin1[/quote] part for if I’m not mistaking and it was always working until R3 so either something was broken before or something is broken now. I’ll try to do a fresh document and try again to see what result I get.

This is usually due to the string’s encoding not being properly defined. ConvertEncoding can only convert from some known encoding into another. The general rules are actually quite simple: Whenever you fetch some text (like from a database) you define its encoding. Whenever you export some text you convert its encoding (if necessary) to whatever encoding is expected. Everything else is taken care of automagically.

[quote=471258:@Sascha S]I can’t see an issue in your code.

But i see you are reading data from the database as ISOLatin1 and UTF8. Then you combine them using replace statements.

What happens if you convert f+lName to UTF8 before you combine them in an UTF8 string?[/quote]
Well apparently the data supposed to be UTF8 and it was always like that but I did put Latin1 to see if maybe that is the issue as we had that in the past when importing from another app and when I did define the encoding as Latin1 I did not get the weird characters in the IDE Debug part while if I do replace the ISOLatin1 with UTF8 I get [quote]St?hanie[/quote] in the debug window so that seems that some data is Latin1 as I might suspected and only when we get it from another app, but still ConvertEncdoding supposed to Convert it properly once the data was correct or at least supposed to look correct . So my understanding was I get the data Latin1 I know it is Latin1, I convert it to UTF8 and I work with it, maybe I’m doing it wrong in the process. While Define works well convert does not seem to have any effect .

Well that was the purpose of the tests that I did earlier, as I mentioned in the previous post, some fields were ISOLatin1 so I define them that way, it was showing well in the interface and debug, then converted them in UTF8 and it should do the job but apparently it does not, no idea honestly where else to look and I checked all the code and all the documentation so apparently all is ok but still does not work.

Can you please try the following?

[quote] lName = row.Column(“lName”).StringValue.DefineEncoding(Encodings.ISOLatin1)
lName = lName.ConvertEncoding(Encodings.UTF8)

  fName = row.Column("fName").StringValue.DefineEncoding(Encodings.ISOLatin1)
  fName = fName.ConvertEncoding(Encodings.UTF8)
  
  fContent=fContent.ReplaceAll("<#PGender#>", row.Column("gender").StringValue.DefineEncoding(Encodings.UTF8))
  fContent=fContent.ReplaceAll("<#PlName#>", lName.ConvertEncoding(Encodings.UTF8))
  fContent=fContent.ReplaceAll("<#PfName#>", fName.ConvertEncoding(Encodings.UTF8))[/quote]

This “other App” is writing ISOLatin1 (or similar) into a UTF8 defined db field. Which is totally possible with SQL.

Apparently yes and it was supposed to be fixed to be all UTF8 so I’ll have to check to that side as well .

[quote=471263:@Sascha S]fContent=fContent.ReplaceAll("<#PlName#>", lName.ConvertEncoding(Encodings.UTF8))
fContent=fContent.ReplaceAll("<#PfName#>", fName.ConvertEncoding(Encodings.UTF8))[/quote]

When I do that , the whole word disappears, so nothing is showing anymore in the document the double converting part.

I guess fContent goes into an unknown encoding state if you combine it with different encodings.

Well I did tried

TIStream.Encoding = encodings.windowsLatin1 ' Imported from some WindowsGeneratedRTF File and always worked this way until now. fContent = TIStream.readAll.ConvertEncoding(Encodings.UTF8) but with the same result.