I have a text file coming in from an external source…
If I load this into a commerical Text Editor… it says it is a UTF-8 file and displays the contents exactly as expected.
HOWEVER, when I read it (TextInputStream) into an XOJO app, and split things up (based on SPACES and QUOTES)
The resulting file still loads into the same TextEditor as UTF-8, but now things look different
for example :
BEFORE I would see “±”
AFTER I see “±”
Now I do have code that loops over a line of text one character at a time
Just an example… “t=t+c” has more decision logic involved
for i=1 to len(s)
c=mid(s,i,1)
t=t+c
next i
BUT, “C” should be a CHARACTER… not a “BYTE”… (and yes I’m using STRINGS, not TEXT)
immediately AFTER reading it (s=txt.READALL)
but it didn’t change the results
I also tried DEFINEENCODING… but it did nothing to affect the result either
textIN=TextInputStream.open(filePath)
s=textIN.ReadAll
textIN.close
//
s=ReplaceAll(s,EndOfLine,EndOfLine.UNIX)
s=ReplaceAll(s,chr(&h0b)," ")
s=ReplaceAll(s,chr(&h0c)," ")
//s=ReplaceAll(s,chr(&h00)," ")
s=ReplaceAll(s,chr(&h09)," ") // 3 spaces
//s=ConvertEncoding(s,Encodings.UTF8) // with or without this line (or as DEFINEEncoding, make no difference
v=Split(s,EndOfLine.UNIX)
Thanks… but that too had no effect… but one thing I just noticed…
The INCOMING data (per 3rd party texteditor) is UTF-8
HOWEVER the OUTPUT file is NOT… it is ISO something
the output file is TEXTOUTPUTSTREAM using WRITE commands
Isn’t everything supposed to be UTF-8 unless specifically told otherwise?
[quote=272831:@Norman Palardy]not when it comes from an outside source like a TCP socket, database, file, etc
this isnt new[/quote]
Sorry… that I knew… I mean if it becomes UTF8 it stays UTF8 unless told otherwise.
And a New string inherits it encoding (or lack thereof) from its source, right
dim t as string=aUTF16string // t is also going to be UTF16?
[quote=272832:@Dave S]Sorry… that I knew… I mean if it becomes UTF8 it stays UTF8 unless told otherwise.
And a New string inherits it encoding (or lack thereof) from its source, right
dim t as string=aUTF16string // t is also going to be UTF16?
[/quote]
You can break that
A string with and encoding + a “string” with nil encoding -> nil encoded result
Fundamentally these sort of issues are what prompted “text”
No. UTF-8 is not the default on any OS. It is more correct to say that everything (external to Xojo) is not UTF-8 unless specifically told so. As you saw, the UTF-8 that Xojo wrote out was interpreted as something else (the OS default) until you explicitly included the meta tag.