HI Guys - I was wondering if anyone could identify the correct encoding to read this stuff from a text file? I’m rather stumped, though I thought was unicode. It’s just the name of an author by the way.
I’m pretty sure [/code]\u[/code] denotes Unicode. Googling those characters looks like they are combining or modifier characters. You need to provide some more detail for us to help you.
Thanks guys - I suspected it was Unicode (or possibly Big-5?) encoding, but when I selected UTF-16, things get a little strange.
Those two examples were with MacRoman, which is obviously wrong.
Using UTF-16, I get a spinning beachball, which indicates something is wrong. This machine doesn’t beachball very much.
What I believe is wrong is that ReadLine is not recognizing the end of line in the file and just reading in around 79K characters.
Perhaps I should be reading the line in MacRoman then converting the input string to unicode in some way?
Here is the plain simple code that it is using:
[code] if fp <> Nil then
textInput.text = "Opening file " + fp.Name + “…”
try
tp = TextInputStream.Open(fp)
tp.Encoding=Encodings.UTF16LE
inRecord = tp.ReadLine
MsgBox("Input record size = " + Str(Len(inRecord)))
'MsgBox(inRecord)
tp.Close
catch e as IOException
tp.close
MsgBox("Error accessing the requested file")
end Try
That’s not a text encoding, it’s a notation within the string. Most likely the encoding is either ASCII or UTF-8 and the high unicode characters are marked with the “\uHHHH” notation.
Is it possible that you are reading a JSON file? If so, you can use either JSONItem or Xojo.Data.ParseJSON to interpret it for you so you can extract the values correctly. For example, one of your posted string returns this when I run it through JSONItem:
Hi Normal - Yep, that is exactly as it was delivered to me. It’s an RDF file with foreign names in it. I just tried reading the string in MacRoman then converting it to UTF-16. (grin) That had the unexpected result of turning it all into Chinese. Boy, do I feel dump. I guess I could parse out the exact strings and try just converting those specific strings?
Wow! Totally missed the JSON stuff in the books. While not precisely JSON, the input is close enough I can use this to translate those notations into something useful. Brilliant!