How to identify the right encoding?

HI Guys - I was wondering if anyone could identify the correct encoding to read this stuff from a text file? I’m rather stumped, though I thought was unicode. It’s just the name of an author by the way.

Thanks - Paul

Firstname: “Kha\u0304lid”
Lastname: “Muh\u0323ammad \u02bbAli\u0304 al-H\u0323a\u0304jj”,

I’m pretty sure [/code]\u[/code] denotes Unicode. Googling those characters looks like they are combining or modifier characters. You need to provide some more detail for us to help you.

Start with the basics:

  1. What did you expect to happen
  2. What went wrong
  3. What code have you tried

Unicode 16Bit (UTF-16) it seems.

Thanks guys - I suspected it was Unicode (or possibly Big-5?) encoding, but when I selected UTF-16, things get a little strange.
Those two examples were with MacRoman, which is obviously wrong.

Using UTF-16, I get a spinning beachball, which indicates something is wrong. This machine doesn’t beachball very much. :wink:

What I believe is wrong is that ReadLine is not recognizing the end of line in the file and just reading in around 79K characters.

Perhaps I should be reading the line in MacRoman then converting the input string to unicode in some way?

Here is the plain simple code that it is using:

[code] if fp <> Nil then
textInput.text = "Opening file " + fp.Name + “…”
try
tp = TextInputStream.Open(fp)
tp.Encoding=Encodings.UTF16LE

  inRecord = tp.ReadLine
  MsgBox("Input record size = " + Str(Len(inRecord)))
  'MsgBox(inRecord)
  

  tp.Close
catch e as IOException
  tp.close
  MsgBox("Error accessing the requested file")
end Try

end if[/code]

Thats the actual text IN the file ?
With the \UXXXX in the entry ?

That’s not a text encoding, it’s a notation within the string. Most likely the encoding is either ASCII or UTF-8 and the high unicode characters are marked with the “\uHHHH” notation.

Is it possible that you are reading a JSON file? If so, you can use either JSONItem or Xojo.Data.ParseJSON to interpret it for you so you can extract the values correctly. For example, one of your posted string returns this when I run it through JSONItem:

Mu?ammad ?Al? al-??jj

Hi Normal - Yep, that is exactly as it was delivered to me. It’s an RDF file with foreign names in it. I just tried reading the string in MacRoman then converting it to UTF-16. (grin) That had the unexpected result of turning it all into Chinese. Boy, do I feel dump. I guess I could parse out the exact strings and try just converting those specific strings?

Wow! Totally missed the JSON stuff in the books. While not precisely JSON, the input is close enough I can use this to translate those notations into something useful. Brilliant!

Thank you! -Paul