Shell encoding problem

I was struggling with a problem similar to this old thread https://forum.xojo.com/13176-shell-output-encoding-problem/0 and found a nice working solution. Instead of bumping the old thread back to life, I’m just posting my findings in a new one, in case it can help.

So. I’m trying to read the information of a music m4p file, using the excellent shell tool: mediainfo. Here’s what I do when directly in a terminal (sorry for all the escaping, juste copying-pasting directly, command syntax is mediainfo --Output=XML filepath):

bruno$ mediainfo --Output=XML /Volumes/Music/iTunes/iTunes\\ Music/Zara\\ Larsson/So\\ Good/09\\ Don_t\\ Let\\ Me\\ Be\\ Yours.m4a

Which generates the following (I’m pasting only the relevant line)

... <Title_Sort>Don’t Let Me Be Yours</Title_Sort> ...

In Xojo, I’m trying to get the same info using:

dim s as new shell s.Mode = 0 dim cmd as string = "/usr/local/bin/mediainfo --Output=XML " + TheFile.ShellPath s.Execute cmd dim res as string = s.ReadAll.DefineEncoding(Encodings.UTF8)

But, when looking at the “res” variable, the curly quote is replaced with a “?”.

<Title_Sort>Don?t Let Me Be Yours</Title_Sort>

Looking at the hex in the debugger, I see that Xojo receives a real question mark from the shell. So encoding is not the problem here.

I tried a suggested workaround to redirect the output of mediainfo into a temporary file, then opening the file for reading like:

dim tempFile as FolderItem = SpecialFolder.Temporary.Child("mediainfodata")
dim s as new shell
s.Mode = 0
dim cmd as string = "/usr/local/bin/mediainfo --Output=XML " + TheFile.ShellPath + " >" + tempFile.ShellPath
dim bs as BinaryStream = BinaryStream.Open(tempFile)
dim res as string = bs.Read(bs.Length,Encodings.UTF8)
bs.Close
tempFile.Delete[/code]

But still get the same result! Even if I open the temporary file in an external hex editor, I see that the curly quote got replaced by ? right in the file... so again, not Xojo's fault at all since the file is generated by shell commands without Xojo taking any part in the process.

But the strange thing is: if I run the exact same command from a terminal, redirecting the output to a file, then the file is generated correctly! No more ? symbols when opening the file in an external editor.

The solution lies in the LANG environment variable. In a standard terminal, if I type
[code]locale
LANG="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_CTYPE="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_ALL=

I can see LANG=“en_CA.UTF-8” but if I run the following in Xojo:

dim s as new shell s.Mode = 0 s.Execute "locale" dim res as string = s.ReadAll
Then “res” looks like:

LANG= LC_COLLATE="C" LC_CTYPE="C" LC_MESSAGES="C" LC_MONETARY="C" LC_NUMERIC="C" LC_TIME="C" LC_ALL=

So there you have the difference. When you open a terminal, the language/encoding is set to your local system and the shell behaves accordingly, whereas in a Xojo subshell (and I would suspect in many other non-Xojo related situations where you spawn a sub-shell), there is nothing defined, hence the command output being processed to fit in what I assume is default ascii encoding. So no matter how you try to get to your data, it’s already pre-digested…

Solution:

dim cmd as string = "export LANG=en_CA.UTF-8 ; /usr/local/bin/mediainfo --Output=XML " + TheFile.ShellPath s.Execute cmd if s.ErrorCode <> 0 then return dim res as string = s.ReadAll.DefineEncoding(Encodings.UTF8)

And… voilà. Got the correctly encoded string back in Xojo. Of course, replace the LANG by whatever the locale you want to use.

Hope it helps prevent some heavy hair-pulling, head-banging sessions for others :slight_smile:

Nice detective work!

More precisely, the Xojo shell uses CP-1252 aka Windows ANSI.

Not surprising since Xojo Shell ? Terminal
Terminal runs a pile of scripts when you open it including your local bash/ksh/tcsh set up scripts
Xojo shell does not - its about as raw a shell as you can get

I learned this a while back through the school of trial and (mostly) error - the shell is sort of like a remote, unestablished campsite - if you didn’t bring it with you, don’t expect it to be there :slight_smile: .