I was struggling with a problem similar to this old thread https://forum.xojo.com/13176-shell-output-encoding-problem/0 and found a nice working solution. Instead of bumping the old thread back to life, I’m just posting my findings in a new one, in case it can help.
So. I’m trying to read the information of a music m4p file, using the excellent shell tool: mediainfo. Here’s what I do when directly in a terminal (sorry for all the escaping, juste copying-pasting directly, command syntax is mediainfo --Output=XML filepath):
bruno$ mediainfo --Output=XML /Volumes/Music/iTunes/iTunes\\ Music/Zara\\ Larsson/So\\ Good/09\\ Don_t\\ Let\\ Me\\ Be\\ Yours.m4a
Which generates the following (I’m pasting only the relevant line)
...
<Title_Sort>Dont Let Me Be Yours</Title_Sort>
...
In Xojo, I’m trying to get the same info using:
dim s as new shell
s.Mode = 0
dim cmd as string = "/usr/local/bin/mediainfo --Output=XML " + TheFile.ShellPath
s.Execute cmd
dim res as string = s.ReadAll.DefineEncoding(Encodings.UTF8)
But, when looking at the “res” variable, the curly quote is replaced with a “?”.
<Title_Sort>Don?t Let Me Be Yours</Title_Sort>
Looking at the hex in the debugger, I see that Xojo receives a real question mark from the shell. So encoding is not the problem here.
I tried a suggested workaround to redirect the output of mediainfo into a temporary file, then opening the file for reading like:
dim tempFile as FolderItem = SpecialFolder.Temporary.Child("mediainfodata")
dim s as new shell
s.Mode = 0
dim cmd as string = "/usr/local/bin/mediainfo --Output=XML " + TheFile.ShellPath + " >" + tempFile.ShellPath
dim bs as BinaryStream = BinaryStream.Open(tempFile)
dim res as string = bs.Read(bs.Length,Encodings.UTF8)
bs.Close
tempFile.Delete[/code]
But still get the same result! Even if I open the temporary file in an external hex editor, I see that the curly quote got replaced by ? right in the file... so again, not Xojo's fault at all since the file is generated by shell commands without Xojo taking any part in the process.
But the strange thing is: if I run the exact same command from a terminal, redirecting the output to a file, then the file is generated correctly! No more ? symbols when opening the file in an external editor.
The solution lies in the LANG environment variable. In a standard terminal, if I type
[code]locale
LANG="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_CTYPE="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_ALL=
I can see LANG=“en_CA.UTF-8” but if I run the following in Xojo:
dim s as new shell
s.Mode = 0
s.Execute "locale"
dim res as string = s.ReadAll
Then “res” looks like:
LANG=
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
So there you have the difference. When you open a terminal, the language/encoding is set to your local system and the shell behaves accordingly, whereas in a Xojo subshell (and I would suspect in many other non-Xojo related situations where you spawn a sub-shell), there is nothing defined, hence the command output being processed to fit in what I assume is default ascii encoding. So no matter how you try to get to your data, it’s already pre-digested…
Solution:
dim cmd as string = "export LANG=en_CA.UTF-8 ; /usr/local/bin/mediainfo --Output=XML " + TheFile.ShellPath
s.Execute cmd
if s.ErrorCode <> 0 then return
dim res as string = s.ReadAll.DefineEncoding(Encodings.UTF8)
And… voilà. Got the correctly encoded string back in Xojo. Of course, replace the LANG by whatever the locale you want to use.
Hope it helps prevent some heavy hair-pulling, head-banging sessions for others