Shell output encoding problem

When I do a ls -a in the terminal I get accentuated characters (if any) correctly displayed.
When I do the same with Xojo Shell I get interrogation signs in place of the accentuated characters.
How can I set the Xojo shell to output accentuated characters? I don’t any place I can set that.

Thanks, Stan

[quote=102723:@Stanley Roche Busk]When I do a ls -a in the terminal I get accentuated characters (if any) correctly displayed.
When I do the same with Xojo Shell I get interrogation signs in place of the accentuated characters.
How can I set the Xojo shell to output accentuated characters? I don’t any place I can set that.[/quote]

I could not reproduce the problem with the code below. How do you display the shell output ? Encoding must be wrong.

Sub Action() dim s as new Shell s.execute("ls -a /Users/Mitch/Desktop") TextArea1.Text = s.result End Sub

I use:

s.execute("cd /Users/Mitch/Desktop ; ls -a)

Sorry, it is:

s.execute( "cd /Users/Mitch/Desktop ; ls -al" )

How are you dealing with the s.ReadAll results? Have you tried Setting the encoding to UTF8?

theString = ConvertEncoding(s.ReadAll, Encodings.UTF8)

This would work if the s.ReadAll where UTF-8 but it is not (I don’t get those weird UTF-8 characters but plain and simple interrogation signs). I thought you could set the shell or sh output encoding, right now the shell output is simply broken.

The ? is the character that is used to display non-printable characters. I use shells all the time and always get the properly formatted text back.

My recommendation is to see what happens if you do force the encoding. You might just be surprised :slight_smile: .

The ? may appear when you define wrongly the encoding of a string. I do nothing here, I am looking at the raw output, the raw ASCII output already contains ? without defining nor setting any encoding. For example if the text were ‘Phnomme’ and the raw text where undefined UTF-8 I would get ‘Phénomène’, then I could simply define the encoding to UTF-8 to restaure the text properly. This is not the case, I don’t get ‘Phnomme’ but ‘Ph?nom?me’. You get a useless string, there is no magic for converting ? into the right accentuated characters :slight_smile: Just my opinion…

I just tried this:

[code] Dim theShell As New Shell

theShell.Execute “ls -la /Users/tjones”

TextArea1.SelText = theShell.ReadAll[/code]

The result looks like this:

I get accented characters correctly as well.

Please post the complete Xojo code that generates the problem.

I finally found out that the problem only appears when shell.mode = 2
No problem with mode 0 and 1.

Can you also try it?

[quote=103296:@Stanley Roche Busk]I finally found out that the problem only appears when shell.mode = 2
No problem with mode 0 and 1.

Can you also try it?[/quote]

Please post your code (the entirety of the method). Or better yet, a test project which exhibits the problem you report.

@Stanley Roche Busk - you are correct that the mode makes a difference. I’ve just examined what’s going on and it is true going all the way back to Real Studio 2010r1.

I have created a feedback report on this <https://xojo.com/issue/34149>.

I have tested further using my own code and DataAvailable. Both s.Result and s.ReadAll. Indeed the accented characters appear as three bytes, one with the basic character and two question marks.

Then I did a bit of research on the Internet and found several contributions about the phenomenon, reported for several different platforms such as Python and Perl. No solution, though.

Finally, I sent the result of the shell command to a text file with

s.execute( "cd /Users/Mitch/Desktop ; ls -al > result.txt" )

Now comes the interesting part : a text only editor such as Text Wrangler shows the same as Xojo : letter and question marks. Seems logical, but the same file fed into TextEdit shows the correct accented characters. Remember that the file was generated by the Unix command line interface and Xojo had no part in it.

I tried to apply diverse ConvertEncoding to it without success.

Conclusion : somehow, in the shell output, accented characters are coded through three bytes and the output contains the correct information. The question marks represent indeed different ASCII values. The text output in an hex editor show that for instance is represented by 65 CC 81 and by 63 CC A7.

So an output to file contains the proper accents. But not so for shell.result or shell.ReadAll, where the question marks are real question marks.

The workaround is therefore simple :

  • Send the output to a text file like I posted above
  • Open the text file and load it with a series of replacements such as :

t = t.ReplaceAll(chr(&h65)+chr(&hCC)+chr(&h81),"")

You will need to identify the accented characters encoding in a hex editor, but this fixes the problem.

What if you use a TextInputStream to read the written file and apply the encoding there?

// knowing fileitem "f" Dim TIS As TextInPutStream TIS = TextInputStream.Open(f) // assume it's ok Dim theString As String = ConvertEncoding(TIS.ReadAll, Encodings.UTF16) TIS.Close TextArea1.SelText = theShell

[quote=103659:@Tim Jones]What if you use a TextInputStream to read the written file and apply the encoding there?

// knowing fileitem "f" Dim TIS As TextInPutStream TIS = TextInputStream.Open(f) // assume it's ok Dim theString As String = ConvertEncoding(TIS.ReadAll, Encodings.UTF16) TIS.Close TextArea1.SelText = theShell[/quote]

It is a lot more vicious than this :frowning:

Not only ConvertEncoding does not work upon loading the file content, but &hCC & &h81 get converted to &u00C3 and &h00C5, so the replaceall I posted above does not work. On top of it, Xojo does not understand “é” correctly and makes it eÍ. So much for generalized internal UTF8. The proper code is :

te = te.ReplaceAll("e"+&u00C3+&u00C5,&u00E9)

I will continue with a binary stream but for the time being, this approach works. If a bit cumbersome.

Got it :slight_smile:

I regret the OP did not post his code, but I have worked with DataAvailable, and used a shell file as outlined just above.

Here is what to do : call twice the shell, first to create the file, second to trigger DataAvailable. Then use this :

[code]Sub DataAvailable()
dim rien as string
rien = me.Result // Just to empty the buffer

Dim readFile as FolderItem = GetFolderItem("/Users/Mitch/Desktop/text.txt", FolderItem.PathTypeShell)
If readFile <> Nil Then
Dim ReadStream as BinaryStream = BinaryStream.Open(readFile, False)
ReadStream.littleEndian = true
Textarea1.Text=ReadStream.Read(ReadStream.Length, Encodings.UTF8)
End If
End Sub
[/code]

All accented characters show :slight_smile:

ConvertEncoding is the wrong command. It will further mangle the string. Use DefineEncoding instead.

As you can see, it is no longer needed.

That’s what I get for typing from memory … DefineEncoding is what I meant.