String.ToText Encodings Problem

Hi,

I am using a command line tool via a Shell instance to return (as XML) the metadata of a medical imaging file. Pseudo-code:

[code]dim s as new Shell
dim result as String

’ ToolFolderItem and myFile as defined elsewhere and represent the location of the
’ enclosing directory of the command line tool and the file that the tool takes as input
’ The tool returns an XML dump

s.Execute( ToolFolderItem.NativePath + "/theTool " + myFile.NativePath )

result = s.Result[/code]

At this point, the String result correctly contains the XML I want. However, I am trying to convert result to the Text datatype with the following:

dim resultText as Text resultText = result.ToText

but this throws a RuntimeException saying that “The data could not be converted to text with this encoding”. However, the debugger is saying that the Encoding type of result is UTF-8.

I could just manipulate result as a String but I’m trying to use the new Text datatype wherever possible because it has nice convenience methods. On a related note, it’ll be nice when Xojo add the Shell class to the new Xojo framework…

What do you get from Encodings.UTF8.IsValid( result ) ?

Using Encodings.UTF8.IsValid( result ) I get False.

I’ve tried converting result to UTF-8 but that doesn’t seem to work either…

What’s annoying me is I can display result (which is a String) in a TextField fine with seemingly no funny characters but I just can’t convert it into the Text datatype.

False means, no matter what else the debugger says, it’s not really valid UTF-8 and that’s why it can’t be converted to Text.

Try:

result = result.DefineEncoding( Encodings.SystemDefault )
t = result.ToText

Hmmm. That causes a BadDataException with the message: “The string’s encoding is not supported in this conversion”

Is the output something you can share? If so, save it to a file and let’s take a look.

A binary file, btw.

String doesn’t actually do any validity checking when you call DefineEncoding :(.

That’s very kind. I’ll have to do it in a bit - I’ll need to dig out an anonymised image with all the patient data, etc stripped.

I had similar problems but only when build in 64bits and not in debug mode.

I was able to solve it (because my shell output is simple) by doing DefineEncoding ascii and then DefineEncoding to utf8 right at the start. That seemed to solve it.

When I keep it as string, I had issues with the Split command because somehow Split doesn’t work well (or at least very unreliable) with strings in 64 bits. It seems ok at first but I’m getting weird results when trying to make it Text afterwards.

I’m having more issues with encoding but only in 64bits.

Huh. I was having similar issues today converting a String to Text. Will have to revisit tomorrow morning.

I don’t know if this is important, but Kem wrote:

[quote=230962:@Kem Tekinay]False means, no matter what else the debugger says, it’s not really valid UTF-8 and that’s why it can’t be converted to Text.

Try:

result = result.DefineEncoding( Encodings.SystemDefault ) t = result.ToText [/quote]
… meaning to use DefineEncoding.

But is seems you tried it with ConvertEncoding, hence the error message containing “conversion”.

So maybe this would work:

Dim t As Text = result.DefineEncoding(Encodings.UTF8).ToText

[code] dim s as string = // some bytes

if s.Encoding = nil then
if encodings.UTF8.IsValidData(s) then
s = DefineEncoding(s, encodings.UTF8)
else
// some fallback
s = DefineEncoding(s, encodings.WindowsLatin1)
end if
end if [/code]

I would do like this and use a fallback encoding for 8bit. Could be WindowsLatin1, MacRoman or ISOLatin or something else.

Thanks for your help everybody.

Turns out that contents of some of the XML tags contained raw data bytes (i.e. image data, etc) that had no encoding at all. I guess that why the Text datatype was failing. The workaround was to use the command line tool to exclude any tags that are non-text.

By design. Did you look at the exception message you were getting and was it helpful at all?

Yeah Joe, something like: “The data could not be converted to text with this encoding”. In hindsight, it’s quite clear.

I guess I’d just been spoiled by the flexibility (but inherent danger) of the String class…

Garry - Just a thought here:

I’d think that the tags with binary data would be easily identifiable and could just be skipped when parsing the XML in Xojo? Don’t know your particulars so not sure if it would streamline your process or not. Figured it was at least worth mentioning.

ps - These aren’t DICOM files by any chance, are they? (shudders with bad memories)

You’re right about DICOM files Anthony. After 16 years of medical training, I have never come across something so painful to deal with as that standard. The phrase: “A camel is a horse designed by committee” comes to mind…