other languages.

There are
http://tools.ietf.org/html/rfc1557
and http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT
might be of use :slight_smile:

Nice! :slight_smile: Thanks.

So because this isn’t strictly unicode am I obliged to know the escape sequences of all ISO character sets… is this something that is used in unicode as well?

Unicode is a whole set of encodings
Specific ones may / may not use certain escapes like this
And when you know the encoding of the incoming data and that encoding IS one of the ones that is already known by Xojo then you can just use it as is
However the Classic Xojo framework doesn’t support every encoding there is

The new framework supports more - but even then this data seems to not match the one known as ISO-2022-KR as listed at http://www.iana.org/assignments/character-sets/character-sets.xhtml
I tried some code with the new framework and it also doesn’t decode properly

  dim myMemoryBlock as new xojo.Core.MutableMemoryBlock(13)
  
  myMemoryBlock.UInt8Value(0) = &h1B   // ESC
  myMemoryBlock.UInt8Value(1) = &h24   // $
  myMemoryBlock.UInt8Value(2) = &h29   // )
  myMemoryBlock.UInt8Value(3) = &h43   // C
  myMemoryBlock.UInt8Value(4) = &hB1   // DBCS lead marker
  myMemoryBlock.UInt8Value(5) = &hE8
  myMemoryBlock.UInt8Value(6) = &hC8
  myMemoryBlock.UInt8Value(7) = &hF1
  myMemoryBlock.UInt8Value(8) = &hC1
  myMemoryBlock.UInt8Value(9) = &hDF
  myMemoryBlock.UInt8Value(10) = &h1B
  myMemoryBlock.UInt8Value(11) = &h28  // (
  myMemoryBlock.UInt8Value(12) = &h42  // B
  
  dim te as xojo.Core.TextEncoding
  
  te = xojo.Core.TextEncoding.FromIANAName("ISO-2022-KR")
  
  me.text = te.ConvertDataToText( myMemoryBlock )

You get the very wrong looking data

This one seems VERY … out in left field ?
I dont know how else to describe it

Thanks for the input Norman… very intuitive.

I really dont know Korean or DICOM so I’m guessing here
I dont know if the lead in bytes (the first 4) would be the same in many encodings or if in DICOM they use them to tell you what encoding the data is in
And whether the trailing bytes are some kind of ending sequence

BUT

    dim myMemoryBlock as new MemoryBlock(13)
  
  myMemoryBlock.UInt8Value(4) = &hB1
  myMemoryBlock.UInt8Value(5) = &hE8
  myMemoryBlock.UInt8Value(6) = &hC8
  myMemoryBlock.UInt8Value(7) = &hF1
  myMemoryBlock.UInt8Value(8) = &hC1
  myMemoryBlock.UInt8Value(9) = &hDF
  
  Label1.text=DefineEncoding(MyMemoryBlock.StringValue(0,13), Encodings.DOSKorean  )

does have the right sequence so determining how to handle the lead in & trailing might make this suitable

I know there are others working with DICOM images (search for DICOM in the forums search field at the top right and you’ll find other posts)

In DICOM you get many types of character sets … Java seems to know them but I’m having trouble with Xojo.
but then I’m an old school ASCII guy… 8 bits is ‘extream’ text for me. unicode sounded to me like the solution, but probably came along after DICOM did.

Yeah DICOM has avoided using a lot of unicode

I presume you use this spec http://medical.nema.org/dicom/2007/07_05pu.pdf

Yes… this is the spec I’m working from and some sample data I have from dicom.nema.org

Just a wild stab in the dark, 4 bytes could a Uint32 or int32, could it be a OEM Code Page specifier? I only mention as I recently had to deal with OEM code pages :frowning: Depressing thought.

What it is is that this particular encoding switches between 8 and 16 bit. Those Escape sequences tell us where the bytes to follow are 16 bit or 8 bit encodings.
Norman nailed it.