encoding 32-bit vs. 64-bit

(This is all on El Capitan in case this is a Mac-specific issue.)

I tried building an old app in 64-bit and immediately ran into issues which I eventually narrowed down to some bad characters being read in from a text file. So I came up with a little test to see what’s going on. I have a text file containing the following:

I’d I’ll I’m I’ve

BBEdit tells me this is a UTF-8 file. I read the contents of the file, split it on endofline and then turn around and write out each member of the array to another text file. My code:

[code] dim f as FolderItem
dim t as TextInputStream
dim o as TextOutputStream
dim s, x() as string
dim i as integer

f = SpecialFolder.Desktop.Child(“test_input.txt”)
t = TextInputStream.Open(f)
t.Encoding = Encodings.utf8
s = t.ReadAll
t.Close

s = ReplaceLineEndings(s, endofline)
s = trim(s)

f = SpecialFolder.Desktop.Child(“test_output.txt”)

if not f.Exists then
o = TextOutputStream.Create(f)
else
o = TextOutputStream.Append(f)
end if

x = s.Split(EndOfLine)

for i = 0 to x.Ubound

#If Target32Bit then
  
  o.Write("32-bit: " + x(i) + EndOfLine)
  
#elseif Target64Bit then
  
  o.Write("64-bit: " + x(i) + EndOfLine)
  
#endif

next

o.Close()
[/code]

If I run this in 32-bit, I get a file (which BBEdit says is UTF-8) with the following as expected:

32-bit: I’d 32-bit: I’ll 32-bit: I’m 32-bit: I’ve

If I switch to 64-bit, build it and run the app I now get this in the file, which BBEdit now says has become Western (Mac OS Roman).

32-bit: I‚Äôd 32-bit: I‚Äôll 32-bit: I‚Äôm 32-bit: I‚Äôve 64-bit: I‚Ä 64-bit: d I‚ 64-bit: ôll 64-bit: I‚Äôm I‚Äôve

I tried changing the 64-bit write line to:

o.Write(ConvertEncoding("64-bit: " + x(i) + EndOfLine, Encodings.UTF8))

But get the same results. Is this a known issue, new bug, or am I handling the encoding wrong?

This looks like a problem with Split in 64-bit. Switching to Text.Split seems to work, though:

  Dim f As FolderItem
  Dim t As TextInputStream
  Dim o As TextOutputStream
  Dim s As String
  Dim i As Integer
  
  f = SpecialFolder.Desktop.Child("test_input.txt")
  t = TextInputStream.Open(f)
  t.Encoding = Encodings.utf8
  s = t.ReadAll
  t.Close
  
  s = ReplaceLineEndings(s, EndOfLine)
  s = Trim(s)
  
  f = SpecialFolder.Desktop.Child("test_output.txt")
  
  If Not f.Exists Then
    o = TextOutputStream.Create(f)
  Else
    o = TextOutputStream.Append(f)
  End If
  
  // New code below uses Text type
  Dim txt As Text = s.ToText
  Dim tx() As Text
  tx = txt.Split(EndOfLine.Unix.ToText)
  
  For i = 0 To tx.Ubound
    #If Target32Bit Then
      Const kBit = "32-bit"
    #ElseIf Target64Bit Then
      Const kBit = "64-bit"
    #EndIf
    
    o.Write(kBit + ": " + tx(i) + EndOfLine)
    
  Next
  
  o.Close

Yes, that seems to work very well. Thank you.

Can we hope that such issues will all get fixed sooner than later? I don’t want to have to search and work around such issues in a huge project, so if there’a good change a bug-fix update will fix those soon, I’ll hold out.

I got a user also that has problem with DrawString on 64 bit with German letters. Its not clear though if its coming from Database where the Encoding info is missing on it. (that is my first guess) That the 64 bit is doing worse job at guessing the encoding. I don’t know for sure though.

Have you tried using splitB?
In another post I pointed out that I solved this kind of problem switching from split to splitB.