Is WString broken?

Kem_Tekinay · September 10, 2014, 4:29pm

I never use WString, but based on another conversation here, I had occasion to try it. And I don’t get it.

Based on my understanding, this should work:

  dim w as WString = "this is a string"
  dim s as string = w

But what I get in both w and s is just trash with only a mild resemblance to my original string. Where have I gone wrong?

Randy_Baskin · September 10, 2014, 5:51pm

Executing the code above using RS2012r1.2, I get 2 strings with the exact same value. Both values are stored as null-terminated UTF-16 strings. According to the docs, that is what I would expect.

For strings, the doc says:

So your first statement converts and stores a UTF-8 string into a UTF-16 WString. The second statements assigns the UTF-16 WString to a String datatype which can handle either UTF-8 or UTF-16. Since the source was UTF-16, the target (in this case) is UTF-16.

At least that is my understanding.

Charles_Yeomans · September 10, 2014, 6:20pm

As far as I can tell, WString is broken; I tried it using RS 2012 r1.2, 2.1, and Xojo 2014r2 running on Mac OS 10.9.4.

Kem_Tekinay · September 10, 2014, 6:34pm

Thanks for the confirmation. I’ll file a FR.

Randy_Baskin · September 10, 2014, 6:43pm

I don’t think it’s broken. I think it’s working as it was designed. Were you expecting the wstring to be converted to UTF-8 in the second statement? My assumption, which appears to be correct, is that the second statement merely move the UTF-16 string byte-for-byte into the “s” variable. The encoding of the “s’” variable after the assignment is UTF-16 in the debugger. That appears to be correct.

Kem_Tekinay · September 10, 2014, 6:48pm

There is no interpretation of the values that I get back that could be interpreted as correct or by design. However, it turns out that a FR is… shall we say, unnecessary at this this.

Randy_Baskin · September 10, 2014, 7:09pm

I made one slight error in my first post and that was that the 2 vars have the “exact” same value. The have the exact same value except the “s” var (String) is not null-terminated like the “w” var (WString). Other than that, the 2 variables are the same. The null-terminator of the wstring was stripped when assigned to the string.

Charles_Yeomans · September 10, 2014, 7:20pm

According to the WString documentation, when a String is assigned to a WString variable, the String should be converted to UTF-16 and a terminator added. This is clearly not happening.

Tim_Hare · September 10, 2014, 7:21pm

When would you ever use code like this? Or is it just an academic exercise?

Michel_Bujardet · September 10, 2014, 7:24pm

Why unnecessary ?

Should it not be done ?

Randy_Baskin · September 10, 2014, 7:24pm

It clearly did happen. If you stop the debugger after the string literal is assigned to the wstring, you will see a null terminator added (if you look at its binary representation. The length of the wstring is 17 bytes. The length of the string literal and the “s” var is 16 bytes. What are you looking at?

Tim_Hare · September 10, 2014, 7:27pm

It should never be necessary.

Tim_Hare · September 10, 2014, 7:29pm

That is the problem. It should be 18 bytes with 2 zero bytes at the end. One zero is invalid. You’ve just been luck that the next byte in memory is also zero. If it weren’t, you’d get garbage.

Randy_Baskin · September 10, 2014, 7:31pm

Ah, I see now. Thanks Tim.

Tim_Hare · September 10, 2014, 7:34pm

That said, it is working correctly on Windows 7 64-bit, Xojo 2014r2.1. I get the correct 2-byte terminator.

Michel_Bujardet · September 10, 2014, 7:37pm

What happens when someone enters Chinese UTF-16 in a textfield ? Does the Text property automatically switch from UTF-8 encoding to UTF-16 ?

This here in RS2012 1.2 gives kind of a strange result :

dim w as WString w = "Hello World" dim s as string = w msgbox w+" "+s+" "+ConvertEncoding(w,encodings.utf8)

Result : HloWrd??? HloWrd??? HloWrd???

I am not surprised by the first two (w and s), but was expecting the last one to be converted.

With Xojo 2014 R2.1 : HloWrd ?? HloWrd ?? HloWrd ??

I even tried GetTextConverter with the same result. So it seems the string get messed up upon the affectation of the string to w.

Randy_Baskin · September 10, 2014, 7:40pm

You actually had me doubting myself for a moment. The length of the string IS 17 bytes. In UTF-16 it takes 2 bytes to make one character. I was taking the length from the debugger. What I said was and is still correct.

It is working correctly in Win 7 rs2012r2.1 and Xojo2014r2.1.

Randy_Baskin · September 10, 2014, 7:42pm

[quote=128434:@Randy Baskin]You actually had me doubting myself for a moment. The length of the string IS 17 bytes. In UTF-16 it takes 2 bytes to make one character. I was taking the length from the debugger. What I said was and is still correct.

It is working correctly in Win 7 rs2012r1.2 and Xojo2014r2.1.[/quote]

Randy_Baskin · September 10, 2014, 8:06pm

Just to be perfectly clear, w.len should not equal 18. It should be 17, as I said. It should be the number of characters in the original literal string (16) + 1 additional null-terminator (0x0000) character. Yes the null-terminator is 2 bytes, but it is only interpreted as one UTF-16 character. S.len is, and should be 16. That is because the null-terminator is stripped when assigning the wstring to the string.

There was no luck needed or involved.

Kem_Tekinay · September 10, 2014, 8:35pm

For me academic. But if I (or anyone) needed to call an external function that expected or returned such a value, this would come into play. That’s the conversation that sparked my curiosity.