Is WString broken?

I never use WString, but based on another conversation here, I had occasion to try it. And I don’t get it.

Based on my understanding, this should work:

  dim w as WString = "this is a string"
  dim s as string = w

But what I get in both w and s is just trash with only a mild resemblance to my original string. Where have I gone wrong?

Executing the code above using RS2012r1.2, I get 2 strings with the exact same value. Both values are stored as null-terminated UTF-16 strings. According to the docs, that is what I would expect.

For strings, the doc says:

So your first statement converts and stores a UTF-8 string into a UTF-16 WString. The second statements assigns the UTF-16 WString to a String datatype which can handle either UTF-8 or UTF-16. Since the source was UTF-16, the target (in this case) is UTF-16.

At least that is my understanding.

As far as I can tell, WString is broken; I tried it using RS 2012 r1.2, 2.1, and Xojo 2014r2 running on Mac OS 10.9.4.

Thanks for the confirmation. I’ll file a FR.

I don’t think it’s broken. I think it’s working as it was designed. Were you expecting the wstring to be converted to UTF-8 in the second statement? My assumption, which appears to be correct, is that the second statement merely move the UTF-16 string byte-for-byte into the “s” variable. The encoding of the “s’” variable after the assignment is UTF-16 in the debugger. That appears to be correct.

There is no interpretation of the values that I get back that could be interpreted as correct or by design. However, it turns out that a FR is… shall we say, unnecessary at this this.

I made one slight error in my first post and that was that the 2 vars have the “exact” same value. The have the exact same value except the “s” var (String) is not null-terminated like the “w” var (WString). Other than that, the 2 variables are the same. The null-terminator of the wstring was stripped when assigned to the string.

According to the WString documentation, when a String is assigned to a WString variable, the String should be converted to UTF-16 and a terminator added. This is clearly not happening.

When would you ever use code like this? Or is it just an academic exercise?

Why unnecessary ?

Should it not be done ?

It clearly did happen. If you stop the debugger after the string literal is assigned to the wstring, you will see a null terminator added (if you look at its binary representation. The length of the wstring is 17 bytes. The length of the string literal and the “s” var is 16 bytes. What are you looking at?

It should never be necessary.

That is the problem. It should be 18 bytes with 2 zero bytes at the end. One zero is invalid. You’ve just been luck that the next byte in memory is also zero. If it weren’t, you’d get garbage.

Ah, I see now. Thanks Tim.

That said, it is working correctly on Windows 7 64-bit, Xojo 2014r2.1. I get the correct 2-byte terminator.

What happens when someone enters Chinese UTF-16 in a textfield ? Does the Text property automatically switch from UTF-8 encoding to UTF-16 ?

This here in RS2012 1.2 gives kind of a strange result :

dim w as WString w = "Hello World" dim s as string = w msgbox w+" "+s+" "+ConvertEncoding(w,encodings.utf8)

Result : HloWrd??? HloWrd??? HloWrd???

I am not surprised by the first two (w and s), but was expecting the last one to be converted.

With Xojo 2014 R2.1 : HloWrd ?? HloWrd ?? HloWrd ??

I even tried GetTextConverter with the same result. So it seems the string get messed up upon the affectation of the string to w.

You actually had me doubting myself for a moment. The length of the string IS 17 bytes. In UTF-16 it takes 2 bytes to make one character. I was taking the length from the debugger. What I said was and is still correct.

It is working correctly in Win 7 rs2012r2.1 and Xojo2014r2.1.

[quote=128434:@Randy Baskin]You actually had me doubting myself for a moment. The length of the string IS 17 bytes. In UTF-16 it takes 2 bytes to make one character. I was taking the length from the debugger. What I said was and is still correct.

It is working correctly in Win 7 rs2012r1.2 and Xojo2014r2.1.[/quote]

Just to be perfectly clear, w.len should not equal 18. It should be 17, as I said. It should be the number of characters in the original literal string (16) + 1 additional null-terminator (0x0000) character. Yes the null-terminator is 2 bytes, but it is only interpreted as one UTF-16 character. S.len is, and should be 16. That is because the null-terminator is stripped when assigning the wstring to the string.

There was no luck needed or involved.

For me academic. But if I (or anyone) needed to call an external function that expected or returned such a value, this would come into play. That’s the conversation that sparked my curiosity.