Executing the code above using RS2012r1.2, I get 2 strings with the exact same value. Both values are stored as null-terminated UTF-16 strings. According to the docs, that is what I would expect.
For strings, the doc says:
So your first statement converts and stores a UTF-8 string into a UTF-16 WString. The second statements assigns the UTF-16 WString to a String datatype which can handle either UTF-8 or UTF-16. Since the source was UTF-16, the target (in this case) is UTF-16.
I don’t think it’s broken. I think it’s working as it was designed. Were you expecting the wstring to be converted to UTF-8 in the second statement? My assumption, which appears to be correct, is that the second statement merely move the UTF-16 string byte-for-byte into the “s” variable. The encoding of the “s’” variable after the assignment is UTF-16 in the debugger. That appears to be correct.
There is no interpretation of the values that I get back that could be interpreted as correct or by design. However, it turns out that a FR is… shall we say, unnecessary at this this.
I made one slight error in my first post and that was that the 2 vars have the “exact” same value. The have the exact same value except the “s” var (String) is not null-terminated like the “w” var (WString). Other than that, the 2 variables are the same. The null-terminator of the wstring was stripped when assigned to the string.
According to the WString documentation, when a String is assigned to a WString variable, the String should be converted to UTF-16 and a terminator added. This is clearly not happening.
It clearly did happen. If you stop the debugger after the string literal is assigned to the wstring, you will see a null terminator added (if you look at its binary representation. The length of the wstring is 17 bytes. The length of the string literal and the “s” var is 16 bytes. What are you looking at?
That is the problem. It should be 18 bytes with 2 zero bytes at the end. One zero is invalid. You’ve just been luck that the next byte in memory is also zero. If it weren’t, you’d get garbage.
You actually had me doubting myself for a moment. The length of the string IS 17 bytes. In UTF-16 it takes 2 bytes to make one character. I was taking the length from the debugger. What I said was and is still correct.
It is working correctly in Win 7 rs2012r2.1 and Xojo2014r2.1.
[quote=128434:@Randy Baskin]You actually had me doubting myself for a moment. The length of the string IS 17 bytes. In UTF-16 it takes 2 bytes to make one character. I was taking the length from the debugger. What I said was and is still correct.
It is working correctly in Win 7 rs2012r1.2 and Xojo2014r2.1.[/quote]
Just to be perfectly clear, w.len should not equal 18. It should be 17, as I said. It should be the number of characters in the original literal string (16) + 1 additional null-terminator (0x0000) character. Yes the null-terminator is 2 bytes, but it is only interpreted as one UTF-16 character. S.len is, and should be 16. That is because the null-terminator is stripped when assigning the wstring to the string.
For me academic. But if I (or anyone) needed to call an external function that expected or returned such a value, this would come into play. That’s the conversation that sparked my curiosity.