Encodings and null characters

Robert_Weaver · October 27, 2018, 7:18am

I have a XOJO UDP listener app that I’m using to test some UDP data transfer, and I’m getting some strange things happening at the end of the data packets. The UDP packets are coming from an ESP8266 wifi system on a chip. I suspect that the ESP8266 may be sending a null character at the very end of the data packet, but I can’t detect it in the received UDP data in the XOJO app by doing any kind of string search. If I specify the encoding as ASCII when I read the datagram data, everything is fine. However, if I don’t specify any encoding, and then display the received data in a text field, it displays an illegal character symbol (black diamond with question mark) as the final character. I’ve tried everything I can think of to examine the data (EncodeHex, copy to memoryblock, etc.) to see what’s there, but nothing shows up except the characters that should be there. Yet, the text field still displays the illegal character symbol. And yes I am using a ReplaceLineEndings function to make sure that endofline characters are properly converted. Since the incoming data is all ASCII characters, I don’t know why setting or not setting the encoding would make any difference. Can anyone suggest a way to find out what is causing the illegal character symbol in the displayed text?

Emile_Schwarz · October 27, 2018, 9:10am

Are-you sure you decoded correctly the UDP service data ?
(I have no doubt, but since there is something wrong)

This is called Replacement Character and happens when that character is not defined in the user Font.

Are-you sure ? (ASCII: 0 to 127).

Did you checked the MemoryBlock contents as Hex in the debugger ?

Your data may not be a TextInputStream, but a BinaryStream that have to be parsed.
(I do not read the specific RFC, thus the question).

Robert_Weaver · October 27, 2018, 9:24am

Thanks Emile.

The data part of the UDP datagram is not an input stream. It’s just a string type. My data should all be in the range 0-127.

I think I’ve now convinced myself that there are no extraneous characters. I was under the impression that null characters could go undetected, but I tried concatenating some chr(0)'s to the input data, and I was able to detect them. So, I guess that the EndOfLine character is probably not considered to be a valid character if the encoding is not defined. I had thought that the encoding would default to something like UTF-8 if not otherwise defined. Anyway, I don’t think there’s any problem now.

Emile_Schwarz · October 27, 2018, 10:05am

Robert, did you try to store directly the UDP data in a MemoryBlock and, in the debugger read its contents as Hex ?
You will be able to know for sure what is in the MemoryBlock ($00 or anything else).

Also, if the data are ASCII, changeing the encoding changes nothing since ASCII is alway included in any Encoding.

Greg_O_Lone · October 27, 2018, 2:58pm

Encoding defaults to Nil when getting data from another source. Sockets, BinaryStream, TextInputStream, Databases, etc

Robert_Weaver · October 27, 2018, 9:53pm

Does that mean that a character with a codepoint in the range 0…31 (control chars) would be considered illegal with nil encoding? If so, that would explain what I’m seeing.

DerkJ · October 28, 2018, 1:53am

You can probably try to read the string as CString (null terminated string). And then converting back to String.

Thomas_Tempelmann · October 31, 2018, 2:47pm

No, Chr(0) to Chr(31) are valid in any encoding, especially if it’s nil.

So, if you read the data, put it into a regular “String” type. Then search with the “B” functions, i.e:

[code]property lastChunk as String // declare this as a property of your class

// In the DataAvailable event:

do
// check if there’s more data to read
dim s as String = mySocket.ReadAll
if s = “” then
// no more data right now
return
end
lastChunk = lastChunk + s
do
// process complete blocks, terminated by a NUL char
dim nulPos as Integer = lastChunk.InStrB (Chr(0))
if nulPos < 0 then
// there’s no more complete block right now
exit // exit the block processing loop, but keep checking for more data in the outer loop
else
// extract the next block
dim block as String = lastChunk.LeftB (pos-1)
lastChunk = lastChunk.MidB (pos+1)
process (block) // do your data processing here
end
loop
loop[/code]

The above is universal code to read asynchronously from an external data source, putting together blocks of data separated by a special delimiter character (here: NUL).

Vigia_Lin · October 31, 2018, 3:04pm

try to use cstring
// In the DataAvailable event:

[code]Dim d As Datagram
d = Me.Read

Dim temp As CString = d.data[/code]

Robert_Weaver · November 2, 2018, 12:57am

Thanks for all the responses. I’ve now used the Midb function to examine every character in the message, and I’m fairly confident that there are no null characters. Part of my concern was based on the way that the data was formatted at the sending end. That code is written in C which has null terminated strings. It was starting with a string which then had to be converted to a byte array which is required by the UDPwrite function. The way that the byte array was dimensioned, it appeared that the terminating null character from the original string was being sent as part of the data.