HTTPSocket text encoding of returned JSON

I’m grabbing some data in JSON format, from data.sparkfun.com. It comes back fine using HTTPSocket. The returned text is unusual in that if I view the string with MsgBox, It contains the number 38 and a diamond with a question mark in it. Lots of those in there, amongst the JSON printable characters. JSONItem won’t parse it.
I assume this is a text encoding problem.
I don’t see any mention of text encoding in the received string (as it typically is on an HTML page), and I don’t see any text encoding properties in the HTTPSocket. So I am looking at the String properties related to TextEncoding, Encodings, etc. I’m trying to set the encoding of the string to ASCII, but not finding the syntax.

Can anyone suggest how to best solve this problem? Running on OSX.

To set the encoding to ASCII, you would do this:

Dim s As String = DefineEncoding(originalString, Encodings.ASCII)

You’ll want to use the correct encoding, of course. It may actually be UTF8 rather than ASCII.

Thanks Paul

Yes that works. It seems to turn the diamond/question mark character into a newline.

I also tried simply scanning the string for non-printable characters, and building a new string from only printable. That worked too.

Oddly, the 38’s are actual ASCII 3’s and 8’s. Which have no business being in a JSON string, at least not outside the brackets. There are 39’s and 3a’s also. It’s something like:
38
[{…}
38
,
{…}
3a
,
etc

I don’t see these as being valid hex control characters. Perhaps the HTTPSocket is treating the returned string as a different encoding then it is, perhaps translating spaces or other control characters into ASCII digits?
BTW this data is coming from Phant. This thread may be veering off-topic, unless perhaps this has to do with the way HTTPSocket behaves.

You have to know what encoding the incoming string has and tell Xojo that by using DefineEncoding. In you case this means, that you must know the encoding used by the uploader of the string.

[quote=166131:@Tom Dowad]Thanks Paul

Yes that works. It seems to turn the diamond/question mark character into a newline.

I also tried simply scanning the string for non-printable characters, and building a new string from only printable. That worked too.

Oddly, the 38’s are actual ASCII 3’s and 8’s. Which have no business being in a JSON string, at least not outside the brackets. There are 39’s and 3a’s also. It’s something like:
38
[{…}
38
,
{…}
3a
,
etc

I don’t see these as being valid hex control characters. Perhaps the HTTPSocket is treating the returned string as a different encoding then it is, perhaps translating spaces or other control characters into ASCII digits?
BTW this data is coming from Phant. This thread may be veering off-topic, unless perhaps this has to do with the way HTTPSocket behaves.[/quote]
Wow I thought this happened only to me. Many times when I do requests with HTTPSocket I get some random numbers at the beginning. And sometimes, line endings. Weird stuff. I thought it was caused by the specific websites I was connecting to, that’s why I didn’t file a feedback.

HTTPSocket does not know the encoding of the data it receives. It just gets data (which is places in a String). It is your responsibility to know the encoding of your input data and then to tell Xojo the encoding of the data (using DefineEncoding) so that it knows how to deal with it.

The new framework makes this much clearer because the data coming in is put into a MemoryBlock, which you have to specifically convert to Text by supplying the encoding.

I can’t see any way to tell the HTTPSocket about the encoding, even if I knew what it was.
I tried using Google Chrome, as it has an encoding menu item. It indicated “Western(Windows-1252)”. I don’t see this in the Xojo Encodings list. It appears to be synonymous with ASCII or ANSI. I tried Encodings.ASCII and Encodings.WindowsANSI. Same result.

The source code for Phant is at GitHub, I haven’t yet figured out what format it is transmitting in, but I can’t see any reason for it to be anything buy plain text.

If I use the same query in Google Chrome, and view the source, no 38’s. I’m beginning to suspect HTTPSocket. Perhaps time for trial-and-error on a workaround. Use some other technique. Or maybe I just have to parse the text and remove anything that doesn’t fit the proper syntax.

This is the encoding of the Phant website and has nothing to do with the data transferred to the HTTPSocket.

Please re-read Paul’s comment: HTTPSocket does not know the encoding of the data it receives. It gets raw bytes and puts them in a string.

As far as I understand it, one can upload data to Phant and then download it. Data in this case means raw bytes. So you need to find out what the encoding is by asking the person which uploads the data to Phant.

This resembles chunked encoding: a transfer encoding (not a text encoding) where the data is sent in chunks which are preceded by the length of the chunk as a hex integer.

HTTP 1.1 ?

Yes.

I think you nailed it with the chunked encoding. 38 hex is 56 decimal, and there are 56 characters that follow the 38.

I can parse it out.

Thanks for your help.