I used the following in Version 2020 Release 1 to retrieve an html page from the internet and saved it to a file.
URLConnection1.Send(“GET”, url, myfolderitem)
The file was retrieved and saved but when I opened it, as well as others, in BBEdit I receive an “Incorrectly Formed UTF-8” warning. The meta statement in the html page is
meta charset=“utf-8”
I did not receive this warning message in the past after retreiving these files in prior versions of Xojo using Xojo.Net.HTTPSocket.
I am unsure where to start looking for the problem. Any suggestions?
Hi Chris!
Which OS are you seeing this on? I just tested on macOS Mojave and the file loaded properly for me both in the browser (as an HTML file) and in Sublime Text:
var f as FolderItem = SpecialFolder.Desktop.Child( "test.html" )
var u as new URLConnection
u.Send( "GET", "https://example.com", f )
I am using Catalina 10.15.6
When I used your code to retrieve https://example.com it worked fine. There was no warning message when I opened the downloaded file in BBEdit.
That’s interesting. Can you private message me the URL you’re using so I can try it?
Using the URL you supplied, I’m able to open the resulting page (which is a 404 for me) in Sublime Text, Chrome, Firefox, Safari. Not sure what’s going on here. Are you able to open the resulting file in a browser?
Hello Chris and Anthony,
My response assumes nothing is different in Xojo’s handling of the file.
I’ve learned not to trust the format of content based on document headers.
I recommend testing the validity of the file format during processing calling gnu iconv or equivalent library from the shell.
iconv will output (from memory, 0 for success and 1 for failure and identify the position (maybe byte position) which failed the protocol for troubleshooting. Maybe try iconv -f UTF-8 -t UTF-8 your_file > /dev/null (or direct output elsewhere as you please). PS: haven’t tested but I think that is close.
The file will either pass or fail. If it passes then the issue maybe BBEdit centric, if it fails the issue is in the content.
You could also test between Xojo versions to see if the handling of url content differs between versions.
I hope that helps.
Kind regards, Andrew
Thanks Andrew. I tried that command on one of the files that was giving me problems and it returned:
l
iconv: test.html:165:46: cannot convert
On line 165 at character 40 (not 46) there was character that appeared as an upside down reversed question mark in BBEdit.
I’m not sure what to make of that at the moment. But will look into it further.
Hello Chris, sounds like either a Spanish question mark (start of sentence) or a BOM - byte order mark (but usually they hide at start of file). Google “remove BOM from file [insert name of your operating system]”. Try finding/removing BOM from your example file. Then re-run the conversion using iconv. Hopefully that’s it.
Kind regards, Andrew
I used the terminal and ran “file -I test.html” which generated
“test.html: text/html; charset=iso-8859-1”
Is iso-8859-1 an encoding that I can use to define input in Xojo.
iso-8859-1
is called Encodings.ISOLatin1
in Xojo.
1 Like