Why is my web app having trouble with unicode special characters passed in XML

What am I missing here. XmlReader is choking on characters like µ,æ,±,®. I am working around this by having the serving dataabase encode the offending character, ie. “®” to "®”.

This web app is a converted Flash/Flex project which did not have any issues with these characters being passed unencoded. As I understand it, they are all perfectly valid characters in an XML string. Chrome has no problem with any of them.

Is there something I need to do with the received XML before passing it to the XMLReader?

Thanks,

John

Apparently, XML just like HTML is not meant to use non-ASCII characters unmodified.

See http://www.w3.org/TR/unicode-xml/

MBS Plugin has functions to encode/decode for XML/HTML.

Normally XML uses UTF-8, so those special characters can be embedded in XML:
But for putting something in HTML you may use EncodingToHTMLMBS function.

[quote=233470:@Christian Schmitz]MBS Plugin has functions to encode/decode for XML/HTML.

Normally XML uses UTF-8, so those special characters can be embedded in XML:
But for putting something in HTML you may use EncodingToHTMLMBS function.[/quote]

XML as used on the web is different than XML used as database. All Unicode characters must be encoded for the browser to show them. So indeed EncodingToHTMLMBS would solve the issue.

Thanks all for the replies. I have done all my work creating 3 Xojo Web Apps over the last several months without using any 3rd party plugins. Guess it’s time to start. EncodingToHTMLMBS looks to do exactly what I need.

I see that I EncodingToHTMLMBS is included in the MBS Util plugin. Off hand I do not see any need for any of the kits excpet for maybe The Web Starter Kit. Would it be useful for me now that I have already deployed several Xojo web apps? Any other suggestions as I dive into Xojo plugins for the first time?

Thanks,

John

[quote=233514:@John Baughman]Thanks all for the replies. I have done all my work creating 3 Xojo Web Apps over the last several months without using any 3rd party plugins. Guess it’s time to start. EncodingToHTMLMBS looks to do exactly what I need.

I see that I EncodingToHTMLMBS is included in the MBS Util plugin. Off hand I do not see any need for any of the kits excpet for maybe The Web Starter Kit. Would it be useful for me now that I have already deployed several Xojo web apps? Any other suggestions as I dive into Xojo plugins for the first time?
[/quote]

The alternative is to create your own class with the info at https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

That is relatively easy to do with a dictionary for instance. The most tedious will be to enter the characters values.

MBS solution is instant.

@Michel Bujardet The alternative is to create your own class … The most tedious will be to enter the characters values.

Of course, and that is exactly what I was looking to avoid with my post. The plug-in is the only way to go in my book.

John

Cool. Thanks for using the plugin.
I put a lot of work into it, especially to get it encode all the emojis correctly.

@Christian S Cool. Thanks for using the plugin.

I purchased a license for the MBS util plugin, but unfortuanately it is not what I think I need. Should have read the docs a little closer.

Given an xml value that includes the diphthon (ligature) unicode character æ in one of the values, the Xojo XMLReader chokes.

XMLReader works fine if I convert æ to it’s numerical character reference æ before I send the xml to the Xojo web app from my database server.

Unfortunately, EncodingToHTMLMBS converts æ to Ê which also chokes the XMLReader. BTW, I do not think Ê is the correct ISO character reference. I think it should be æ My problem, on the other hand, is that I need æ converted to it’s numerical character reference, not the ISO character ref.

2 Questions for you…

  1. Do you have any other suggestions? If not I will bite the bullet and write a method in the database to handle the problem as Michel suggested and I was trying to avoid.

  2. Does MBS offer refunds? I am guessing that I can find other things in this plugin that will be useful, but in case not…

If anyone is curious as to why æ is even an issue. When writing bibliographies the word encyclopedia is often spelled encyclopædia. I learn somethinng new every day. :wink:

Maybe you start by sending me a test project so I can see what is going on?

And maybe ask me by email about bugs?

MsgBox EncodingToHTMLMBS("")
shows here æ so your text string may have not the right text encoding assigned.

@Christian S shows here æ so your text string may have not the right text encoding assigned

Interesting. I am getting Ê

Ecirc Ê U+00CA (202) HTML 2.0 HTMLlat1 ISOlat1 Latin capital letter E with circumflex
aelig æ U+00E6 (230) HTML 2.0 HTMLlat1 ISOlat1 Latin small letter ae (Latin small ligature ae)

That being said, I tested both numeric references Ê and æ and both gave æ from the Xojo XMLReader.!??! I am so confused. :wink:

So I perhaps the issue with Ecirc may be moot, but you can see what I am getting below…

John


<?xml version="1.0" encoding="UTF-8"?>

snippet of xml sent from database:

<rsrchPlan_Bibliograpy>Thomas, Coan, and Ye Jingbo. Muon Physics. Print.&#13;&#13;&quot;Muon.&quot; Http://www.britannica.com. Encylopædia Britannica, 20 July 2006. Web. 18 Oct. 2015.&#13;&#13;&#13;Lancaster, Mark. &quot;My Favourite Particle: The Muon.&quot; Http://www.theguardian.com/science/life-and-physics/. The Guardian, 14 May 2011. Web. 18 Oct. 2015.&#13;</rsrchPlan_Bibliograpy>

Xojo code request to database and call to EncodingToHTMLMBS

data = socket1.Get("http://"+Host+"/chaos/lookup_Student?"+get, 30) Dim cleanData As String =EncodingToHTMLMBS(data,1)

xml snippet in data as viewed in the Xojo debugger

<rsrchPlan_Bibliograpy>Thomas, Coan, and Ye Jingbo. Muon Physics. Print.&#13;&#13;&quot;Muon.&quot; Http://www.britannica.com. Encylop?dia Britannica, 20 July 2006. Web. 18 Oct. 2015.&#13;&#13;&#13;Lancaster, Mark. &quot;My Favourite Particle: The Muon.&quot; Http://www.theguardian.com/science/life-and-physics/. The Guardian, 14 May 2011. Web. 18 Oct. 2015.&#13;</rsrchPlan_Bibliograpy>

Same xml snippet in cleanData as vewed in the Xojo debugger

<rsrchPlan_Bibliograpy>Thomas, Coan, and Ye Jingbo. Muon Physics. Print.&#13;&#13;&quot;Muon.&quot; Http://www.britannica.com. Encylop&Ecirc;dia Britannica, 20 July 2006. Web. 18 Oct. 2015.&#13;&#13;&#13;Lancaster, Mark. &quot;My Favourite Particle: The Muon.&quot; Http://www.theguardian.com/science/life-and-physics/. The Guardian, 14 May 2011. Web. 18 Oct. 2015.&#13;</rsrchPlan_Bibliograpy>

@Christian Schmitz

After reading the documentation for the util plugin a bit more closely I see now that I could have tested it out before buying. My bad.

John

data = socket1.Get("http://"+Host+"/chaos/lookup_Student?"+get, 30) data = defineEncoding(data, encodings.UTF8) Dim cleanData As String =EncodingToHTMLMBS(data,1)

Please tell Xojo that the data you downloaded is UTF-8.