XML and CDATA sections

Erik_Pevec · July 28, 2013, 1:41am

I am working on a small tool that will allow me to open an xml file, do some value changes to nodes and save it back in xml. However the opened XML contains some CDATA sections which are converted into text nodes - all fine, but after saving the doc some characters are changed.

Source:
<![CDATA[

my text goes here

]]>

After save I get this:

my text goes heres

Is there any good way to avoid this?
Thanks in advance

Marc_Zeedar · July 28, 2013, 1:58am

To keep it as CDATA, you have to explicitly add it back as a CDATA node before you save. At least, that’s what I recall – it’s been a while since I used CDATA but when I did, I remember running into that. I think Xojo thinks that converting illegal characters to entities is enough and it doesn’t need to be CDATA any more, or maybe it just doesn’t know it was originally CDATA once it’s been loaded into the XML tree: it just knows it’s text data, so when it writes it back out, it’s just text.

Erik_Pevec · July 28, 2013, 2:19am

Thanks for the answer. What is the best practice to add it back? Setting of type is not permitted (to force the CDATA type), and adding the value back as a: child.Value="" doesn’t really change anything. Do you know of a way?

Marc_Zeedar · July 28, 2013, 2:27am

You can use XMLDocument.CreateCDATASection to create a CDATA section, so you should be able to read in the text, make your changes, and then write it back as a CDATA section. If all your text is CDATA, you could write your own wrapper to do that for text nodes automatically.

Erik_Pevec · July 28, 2013, 2:41am

Actually there are 2 different nodes with CDATA value repeating throughout the xml document. They appear as “defaulttext” and “textdetail”. Do you think it would be better to modify the xml.tostring value and write it as a text output stream or is there any way to modify node by node? I am not very sure how to use the createCDATASection.

Marc_Zeedar · July 28, 2013, 2:48am

I wouldn’t modify the xml.tostring value – if you’re going that route, you might as well process the entire XML as text instead of converting to an XML tree.

Working with the createCDATASection is easy – it works just like writing a text node, except you use the createCDATASection command.

If those two text types are the only ones that are CDATA, you could check for those type names and then write the data as CDATA. It might be the simplest to do this by creating two XML trees: the original document that you read in, and a new one that you write out. You’d make any modifications you’d want in the interim. Then you’d just need a routine that steps through all the children of the original XML and copies it to the new XML document, making sure to make your mods and write those “defaulttext” and “textdetail” sections as CDATA. A bit of a hack, to workaround Xojo losing the CDATA, but not too hard to do, depending on the complexity of the original XML.

Erik_Pevec · July 28, 2013, 3:34am

Thanks for the great explanation and I am sorry for my lack of knowledge with xojo Xml CDATA sections . Anyway, when I read in the xml, all the CDATA nodes are converted to text nodes which is fine for me as I can modify the values as needed and all the data is correct. The problem appears in saving the text node.

Based on above and your explanation your suggested approach is to read the xml in, do all the necessary changes to nodes as they are and copy the source xml to an new xml by stepping through the nodes and converting the text nodes to CDATA sections. I did a quick test and my saved xml contained correct data:

<?xml version="1.0" encoding="UTF-8"?><![CDATA[12]]>

Now the problem is with preserving line breaks. Is there any way to do so? Or would you more suggest to convert the nodes to CDATA sections when reading the xml?

Erik_Pevec · July 28, 2013, 4:03am

I just wanted to post that I have found a solution based on above suggestions. Might help also somebody else. By reading the node I know that the nodes with certain names should be CDATA sections and below snippet will create a CDATA section instead of original text node. Line breaks are preserved and xmldoc.savexml works perfect.

    if child.name="defaultText" or child.name="textDetail" then
      dim n as xmlnode
      dim cd as XmlCDATASection
      n=child.AppendChild (xmldoc.CreateCDATASection( child.Child(0).Value))
      child.RemoveChild(child.Child(0))
    end if

Thanks for your help

Marc_Zeedar · July 28, 2013, 5:17am

I believe you could also write this generically by using the XMLCDATASection and isa to see if the section is a CDATA and then use the rest of your code to preserve the CDATA.