XML Nodelist question

Alexander_van_der_Linden · February 24, 2014, 1:56pm

I have this XML file (content.opf, part of an ePub package)

<?xml version="1.0" encoding="UTF-8"?> <package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="uuid_id"> <metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata" xmlns:dc="http://purl.org/dc/elements/1.1/"> <meta name="calibre:rating" content="2"/> <meta name="calibre:series_index" content="1"/> <dc:language>UND</dc:language> <meta name="calibre:timestamp" content="2010-01-05T22:00:00"/> <dc:title>Avontuur in het Verleden</dc:title> <meta name="cover" content="cover"/> <dc:date>2010-01-05 00:00:00+01:00</dc:date> <dc:contributor opf:role="bkp">calibre (0.6.31) [http://calibre-ebook.com]</dc:contributor> <dc:identifier id="uuid_id" opf:scheme="uuid">ec46162e-c1e6-48a9-8a31-1c95db70ba9e</dc:identifier> <dc:identifier opf:scheme="ISBN"></dc:identifier> <dc:description></dc:description> <dc:creator opf:role="aut" opf:file-as="Anderson, Poul">Poul Anderson</dc:creator> <dc:publisher>Prisma-boeken</dc:publisher> <dc:subject>Science Fiction</dc:subject> </metadata> <manifest> <item href="Ops/1.html" id="id1" media-type="application/xhtml+xml"/> <item href="Ops/2.html" id="id2" media-type="application/xhtml+xml"/> <item href="Ops/3.html" id="id3" media-type="application/xhtml+xml"/> <item href="Ops/4.html" id="id4" media-type="application/xhtml+xml"/> <item href="Ops/5.html" id="id5" media-type="application/xhtml+xml"/> <item href="Ops/6.html" id="id6" media-type="application/xhtml+xml"/> <item href="Ops/7.html" id="id7" media-type="application/xhtml+xml"/> <item href="cover.jpg" id="cover" media-type="image/jpeg"/> <item href="stylesheet.css" id="css" media-type="text/css"/> <item href="titlepage.xhtml" id="titlepage" media-type="application/xhtml+xml"/> <item href="toc.ncx" media-type="application/x-dtbncx+xml" id="ncx"/> </manifest> <spine toc="ncx"> <itemref idref="titlepage"/> <itemref idref="id1"/> <itemref idref="id2"/> <itemref idref="id3"/> <itemref idref="id4"/> <itemref idref="id5"/> <itemref idref="id6"/> <itemref idref="id7"/> </spine> <guide> <reference href="titlepage.xhtml" type="cover" title="Cover"/> </guide> </package>

I want to read all ‘dc:’ elements
My code:

[code]'opens the XML file

dim xmlDoc as new XmlDocument
dim aNodeList as XMLNodeList
dim i,j as Integer
xmlDoc.LoadXML(XMLFile)

'perform xql query
aNodeList = xmlDoc.Xql("/package[1]/metadata[1]/*")

'loop through nodelist and check siblings
for i = 0 to aNodelist.Length-1
for j = 0 to aNodeList.Item(i).ChildCount-1
Msgbox(aNodeList.Item(i).Name)
next
next[/code]

That performs ok, I get all the dc: info but I want the content of the specific sibling, not only the name.

aNodeList.Item(i).ToString does not work, that shows the total node information. I think it must be something like

aNodeList.Item(i).GetAttribute(theAttribute) But what if I don’t know the name of the attributes?

Tim_Jones · February 24, 2014, 4:23pm

I’m also trying to work my way through more complex XML examples and running into a serious lack of documentation from a Xojo perspective. However, this is an oddly formatted XML and appears to adhere to some specific schema used by the epub reader.

[quote=67313:@Alexander van der Linden]<itemref idref="titlepage"/> <itemref idref="id1"/> <itemref idref="id2"/> <itemref idref="id3"/> <itemref idref="id4"/> <itemref idref="id5"/> <itemref idref="id6"/> <itemref idref="id7"/>[/quote]
I wonder if this is something that the Xojo XML parser can cope with or if we’re looking at another limitation of the Xojo native implementation similar to the RTF limitations.

Tim

Alexander_van_der_Linden · February 24, 2014, 7:40pm

Hello Tim,

A couple of months ago I got very frustrated with parsing XML, so I gave it a rest and yesterday I started again. But it is a very terse matter. Trial and error, not in the least because of the ambigious formats allowed in ePubs.

I think that Xojo can handle this. It is a matter of digging into the possibilities, it is up to the programmer to handle the format. The examples shown are too simple.

Paul_Lefebvre · February 24, 2014, 8:16pm

So for “dc:language” you want to get back “UND”?

If so, you need to get the next node in the hierarchy. Perhaps something like this:

[code] 'opens the XML file
Dim f As New FolderItem(“ePub.xml”)

dim xmlDoc as new XmlDocument
dim aNodeList as XMLNodeList
dim i,j as Integer
xmlDoc.LoadXML(f)

'perform xql query
aNodeList = xmlDoc.Xql(“/package[1]/metadata[1]/*”)

'loop through nodelist and check siblings
for i = 0 to aNodelist.Length-1
for j = 0 to aNodeList.Item(i).ChildCount-1
Dim n As XmlNode
n = aNodeList.Item(i)

  Dim nodeContents As String
  nodeContents = n.FirstChild.Value
  Msgbox(nodeContents)
next

next[/code]

XML Is just a big hierarchy. You just need to keep going deeper until you get what you want.

Alexander_van_der_Linden · February 24, 2014, 8:25pm

Hello Paul,

Yes indeed. Well ‘UND’ in this context means undefined which is a valid value. But the ‘dc:title’ should return ‘Avontuur in het verleden’ (which is a very good read ‘Guardians of Time’). Maybe I lack some deeper understanding of XML but, if I can retrieve the ‘dc:title’ as valid sibling name then is the text of that sibling on the same level though?

Tim_Hare · February 24, 2014, 8:52pm

No. It’s always one level deeper. The text is it’s own element.

Paul_Lefebvre · February 24, 2014, 9:05pm

It does using the code I posted.

Alexander_van_der_Linden · February 24, 2014, 9:07pm

I can’t test it right now but that has helped me much!

Thanks!

Emile_Schwarz · February 25, 2014, 9:13am

Remember xDev Magazine.

It have articles on XML in issues 3.2 (2004-11), 3.3, 4.3 (2006-1), 4.6, 7.2 (2009-01), 8.3 (2010-03) and 11.5 (2013-09/10).

Get an eye here.

Data collected with The Completist.