XML Nodelist question

I have this XML file (content.opf, part of an ePub package)

<?xml version="1.0" encoding="UTF-8"?> <package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="uuid_id"> <metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata" xmlns:dc="http://purl.org/dc/elements/1.1/"> <meta name="calibre:rating" content="2"/> <meta name="calibre:series_index" content="1"/> <dc:language>UND</dc:language> <meta name="calibre:timestamp" content="2010-01-05T22:00:00"/> <dc:title>Avontuur in het Verleden</dc:title> <meta name="cover" content="cover"/> <dc:date>2010-01-05 00:00:00+01:00</dc:date> <dc:contributor opf:role="bkp">calibre (0.6.31) [http://calibre-ebook.com]</dc:contributor> <dc:identifier id="uuid_id" opf:scheme="uuid">ec46162e-c1e6-48a9-8a31-1c95db70ba9e</dc:identifier> <dc:identifier opf:scheme="ISBN"></dc:identifier> <dc:description></dc:description> <dc:creator opf:role="aut" opf:file-as="Anderson, Poul">Poul Anderson</dc:creator> <dc:publisher>Prisma-boeken</dc:publisher> <dc:subject>Science Fiction</dc:subject> </metadata> <manifest> <item href="Ops/1.html" id="id1" media-type="application/xhtml+xml"/> <item href="Ops/2.html" id="id2" media-type="application/xhtml+xml"/> <item href="Ops/3.html" id="id3" media-type="application/xhtml+xml"/> <item href="Ops/4.html" id="id4" media-type="application/xhtml+xml"/> <item href="Ops/5.html" id="id5" media-type="application/xhtml+xml"/> <item href="Ops/6.html" id="id6" media-type="application/xhtml+xml"/> <item href="Ops/7.html" id="id7" media-type="application/xhtml+xml"/> <item href="cover.jpg" id="cover" media-type="image/jpeg"/> <item href="stylesheet.css" id="css" media-type="text/css"/> <item href="titlepage.xhtml" id="titlepage" media-type="application/xhtml+xml"/> <item href="toc.ncx" media-type="application/x-dtbncx+xml" id="ncx"/> </manifest> <spine toc="ncx"> <itemref idref="titlepage"/> <itemref idref="id1"/> <itemref idref="id2"/> <itemref idref="id3"/> <itemref idref="id4"/> <itemref idref="id5"/> <itemref idref="id6"/> <itemref idref="id7"/> </spine> <guide> <reference href="titlepage.xhtml" type="cover" title="Cover"/> </guide> </package>

I want to read all ‘dc:’ elements
My code:

[code]'opens the XML file

dim xmlDoc as new XmlDocument
dim aNodeList as XMLNodeList
dim i,j as Integer
xmlDoc.LoadXML(XMLFile)

'perform xql query
aNodeList = xmlDoc.Xql("/package[1]/metadata[1]/*")

'loop through nodelist and check siblings
for i = 0 to aNodelist.Length-1
for j = 0 to aNodeList.Item(i).ChildCount-1
Msgbox(aNodeList.Item(i).Name)
next
next[/code]

That performs ok, I get all the dc: info but I want the content of the specific sibling, not only the name.

aNodeList.Item(i).ToString does not work, that shows the total node information. I think it must be something like

aNodeList.Item(i).GetAttribute(theAttribute) But what if I don’t know the name of the attributes?

I’m also trying to work my way through more complex XML examples and running into a serious lack of documentation from a Xojo perspective. However, this is an oddly formatted XML and appears to adhere to some specific schema used by the epub reader.

[quote=67313:@Alexander van der Linden]<itemref idref="titlepage"/> <itemref idref="id1"/> <itemref idref="id2"/> <itemref idref="id3"/> <itemref idref="id4"/> <itemref idref="id5"/> <itemref idref="id6"/> <itemref idref="id7"/>[/quote]
I wonder if this is something that the Xojo XML parser can cope with or if we’re looking at another limitation of the Xojo native implementation similar to the RTF limitations.

Tim

Hello Tim,

A couple of months ago I got very frustrated with parsing XML, so I gave it a rest and yesterday I started again. But it is a very terse matter. Trial and error, not in the least because of the ambigious formats allowed in ePubs.

I think that Xojo can handle this. It is a matter of digging into the possibilities, it is up to the programmer to handle the format. The examples shown are too simple.

So for “dc:language” you want to get back “UND”?

If so, you need to get the next node in the hierarchy. Perhaps something like this:

[code] 'opens the XML file
Dim f As New FolderItem(“ePub.xml”)

dim xmlDoc as new XmlDocument
dim aNodeList as XMLNodeList
dim i,j as Integer
xmlDoc.LoadXML(f)

'perform xql query
aNodeList = xmlDoc.Xql(“/package[1]/metadata[1]/*”)

'loop through nodelist and check siblings
for i = 0 to aNodelist.Length-1
for j = 0 to aNodeList.Item(i).ChildCount-1
Dim n As XmlNode
n = aNodeList.Item(i)

  Dim nodeContents As String
  nodeContents = n.FirstChild.Value
  Msgbox(nodeContents)
next

next[/code]

XML Is just a big hierarchy. You just need to keep going deeper until you get what you want.

Hello Paul,

Yes indeed. Well ‘UND’ in this context means undefined which is a valid value. But the ‘dc:title’ should return ‘Avontuur in het verleden’ (which is a very good read ‘Guardians of Time’). Maybe I lack some deeper understanding of XML but, if I can retrieve the ‘dc:title’ as valid sibling name then is the text of that sibling on the same level though?

No. It’s always one level deeper. The text is it’s own element.

It does using the code I posted.

I can’t test it right now but that has helped me much!

Thanks!

Remember xDev Magazine.

It have articles on XML in issues 3.2 (2004-11), 3.3, 4.3 (2006-1), 4.6, 7.2 (2009-01), 8.3 (2010-03) and 11.5 (2013-09/10).

Get an eye here.

Data collected with “The Completist”.