Faster XML DOM?

Curious. We’re running into some performance issues parsing and processing large XML data sets using the Xojo XML Dom.
Is there something out there thats faster?

Thanks
Gary

Don’t use that - use the XMLReader which doesn’t require loading the entire XML into memory and can be really fast
It’s event driven & I’ve used it to load the old iTunes XML file that was 180 mb in about 2 minutes

Thanks Norman

So I’m using Nodelist’s, is that still going to work?

No

The reader is entirely different and you get different events for different nodes.
It puts more work on you but because you don’t have to load the entire document you can load larger documents.
It DOES really force you to rethink HOW you process an XML document and YOU have to track states etc.

The XML Reader is a completely different beast. I found the learning curve to be somewhat painful. BUT, once you figure it out it is fast.

Thanks guys. I’ve never used this, so I’ll need some examples. Are there any out there?

How big is the document?

The XMLReader will be faster for just about any size document BUT at the expense of being able to walk the DOM

This is based on experience with a 150 Mb iTunes library file
The XML Document simply barfed trying to load it
The XML Reader sped through it

You are right Norman.
But I was asking Gary’s document size.

With big documents is a matter of find the balance between loading speed and kind of work to be done.
XML DOM is more flexible than XML Reader, has more feature (XQL for instance), but becomes very slow as the document becomes larger.
XML Reader is faster but also more rigid, it’s the only choice with very big documents.

With medium large document is a matter of work to be done. If you have to do many “queries” against the document DOM is better (but you have to wait the load time), if you only need to read the data and transfer them in your software Reader is much better.

So below is my XML “walk” routine that can happen on an adoXML string that could be HUGE. Sometimes upwards to 80Mb’s.

Yeah, I know thats a lot… but the reality is, I don’t have much control over how many “customers” are going to come back based on the query to the database. I’ve thought about breaking this operation up and splitting out the adoXML into a single customer XML result – but that would require multiple hits to the database to get the result, or taking the XML result and parsing it up into individual XML result sets… which both operations are costly.

So, a faster method would be a better way to go, as I think the DOM is just slowing me down. XMLReader sounds interesting, but how the hell do I do something like this?

xmldoc.LoadXml (adoXML)
Dim nodelist As XMLNodeList
Dim count,i as Integer
nodelist = xmldoc.Xql("//z:row/@custnum")
count = nodelist.Length
SendToLog ("Strart XML DOM Customers")
if nodelist <> nil then
  for i = 0 to count - 1
    try
      Dim node as XmlNode
      Dim Customer as New Dictionary
      CustNum =Val(xmldoc.xql("//z:row/@custnum").Item(i).Value)
      Customer.Value ("alt_customer_id") = CustNum
      Customer.Value ("lastname") = xmldoc.xql("//z:row/@lastname").Item(i).Value
      Customer.Value ("firstname") = xmldoc.xql("//z:row/@firstname").Item(i).Value
      Customer.Value ("email") = Trim (xmldoc.xql("//z:row/@email").Item(i).Value)
             
      node = xmldoc.Xql ("//z:row").Item (i)
      xml = node.ToString
      Dim xmlDocCust as New XmlDocument
      xmlDocCust.LoadXml (xml)
      xslt = Me.GetProductImportTemplate (Me.CustomerAdapterName)
      // transform customer xml.
       xml = xmlDocCust.Transform(xslt)
      
      Customer.Value ("xml")  = xml

      // add the customer's record to the SQLite database.

      if Me.AddCustomerToCache (Customer) then
        // increase total customer count.
        nTotalCustomers = nTotalCustomers + 1
      end if
    
    catch XmlException
      OutputToUser ("Error: " + XmlException.Message)
      return false
    end try
  next i
  nRecord  = App.Controller.Config.SyncCustomersMax
end if
// end.

Maybe it would be faster and easier to extract the data into its own mini-SQLite database rather than using XML?

Since the result is consistently formed you can

  1. when you get the node that is a Customer you could create a new customer instance (if you have such a thing) & then store the various tags into it as you receive them. You’d then have to write code to mimic the xql queries

  2. you could instead create an in memory sqlite db (or on disk) & do similar to 1 and put the data in the db tables as you get it
    Then everything is just db queries locally

I did something like this for huge iTunes xml files and loaded it into a sqlite db then used the DB
Way faster

Thanks for the reply.

The issue is this. The adoXML is just that, XML from an ADO record set off a SQL Server.

So I’m doing the COM access through ADO via Xojo. There record set returned COULD be jammed into a Sqlite database,
but thats still going to have to loop around 77,000 rows and create it. I’m not sure thats going to be a performance benefit
over ADO saving the result set is persisted to ADO XML (format) and then grabbing that and running through it – as you see from the above code.

Would be awesome is ADO allowed for a persist on SQLite. Then my job would be easy. Nah, thats too easy.

So let me try these suggestions.

But you don’t think XMLReader would help me here? I’m also struggling to find an example code base to pull from.
Anybody have a nice example of XMLReader?

Thanks again
Gary

loading the XML into and XML Document via

xmldoc.LoadXml (adoXML)

is slow esp with very large documents

77,000 rows isn’t that big - the XML file I did this with started out as 150Mb and had about 200,000+ “rows”
But you do have to parse the data in a way you didn’t have to using the XMLDocument

Well, its 77,000 rows, but about 80Mb’s of data… and I would agree, 77K is not that much.

So the loadXmL isn’t that bad, its the XQL and node walking thats killing me.

I don’t mind implementing an XMLReader version of this, but I’m sort of looking at the class and scratching my head.

Another thing to keep in mind about XMLReader is that it does not include an XMLWriter. If you need to create or modify XML, the DOM-based classes or direct string operations are needed. As for examples, they tend to be written in Java, but they should be out there on the 'net–noting that the technology is known generically as SAX.

[quote=44274:@Gary MacDougall]Well, its 77,000 rows, but about 80Mb’s of data… and I would agree, 77K is not that much.

So the loadXmL isn’t that bad, its the XQL and node walking thats killing me.

I don’t mind implementing an XMLReader version of this, but I’m sort of looking at the class and scratching my head.[/quote]

well the queries in a db would be a lot quicker

77K rows is not too much
Probably is the XQL that’s killing performance.
you could try to log time execution to understand where you loose performance.

I don’t know the structure of your document (so to have a better navigation time), however reading your code I think that asking the same XQL too much time.

Here you are requesting all the z:row nodes that have a custnum attribute
Then you repeat the XQL to get the value, and other info.

Maybe (but I repeat I don’t know the structure of your document)
you could do:

[code]nodelist = xmldoc.Xql(“//z:row”)
count=nodelist.length-1
for i=0 to count
try
Dim node as XmlNode
Dim Customer as New Dictionary
CustNum =nodelist.Item(i).getAttribute(“custnum”).val
Customer.Value (“alt_customer_id”) = CustNum
Customer.Value (“lastname”) = nodelist.Item(i).getAttribute(“lastname”)
Customer.Value (“firstname”) = nodelist.Item(i).getAttribute(“firstname”)
Customer.Value (“email”) = nodelist.Item(i).getAttribute(“email”).trim

      xml = nodelist.Item(i).ToString
      Dim xmlDocCust as New XmlDocument
      xmlDocCust.LoadXml (xml) //Maybe an import a clone(true) node is faster...
      xslt = Me.GetProductImportTemplate (Me.CustomerAdapterName)

…[/code]