Walk an entire XML tree

Hi All,

Does anyone have a sample project / code for how to walk an XML file?
I need to walk every node from top to bottom and some part of the XML are deeper than others.

Thanks.

The XMLReader class does just that, it calls certain methods to indicate start and end tags:
https://documentation.xojo.com/api/text/xml/xmlreader.html

Sub class it and add methods to access various parts, with attributes decoded and passed to the methods. You need to maintain certain states as you work through the nodes but you can add properties to your subclass to do that.

There’s a basic sample project in the Example Projects folder at
Example Projects/Text/XML.xojo_binary_project

There’s an XMLReader sample project also. XMLReader can cope with huge XML files without running into the memory limitations with the other XML methods. I also find it better for certain types of XML data.

Most of the work will be done using:

StartElement
EndElement

If there is text between XML tags such as some text then, you will need:

Characters

Thanks @Ian_Kennedy and @Tim_Parnell .
I am away from my laptop at the moment, but from what I recall about both of those projects (correct me if I am wrong) is that neither actually walks the whole tree.
One of them, I don’t recall which, shows the top nodes and then as you click on each node you are able to drill down.
I am looking for an example or the understanding of how to iterate over the whole tree which is ā€œraggedā€ ~ i.e. the depth of the hierarchies vary.

I looked at XML.xojo_binary_project briefly, and it did show the whole tree.

Screen Shot 2024-05-31 at 12.52.25

Edit: Oh I see the misunderstanding.

You would need to combine the ideas of loading and the ExpandRow event to walk the whole tree.

Depending what the OP tries to solve, manipulating a DOM might be easier. Thats via XMLDocument. MBS even has XML-Functions to traverse via ForEach.

I was planning to re-examine those anyway, but now I will start with that one. Thanks Tim.

XMLReader does exactly that. It sends each node to StartElement and every closing of a node to EndElement. So for example:

<section> ' Calls StartElement
<subsection> ' Calls StartElement
<cell>Text</cell> ' Calls StartElement, then Characters, then EndElement
<cell>Text 2</cell> ' Calls StartElement, then Characters, then EndElement
<subsection/> ' Calls StartElement and EndElement
<subsection/> ' Calls StartElement and EndElement
</section> ' Calls EndElement

It is ideal for exactly what you want.

I do have the MBS plugins. A ForEach iteration is helpful.
Thanks!

Oh wow! That is interesting! Thanks!

You can even use it with a FolderItem directly and it deals with reading the file in a chunk wise fashion.
Instantiate the subclass using the file and then call ā€˜Parse’.

1 Like

OK, so I am playing with the StartElement in the XMLReader.
My XML is more / less like this:

<order>
     <id-00001>
          <order-date>2024-06-01</order-date>
          <item-list>
               <id-00001>
                    <product_cd>AJ589</product_cd>
                    <quantity>3</quantity>
                    <unit-cost>31.99</unit-cost>
                      ...
     </id-00001>
     <id-00002>
         ...
</order>

I am unsure how to trigger the code to drop down and check for the other values
That is to say I don’t know how to check for or grab the id numbers shown above.
Any ideas?
Thanks.

If you post an actual sample of the XML people can help.
Trying to infer from cropped / invalid snippets is expecting a bit much.

In StartElement you get 2 parameters passed in Name as String and attributeList as XMLAttribute.

Name is the tag name within the XML. So in your example ā€œorderā€, ā€œid-00001ā€, ā€œid-00002ā€ etc.

Typically you use a a case statement to decide what you need to do:

Select Case name
  Case "order"

  Case "order-date"
  Case "item-list"
End Select

Attribute list is a Dictionary of the properties of the XML objects. For example:

<Dataset Cols="10" Rows="20"/>

Would produce a Dictionary which you can access as follows:

Select case name
Case "Dataset"
   cols = Val( attributeList.Value( "Cols" ) )
   rows = Val( attributeList.Value( "Rows" ) )
etc.

In terms of your XML ID elements would be better as (assuming you are able to change this)

<order>
     <id number="00001">
          <order-date>2024-06-01</order-date>
          <item-list>
               <id-00001>
                    <product_cd>AJ589</product_cd>
                    <quantity>3</quantity>
                    <unit-cost>31.99</unit-cost>
                      ...
     </id>
     <id number="00002">
         ...
</order>

Then you can work with Name as id and read the number from the attribute list.

For items like this:

<order-date>2024-06-01</order-date>

You need to implement the Characters event it will provide access to the 2024-06-01 part.

I understand Tim. The format is actually proprietary so I don’t think I can actually post it publically. So I just have to do my best.

Hi Ian, unfortunately I cannot make any changes to the XML structure.
Thanks for this sample code. I will play with this a bit and see how much closer I get.

THANK YOU!!!

As you can see you need to add properties to your class in order to keep track of where you are within the XML. For example for Order date you need to know which order it is a part of so you need something like:

Select Case name
  Case "order"
      WithinOrder = True
  Case "id-00001" to "id-99999"
       if WithinItemList then
         ' Deal with ID within an item list
      else
         CurrentOrderNumber = val( name.middle( 3 ) )
      end if
  Case "order-date"
       ' Now you can use CurrentOrderNumber to decide which Order you are dealing with.
  Case "item-list"
      WithinItemList = True
End Select

Add a EndElement and do the following:

Select Case name
  Case "order"
      WithinOrder = False ' When we get </order> reset the WithinOrder property
  Case "item-list"
      WithinItemList = False ' When we get </item-list> reset the WithinItemlist property
End Select

You’re effectivly building a state machine so you can best decide what to do with the elements you read.

I don’t intend to come across as mean. I just wouldn’t want to assume I can mutate your input in any way, and wanted to be able to build an example from your actual input.

Tim I did not think you were coming across mean at all. I just assume people here are speaking well and kindly until I see them not doing that. :slight_smile:
I understand where you are coming from and nothing would make me happier than to give you guys the complete XML file and very quickly help me drill down to my issue, but I can’t do that and I know you get that. :slight_smile:

1 Like