XML With Repeating Tags to SQL

Mark_Strickland1 · January 26, 2017, 2:44pm

I am trying to figure out how to load a GIANT XML file. Currently I am able to parse all of the Tag Names and Attributes using recursion (I found that suggestion here in the forum). I have figured out the outer most Tag Level that generally corresponds to a SQL Table.

My problem is within what would be a single table there are repeating tags.
Example:

         <DemographicInformation>
            <ParentGuardian>
               <Name>Buaw Paanr</Name>
               <Sex>Male</Sex>
               <LivingIndicator>Yes</LivingIndicator>
               <EducationLevel>5</EducationLevel>
               <Occupation>
                  <OccupationID>63</OccupationID>
                  <OccupationDescription>Installation, Maintenance, and Repair</OccupationDescription>
               </Occupation>
            </ParentGuardian>

            <ParentGuardian>
               <Name>Ahdi Fadmb</Name>
               <Sex>Female</Sex>
               <LivingIndicator>Yes</LivingIndicator>
               <EducationLevel>5</EducationLevel>
            </ParentGuardian>

            <GeneralDemographics>
               <NumberOfDependents>0</NumberOfDependents>
               <Siblings>
                  <Age>22</Age>
                  <Sex>Male</Sex>
               </Siblings>
               <Siblings>
                  <Age>16</Age>
                  <Sex>Male</Sex>
               </Siblings>
            </GeneralDemographics

         </DemographicInformation>

Note that the and tags are repeated. I suppose the there could be a max of 2 tags but there could be any number of tags including none.

I am looking for a generalized method of handling this type of repeating tag as I convert the XML to SQL. If I was to lay out the SQL DB without thinking about the XML I would have a ParentGuardian table and a Siblings table. An alternate approach would be to have a Relationship table and put all types of relationships in the table with codes for Parent, Siblings, etc. In this alternate approach there would be a name column and Siblings would have a NULL name since none was supplied via the XML.

There are a set of tags for that includes various schools attended. This includes one or more tags and one or more for other schools including college. All of these have repeating tags for each course attended with tags for each course like , , , and more. The ends up looking like it is 2 levels deep (School and Course).

At this point I think I can do a complete refresh with each XML file received (truncate and insert without needing any updates).

One thought would be to make a list of tags that were essentially repeating within some other tag. When I encounter that tag I would build a new INSERT using the “saved” database keys (like the student ID number) captured as I parse the XML.

Any tips for transforming XML to SQL would be appreciated.

Thanks.

Scott_Griffitts · January 26, 2017, 4:31pm

Sounds like you need some cross reference tables. I’d put all your people in a “people” table, all your schools in a “schools” table, etc. Each entry in these tables having a unique id, of course. Then create cross reference tables to link these entries together, let’s call it “school_xref”. So for example let’s say Bob attended Acme Elementary and National University. Add Bob to the people table and the two schools to the schools table. Then add Bob’s ID and Acme Elementary’s ID to school_xref. Then add Bob’s ID and National University’s ID to school_xref. At this point, you can query which schools Bob attended. Assuming Bob’s people ID is 1:

Select b.school_name from schools_xref a left join schools b on b.id = a.school_id where a.people_id = 1
That would give you the list of schools Bob attended.

Same with a relationship cross reference table. With a relationship type table and corresponding IDs, you can simply add the relationships that exist, no need for null values if for example the primary person doesn’t have any siblings. Just query the cross reference table for people who have a sibling type relationship with Bob. If no results come back, Bob doesn’t have any siblings.

Mark_Strickland · January 27, 2017, 2:44am

Scott … I like that idea. I have used this type of structure before and called them “bridge tables” but somehow I did not think about using them here.

I once worked with an MS-SQL DB that had XML data in columns. Somebody had the bright idea to fake having an array by putting the XML in a column. The performance was awful and doing a query using the content of the XML was a real mess.

I am just not a fan of XML so I think I had a “mental block” in trying to find a solution.

Thanks.

Antonio_Rinaldi · January 27, 2017, 7:34am

XML is only a human readable structure of a data structure
From what I can see here there is a complete description of your data in one of the two form usually used.
The first form is use the attributes as record fields and nodes for the record (or sub nodes for related record), the second form is use every node as a record or a field like this document.

So you are navigating in your doc, and in a Person node (I suppose) you have some ID field (an identifier for this person) and a SECTION about DemographicInformation (this is a conceptual section not a real data section)

So in your navigation you can have (while reading PERSON NODE) something like:

//PersonNode as XMLNode is the current node dim PersonInfo as XMLNode=PersonNode.firstChild dim PersonDix as New Dictionary while PersonInfo<>nil select case PersonInfo.name case "ID" PersonDic.value("id")=PersonInfo.firstChild.value //OK here you have also to check that the value exists case "DemographicInformation" analyzeDemographicInformation(PersonDic.value("id"), PersonInfo) //Here you should check that ID exists .... end select wend //Now you can save the PersonInfo building the right SQL //you should use a transaction in order to save person info and sub info in all or nothing way
In a similar way continue for the other structures saving after parsing and let the PersonInfo close the transaction

The table structures could be:
PERSON
…other fields

PARENTGUARDIAN
Name Sex LivingIndicator EducationLevel OccupationID

OCCUPATION
OccupationID OccupationDescription

GENERALDEMOGRAPHICS
NumberOfDependents

SIBLINGS
Age Sex

If NumberOfDependents is the unique info of GeneralDemographics and GeneralDemographics is unique for Person you could promote it to the PERSON table (in this case in the analyzeXXXX methods pass the PersonDic instead of only the ID value.

Note that Occupation is a Lookup table. So you have to check if OccupationID is an existing ID in this table before insert or use a replace strategy if the DB let you use it.