Trouble with XMLReader Class

Job: Parse a 75 GB XML-File (OpenStreetMap *.osm)
Try: XMLReader.Parse(folderitem)
Result: Crash
Question: Any other Xojo XMLParserClass available?

As always: which platform? Which version of the platform? Which version of Xojo? How does your code look like? What exactly do you want to do? How does the crash look like? Can you use smaller xml files? Do you run out of memory? Have you checked if the MBS plugin has an xml parser?

3 Likes

For such a file, you need to use a reader, which can read data in chunks.
75 GB is no way fitting in memory at once.
At least for a regular computer with a few GB of memory.

And Xojo strings only hold 2 GB.

Ah, you may be able to learn the structure and maybe use binarystream to read sections of it to parse those.

Isn’t that what XMLReader does?

3 Likes

Yes, if you feed it in small chunks and not 75 GB at once!

So he needs to feed it’s partial xml himself? Not seing that in the docs, but it does make sense.

Yes, see XMLReader.Parse function.

This code crashed on large XML-Files:

Try
  Var f As FolderItem = GetOpenFolderItem("")
  If f Is Nil Then Return
  ' TestXmlReader = only SubClass from XMLReader
  ' no additional Code
  Var reader As New XmlDocReader_osm
  reader.Parse(f)
Catch err As XmlReaderException
End Try

This code works:

Try
  Var f As FolderItem = GetOpenFolderItem("")
  If f Is Nil Then Return
  ' TestXmlReader = only SubClass from XMLReader
  ' no additional Code
  Var reader As New TestXmlReader 
  Var ReadStream As BinaryStream = BinaryStream.Open(f, False)
  
  Do Until ReadStream.EOF
    Var mb As MemoryBlock = New MemoryBlock(0)
    mb = ReadStream.Read(1024 * 1024 * 128)
    reader.Parse(mb, False)
  Loop
  
  ReadStream.Close

Catch err As XmlReaderException
End Try

Thanks for help.

Yes, it must crash as you read too much data.

And why do you convert to/from memoryblock where a string would do better?

because it works and no speed difference with this code

Try
  Var f As FolderItem = GetOpenFolderItem("")
  If f Is Nil Then Return
  ' TestXmlReader = only SubClass from XMLReader
  ' no additional Code
  Var reader As New TestXmlReader 

  Var ReadStream As TextInputStream = TextInputStream.Open(f)
  Do Until ReadStream.EndOfFile
    Var s As String = ReadStream.Read(1024 * 1024 * 128)
    reader.Parse(s, False)
  Loop
  
  ReadStream.Close
 
Catch err As XmlReaderException
End Try

TextInputStream Or BinaryStream both Parse with 3.5GB/min.

You should declare Var s As String outside the loop. This should bring you some more speed.

1 Like

You may get yourself a little more speed by eliminating the string copy like this:
reader.parse(readstream.read(1024*1024*128), false)