Job: Parse a 75 GB XML-File (OpenStreetMap *.osm)
Try: XMLReader.Parse(folderitem)
Result: Crash
Question: Any other Xojo XMLParserClass available?
As always: which platform? Which version of the platform? Which version of Xojo? How does your code look like? What exactly do you want to do? How does the crash look like? Can you use smaller xml files? Do you run out of memory? Have you checked if the MBS plugin has an xml parser?
For such a file, you need to use a reader, which can read data in chunks.
75 GB is no way fitting in memory at once.
At least for a regular computer with a few GB of memory.
And Xojo strings only hold 2 GB.
Ah, you may be able to learn the structure and maybe use binarystream to read sections of it to parse those.
Isn’t that what XMLReader does?
Yes, if you feed it in small chunks and not 75 GB at once!
So he needs to feed it’s partial xml himself? Not seing that in the docs, but it does make sense.
This code crashed on large XML-Files:
Try
Var f As FolderItem = GetOpenFolderItem("")
If f Is Nil Then Return
' TestXmlReader = only SubClass from XMLReader
' no additional Code
Var reader As New XmlDocReader_osm
reader.Parse(f)
Catch err As XmlReaderException
End Try
This code works:
Try
Var f As FolderItem = GetOpenFolderItem("")
If f Is Nil Then Return
' TestXmlReader = only SubClass from XMLReader
' no additional Code
Var reader As New TestXmlReader
Var ReadStream As BinaryStream = BinaryStream.Open(f, False)
Do Until ReadStream.EOF
Var mb As MemoryBlock = New MemoryBlock(0)
mb = ReadStream.Read(1024 * 1024 * 128)
reader.Parse(mb, False)
Loop
ReadStream.Close
Catch err As XmlReaderException
End Try
Thanks for help.
Yes, it must crash as you read too much data.
And why do you convert to/from memoryblock where a string would do better?
because it works and no speed difference with this code
Try
Var f As FolderItem = GetOpenFolderItem("")
If f Is Nil Then Return
' TestXmlReader = only SubClass from XMLReader
' no additional Code
Var reader As New TestXmlReader
Var ReadStream As TextInputStream = TextInputStream.Open(f)
Do Until ReadStream.EndOfFile
Var s As String = ReadStream.Read(1024 * 1024 * 128)
reader.Parse(s, False)
Loop
ReadStream.Close
Catch err As XmlReaderException
End Try
TextInputStream Or BinaryStream both Parse with 3.5GB/min.
You should declare Var s As String
outside the loop. This should bring you some more speed.
You may get yourself a little more speed by eliminating the string copy like this:
reader.parse(readstream.read(1024*1024*128), false)