XMLReader out of memory

Hi,

I have been using XMLDocument for quite awhile in RB/Xojo and feel pretty comfortable with it. Recently I encountered a task where I needed to read a very large XML document (well, at least large to me) @ 812MB. I figured XMLDocument was going to choke on that so I switched to XMLReader. I am getting what I want with smaller sample files, but when I .parse against the larger one it dies pretty much instantly with an out of memory error. The individual records themselves are pretty small… circa 16 elements, but I guess that does not/should not matter. I was assuming that almost noting would be retained in memory, and I see non Xojo Expat users only starting to complain about memory at multi gig files. I am using XOJO 2014r1.1 on OS X. I can probably hand chop this file into parts but would rather understand a limitation or find a solution as I will enounter this again. Thanks in advance.

tf

Are you using parse with a string or a file ?
I’ve used the file based one with really enormous files.

File, sorry for not being clear.

dim f as FolderItem
f=GetOpenFolderItem("")
if f<>nil then
myxmlreader=new Globals.tfxmlreader
myxmlreader.parse f
end if

It raises an exception almost immediately and at that point

Base is empty
CurrentByteCount is 0
CurrentByteIndex is -1
CurrentColumnNumber is 0
CurrentLineNumber is 1
ErrorCode is 1

So I am guessing it is not even starting. Hmm… there is no encoding defined in the header, let me look into that.

Well, encoding made no difference. For what it is worth I can’t load this file in several java based xml editors without also getting an out of memory error. Maybe for my purposes I just roll my own with textinputstream.

I’m having exactly the same problem with a 600Mb file. The wrinkle with my issue is that it works fine on Mac OS X, but throws the “outOfMemory” exception immediately when running in Windows.

Any suggestions would be very much appreciated

I’m loading render data xml’s and I’ve had no problem

[code] #if TargetWin32 then
dim fr as folderItem=GetFolderItem(Main.Savetxt.Text +“temp.xml”,FolderItem.PathTypeNative)
#elseif TargetMacOS
dim fr as folderItem=GetFolderItem(Main.Savetxt.Text +"/"+“temp.xml”,FolderItem.PathTypeNative)
#endif
dim xDoc as new xmlDocument
try
xDoc.LoadXml(fr)
catch
xDoc=nil //an error occurred
end try
dim SelectCamera as String
dim myFocalLength as String
dim Resolution as String
dim names() as String
dim values() as String
if xDoc<>nil then
dim xq as XmlNodeList
dim xt as XmlNodeList
xq=xDoc.DocumentElement.Xql(“Object/Object[@Type=‘Camera’]”)
xt=xDoc.DocumentElement.Xql(“Object[@Type=‘Scene’]”)
dim i,n as integer
n=xq.Length-1

etc
etc
[/code]

this is windows and Mac

Nige, we are not talking about the XML DOC/DOM model, but instead, the event driven reader.

And the issue is not if it works, but if it works on very large files, which is what it is supposed to be good for (amongst other things).

My “solution” to my original problem was to create my own file reader, which would read off blocks of characters until it found a top level element, and then passed those elements into a standard xmlDocument.

I’ve used it on XML documents of about 150Mb
Nothing larger though so I cant speak from experience there

This: [quote]The strength of the SAX specification is that it can scan and parse gigabytes worth of XML documents without hitting resource limits, because it does not try to create the DOM representation in memory.[/quote] is from an [/code]]article on DexX about the SAX and DOM parsing approaches.

Is it possible that there is something in your event handlers that is causing the problem, Todd? Have you tried “parsing” the file with nothing in any of the reader’s handlers? If it still fails, it sounds like a feedback-worthy report to me.

Ok, I’ll give it a shot… as luck would have it the customer wants the data massaged in a different way so I will be looking at it again next week. If it holds then I will be happy to provide the sample file. It is all Danish hockey and football data so I don’t think there is a national security issue.