Stopping XMLReader.Parse before end of the file

I have an XMLReader class that reads XLSX worksheet files and can use it to read the entire file into my application. I’m trying to implement a preview function using the same code and want it to stop after 10 rows worth of data. Spotting the 10th row is easy, however, I can’t seem to find a way of telling the XMLReader to stop reading and return what it has done so far.

I can see a Reset method but calling that from within StartElement causes a hard crash of the application. Has anyone a solution for putting the breaks on. I suppose raising an exception could be a way out but it seems a little extreme.

Can you set the object reference to nil?

Which object? The XMLReader isn’t passed to the StartElement method. If I try:

Self = Nil

I get a syntax error.

But the XMLReader reads a FolderItem, is this correct? Maybe you can just Nil this Object and handle the Exception within the XMLReader Object?

The XMLReader reads a FolderItem, yes. It does that by you calling the Parse method, after that you are within the XMLReader methods. Within that you have no access to the FolderItem object.

Either way if you are going to cause an exception I may as well just cause one myself. For now I’ve subclassed RuntimeException and captured that around the parse point. Allowing me to exit when I need to.

I notice that Expat (which XMLReader uses) does have a XML_StopParser, but it isn’t exposed to Xojo.

IIRC, Expat is fairly old. I know that MBS has an XML reader and I’d imagine it’s a lot newer. But I have no experience with it. And if you’re not an MBS user then sorry for the waste of bandwidth. :slight_smile:

Expat was last updated in 2022, so not that old. I’m afraid I can’t use MBS, sorry. Thanks for the suggestion.

I’ve filled a feature request.

XMLReader should expose the XML_StopParser method

Ok, how about this:

Make a subclass of the XMLReader class and override the Parse method like this:

Private Sub Parse(s As String)
  Dim lines() As String = Split(s, EndOfLine)
  
  While UBound(lines) > -1 and not mStopped
    Super.Parse(lines(0), False)
    lines.RemoveAt(0)
  Wend
End Sub

Then whenever you get to the line where you want to stop, you set mStopped = True.

Unfortunately Excel files can megabytes in size… so not really practical…

ok, so take the logic outside of the parser then. You could read the file externally using a TextInputStream and just feed it in chunks to the parser. or just override the Parse(f as Folderitem) method.

For now I’ll stick with wrapping the parse command in a try catch block and triggering an exception subclass:

Try
  Me.oWorksheet = New XLWorksheetLoader( oXLLoader, oPreviewDataset, LoadFileThread )
  Me.oWorksheet.Parse Me.oWorksheetFile

Catch oError As PreviewStopException
   ' Ignore this exception
End Try

Within StartElement trigger, when required:

if ReadEnough then
   Raise New PreviewStopException
end if

You still might want to try it. I just did it here with a 1MB file and it took 5166 microseconds to read the file and 0.16 seconds to parse the first 10 lines.

Well my exception method doesn’t seem to work. All the happens is that the exception is raised but there’s then a massive delay with nothing operational until file read is complete. Trying your scheme now. One slight wrinkle is that there is no EndofLine within the file so I’ll have to split it another way.

I’m not sure if a partially rendered XML can be used.

<main>
  <whatever>
      <wow>
      </wow>
  </whatever> ## BREAK
</main>

If you read until </wow>, it means that <main> is inconsistent and the entire parse should be considered a fail and should not be used.

I was wondering about that also. That said. I need to read 10 rows of data. I can read the file until I hit </ row> of the 10th time. I can make the file complete by simply adding “</ sheetData></ worksheet>” (spaces added to make it visible in the forum) to the end of it. Everything should then be conformant.

I’ve just checked with a worksheet with 183000 rows in it and the file size reduces from 390MB to 25KB.

If you know the structure and how to hack it for a fake completeness, you could create a CreatePreviewXML(in As FolderItem, out As FolderItem)

in reads a XML and write into out (a temp file). Once enough data got in, you “complete” it with the necessary information and close both files.

Then you use “out” as the “in” to your PreviewParser(in)

Yes, that was my plan. I don’t want to mess with multiple readers.

Maybe you need to annotate all open tags in order. And remove once they were closing. Once you reach your desired goal, you could just dump the remaining matching closing tags in order. So in case of receiving something “different” of the expected, with extra tags, the routine would, possibly, maintain it readable, not sure if consistent.

No given the only item reading this is my code I’m safe just adding the close tags. It will work fine. It also prevents any issues with my EndDocument method.