Stopping XMLReader.Parse before end of the file

Ian_Kennedy · February 28, 2023, 1:25pm

I have an XMLReader class that reads XLSX worksheet files and can use it to read the entire file into my application. I’m trying to implement a preview function using the same code and want it to stop after 10 rows worth of data. Spotting the 10th row is easy, however, I can’t seem to find a way of telling the XMLReader to stop reading and return what it has done so far.

I can see a Reset method but calling that from within StartElement causes a hard crash of the application. Has anyone a solution for putting the breaks on. I suppose raising an exception could be a way out but it seems a little extreme.

Greg_O · February 28, 2023, 1:31pm

Can you set the object reference to nil?

Ian_Kennedy · February 28, 2023, 1:35pm

Which object? The XMLReader isn’t passed to the StartElement method. If I try:

Self = Nil

I get a syntax error.

Sascha_S · February 28, 2023, 2:51pm

But the XMLReader reads a FolderItem, is this correct? Maybe you can just Nil this Object and handle the Exception within the XMLReader Object?

Ian_Kennedy · February 28, 2023, 3:09pm

The XMLReader reads a FolderItem, yes. It does that by you calling the Parse method, after that you are within the XMLReader methods. Within that you have no access to the FolderItem object.

Either way if you are going to cause an exception I may as well just cause one myself. For now I’ve subclassed RuntimeException and captured that around the parse point. Allowing me to exit when I need to.

I notice that Expat (which XMLReader uses) does have a XML_StopParser, but it isn’t exposed to Xojo.

Bob_Keeney3 · February 28, 2023, 5:40pm

IIRC, Expat is fairly old. I know that MBS has an XML reader and I’d imagine it’s a lot newer. But I have no experience with it. And if you’re not an MBS user then sorry for the waste of bandwidth.

Ian_Kennedy · February 28, 2023, 5:42pm

Expat was last updated in 2022, so not that old. I’m afraid I can’t use MBS, sorry. Thanks for the suggestion.

Ian_Kennedy · February 28, 2023, 5:47pm

I’ve filled a feature request.

XMLReader should expose the XML_StopParser method

Greg_O · February 28, 2023, 7:15pm

Ok, how about this:

Make a subclass of the XMLReader class and override the Parse method like this:

Private Sub Parse(s As String)
  Dim lines() As String = Split(s, EndOfLine)
  
  While UBound(lines) > -1 and not mStopped
    Super.Parse(lines(0), False)
    lines.RemoveAt(0)
  Wend
End Sub

Then whenever you get to the line where you want to stop, you set mStopped = True.

Ian_Kennedy · February 28, 2023, 7:17pm

Unfortunately Excel files can megabytes in size… so not really practical…

Greg_O · February 28, 2023, 7:18pm

ok, so take the logic outside of the parser then. You could read the file externally using a TextInputStream and just feed it in chunks to the parser. or just override the Parse(f as Folderitem) method.

Ian_Kennedy · February 28, 2023, 7:26pm

For now I’ll stick with wrapping the parse command in a try catch block and triggering an exception subclass:

Try
  Me.oWorksheet = New XLWorksheetLoader( oXLLoader, oPreviewDataset, LoadFileThread )
  Me.oWorksheet.Parse Me.oWorksheetFile

Catch oError As PreviewStopException
   ' Ignore this exception
End Try

Within StartElement trigger, when required:

if ReadEnough then
   Raise New PreviewStopException
end if

Greg_O · February 28, 2023, 7:26pm

You still might want to try it. I just did it here with a 1MB file and it took 5166 microseconds to read the file and 0.16 seconds to parse the first 10 lines.

Ian_Kennedy · March 1, 2023, 4:51pm

Well my exception method doesn’t seem to work. All the happens is that the exception is raised but there’s then a massive delay with nothing operational until file read is complete. Trying your scheme now. One slight wrinkle is that there is no EndofLine within the file so I’ll have to split it another way.

Rick_Araujo · March 1, 2023, 5:00pm

I’m not sure if a partially rendered XML can be used.

<main>
  <whatever>
      <wow>
      </wow>
  </whatever> ## BREAK
</main>

If you read until </wow>, it means that <main> is inconsistent and the entire parse should be considered a fail and should not be used.

Ian_Kennedy · March 1, 2023, 5:25pm

I was wondering about that also. That said. I need to read 10 rows of data. I can read the file until I hit </ row> of the 10th time. I can make the file complete by simply adding “</ sheetData></ worksheet>” (spaces added to make it visible in the forum) to the end of it. Everything should then be conformant.

I’ve just checked with a worksheet with 183000 rows in it and the file size reduces from 390MB to 25KB.

Rick_Araujo · March 1, 2023, 6:09pm

If you know the structure and how to hack it for a fake completeness, you could create a CreatePreviewXML(in As FolderItem, out As FolderItem)

in reads a XML and write into out (a temp file). Once enough data got in, you “complete” it with the necessary information and close both files.

Then you use “out” as the “in” to your PreviewParser(in)

Ian_Kennedy · March 1, 2023, 6:24pm

Yes, that was my plan. I don’t want to mess with multiple readers.

Rick_Araujo · March 1, 2023, 6:32pm

Maybe you need to annotate all open tags in order. And remove once they were closing. Once you reach your desired goal, you could just dump the remaining matching closing tags in order. So in case of receiving something “different” of the expected, with extra tags, the routine would, possibly, maintain it readable, not sure if consistent.

Ian_Kennedy · March 1, 2023, 6:35pm

No given the only item reading this is my code I’m safe just adding the close tags. It will work fine. It also prevents any issues with my EndDocument method.