Read gzip or bzip2 compressed text file line by line?

Is there an existing mechanism for reading the contents of a gzip or bzip2 compressed text file in a line-by-line mode? Something like this:

theLine = gzTextInputStream.ReadLine

I considered uncompressing and then using the nroaml TextInputStream, but these files can be GB when uncompressed.

Anyone have something like that?

For bzip2, the linux command bzgrep (bzgrep linux command man page) can search inside a compressed file, so the open source code for that utility might yield ideas. I don’t think it can be line by line, but block by block; once you have the uncompressed block it would consist of lines you could read. I assume that is how bzgrep must work.

There’s also a zgrep utility for gzip files.

Thanks, @Bob_Grommes, that’s what I’m already doing, but the overhead is a bit steep in the shell.

I was hoping that Christian or Bjorn would have something like that in their plugins.

1 Like

For gzip take a look at my open source zlib wrapper:

Reading line by line is supported in “buffered reading” mode which is turned on by default:

  Dim f As FolderItem ' the gzip file to read
  Dim stream As zlib.ZStream = zlib.ZStream.Open(f)
  Do Until stream.EOF
    Dim line As String = stream.ReadLine()
  Loop
  stream.Close

I also have an open source wrapper for bzip2. I didn’t include the ReadLine method, but it could be copied from the zlib wrapper with a few minor changes (both wrappers use the same fundamental code for streaming decompression.)

7 Likes

If you use MBS Xojo Compression Plugin, you can use GZipFileMBS class to uncompress the .gz file in chunks.

Gzip is already in the framework, they just refuse to expose it:
#20404 - Make existing _GzipString and _GzipFile public methods

2 Likes