Efficiently reading the last line from a text file

Christian_Wheel · May 19, 2022, 1:08am

This is one of those simple questions with a lot of valid approaches. But I’m looking for the most efficient way possible.

I have a large text file. Currently 90 MB and every 3-5 minutes a new line is added. My app needs to read the last line of this text file every 5 seconds.

Due to the frequency of the reads and the possibility of a collision (my app is reading the file at the exact moment that it’s being updated), I’m trying to be as efficient as I can with this. Looping through the file line-by-line with the ReadLine method is certainly not the right approach for this problem.

How would you tackle this?

Cheers and thanks in advance for any ideas.

Douglas_Handy · May 19, 2022, 1:52am

Off the top of my head, what I would test first would be getting the current file size in bytes, then opening it and set the TextInputStream BytePosition property to the total file length minus what you consider a reasonable maximum line length expected. Then read the reminder of the file into memory and split() into lines. If you don’t get at least two lines, then your estimated maximum line length was too low and reset the BytePosition somewhat earlier and try again.

One you get at least two lines in the split(), then what you want is the final element in that array.

If the file can have an extra line ending, you may want to need at least three elements in the array and walk the array backwards looking for a non-empty line.

Note that TextInputStream allows you to set a BYTE position, not CHARACTER position, so if you may be dealing with multibyte characters in the text take that into account when estimating what to use as a reasonable offset from the file size to start reading data.

Question: why would you read the file every 5 seconds, when a new line is only added every 3-5 minutes? Wouldn’t it be more efficient to just read the file when it has changed? I use functions in MBS to register for notifications when the filesystem changes, but then I already have a MBS license so I tend to not look for other alternatives when they have what I want plus examples of how to use it.

If you don’t have MBS, this can probably be done with declares that are OS specific. So be sure to mention what operating system(s) you need this to work on, as you posted in the General category instead of a specific platform.

Jay_Madren · May 19, 2022, 2:04am

Courteous of Norman:

DONT use a text input stream

Read it ONCE, keep track of the total size at that time, on the next read, use a binary stream, jump to the last known size, read “the rest”, keep track of this size for next time

Lather rinse & repeat

And if you can avoid that first read of the entire file
Maybe start 4K back from the end and read that
See if there’s more than one line in there
If so you don’t need to jump back any further
Pull “the last line” out of that and save the size as noted above
And now you’re off to the races reading just the last line(s) each time

Kem_Tekinay · May 19, 2022, 2:09am

What platform or platforms?

Christian_Wheel · May 19, 2022, 2:39am

What platform or platforms?

This will be on Mac.

These are great ideas, thanks everyone. I think you’re spot on. Read it using random access with maybe a 1K buffer to Split out CRLF into an array and take the final element, then keep track of filesize changes to know the byte position to seek to in subsequent reads.

Question: why would you read the file every 5 seconds, when a new line is only added every 3-5 minutes? Wouldn’t it be more efficient to just read the file when it has changed?

I was never able to get filesystem monitor API’s to function reliably. They’d work for a while and then randomly stop after some days or weeks until I restarted the application. This was on Windows though, haven’t tried with Mac, but the past experience made me wary of this approach. Using a timer to monitor the filesize should be fine for this project. A good idea though, to be sure.

Kem_Tekinay · May 19, 2022, 2:47am

What about an interactive Shell that calls tail -f?

Jay_Madren · May 19, 2022, 2:56am

I thought about that but I usually try to stick with cross-platform solutions unless the request specifically states a single platform only (or in this case, Mac & Linux).

Christian_Wheel · May 19, 2022, 3:04am

This is brilliant. I was completely unfamiliar with this command. Thanks for the tip!

Whole thing’s already done in like 4 lines of code. Amazing.

Arnaud_N · May 19, 2022, 2:52pm

Even considering this issue, why do you read the file every 5 seconds if you know each line is written at least 3 minutes later (as you described it: 3-5 minutes)? If you read it every 3 minutes (or even every 2:30, to be safe), you’ll waste less resources for the same result.

Christian_Wheel · May 19, 2022, 6:57pm

It’s updating the text of the currently playing song of a radio station. If I wait 3 minutes, the old song will still be displayed that whole time.

Arnaud_N · May 19, 2022, 8:35pm

Ah yes, fair point.