This is one of those simple questions with a lot of valid approaches. But I’m looking for the most efficient way possible.
I have a large text file. Currently 90 MB and every 3-5 minutes a new line is added. My app needs to read the last line of this text file every 5 seconds.
Due to the frequency of the reads and the possibility of a collision (my app is reading the file at the exact moment that it’s being updated), I’m trying to be as efficient as I can with this. Looping through the file line-by-line with the ReadLine method is certainly not the right approach for this problem.
Off the top of my head, what I would test first would be getting the current file size in bytes, then opening it and set the TextInputStream BytePosition property to the total file length minus what you consider a reasonable maximum line length expected. Then read the reminder of the file into memory and split() into lines. If you don’t get at least two lines, then your estimated maximum line length was too low and reset the BytePosition somewhat earlier and try again.
One you get at least two lines in the split(), then what you want is the final element in that array.
If the file can have an extra line ending, you may want to need at least three elements in the array and walk the array backwards looking for a non-empty line.
Note that TextInputStream allows you to set a BYTE position, not CHARACTER position, so if you may be dealing with multibyte characters in the text take that into account when estimating what to use as a reasonable offset from the file size to start reading data.
Question: why would you read the file every 5 seconds, when a new line is only added every 3-5 minutes? Wouldn’t it be more efficient to just read the file when it has changed? I use functions in MBS to register for notifications when the filesystem changes, but then I already have a MBS license so I tend to not look for other alternatives when they have what I want plus examples of how to use it.
If you don’t have MBS, this can probably be done with declares that are OS specific. So be sure to mention what operating system(s) you need this to work on, as you posted in the General category instead of a specific platform.
Read it ONCE, keep track of the total size at that time, on the next read, use a binary stream, jump to the last known size, read “the rest”, keep track of this size for next time
Lather rinse & repeat
And if you can avoid that first read of the entire file
Maybe start 4K back from the end and read that
See if there’s more than one line in there
If so you don’t need to jump back any further
Pull “the last line” out of that and save the size as noted above
And now you’re off to the races reading just the last line(s) each time
These are great ideas, thanks everyone. I think you’re spot on. Read it using random access with maybe a 1K buffer to Split out CRLF into an array and take the final element, then keep track of filesize changes to know the byte position to seek to in subsequent reads.
Question: why would you read the file every 5 seconds, when a new line is only added every 3-5 minutes? Wouldn’t it be more efficient to just read the file when it has changed?
I was never able to get filesystem monitor API’s to function reliably. They’d work for a while and then randomly stop after some days or weeks until I restarted the application. This was on Windows though, haven’t tried with Mac, but the past experience made me wary of this approach. Using a timer to monitor the filesize should be fine for this project. A good idea though, to be sure.
Even considering this issue, why do you read the file every 5 seconds if you know each line is written at least 3 minutes later (as you described it: 3-5 minutes)? If you read it every 3 minutes (or even every 2:30, to be safe), you’ll waste less resources for the same result.