@Mike_Linacre1: Try something like the following with your Text File please:
Var f As FolderItem
Var textInput As TextInputStream
f = FolderItem.ShowOpenFileDialog("text/plain")
If f <> Nil Then
If f.Exists Then
textInput = TextInputStream.Open(f)
Var strInput As String = textInput.ReadAll(Encodings.UTF8)
strInput = strInput.ReplaceLineEndings(EndOfLine.UNIX)
Var sArray() As String
sArray = strInput.Split(EndOfLine.UNIX)
If sArray.Count > 0 Then
Catch e As IOException
MessageBox("Error accessing file...")
*The above Code was taken in parts from the Documentation…
Thanks everyone. I generated many similar input files and looked at them with a hex editor. Every line looks the same except for different numbers. They all behaved the same. And yes, .ReadAll would work for these small files, but they are test data for the real files which can be several GB large.
Here is the code, omitting all the irrelevant processing of the input line.
var fitem as folderitem
fitem = getfolderitem(fname)
if fitem<>Nil then
If fitem.Exists Then
var fstream as textinputstream
fstream = textinputstream.Open(fitem)
fstream.encoding = Encodings.UTF8
while not fstream.EndOfFile then
var s as string = fstream.ReadLine
It is strange because this code has been used in production for several months without problem, then suddenly this occurred.
var f as new folderitem("C:\Users\Rick\Desktop\txt5000\txt5000x60.txt")
If not f.exists then
MessageBox "txt not found"
var fs as textinputstream = textinputstream.Open(f)
fs.encoding = Encodings.UTF8
var numLines As Integer = 0
Do until fs.EndOfFile
var s as string = fs.ReadLine
If s.Length = 0 Then Continue // Ignore empty ones
numLines = numLines + 1
If s.Length <> 60 Then
MessageBox "Error! line(" + numLines.ToString + ") " + s
MessageBox "Done OK. Read " + numLines.ToString + " data lines"
One thing comes to mind. This above code is slower than it needs to be and also could be causing an issue in terms of memory utilisation. By declaring the variable inside the while you are destroying and recreating it every time the loop iterates. Not only is this slowing the code down, it could be causing an issue for the memory manager.
The following code will only crate the string variable once and then change its contents each time the loop iterates.
var s as string
while not fstream.EndOfFile then
s = fstream.ReadLine
It should be faster and has less chance of old strings not being cleared up during the loop. The old code should be perfectly safe, but, perhaps something is failing to free up the memory and resulting in problems? Just a thought. If it does solve the problem then it would seem like a bug in Xojo that would be worth reporting.
Another question comes to mind. Do you have access to a hexdump program. I know there is one as standard on Mac and Linux, not sure about Windows. If you use it on your files you should be able to spot any “odd characters” within the file that could cause issues. Given your original post suggested that the issue occurs in a different place for each run the first option is perhaps more likely. [Actually you say you’ve done this]
I suppose another option is that the hard drive is starting to fail and giving bad data from time to time.
An user code bug, referencing an out of scope variable, would end as an error at compile time. So no runtime errors. As you said, internalizing the instantiation in the loop could penalize speed… a tiny bit. Just it.
Hard drive / SSD unrecoverable failures would rise exceptions. If recoverable, it would be just ok.
His source data is damaged. The events that produced it made it this way. It’s not a reading problem.
I would love to live in such a world. But I’ve seen hard drives, when failing, do the most odd things over the years. Including non-deterministic results for a given sector. Times where it reads and reads a sector attempting to get a result and then “succeeds” in reading but with somewhat random results. Yes, unrecoverable failures would result in an exception, however, “recoverable” issues, that still fail to read correctly can occur in the early stages of drive issues.
We have seen other issues where memory cleanup within a loop fails to take place until the loop is exited. That was a problem with Date objects not being destroyed until the loop ends. This “could” be happening here, but I agree it would be a bug and not normal.
And yet the description says he’s looked at the data with a hex editor and it is OK. Also the original description is random failures at different points with each run. Data format issues would not produce that result?
@Mike_Linacre1 For clarity. Does a single file process in the same way each time or does it change each time you run?
Yes, but, since you’re in a debug stage anyway, you could try that suggestion and inspect the resulting array. If you have less entries in the array than lines in your files, you’ll know the error doesn’t relate to the use of ReadLine. You could then even examine the entries whose length is greater than the average and spot the difference.
If you see as much entries in the array as you have lines in the file, the issue would logically point to the use of ReadLine. Definitively worth trying the suggestion of ReadAll and split, for this debug phase, does it not?