Chunk Algorithm Question

Shouldn’t that be tArray = as you get an array on the right, not a string.

Yes, but this was a typing mistake :slight_smile:
I created a small Demo-Project to show you the error. Copy “Test.txt” to your Documents Folder!

Found this “Xojo.IO.TextInputStream never returns True for EOF” <https://xojo.com/issue/35713>.

Unfortunately, there still appear to be quite a few bugs in the new framework. I’ve started a couple of new projects using the new framework, and ended up having to go back to the old one, because there were too many problems. Eventually, the new framework will be reliable, but it’s not there yet.

The code should handle that situation correctly. When this situation occurs, it will result in a null string in the last array element. This null string gets appended to the start of the next file read with no ill effect.

Add this line

if input.BytePosition + chunkSize > f.Length Then Exit while

after

While Not input.EOF

OK, EOF works fine. I made some tests. Looks like the Feedback Case above was closed. It is Xojo.IO.TextInputStream.Read Method. While the old framework hasn’t any problem reading the file also if the “numberOfBytes” (Parameter) is bigger then the file, the new Framework crashes.

OLD Framework - this works fine:

[code]Dim f As FolderItem = SpecialFolder.Documents.Child(“Test.txt”)

If f.Exists Then

Dim input As TextInputStream = TextInputStream.Open(f)
Dim line As String = input.Read(15000)

MsgBox line
End If[/code]

NEW Framework - this crashes:

[code]Using Xojo.Core
Using Xojo.IO

Dim f As FolderItem = SpecialFolder.Documents.Child(“Test.txt”)

If f.Exists Then

Dim input As TextInputStream = TextInputStream.Open(f, TextEncoding.UTF8)
Dim line As Text = input.Read(15000)

MsgBox line
End If[/code]

Or use this instead,

While (Not input.EOF) And (input.BytePosition + chunkSize <= f.Length)

[quote=367830:@Asis Patisahusiwa]Or use this instead,

While (Not input.EOF) And (input.BytePosition + chunkSize <= f.Length) [/quote]
Thanks Asis, this is nice to “fix” it. Works well!

Does that correctly read the final partial chunk?

Why don’t you use Input.ReadLine?
That way each read gives you a complete line.

  Dim Input As TextInputStream = TextInputStream.Open(f, TextEncoding.UTF8)
  
  Dim tx As Text
  While Not Input.EOF
    Dim tx As Text = Input.ReadLine
    // Process tx as a complete line
  Wend

No, but you can modify the code to use this

While Not input.EOF

    ReDim tArray(-1)

    dim ch as UInt64 = chunkSize
    If input.BytePosition + chunkSize > f.Length Then
            ch = f.Length - input.BytePosition // -1?
    End If
    tArray = input.Read(ch).Split(&uA)

Okay, that’s what I thought. I was going to suggest:

dim bytesToRead As Integer = min(f.length-input.BytePosition,chunkSize)

What do you mean Asis and Robert? How does the whole method will look like with your annotations above?
And how can we integrate an global Integer-Variable to show the Progress of reading (for a Progressbar)?

[quote=367833:@Jim Shaffer]Why don’t you use Input.ReadLine?
That way each read gives you a complete line.[/quote]
But then I can’t use a chunk method, because t will reading the whole file.

[quote=367838:@Martin T]What do you mean Asis and Robert? How does the whole method will look like with your annotations above?
And how can we integrate an Integer-Variable to show the Progress of reading (for a Progressbar)?[/quote]
Your bug is caused by input.Read(chunkSize) where input.BytePosition + chunkSize is larger than file size/length.

But, if you use

While (Not input.EOF) And (input.BytePosition + chunkSize <= f.Length)

then the last part will not be fetched/read.

For the progress, you can get the values of input.BytePosition and f.length
Set the maximum value

ProgressBar1.Maximum = f.length

and current progress

ProgressBar1.Value = input.BytePosition

[quote=367835:@Asis Patisahusiwa]No, but you can modify the code to use this

[code]
While Not input.EOF

ReDim tArray(-1)

dim ch as UInt64 = chunkSize
If input.BytePosition + chunkSize > f.Length Then
        ch = f.Length - input.BytePosition // -1?
End If
tArray = input.Read(ch).Split(&uA)

[/code][/quote]
This code means your last part of the file content is probably smaller than chunkSize.

Essentially, you ensure that the last file read reads exactly the remaining number of bytes if it’s less than chunkSize. So, the routine will still do all of the reads in chunkSize chunks, except the last one .

[code]Public Sub ReadChunks(txtIn as TextInputStream, f As FolderItem)
'Reads multiple lines of text in large chunks
Const chunkSize=5000
dim residual As String = “”
While not txtIn.EOF
'Figure how big a chunk to read
dim bytesToRead As Integer = min(f.length-txtIn.PositionB,chunkSize)
dim tArray() As string = split(residual+txtIn.Read(bytesToRead),EndOfLine)
dim nLines As Integer = UBound(tArray)-1
residual=tArray(nLines+1) 'save incomplete line at end of current chunk for next read
for i As integer = 0 to nLines
'Process complete lines of text here
Next
Wend
'If file doesn’t end with an EndOfLine, then the residual string
'will contain the final partial line which needs to be processed.
if residual<>“” then
'Process final incomplete line of text here
end if

'Because the residual string is prefixed to front of txtIn.Read this should
'restore the possibly split EndOfLine in Windows (CR + LF), so that it should
'work correctly on all platforms.
End Sub
[/code]

Sorry, I’m still posting in old Framework code. I can’t figure out how to implement the new Framework Split function without getting all kinds of type mismatch errors.

Also, since we’ve determined the source of the problem to be reading past the end of file, I don’t think there’s any need to include a redim statement.

Thanks Robert, thanks Asis!

Please correct me if I’m wrong, wouldn’t it be save memory to use the ReDim Statement instead of declare the Array within each loop again?

Either way, Xojo’s memory manager will recover the memory from the old abandoned array. The question is whether it’s done more quickly or not when using redim. Maybe one of the Xojo people can comment on that.

One more question, what if the ChunkSize > FolderItem.Length? With the code below, the Progressbar doesn’t update:

Dim chunkSize = Min(FolderItem.Length, 5000)

If the file is that small, then it will probably get processed fast enough that you don’t need an accurate progress indication. In that case, I would just set the progress bar maximum value to 0 so that it becomes an indeterminate progress bar (barber pole effect), and leave it at that.