improve reading of a file char by char

Hi,

for a parsing I need to read a text file (up to some MB) char by char and appending the char to a string when some conditions are satisfied.
At present I save the whole text file in a string and I loop on in using Mid(i, 1). The code below (a standalone and simplified version of what I use) works but it is extremely slow. As you can test by yourself, using the attached input file, it takes ~30 s to process a 150 kB file (!).

How can I improve the speed? Ideas?

Thanks.

Test file: https://www.dropbox.com/s/ths009g6esr0b8c/testFILE.txt?dl=0

  #pragma disableBackgroundTasks
  
  Dim time As Double = Microseconds
  
  Dim f As New FolderItem("testFILE.txt", FolderItem.PathTypeNative)
  if f = Nil OR f.Exists = false then Return
  Dim t As TextInputStream
  t = TextInputStream.Open(f)
  t.Encoding = Encodings.UTF8
  Dim mInput As String = t.ReadAll
  
  Dim rg as New RegEx
  Dim myMatch as RegExMatch
  rg.SearchPattern = "\\s+"
  rg.ReplacementPattern = " "
  rg.Options.ReplaceAllMatches = True
  mInput = rg.Replace(mInput)
  
  Dim c As String
  Dim entry As String
  Dim found As Boolean = False
  Dim paren As Integer = -1
  Dim quotes As Integer = -1
  
  
  for i As Integer = 1 to mInput.Len
    c = mInput.Mid(i, 1)
    
    if c = "@" then 
      if not found then
        // here I initialize a new object
        found = true
        paren = -1
        quotes = -1
      end if
      entry = c
      continue
    end if
    
    if c = """" then
      if quotes = -1 OR quotes = 0 then
        quotes = 1
      elseif quotes = 1 then
        quotes = 0
      end if
      entry = entry + c
      continue
    end if
    
    if c = "{" then
      if paren = -1 then
        paren = 1
      else
        paren = paren + 1
      end if
      entry = entry + c
      continue
    end if
    
    if c = "}" then
      if paren = -1 then
        Return 
      else
        paren = paren - 1
        if found AND paren = 0 AND (quotes = 0 OR quotes = -1) then
          found = false
          // here I add the object to an array
          entry = ""
          continue
        end if
      end if
      entry = entry + c
      continue
    end if
    
    if c = "," then
      if found AND paren = 1 AND (quotes = 0 OR quotes = -1) then
        // here I fill the object
        entry = ""
        continue
      end if
      entry = entry + c
      continue
    end if
    
    entry = entry + c
  next
  
  MsgBox Str((Microseconds-time )/1000000) + " seconds"

There may be a difference depending on the running OS.

Then, as you discovers, string contatenation is slow, very slow.

Clue: use arrays to deals with characters:

at load time, place the whole file into an array, then do your stuff character after character and place the results into another array.
When the work is over, transform the array contains into a string and …

Split and Join are your friends here.

[quote=279571:@Davide Pagano]for a parsing I need to read a text file (up to some MB) char by char and appending the char to a string when some conditions are satisfied.
At present I save the whole text file in a string and I loop on in using Mid(i, 1). The code below (a standalone and simplified version of what I use) works but it is extremely slow. As you can test by yourself, using the attached input file, it takes ~30 s to process a 150 kB file (!).

How can I improve the speed? Ideas?[/quote]

How do you append to the string ? Using an array and join() at the end can be a lot faster.

Thank you both for the suggestion of using array. However I have a doubt:

what do you mean to put the whole file into an array? Should I split it to have one char for each element of the array?

Yes. It will be way faster to access than Mid, and join is very much faster as well than appending. You should see a dramatic improvement.

Ok many thanks.

Come back here if you still have troubles

wow, from ~30 s to ~1.8 :slight_smile:

thanks again.

Michel wrote: [quote=279620:@Michel Bujardet]dramatic improvement.[/quote]

Now, you know he is right.