Troubles reading a csv (text) file

This text have been created yesterday and I add more data + relevant code today, minutes ago (10:00).

You may skip part 1 and have a look at the code.
The code as is works fine provided you load a two fields / line csv files, but have a display bug if the file have more than two fields / row (and I only display two fields from each Row).

I can reject the user selected csv file if I found more than two fields in each row, but … I want to understand why a weird display occured in this circumstance (my bug or Xojo bug).

Nota: I shut down, wait a minute, power on the MacBook pro, run Xojo (only) run the application in the IDE and saw the bug.
Then, I shutdown Xojo, power off the computer.
I powered on the computer, clear the trash, runx Xojo, open the project, run it in the IDE and found the same bug.

Remember: this code (a bit different / more lines ferom the original / some non relevant lines changed) displays the same bad behavior than the one I have in my project.
Specifically, I added two new variables to hold the string length - for Cell(0), Cell(1) - removed the try part, removed the core text line reading since the but appears since I read line 1, etc.

At last, I am pleased using two quotes a field to enclose each field and the comma as field delimiter.

Take this as a quizz: who have a bug ? My code (me) or Xojo ?

[yesterday] Not really a quiz (I do not have the answer, yet), but here I go:

think a proprietary application that loads one field a line .txt files or two fields a line .csv file (comma separated and quote enclosed).

Time ellapsed and you are near the end of releasing it.

“Now is the time” to make intensive testings and be creative.

After some days, I am really happy and think it is time to end writting the user’s manual.

BTW: what if I get some csv files from the internet and check these ? Of course, testing with my own csv files is not enough. I was happy to test my application about weird csv files. So, let’s go !

Lord, oh my Lord ! Why ? Why do I found a bug at this stage ? I loose the last character of the second field on these csv files. Why ?

Now the quest. Before going to the code in the project, I am trying to understand what’s wrong in that file, what this file have that mine does not ?

The only difference is the presence of some Return character and… they have more than two fields in each line.

It does not matter: I just have to add a third field ! I started to add a third heading field, load the file and everything is OK.

That is not enough: I added another third field, but this time in the core data (first line only): I open the file and… I MISS ThE LAST CHARACTER OF THE SECOND COLUMN ! (.ra instead of .rar is not correct at all !).

Now the quest search starts.

BTW: the code…

In a loop, I read the .csv text file, one line after another. I use two NthFields to get strings for columns 0 and column 1, leaving the rest of the line where it was. After all, how a code that works perfectly for a line with two fields will make strange results ?

What is your answer ?

In the mean time (now to the moment I send this text), I will try to understand the why of the thing.

Notes:
Adding a third field in the Heading does not reveal an error.
Adding a third field in the first line reveal an error.
Removing the third field in the Heading does not remove the error.

[today, 10:00] Last found bit of information:

the heading(1) string is truncated by 1 character if the csv file have more than 2 fields / line. No need to load more than line 1 to get the bug.

The relevant code is:

[code] Dim ImportFI As FolderItem
Dim ImportTIS As TextInputStream
Dim ALine As String
Dim LocRow As Integer
Dim TheURL As String
Dim TheName As String
Dim URLLen As Integer
Dim NameLen As Integer
Dim MyDelimiter As String

MyDelimiter = Chr(34) + “,” + Chr(34)

// Get a csv file
ImportFI = GetOpenFolderItem(“text/csv”)

// a. Get A TextInputStream Reference and set the encodings to UTF-8
ImportTIS = TextInputStream.Open(ImportFI)
ImportTIS.Encoding = Encodings.UTF8

// b. Get one line of text from the csv text file
ALine = ImportTIS.ReadLine

// c. Compute Cell(0) string and its length
TheURL  = NthField(ALine, MyDelimiter, 1) + Chr(34)
URLLen  = Len(TheURL)

// d. Compute Cell(1) string and its length
TheName = NthField(ALine, MyDelimiter, 2) + Chr(34)
NameLen = Len(TheName)

// e. Fills the header
LB.Heading(0) = Mid(TheURL,2,Len(TheURL)-2)   // Heading(0)
LB.Heading(1) = Mid(TheName,1,Len(TheName)-2) // Heading(1)

Return[/code]

The csv file text can be:

"Head 0","Head 1","Head 2" "Line 1","Line 1","Line 1" "Line 2","Line 2","Line 2" "Line 3","Line 3","Line 3"

To get the correct results, just remove the last field to get only two by line:

"Head 0","Head 1" "Line 1","Line 1" "Line 2","Line 2" "Line 3","Line 3"

BTW: my external csv testing file (taken from the internet) have far more than three fields a line, but these are enough to display the bug.

The bug is securely in your own code. Step through the debugger to find it. I mean, really, you do not even seem to be trying. Your code assumes more than 2 columns. It is guaranteed to fail with 2 columns because of the assumptions you have made. Really, the debugger is your friend here. Step through it and examine all the pieces. In particular, look at TheName. You should see immediately why your code fails.

[quote=181065:@Emile Schwarz]The csv file text can be:

“Head 0”,“Head 1”,“Head 2”
“Line 1”,“Line 1”,“Line 1”
“Line 2”,“Line 2”,“Line 2”
“Line 3”,“Line 3”,“Line 3”[/quote]

I saved this text as table.csv on Desktop and load it with this code

  dim f as FolderItem = SpecialFolder.Desktop.Child("table.csv")
  dim tis as TextInputStream
  dim st() as string
  dim s as string
  
  if f <> nil and f.Exists then
    lb.DeleteAllRows
    tis = TextInputStream.Open(f)
    tis.Encoding =  Encodings.UTF8
    s = tis.ReadAll
    st = split(s, EndOfLine)
    For i as integer = 0 to Ubound(st)
      lb.AddRow
      lb.Cell( i , -1) = st(i).ReplaceAll(",", chr(9))
    next
    tis.Close
    lb.Heading(-1) = lb.Cell( 0 , -1)
    lb.RemoveRow(0)
  end if

Hi Axel,

nice code, thank you.

Troubles will arise if unlucky enough, a comma is set inside of the fields. That is why I use as delimiter “,”: chr(34) + “,” + Chr(34).

Also, the code I shared (above) works fine if there is only two columns (this is my prerequisite), but do not show the last character inColumn(1) if there is more than two columns in the loaded file.

I will wait 'till tomorrow or Monday and try to think then why I have this strange behavior.

Hi Tim,
thank you for your answer.

My code is meant to read only the two first columns (I leave the eventual other not needed columns) and it works fine as is with two columns csv text files.

I have tried with the original project code, but saw nothing. That is the reason why I stripped my code to what I shared. I may need another good night or two to understand what is wrong in my code.

With a two columns csv file, the code below works fine.
LB.Heading(1) = Mid(TheName,1,Len(TheName)-2)

If I use -1 (instead of -2), I get the ending double quote with a two columns csv file, and as is, I miss one character with a three (or more) columns csv file.

BTW: I even tested with the csv file contents in a string (to eliminate any error from File I/O operations). I may see a tree that hides (to me) the forest ;-:).

if you have no (or need no) " in the text you can replace

lb.Cell( i , -1) = st(i).ReplaceAll(",", chr(9))

with

lb.Cell( i , -1) = st(i).ReplaceAll(""",""", chr(9)).ReplaceAll("""","")

or better use tab as a delimiter (easier to read and write to files)

Assuming you can not insist on a tab delimited file, here is tested and working code that will read a CSV file and use UP TO 3 fields from it.
(you can adapt it to other numbers of fields yourself)

I have adapted it from code earlier in this thread…

Start a project.
Add a button and a listbox.
Call the listbox ‘listbox1’, and set it to have 3 columns.

Add this code to the button action event

  dim f as FolderItem = SpecialFolder.Desktop.Child("table.csv")
  dim tis as TextInputStream
  dim st() as string
  dim s as string
  dim scopy as string
  dim cols as integer
  dim lup as integer
  dim thischar as string
  dim bfoundstartquote as boolean

  
  if f <> nil and f.Exists then
    listbox1.DeleteAllRows
    
    tis = TextInputStream.Open(f)
    tis.Encoding =  Encodings.UTF8
    
    while not tis.EOF
      scopy = ""
      s = tis.ReadLine
      bfoundstartquote = false 'reset in case previous line leaves it undefined
      
      for lup = 1 to len(s)
        thischar = mid(s,lup,1)
        db = asc(thischar)
        if thischar = chr(34) or thischar = "“"  or thischar = "”" then
          bfoundstartquote = not(bfoundstartquote)
          thischar = ""
        end if
        
        
        
        'hide the embedded comma/s for the moment
        if thischar  = "," then
          if bfoundstartquote then
            thischar = "~"
          end if
        end if
        
        'build up a copy
        scopy = scopy + thischar
        
      next
      
      
      st = split(scopy,",")
      
      
      listbox1.AddRow
      for cols = 0 to 2
        try
          'fill listbox and put embedded commas back at the same time
          listbox1.cell(listbox1.lastindex,cols) = replaceall(st(cols),"~",",")
        catch
          'may not have been 3 columns
        end try
        
      next
    wend
    tis.close
    
    
  end if

Create a file on your desktop, called table.csv

Paste this text into it.
You will see that it deliberately tries to break the code: some rows have one field, some have embedded commas, and the commas can be ‘smart quotes’ or chr(34)

[quote]
Normal,”Quoted,watch!”,This one is not quoted
One,Two,Three
Apple,Banana,Cherry,Fruitcake,Smoothie
One field only
“This,is,a,test”,”normal”[/quote]

Errata: the commas can be ‘smart quotes’ or chr(34) -> the quotes can be…

As Tim mentioned above if you step through your code with the debugger your problem is obvious. Once you see what you do not expect there look at your code and you will understand why it is not what you expect.

Another option is to do each step on a piece of paper. Write down exactly what your code will return - not what you expect to return.

Your code is written for a very specific input pattern and you have introduced a change to the pattern.

I fall into this trap fairly often and after a period of frustration I take a deep breath and step through the code. Usually I end up kicking myself for not doing it earlier. It is easy to assume that what works in one case will work in all cases.

[quote=181236:@Peter Fargo]As Tim mentioned above if you step through your code with the debugger your problem is obvious. Once you see what you do not expect there look at your code and you will understand why it is not what you expect.

Another option is to do each step on a piece of paper. Write down exactly what your code will return - not what you expect to return.

Your code is written for a very specific input pattern and you have introduced a change to the pattern.

I fall into this trap fairly often and after a period of frustration I take a deep breath and step through the code. Usually I end up kicking myself for not doing it earlier. It is easy to assume that what works in one case will work in all cases.[/quote]
Hi Peter,

Thank you for your kind answer (your experience). I understand it.

During the last two days, I wanted to make a full report of what I’ve done in a TextArea, but I do not had time (what you tell me to write on a paper). I suspect that when there is three or more fields a Line, the conditions of the code running is different and thus the results I get. I think that I have to add a line in the code to check how many fields a line I get and issue a warning to the user. If (s)he want to continue, work accordingly (take into account that I run under a different conditions and treat the second field differently if there are more than two fields).

Not yet resolved, but I feel that I am in the good direction now.

Thank you all.

Below - note the brackets surrounding your delimiter

2 field .csv with “,” as delimiter results: “Head0[”,"]Head1"

3field .csv with “,” as delimiter results: “Head0[”,"]Head1[","]Head2"

Head 1 will have no trailing quote because it has a delimiter on both sides. This will be true for any field in a 3 or more field line except the first and last which only have 1 delimiter

Hope this makes it clear

You do not need to warn the user. In your code test for the fieldCount and act accordingly

YES MAN !

This is 100% clear. Thanks.

This is because of my specific design for this special use.

I used this design elsewhere, but I load all fields and deals with the end of line delimiter special case (certainly, I do not recall, but I do not had the trouble, so…).

I’ve made the error when I stripped down the original code and continue in my erroneous design.

Parsing CSV is NOT easy. There are many special cases you must take into account. Split and NthField are NOT sufficient for the task. I really recommend Norman’s CSV parser. Writing your own (which I have done in the past) is a time consuming and error-prone endeavor.

Tim:

I am (sometimes) naive.

I surround all fields using a quote. At read time, I get each field, remove the leading and ending quotes and display eveything inside. Nothing is escaped since I think I display everything in the defined field in the file.

I get an eye on Fargo screen shot and this describe exactly what I think; I use Mid to remove the leading and ending quote and get what I want. (of course when the project is set to read all fields, not this project).

I certainly forgot something that explain why you wrote “ There are many special cases you must take into account.” Care to give one or two ?
(excepted many lines in a field)

Have you tried the code I wrote and posted for you?

Yes.