Parsing CSV file?

Greetings - Using Xojo 2019 R3.2

I am trying to parse a number of csv files. These files have a consistent format. They are from a BOM (Bill of Material) generator used by the circuit board layout tool KiCad. Two typical lines are:

“Date:”,“Friday, October 27, 2023 at 10:06:17 PM”
“Tool:”,“Eeschema (6.0.11-0)”

They come from a TextInputStream.ReadLine operation. The string viewer in the debugger shows each as a single line (expected since no internal newlines), quotes included. But split is going “bonkers” over the lines, using single commas as split delimiters (it never splits). Numbers spread sheet has no problems!

Internal to each line, it appears that the substring “,” may be a useful split delimiter. Trimming a leading quote from the first split element and a trailing quote from the last might seem to be the way to go but i am a bit hesitant to mess with all those quotes and perhaps loose the identity as a string. I also do not know how to specify “,” as a delimiter in split!

So, to that end, some suggestions would be appreciated.

Many Thanks
Jim Wagner, Oregon Research Electronics

Hi James, you may could replace the occurences of

“,”

to

“;”

and use ; as a separator for split.

best
Thomas

Thanks -

But how do I define the string “.” or “;” (quotes embedded in the string). for the replacement operator? Do you double the quotes or something like that?

Jim

Why do you reinvent the wheel? See GitHub - npalardy/CSVParser: Event based and recordset like CSV parser for Xojo .

7 Likes

You can use Chr(34) for ".
“,” would become Chr(34) + "," + Chr(34)
You can also replace the , with Chr(44).
It then becomes Chr(34) + Chr(44) + Chr(34)

what happens if there is a ; somewhere inside a csv field ? a comma between quotes ?

Whatever delimiter you use, it can occur inside of quotes
When that happens, you get more fields than you expect.
Norm’s CSV parser, linked above, will handle this for you.

In my experience, attempts to simply split on the delimiter will fail.
Quotes mark the start and end of ‘ignore delimiter’… any inside the quotes must be ignored.

3 Likes

I wrote the method below to properly parse CSV data (even if the delimiter appears within quotes). It even handles the old “#” delimiter (what some systems used to use to indicate a date), and can be easily updated to handle any other adhoc scenarios. You don’t need to buy any plugins, etc.

Var outAllLines() As Dictionary

Var csvLines() As String = csvData.Split(EndOfLine)
For Each cLine As String In csvLines
  Var outLine As Dictionary = New Dictionary
  Var curField As Integer = 0
  Var curChar As Int64 = 0
  Var escapeChar As Int64 = 0 // The \ backslash is the CSV escape character. This allows quotes to be embedded in strings by escaping them: \" Therefore, we must ignore ANY AND ALL ESCAPED CHARACTERS.
  Var quoteChar As Int64 = 0 // When a quote has been opened, we must ignore the field delimiter until the quote has been closed.
  Var curData As String = ""
  For Each cChar As String In cLine.Characters
    curChar = curChar + 1
    If cChar = fieldDelim And quoteChar=0 And escapeChar=0 Then // Output current field
      // ********************************************************************************
      // If the data has the old-and-uncommon number-sign delimiters around date data, then simply remove them
      // ********************************************************************************
      If ( curData.Left(1) = "#" And curData.Right(1) = "#" ) And curData.Length > 6 Then
        curData = curData.Middle(1,(curData.Length-2))
      End If
      // ********************************************************************************
      outLine.Value(curField.ToText) = curData
      curData = ""
      curField = curField + 1
    ElseIf cChar = """" And quoteChar=0 And escapeChar=0 Then // If this is an opening quote, then handle as such
      quoteChar = curChar
      Continue
    ElseIf cChar = """" And quoteChar<>0 And escapeChar=0 Then // If this is a closing quote, then handle as such
      quoteChar = 0
      Continue
    ElseIf cChar = "\" And escapeChar=0 Then // If this is an opening escape charcter, then handle as such
      escapeChar = curChar
      Continue
    Else
      // If all else fails, simply add the character to the current field data
      curData = curData + cChar
      escapeChar = 0
    End If
  Next cChar
  If curData<>"" Then // Add last field to output array
    outLine.Value(curField.ToText) = curData
  End If
  // Now add the entire outLine as a new element in the outAllLines array
  outAllLines.Add(outLine)
Next cLine

Return outAllLines
1 Like

if you have endoflines inside quotes pretty sure your code will hang.
it happens quite often when you import csv files frequently…

also if you import on a mac a csv file generated from a windows the endofline will not be the same
you cannot use a split function in a generic csv parser, you must parse every character one by one.

1 Like

Norm’s parser works perfectly.

3 Likes

Was not aware of Norman’s CSV parser and will take a look at that. I will look at your other suggestions also. The key issue in this case seems to be the use of the ASCII double-quote character within the string. The reason for doing that is also pretty obvious; the date string in the first example containing several commas within one field shows it very well.

Folks - I hate to admit this, but i took a look at the git repository that Beatrix linked. First, I don’t know git and second, everything looks strange. I’ve never used a third party library and I have NO CLUE what to do!

Thanks to you all -

Jim

Did you click on the green button at the end of @Beatrix_Willius 's link and select Download Zip ?

Looking at the download, after unzipping, it looks as if (sort of unfortunately) Norman saved the sample project not as a binary project, so the folder is cluttered with bits of it. But I expect if you open the SimpleCSVSample you’ll get to see how it works. I imagine that prolly the stuff you want is in the two folders. If that’s the case then just copy/paste them into your project.

Yes, just open the sample project and copy the CSVParser Classes folder to your project in the IDE (and the CSVWriter folder if you want or need that). Drag a GenericObject from the Library to your window, set its super to CSVParser, and implement code in its events.

I will try that.

Thanks
Jim

You are correct in identifying the weaknesses. But as I said, this can be easily enhanced to satisfy those requirements if necessary, and the user can control the code, without the need to spend time or money investigation addons or plugins.

I tried PaulS’ simple character substitution using split. That did not work.

Then I tried the same after trimming the leading & trailing quote symbols from the string. That DOES work. I am guessing that the leading quote symbol messes up split.

Since this is a captive application and this format is the only CSV format it will ever see, I am going to go with PaulS’ simple solution. BUT I also now know about Norman’s CSV processor and at least the beginnings of how to use it.

Thanks to all of you who made suggestions. I learned a lot today.

Jim

2 Likes

Here’s another option. In the 2018 “Just Code Challenge” I posted a CSV file editor project. It correctly reads and writes CSV files, even when the fields contain the special characters: commas, endofline characters and quotation marks. Fields don’t have to be quoted unless they contain one of the above special characters.

The CSV input parsing is performed by the ImportCSV function which is quite short and well commented.

Thanks! I will certainly take a look.

Jim

Download link isn’t working for me.