Sorting through data in .csv file

Meade_Lewis · April 7, 2015, 3:14pm

I am curious how one would read the information in a .csv file.

So far I have:

[code] Dim dlg As OpenDialog
Dim openfile As FolderItem
dlg = New OpenDialog

dlg.InitialDirectory = SpecialFolder.Desktop

dlg.title = “Select a file from _______________ to upload” //Deleted for security reasons

Dim csvType as New FileType
csvType.Name = “CSV File (*.csv)”
csvType.MacType = “TEXT”
csvType.Extensions = “csv;.csv”

dlg.filter = csvType

openfile = dlg.ShowModal

if openfile <> Nil Then
dim lines() As String
dim txtin as TextInputStream
txtin = TextInputStream.Open(openfile)

else
MsgBox(“No file was selected”)
end[/code]

Now how do I set it to import specific text to specific text fields?

Thank you.

Roger_Clary · April 7, 2015, 3:31pm

dim myArray() As String = split(txtin.readAll, ",")

will create an array of your csv contents delimited by the comma. You will need to determine which element of the array goes where.

Scott_Griffitts · April 7, 2015, 3:39pm

Assuming you don’t have commas in the file which are not delimiters, typically in a string enclosed in quotes.

Norman_P · April 7, 2015, 5:15pm

Dont split just on commas - they can be much more complex that that
You could grab my CSV parser from http://www.great-white-software.com/REALbasic_Code.html

Meade_Lewis · April 7, 2015, 5:34pm

[quote=179022:@Norman Palardy]Dont split just on commas - they can be much more complex that that
You could grab my CSV parser from http://www.great-white-software.com/Great_White_Software/REALbasic_Code.html[/quote]

Commas do seem like a bad way to split the information as there can be line down characters that separate lines as well.

With your CSV Parser, how would I make that work with what I am trying to accomplish?

Norman_P · April 7, 2015, 5:43pm

It’d make reading the CSV simpler
From there you can do any one of several things with the data like loading it into an in memory sqlite database which then lets you write complex queries for filtering & sorting

Meade_Lewis · April 7, 2015, 5:55pm

[quote=179031:@Norman Palardy]It’d make reading the CSV simpler
From there you can do any one of several things with the data like loading it into an in memory sqlite database which then lets you write complex queries for filtering & sorting[/quote]

What if every line needs to be separated? In the example, I am seeing that headers are separated, but not every line.

Norman_P · April 7, 2015, 6:23pm

The parser should be splitting every line into fields
That’s the whole point
If it’s not that’s a bug I’d have to look into

Meade_Lewis · April 7, 2015, 6:34pm

[quote=179038:@Norman Palardy]The parser should be splitting every line into fields
That’s the whole point
If it’s not that’s a bug I’d have to look into[/quote]

I figured out, the issue was in my file. Now how do I fetch specific values based on their location (ex. row 2, column 6)?

Also, the blank cells are being put in as “?” characters in your example, will this be a problem?

Axel_Schneider · April 7, 2015, 6:51pm

[quote=179022:@Norman Palardy]Dont split just on commas - they can be much more complex that that
You could grab my CSV parser from http://www.great-white-software.com/Great_White_Software/REALbasic_Code.html[/quote]

it does not create columns (loading Workbook1.tab)
Xojo 2013

Xojo 2015 (additional trouble with encoding? )

Meade_Lewis · April 7, 2015, 6:54pm

[quote=179043:@Axel Schneider]it does not create columns (loading Workbook1.tab)
Xojo 2013

Xojo 2015 (additional trouble with encoding? )
[/quote]

That is why I was curious how I was to extract specific information out of the file based on column. Thank you for wording that better and simpler than I could.

Scott_Griffitts · April 7, 2015, 7:05pm

Looks like he’s simulating columns by putting the column data in brackets. Looking at the example code:

[code] for i = 0 to ubound(values)

s = s + "[" + ReplaceLineEndings(values(i),"<CR>") + "]"

values is one row’s worth of columns.

Meade_Lewis · April 7, 2015, 7:08pm

So what would you suggest for being the best way to read .csv files cell by cell in order to obtain specific information out of them?

Scott_Griffitts · April 7, 2015, 7:11pm

In this example, values is a row. Which column are you interested in? The 2nd column for example:

dim value_i_need as string value_i_need = values(1)

Axel_Schneider · April 7, 2015, 7:12pm

change CSVParser1 - newline to

  dim i as integer
  dim s as string
  
  for i = 0 to ubound(values)
    s = s + ReplaceLineEndings(values(i),"<CR>") + chr(9)
  next
  
  listbox1.addrow 
 listbox1.Cell(listbox1.LastIndex, -1) = s.ReplaceAll(",", chr(9))

Axel_Schneider · April 7, 2015, 7:29pm

encoding on Xojo 2015 works ( for me) with
CSVPareser.parse

  #pragma disableBackgroundTasks
  #pragma disableBoundsChecking
  #pragma DisableAutoWaitCursor
  
  dim b as binaryStream
  dim c as integer
  dim fieldCount as integer
  dim fieldBuffer as string
  dim state as integer
  dim line as integer
  
  const start_field = 1
  const collect_chars = 2
  const end_field = 3 // generally this is a "processing state" and not used
  const error = 4
  const quoted_field = 5
  const quote_in_quoted_field = 6
  
  redim currentRow(-1)
  fieldCount = 1
  state = start_field
  b = f.openAsBinaryFile(false)
  
  while b.EOF <> true
    
    c = b.ReadByte
    
    select case state
    case start_field
      if fieldCount = 1 then 
        line = line + 1
      end if
      
      if c = ascB(fieldseparator) then // got a separator when we were expecting to start a field so we just got an empty field
                                                      // like ,, in a CSV (the field between the comma's is empty)
        newfield (fieldBuffer)
        fieldCount = fieldCount + 1
        fieldBuffer = ""
        state = start_field
      elseif c = ascB(fieldenclosure) then // got an field enclosing item (a " in a CSV) to start a field so this is a quoted field
        state = quoted_field
      elseif c = 10 or c = 13 then // got a carriage return at the start so this is en empty field as well
        //                                        in fact this is considered end of line
        newfield (fieldBuffer)
        handleLine(line,currentRow)
        redim currentRow(-1)
        
        if c = 13 then // if the line ending was a CR was it a CR LF pair ?
          if b.EOF = true then exit // if we are at EOF then it wasn't and we can exit the loop
          
          c = b.readByte // was the next char a chr 10 ?
          if c = 10 then 
          else
            b.Position = b.position - 1 // no so push it back
          end if
        end if
        fieldcount = 1
        fieldBuffer = ""
        state = start_field
      else
        fieldBuffer = DefineEncoding(fieldBuffer + chrB(c), Encodings.UTF8)
        // other wise we go a char that we just want to save and go to 
        //                                                   the collecting characters state
        state = collect_chars
      end if
      
    case collect_chars
      if c = ascB(fieldenclosure) then // a quote in an unquoted field .. that's an error
        state = error
      elseif c = ascB(fieldseparator) or c = 10 or c = 13 then // a comma CR or LF (cant have CR, LF or CRLF in an unquoted field
        newfield (fieldBuffer)
        if c = 10 or c = 13 then 
          handleLine(line,currentRow)
          redim currentRow(-1)
          fieldcount = 1
          if c = 13 then // was this a CR LF pair ?
            if b.EOF = true then exit
            c = b.readByte
            if c = 10 then
            else
              b.Position = b.position - 1 // no LF so push this one back
            end if
          end if
        else
          fieldCount = fieldCount + 1
        end if
        
        fieldBuffer = ""
        state = start_field
      else
        fieldBuffer = DefineEncoding(fieldBuffer + chrB(c), Encodings.UTF8)
        // just add this char to the buffer
        state = collect_chars
      end if
      
      //case end_field
      
    case error // whoops .... an error state
      csvError(line,fieldCount, "An error occured reading field " + str(fieldCount) + " of line " + str(line) )
      
      fieldCount = fieldCount + 1
      fieldBuffer = ""
      state = start_field
      
      c = b.ReadByte // skip forward til we find the start of a new field or the end of line
      while b.eof <> true 
        if c = 13 or c = 10 or c = ascB(fieldseparator) then exit
        c = b.ReadByte
      wend 
      if c = 13 then // found a CR
        if b.EOF = true then exit
        c = b.readByte
        if c = 10 then // was it CR LF ?
        else
          b.Position = b.position - 1 // no push back the char after the CR
        end if
      end if
      
    case quoted_field
      if c = ascB("""") then // we're in a quoted field and got a quote
        state = quote_in_quoted_field // now we are in a quote in a quoted field
      else
        fieldBuffer = DefineEncoding(fieldBuffer + chrB(c), Encodings.UTF8)
        // just add the character (note this allows embedded CR, LF, etc)
      end if
      
    case quote_in_quoted_field
      if c = ascB("""") then // we've already had one quote did we get a second ie "" ?
        fieldBuffer = DefineEncoding(fieldBuffer + chrB(c), Encodings.UTF8)
         // add it to the buffer
        state = quoted_field // yup ... add the second one and go back tp collecting characters in a quoted field
      elseif c = ascB(fieldseparator) or c = 10 or c = 13 then // we got a , CR or LF (folloiwing a ")
        
        newfield (fieldBuffer)
        if c= 10 or c = 13 then 
          handleLine(line,currentRow)
          redim currentRow(-1)
          fieldCount = 1
          if c = 13 then // if it was a CR did we get CR LF ?
            if b.EOF = true then exit
            c = b.readByte
            if c = 10 then
            else
              b.Position = b.position - 1 // no push back the character so it is next to be dealt with
            end if
          end if
        else
          fieldCount = fieldCount + 1
        end if
        
        fieldBuffer = ""
        state = start_field
      else // we got a quote followed by something OTHER than a comma , CR or LF 
        // normally this might be considered an error ... but we'll just allow it and move on
        // could add a call to csvError here and spin forward to find the next field start
        fieldBuffer = DefineEncoding(fieldBuffer + chrB(c), Encodings.UTF8)
         // add it to the buffer
        state = quoted_field // yup ... add the second one and go back tp collecting characters in a quoted field
      end if
      
    end select
    
  wend
  
  if lenB(fieldBuffer) > 0 then // if there was w buffer in progress we got a field of some sort
    if state = quoted_field then // if we were in a quoted field we have an unterminated quote
      csverror(line,fieldCount,"Unterminated quote on line " + str(line) + " field " + str(fieldCount))
    else
      newfield (fieldBuffer)
      handleLine(line,currentRow)
      redim currentRow(-1)
      
      fieldCount = fieldCount + 1
    end if
  end if
  
  b.close

Norman_P · April 7, 2015, 10:35pm

[quote=179043:@Axel Schneider]it does not create columns (loading Workbook1.tab)
Xojo 2013

Xojo 2015 (additional trouble with encoding? )
[/quote]
Maybe
I wrote it 7 or 8 years ago

Christian_Schmitz · April 8, 2015, 9:28am

if you guys use MBS Plugin, there is a SplitCommaSeparatedValuesMBS method:
http://www.monkeybreadsoftware.net/string-splitcsv-method.shtml

Meade_Lewis · April 8, 2015, 12:07pm

[quote=179110:@Christian Schmitz]if you guys use MBS Plugin, there is a SplitCommaSeparatedValuesMBS method:
http://www.monkeybreadsoftware.net/string-splitcsv-method.shtml[/quote]

What is your plugin like?

Can it split on line down as well?

How would you suggest sorting through the data to find specific cells using your plugin?

Axel_Schneider · April 8, 2015, 1:35pm

try this

Test