It’d make reading the CSV simpler
From there you can do any one of several things with the data like loading it into an in memory sqlite database which then lets you write complex queries for filtering & sorting
[quote=179031:@Norman Palardy]It’d make reading the CSV simpler
From there you can do any one of several things with the data like loading it into an in memory sqlite database which then lets you write complex queries for filtering & sorting[/quote]
What if every line needs to be separated? In the example, I am seeing that headers are separated, but not every line.
[quote=179038:@Norman Palardy]The parser should be splitting every line into fields
That’s the whole point
If it’s not that’s a bug I’d have to look into[/quote]
I figured out, the issue was in my file. Now how do I fetch specific values based on their location (ex. row 2, column 6)?
Also, the blank cells are being put in as “?” characters in your example, will this be a problem?
[quote=179043:@Axel Schneider]it does not create columns (loading Workbook1.tab)
Xojo 2013
Xojo 2015 (additional trouble with encoding? )
[/quote]
That is why I was curious how I was to extract specific information out of the file based on column. Thank you for wording that better and simpler than I could.
dim i as integer
dim s as string
for i = 0 to ubound(values)
s = s + ReplaceLineEndings(values(i),"<CR>") + chr(9)
next
listbox1.addrow
listbox1.Cell(listbox1.LastIndex, -1) = s.ReplaceAll(",", chr(9))
encoding on Xojo 2015 works ( for me) with
CSVPareser.parse
#pragma disableBackgroundTasks
#pragma disableBoundsChecking
#pragma DisableAutoWaitCursor
dim b as binaryStream
dim c as integer
dim fieldCount as integer
dim fieldBuffer as string
dim state as integer
dim line as integer
const start_field = 1
const collect_chars = 2
const end_field = 3 // generally this is a "processing state" and not used
const error = 4
const quoted_field = 5
const quote_in_quoted_field = 6
redim currentRow(-1)
fieldCount = 1
state = start_field
b = f.openAsBinaryFile(false)
while b.EOF <> true
c = b.ReadByte
select case state
case start_field
if fieldCount = 1 then
line = line + 1
end if
if c = ascB(fieldseparator) then // got a separator when we were expecting to start a field so we just got an empty field
// like ,, in a CSV (the field between the comma's is empty)
newfield (fieldBuffer)
fieldCount = fieldCount + 1
fieldBuffer = ""
state = start_field
elseif c = ascB(fieldenclosure) then // got an field enclosing item (a " in a CSV) to start a field so this is a quoted field
state = quoted_field
elseif c = 10 or c = 13 then // got a carriage return at the start so this is en empty field as well
// in fact this is considered end of line
newfield (fieldBuffer)
handleLine(line,currentRow)
redim currentRow(-1)
if c = 13 then // if the line ending was a CR was it a CR LF pair ?
if b.EOF = true then exit // if we are at EOF then it wasn't and we can exit the loop
c = b.readByte // was the next char a chr 10 ?
if c = 10 then
else
b.Position = b.position - 1 // no so push it back
end if
end if
fieldcount = 1
fieldBuffer = ""
state = start_field
else
fieldBuffer = DefineEncoding(fieldBuffer + chrB(c), Encodings.UTF8)
// other wise we go a char that we just want to save and go to
// the collecting characters state
state = collect_chars
end if
case collect_chars
if c = ascB(fieldenclosure) then // a quote in an unquoted field .. that's an error
state = error
elseif c = ascB(fieldseparator) or c = 10 or c = 13 then // a comma CR or LF (cant have CR, LF or CRLF in an unquoted field
newfield (fieldBuffer)
if c = 10 or c = 13 then
handleLine(line,currentRow)
redim currentRow(-1)
fieldcount = 1
if c = 13 then // was this a CR LF pair ?
if b.EOF = true then exit
c = b.readByte
if c = 10 then
else
b.Position = b.position - 1 // no LF so push this one back
end if
end if
else
fieldCount = fieldCount + 1
end if
fieldBuffer = ""
state = start_field
else
fieldBuffer = DefineEncoding(fieldBuffer + chrB(c), Encodings.UTF8)
// just add this char to the buffer
state = collect_chars
end if
//case end_field
case error // whoops .... an error state
csvError(line,fieldCount, "An error occured reading field " + str(fieldCount) + " of line " + str(line) )
fieldCount = fieldCount + 1
fieldBuffer = ""
state = start_field
c = b.ReadByte // skip forward til we find the start of a new field or the end of line
while b.eof <> true
if c = 13 or c = 10 or c = ascB(fieldseparator) then exit
c = b.ReadByte
wend
if c = 13 then // found a CR
if b.EOF = true then exit
c = b.readByte
if c = 10 then // was it CR LF ?
else
b.Position = b.position - 1 // no push back the char after the CR
end if
end if
case quoted_field
if c = ascB("""") then // we're in a quoted field and got a quote
state = quote_in_quoted_field // now we are in a quote in a quoted field
else
fieldBuffer = DefineEncoding(fieldBuffer + chrB(c), Encodings.UTF8)
// just add the character (note this allows embedded CR, LF, etc)
end if
case quote_in_quoted_field
if c = ascB("""") then // we've already had one quote did we get a second ie "" ?
fieldBuffer = DefineEncoding(fieldBuffer + chrB(c), Encodings.UTF8)
// add it to the buffer
state = quoted_field // yup ... add the second one and go back tp collecting characters in a quoted field
elseif c = ascB(fieldseparator) or c = 10 or c = 13 then // we got a , CR or LF (folloiwing a ")
newfield (fieldBuffer)
if c= 10 or c = 13 then
handleLine(line,currentRow)
redim currentRow(-1)
fieldCount = 1
if c = 13 then // if it was a CR did we get CR LF ?
if b.EOF = true then exit
c = b.readByte
if c = 10 then
else
b.Position = b.position - 1 // no push back the character so it is next to be dealt with
end if
end if
else
fieldCount = fieldCount + 1
end if
fieldBuffer = ""
state = start_field
else // we got a quote followed by something OTHER than a comma , CR or LF
// normally this might be considered an error ... but we'll just allow it and move on
// could add a call to csvError here and spin forward to find the next field start
fieldBuffer = DefineEncoding(fieldBuffer + chrB(c), Encodings.UTF8)
// add it to the buffer
state = quoted_field // yup ... add the second one and go back tp collecting characters in a quoted field
end if
end select
wend
if lenB(fieldBuffer) > 0 then // if there was w buffer in progress we got a field of some sort
if state = quoted_field then // if we were in a quoted field we have an unterminated quote
csverror(line,fieldCount,"Unterminated quote on line " + str(line) + " field " + str(fieldCount))
else
newfield (fieldBuffer)
handleLine(line,currentRow)
redim currentRow(-1)
fieldCount = fieldCount + 1
end if
end if
b.close