Parse CSV files from Mac or PC Excel

I have CSV files that will c

Incomplete question?

I have CSV files that will come from a Mac or PC. With some helpful suggestions I can parse Excel CSV files on a Mac.

I ran the same Xojo tool on an Excel CSV file that came from a PC using the same Excel app and parsing fails.

The column widths are different as well as rows. The Xojo app parses " encapsulated fields so the width difference is not due to extra field delimiter “,”.

The number of rows is doubled however CSV files from Mac or PC have 13 10. At the end of each row.

Has anyone else ran into this.? Has anyone parsed Excel CSV files from Excel on a PC? Any suggestions??

Thanks

Carl

How are you reading in the files? If you are reading them all into a variable first, use ReplaceLineEndings to normalize the end of line character.

Well, first we have a XL plugin to read/write excel files without the need of the Excel application.

Second you can simply read CSV with textInputStream class.
For splitting CSV lines, I once put a lot of work into the SplitCommaSeparatedValuesMBS function in our plugins.
(to handle quoting correct)

Maybe that helps you.

Duplication of lines between platforms can almost always be accounted for by the difference in line endings. As Kem says, the very first thing to do is normalize line endings. Either by taking the proper line ending into account or by normalizing the whole text in memory first.

I usually run my imported texts through this, taken from macoslib, I think.

[code]Function ReplaceLineEndings(extends s as string) As string

// For convenience, so that we can use dot syntax and
// don’t have to specify a line ending.

return s.ReplaceLineEndings(EndOfLine)

End Function

Function ReplaceLineEndings(extends s as string, newEnding as string) As string

// For convenience, so that we can use dot syntax.

return ReplaceLineEndings(s, newEnding)

End Function
[/code]

Give the free code I have on here a whirl
http://great-white-software.com/Great_White_Software/REALbasic_Code.html

if you use TextInputstream and read line by line in a loop, you don’t need to worry for the line endings.

Try my CSV parser:

https://dl.dropboxusercontent.com/u/10747116/RealStudioClasses/sbCSV_In_Out.zip

I would like to thank everyone for posting suggestions and links. I have tried them all and here is my follow up. Some of what I found may simply be due to my lack of interfacing your suggestions properly

To begin with I found if I enabled the following from a popup reading the csv became consistent in column row behavior no matter the source MAC, PC, Unix.

[code] Try

textInput = TextInputStream.Open(f)

Select Case Encoding_Chosen
Case 1
  textInput.Encoding = Encodings.MacRoman
Case 2
  textInput.Encoding = Encodings.WindowsANSI
Case 3
  textInput.Encoding = Encodings.UTF8
End

While Not textInput.EOF
  rowFromFile = textInput.ReadLine
  Row_Count = Row_Count + 1
Wend

Catch e As IOException
textInput.Close
MsgBox(“Error accessing file.”)
End Try
[/code]

Christian - What I was caught on was that buried inside some of a cells data was a ‘,’. That is what caused an offset in the comma delimited files as I parsed the data out. Once I saw that I knew I needed to Encapsulate the data in order to keep it in its respective column.

Basically the data I am looking at has the need to be placed into a db and RecordSet structure and then further modifications will occur. Part of this effort is to enable aligning data that is meant for the same destination application from an older to newer application that has some significant differences. The columns vary in depth and have new or missing columns. The need is to match up columns, then adjust the cells data to meet the new format and then continue on. At least to me this is why a db and RecordSet methodology seems to make the most sense.

Proceeding under this methodology means turning csv data with columns exceeding 6000 into a db structure is a challenge for SQLLite or MySQL. Add to this the ListBox limitation of 64 creates a presentation problem for visual verification and a TextBox etc simply does not align ‘columns’ up very well because the data has multiple lengths. A further complication is that some of the Column Headers contain ‘(’ spaces ’ ’ and ‘-’ which means I need an Encapsulation method in order to set column names.

In the absence of a db I read the data in and use single and multi dimensional arrays. Single arrays are simple to search through so I place headers into a single dimensional array. Then once I search the header array I have the column index I can locate the column quickly in the multi dimensional array. It sure would be nice to be able to search a column in a multi dimensional array in a similar manner as a single dimensional array without needing to loop. Well at least I have not figured that out so I loop around. So this method seems to work well but has its limits in that I do not see how to implement with a Web project which is something I would like to accomplish.

Simon - What you have looks like it may be a good parser but I could not get it to work.
To begin with I placed sbCSVIn onto a window and then in a button action event handler and from your General Usage note I added

[code]dim f As FolderItem = GetOpenText(“csv”)
if f = nil then exit sub

dim c As new sbCSV
c.FileName = f
[/code]
This erred on compile because ‘This item does not exist - GetOpenText’
This section of code dim c As new sbCSV failed for the same reason.

It appears that selecting sbCSVin is the needed choice but then GetOpenText always fails

I would like to give this a go if I could figure out how to implement reading a csv file and then after I have performed certain other work write that back out to a file using your two classes

Also not sure how to make sbCSVout work.

Norman - I took a look at yours and was especially interested in the sudo RecordSet method you have. I found that I needed to comment an Exit inside your MoveNext Method where elseif c = 10 or c = 13 then was detected to enable reading and tagging end of line

// newline(line,currentRow) // we have a new line !!!! 'Exit

And then placing an Exit at the end of this else if statement all worked well.

[code] fieldcount = 1
fieldBuffer = “”
state = start_field

    Exit
  else[/code]

Once I got through this I was successfully parsing into RecordSet like structure that your CSVRecordSet class performs. It sure would be nice to make RecordCount work

If I am not using any of anyones Classes - Modules etc correctly please let me know

Thanks

A listbox can have more than 64 columns. The number 64 comes from the fact that there is a max of 64 visible columns. If you have more you just have to scroll back and forth. I doubt if 64 is a problem in most cases unless all of your data is very short in digits or characters.

Record count would require parsing the whole file to get the count

Sorry about the GetOpenText method, this is my own GetOpen method. The code is:

[code]function GetOpenText(ext As String) As FolderItem
dim fn As FolderItem
Dim sType as New FileType

sType.Name = “text/” + ext
sType.MacType = “TEXT”
sType.Extensions = ext

fn = GetOpenFolderItem(sType)
Return fn
end function[/code]
Place this function in a module or in your window and it will work.

Harrie - ok you perked my interest. How does one implement scrolling a Listbox for columns in > 64?

Simon - thanks will check that out

Norman - I am taking a look at that. Here is what I implemented

I place this request Row_Count = c.RecordCount inside a Method, Action etc after this c = new CSVRecordSet(f)
Then inside your CSVRecordSet Method ‘RecordCount’ I placed the following

[code] Dim x As Integer
Dim s As String
Dim myTextInputStream As TextInputStream

myTextInputStream = TextInputStream.Open(myFile)

While myTextInputStream.EOF <> True
s = myTextInputStream.ReadLine
x = x + 1
Wend

return x[/code]

Ah but readline would be wrong :stuck_out_tongue:
A CSV value can contain line endings - fun stuff
Excel has no trouble producing those

Interesting thought and that could possibly happen. I have not seen that but then again that just may be something I have not seen like a ‘,’ in the middle of the data. I have always implemented Readln just as Christian suggested above.

I am interested in learning how to scroll a Listbox. That thought never occurred to me and if that is doable I would like to implement that.

The interesting part about this would be using a 3 dimensional array instead of a RecordSet to display inside a Listbox. I have started by sending array info to Listbox but it has only 1 column. So the thought process I am looking at would be to fill in as many columns up to 64 then stop adding to the Listbox. Then when scroll is requested delete the current 64 and do again for the next 64. Then need to know how to handle if 64 more is not available. I guess just show the last 64. If this thinking process is correct then the next step is to figure out the code sequence.

I have never had the need to scroll any of my listboxes horizontally to date; but, considering that the Listbox has a ScrollBarHorizontal property alongside the ScrollBarVertical property, I would assume that you just have to set ScrollBarHorizontal to true in the IDE. I believe that the 64 visible rows is some inner limitation to the displaying portion of the Listbox internal workings. So just try turning on the ScrollBarHorizontal property and give it a go.

How can I use it.

I understand that its not a class,its a Xojo project
I’m Using Xojo 2015 R2

Regards

[quote=100485:@Simon Berridge]Sorry about the GetOpenText method, this is my own GetOpen method. The code is:

[code]function GetOpenText(ext As String) As FolderItem
dim fn As FolderItem
Dim sType as New FileType

sType.Name = “text/” + ext
sType.MacType = “TEXT”
sType.Extensions = ext

fn = GetOpenFolderItem(sType)
Return fn
end function[/code]
Place this function in a module or in your window and it will work.[/quote]
I read that you forgot to put the GetOpenText. I putted into a module, called MethodName: GetOpenText, Return Type: FileType, Scope:Public.

And I got this error:

Syntax error:
end function

Syntax error:
End sub.

What I’m doing wrong?

The function and end function lines do not go in your code. Use the function line as a guide to filling out the name/parameters/return type fields. Drop end function altogether.