Import text into a ListBox ?

Emile_Schwarz · September 6, 2014, 8:47am

Hi,

I want to create a project where I can import text file into a ListBox.

The problem is the kind of text / kind of text storage into files.

There is csv files and txt files, but the fields separator can be ‘,’, ‘;’, tab, etc.

What stategy do will you use at import file time ?
(I mean the place where the text file is read, one line at a time).

Also, by pure hazard, I double-clicked on a csv file instead of right-click choose to open it with textedit, it then was open with LibreOffice.

Here is the screen shot:

Basically, and even if the screen shot displays french text, I think all of us can understand.
Virgule: ‘,’
Point virgule: ‘;’
Champs entre guillemets: strings inside quotes
Dtecter les nombres spciaux: detect special numbers (?)

If you want to build such project, what will you do:

a. a behind the back code that will read silently (most of the) text files,
b. issue such a window to let the user choose the properties to import the text from the file(s) ?

Gerd_Wilmer · September 6, 2014, 10:37am

Hi Emile
this is quite easy
import the file via textinputstream
then place the text into the listbox
line by line and use Nthfield(separator) to place the text into cells
I have found these separators
///chr(9) = tab
////chr(124) = |
///chr(44) = ,
//chr( 32) = space
There may be more

The user usually knows what type of separator is used, so let the user choose from a PopupMenu
Alternatively you could test the number of possible separators and go with the most common.
(You may run into problems with space though)

Gerd

Emile_Schwarz · September 6, 2014, 11:36am

Why ?

Nota: I saw headings without spaces in a csv file, so

Gerd_Wilmer · September 6, 2014, 8:40pm

Because, when you are guessing the separator used, in large file the number of spaces may be larger than the number of separators.

Emile_Schwarz · September 7, 2014, 7:55am

I prefer to use -1 to store in the whole Row.

////chr(124) = |
This one is new for me (even if I saw a ‘Cell’ with one at the beg and one at the end.

Do you know what software use this sparator ?

Christian_Jung · September 7, 2014, 9:08am

Hi Emile,

if you know the possible separators (and it seems so) my way of dealing with this is having a property “separator”. Then i show the headings (normally the first line of your csv file) inside a listbox with the most common separator “;”.

To read your csv into a Listbox, your best choice should be “nthfield” to separate the columns

read in the csv via textInputstream an take care of special characters,

Dim s, fld As String

  s = tstream.ReadLine
  s = ReplaceAll(s,"'","")
  s = ReplaceAll(s,chr(034),"")

Find out, how often “delimiter” (; or, or whatever) exist in that string

Dim numberOfDelimiters As Integer = countfields(s, delimiter)

then read the first line for separating and split it into fields with that variable “delimiter” and write the splitted field into fld

for i = 1 to numberOfDelimiters
fld = (NthField(s, delimiter, i) // NOW, fld is ONE separated column of your table, using a variable delimiter
listbox.(heading(i-1) = fld
next

If it looks fine, you know what to do next. If it doesnt, you can change the property “separator” via a group of radiobuttons (my choice, because you can set it then using radiobutton.value like

if me.Value then
delimiter = me.Caption
end if

)

or pull down menu into another character ,"," or so and read it again into the listbox to check how it looks…

Emile_Schwarz · September 7, 2014, 10:17am

Thank you Christian for your answer.

Emile_Schwarz · September 7, 2014, 10:24am

What I have so far:

I ask the user to open a file (txt or csv)
I read the first line and fill the ListBox Heading with it.
To do that, I search for the field delimiter and set a bunch of delimiter booleans (for later use)
I generate the temporary ListBox Heading string using split and the found field delimiter
to get a tab delimited string
Then I set the ListBox heading using the -1 trick (fill all columns in one shot)

Then I read the selected text file using an if block and the Booleans noted earlier.
In each parts of the If block, I change the line field delimiter to Chr(9) using split

At the end of the If block, I add one Row and fill it directly (still the -1 trick) at once
with the newly computed string.

I do a lot of more things but out of the scope of the question.

is this what you would do ?

Gerd_Wilmer · September 7, 2014, 10:44am

Hi Emile

the | is also called a “pipe” often used in Linux/Unix line commands
I have come a across it on several occasions (though it is quite rare) in text to import

Your path is basically correct. However if as Christian Jung suggests to replace all known separates to tab, what happens if there are tabs within the text?

If you do not know the separator, there is always the danger to get it wrong.

Here is what I do for tab-delimited (just change the char( whatever)

If f <> Nil Then
textInput =TextInputStream.Open(f)
textInput.Encoding=Encodings.MacRoman //strings are MacRoman
Do
rowFromFile=textInput.ReadLine
If ListBox1.ColumnCount < CountFields(rowFromFile,Chr(9)) Then
ListBox1.ColumnCount=CountFields(rowFromFile,Chr(9))
End If
ListBox1.AddRow NthField(rowFromFile,Chr(9),1)
For i=1 to CountFields(rowFromFile,Chr(9))
oneCell=NthField(rowFromFile,Chr(9),i)
ListBox1.Cell(ListBox1.ListCount-1,i-1)=oneCell
Next
Loop Until textInput.EOF
textInput.Close
End If

Michel_Bujardet · September 7, 2014, 10:49am

When there is a separator within a field, the data must be within quotes. So using Chr(34)+Chr(9)+Chr(34) as separator should do the trick. Same thing for comma or semicolon.

Gerd_Wilmer · September 7, 2014, 10:57am

Sorry Michael, but in years or importing delimited text files I have never come across quotes

Emile_Schwarz · September 7, 2014, 11:12am

Hi Gerd, Michel,

thank you for your answers.

Your path is basically correct.
Thank you.

However if as Christian Jung suggests to replace all known separates to tab,
That is what I do. In fact, I xojoise the string (change the string to be at the Xojos default format who use tabs)

what happens if there are tabs within the text?
We cannot wrote users idiot proof’. As a user, in that case, (WikipediA csv file) I enclose my data inside quotes to holds the text contents and like in Xojo, I use to double the quotes if I have a quote in my string !
Michel answer (sent while I was typing my own answer is 100% correct and better written).

As I write earlier, I make delimiter testings, convert the delimiters from the loaded TIS, and then

ListBox1.AddRow The_Row

(or similar)
The_Row holds the readed TIS text with the tab as field delimiter (changed in an If / End If Block)

One thing I do not do is remove the string quotes (I found cases) and some other testings.

I found earlier today a case where the filename - set on OS X) holds a comma and the string is not quoted so at the end, the row is malformed. This kind of things have to be corrected in a copy of the original file, then re-import the file. Just like many other things

BTW: I can check the number of fields in each loaded row and compare it to the number of fields found in the heading string and issue a warning ? But if I start in this way, I will have to write a file checking routine and add a window to make these checkings in the user’s hands. *

About Encodings:
This is my feeling. I assume everyone now uses UTF as the text encoding. Using strings from dark ages is no more in use, here.

About the pipe:
(the | is also called a “pipe” often used in Linux/Unix line commands)
Also used within the Terminal in OS X. Check the man pags.
Thanks for the info, but it have nothing to do here. I found four of them surrounding two fields in one row of one file. Probably an error of the unknow file creator.

I already do (wastes my time) a filtering / reporting window of the original-text-file-to-be-imported but in a DB (nearly the same work, after all). That reported to the user both visually and saved in a disk file (if the user choosed that) what the program (me) think are errors, like bad date format, birth date after death date, too old or your person, text with too many characters and so on yes, two // in a single date

Only because the file to import (they used Excel to create and deal with their data) was full of errors and the guy in charge was reluctant to do the job. Once he saw the extent of the damage, he started to correct the Excel file and provided me with a far better original file to import ! (far better original file: we cannot invent data when none is available).

We, as programmers, cannot with a magic wand change a bad text file into a near perfect text file. We can only make guesses and stay there.