Hi Gerd, Michel,
thank you for your answers.
Your path is basically correct.
Thank you.
However if as Christian Jung suggests to replace all known separates to tab,
That is what I do. In fact, I xojoise the string (change the string to be at the Xojos default format who use
tabs)
what happens if there are tabs within the text?
We cannot wrote users idiot proof’. As a user, in that case, (WikipediA csv file) I enclose my data inside quotes to holds the text contents and like in Xojo, I use to double the quotes if I have a quote in my string
!
Michel answer (sent while I was typing my own answer is 100% correct and better written).
As I write earlier, I make delimiter testings, convert the delimiters from the loaded TIS, and then
ListBox1.AddRow The_Row
(or similar)
The_Row holds the readed TIS text with the tab as field delimiter (changed in an If
/
End If
Block)
One thing I do not do is remove the string quotes (I found cases) and some other testings.
I found earlier today a case where the filename - set on OS X) holds a comma and the string is not quoted so at the end, the row is malformed. This kind of things have to be corrected in a copy of the original file, then re-import the file. Just like many other things
BTW: I can check the number of fields in each loaded row and compare it to the number of fields found in the heading string and issue a warning ? But if I start in this way, I will have to write a file checking routine and add a window to make these checkings in the user’s hands. *
About Encodings:
This is my feeling. I assume everyone now uses UTF as the text encoding. Using strings from dark ages is no more in use, here.
About the pipe:
(the | is also called a “pipe” often used in Linux/Unix line commands)
Also used within the Terminal in OS X. Check the man pags.
Thanks for the info, but it have nothing to do here. I found four of them surrounding two fields in one row of one file. Probably an error of the unknow file creator.
- I already do (wastes my time) a filtering / reporting window of the original-text-file-to-be-imported but in a DB (nearly the same work, after all). That reported to the user both visually and saved in a disk file (if the user choosed that) what the program (me) think are errors, like bad date format, birth date after death date, too old or your person, text with too many characters and so on
yes, two // in a single date
Only because the file to import (they used Excel to create and deal with their data) was full of errors and the guy in charge was reluctant to do the job. Once he saw the extent of the damage, he started to correct the Excel file and provided me with a far better original file to import ! (far better original file: we cannot invent data when none is available).
We, as programmers, cannot with a magic wand change a bad text file into a near perfect text file. We can only make guesses and stay there.