Help w/ first app - Import txt into app and process it

Hello!
I’m trying to do some data analysis but my data set has grown too big for my usual set of tools. I wanted to use my xojo desktop license but i’m not too good at it. I created a project, a button to engage my data analysis routine but i’m getting stuck afterwards.

My requirements are as follow:
-import a txt file(huge one, roughly 1.5m rows, about 200mb in size)
-split each row into two data subset(date and path) using “d:” as separator
-remove trailing spaces in front and at the end of each cell
-populate a third data subset, calculated using the length of the path item for that row
-export the complete data set (date,path,path length) into a csv.

While i don’t have a sql license i understand that using an external db is not an issue if i just compile and run local right? I have a mssql express 2014 server on my dev machine.

This is my incomplete function, i managed until the data processing part

[code] Dim f As FolderItem
dim tis as TextInputStream
dim s as string
dim fields() as string

'show standard file selector
f = GetOpenFolderItem(“any” )
if f=nil then exit sub 'cancel clicked

'open the file
tis = f.OpenAsTextFile
if tis=nil then 'failed?
MsgBox(“The file could not be opened.”)
exit sub
end if

'read file into grid
while not tis.EOF 'while not end-of-file
s=tis.ReadLine 'read line from file
fields=Split(s,“d:”) 'put items in fields() array
wend

tis.Close 'close file[/code]

Thank you in advance,
Matteo

a. import a txt file(huge one, roughly 1.5m rows, about 200mb in size)
Done.

But, you may use ReplaceLineEndings to be sure the line ending will be the one Xojo was dealing with.

b. split each row into two data subset(date and path) using “d:” as separator
Done.

c. remove trailing spaces in front and at the end of each cell
Use TRIM. Get an eye on what TRIM remove at TRIM at WikipediA: TRIM will remove leading/ending tab for example.

d. populate a third data subset, calculated using the length of the path item for that row
I do not understand.

e. export the complete data set (date,path,path length) into a csv.[/quote]
To read a ListBox contents, you have to:
get the number of Rows in the ListBox (ListBox1.ListCount - 1)
in a loop: (for LoopIdx = 0 To RowCnt)
Get a Row contents using ListBox1.Cell → Cell in ListBox doc

  • Format the Row string for write to file
  • Place here the Write operations)
    end of Loop

To save the strings to disk file) you have to use a TextOutputStream object to do that. Also, the delimiter usually is a comma in .csv files. Read the Join Method to achieve that (inverse of Split…).

Warning: as per docs, TextInputStream and TextOutputStream have to reside in a try block (TextInputStream Example and TextOutputStream Example.

If the above directions are not enough, feel free to call for more info.

[quote]d. populate a third data subset, calculated using the length of the path item for that row
I do not understand.[/quote]

I need to calculate the length of the string contained within the paths(i need to check if the path is longer than 260 char) and populate a third column with the calculated items. My standard powershell script would use measure-object -character but given the size of the dataset i’m trying to use xojo to get my results a bit faster.

An example of a finished csv row would be:

2012/06/11 07:53:07;d:\data\foldername\file.txt;27

I need to get know how many chars “d:\data\foldername\file.txt” has, so i can evaluate if explorer would error out if somebody tried to edit/move/etc that folder contents.

Thank you again,
Matteo

If I understand you correctly, you dont need a database, or a grid.

The source file looks like this?
2012/06/11 07:53:07;d:\data\foldername\file.txt

If you can trust the date to be yyyy/mm/dd hh:mm:ss
Then all you need to do is (pseudocode)

[code]
Open the input file for reading
open a textstream for writing.

read in one inputrow
While not end of input file
'throw away trailing spaces
inputrow = trim(inputrow)

'output the trimmed row plus a filename length
write to output file inputrow + “;” + format(len(inputrow) - 19,“0”)

'get another row
read in inputrow
wend

close the input
close the output[/code]

So your csv file will not use a Comma (Separated Value), but a semi-color. Does not matter.

There are two ways to compute a string length: Len() and LenB() (check in the docs).

ok i refactored my routine

@Jeff Tullin Thank you for the hint, i managed to simplify the application workflow to my skill level, databases are too much for my xojo knowledge :slight_smile:

My quick and dirty code is as follows

[code] Dim f As FolderItem
dim tis as TextInputStream
dim s as string
dim fields() as string
dim inputrow as string
dim outputrow as string
Dim CSVf As FolderItem
Dim fileStream As TextOutputStream

CSVf = GetSaveFolderItem(“any”,“My Info”)
If CSVf <> Nil Then
MsgBox(“Save path is”+ Endofline+CSVf.Nativepath)
End If

'show standard file selector
f = GetOpenFolderItem(“any” )
if f=nil then exit sub 'cancel clicked

'open the file
tis = f.OpenAsTextFile
if tis=nil then 'failed?
MsgBox(“The file could not be opened.”)
exit sub
end if

'compile new file
while not tis.EOF 'while not end-of-file
s=tis.ReadLine 'read line from file
s = trim(s) 'remove trailing spaces
inputrow=trim(s)
inputrow=replace(inputrow,“d:”,";d:")
outputrow=(inputrow + “;” + format(len(inputrow) - 19,“0”))

 'create output filestream if CSV is empty else append to existing file
if CSVf = nil then fileStream = TextOutputStream.Create(CSVf) else fileStream = TextOutputStream.Append(CSVf)

fileStream.WriteLine(outputrow) 'input currentline
fileStream.Close 'close filestream

wend

tis.Close 'close file [/code]

check the results on a 5 row file.
Your code may not process the last line, due to when you test for the EOF.

[quote=145841:@Jeff Tullin]check the results on a 5 row file.
Your code may not process the last line, due to when you test for the EOF.[/quote]
I got all my data on my first successful run.
If there may be potential issues with my while loop, any hint on how to avoid them? Is there a better way to check for the end of the imported file?

TY again!

If you start the loop with “do” and end it with “loop until tis.EOF”, it will guarantee that every line gets tested (although you may instead find it testing an extra empty line at the end).

Why open and close the output file every time through the loop? Open it before you start the loop and close it after the end of the loop. Opening a file over a million times is going to add considerable time to the execution.

See pseudocode.
:wink:

The important part of the sentense is MAY.

I too use to check against EOF at the beg of the loop and have no trouble (the only one I potentially get was an empty line (an EndOfLine character at the end of the file).

Care to explain your concern ?

Based on similar code for database use, where EOF can be set True at a different point.
The thing about pseudocode was that I hadn’t run it personally and fully tested when I posted it.

Ok i’ve converted the code into a method (i’m now using it for another robocopy log cleanup job too) and switched to a do loop. For sake of completion i’m posting the method :slight_smile: Everything works and i’ve managed to execute within reasonable times both jobs.

I’ve created a filetypes file describing a txt and csv save option (its just a formal thing, the content is the same) and the method has two parameters: Type(defining if i’m checking a certain type of export or another one, the only change is the csv header string and the chars to remove from the path count) and separator(to create csv with semicolon or comma separated data).

[code] 'Define variables+objects
Dim f As FolderItem
Dim tis As TextInputStream
Dim s As string
Dim fields() As string
Dim inputrow As string
Dim outputrow As string
Dim CSVf As FolderItem
Dim fileStream As TextOutputStream
Dim chars As Int32
Dim diskletter As string
Dim year As string
Dim diskstring As string
Dim datestring As string
Dim CSVsep As string
Dim CSVheader As string
Dim headermode As Integer
Dim filename As string
Dim diskid As string
Dim yearid As string

'Initialize variables
diskletter =“d”
diskid = diskletter + “:”
year =“2014”
yearid = year + “/”
headermode = 1

'Method setup
If Separator = 0 then
CSVsep = “;”
elseif Separator = 1 then
CSVsep = “,”
end if
If Type = “newfiles” then
chars = 28
CSVheader = “Size” + CSVsep + “Date” + CSVsep + “Path” + CSVsep + “Pathlength”
filename = “newexport.txt”
elseif Type = “longpaths” then
chars = 21
CSVheader = “Date” + CSVsep + “Path” + CSVsep + “Pathlength”
filename = “longexport.txt”
else
MsgBox(“Wrong type, please contact the developer to verify action source code”)
End if
datestring = CSVsep + yearid 'prepare date separator object
diskstring = CSVsep + diskid 'prepare disk letter separator object

'Select output file/folder
CSVf = GetSaveFolderItem(FileTypes1.ALL,filename)
If CSVf <> Nil Then
MsgBox(“Save path is”+ Endofline+CSVf.Nativepath)
End If

'Select input file
f = GetOpenFolderItem(“any” )
if f=nil then exit sub 'cancel clicked

'open the file
tis = f.OpenAsTextFile
if tis=nil then 'failed?
MsgBox(“The file could not be opened.”)
exit sub
end if

'compile new file
do
s=tis.ReadLine 'read line from file
if s <> “” then
s = trim(s) 'remove trailing spaces
inputrow = replace (s, diskid, diskstring) 'use the disk letter to add the first column separator
If type =“newfiles” then
inputrow = replace (inputrow, yearid, datestring) 'use the year to add the second column separator on newfiles mode
end if
outputrow=(inputrow + CSVse" + format(len(inputrow) - chars,“0”)) 'execute the path char count routine and remove the corresponding number of chars from the total

       'create output filestream if CSV is empty else append to existing file
  if Headermode = 1 then
    fileStream = TextOutputStream.Create(CSVf)    'create file
    fileStream.WriteLine(CSVHeader)                         'write header
    Headermode = 0                                                     'revert to standard mode
   else
     fileStream = TextOutputStream.Append(CSVf) 'append data to file
  end if
  
  fileStream.WriteLine(outputrow) 'input currentline
  fileStream.Close 'close filestream
end if

loop until tis.eof

tis.Close 'close file[/code]

Again, thank you all for the all useful hints and suggestions :slight_smile:

I need to create the textoutputstream, insert a row on file creation(column names), write the first data row and then redo the loop appending the current row, can i keep the stream open even if i switch from create to append?

TY! :slight_smile:

Yes.

Create the file once, then use lots of writeln commands.
Or create the file as empty, and every write is an append.

See pseudocode… :slight_smile:

ok redone the file subroutine, quite the speed boost!

I did it this way

[code] 'Initialize the file
fileStream = TextOutputStream.Create(CSVf) 'create file
fileStream.WriteLine(CSVHeader) 'write header
fileStream.Close 'close filestream

'compile new file
do
s=tis.ReadLine 'read line from file
if s <> “” then
s = trim(s) 'remove trailing spaces
inputrow = replace (s, diskid, diskstring) 'use the disk letter to add the first column separator
If type =“newfiles” then
inputrow = replace (inputrow, yearid, datestring) 'use the year to add the second column separator on newfiles mode
end if
outputrow=(inputrow + CSVsep + format(len(inputrow) - chars,“0”)) 'execute the path char count routine and remove the corresponding number of chars from the total
fileStream = TextOutputStream.Append(CSVf) 'append data to file
fileStream.WriteLine(outputrow) 'input currentline
end if
loop until tis.eof
fileStream.Close 'close filestream[/code]

Done some code refactoring and i think that’s good enough to settle it as a finished project

Here is the code:

[code] 'Define variables+objects
Dim f As FolderItem
Dim tis As TextInputStream
Dim s As string
Dim inputrow As string
Dim outputrow As string
Dim CSVf As FolderItem
Dim fileStream As TextOutputStream
Dim chars As Int32
Dim diskletter As string
Dim year As string
Dim diskstring As string
Dim datestring As string
Dim CSVsep As string
Dim CSVheader As string
Dim filename As string
Dim diskid As string
Dim yearid As string

'Initialize variables
diskletter =“d”
diskid = diskletter + “:”
year =“2014”
yearid = year + “/”

'Method setup
If Separator = 0 then
CSVsep = “;”
elseif Separator = 1 then
CSVsep = “,”
end if
If Type = “newfiles” then
chars = 28
CSVheader = “Size” + CSVsep + “Date” + CSVsep + “Path” + CSVsep + “Pathlength”
filename = “newexport.txt”
elseif Type = “longpaths” then
chars = 21
CSVheader = “Date” + CSVsep + “Path” + CSVsep + “Pathlength”
filename = “longexport.txt”
else
MsgBox(“Wrong type, please contact the developer to verify action source code”)
End if

'Prepare CSV separator strings
datestring = CSVsep + yearid
diskstring = CSVsep + diskid

'Select output file/folder and show the filepath
CSVf = GetSaveFolderItem(FileTypes1.ALL,filename)
If CSVf <> Nil Then
MsgBox(“Save path is”+ Endofline+CSVf.Nativepath)
End If

'Select input file
f = GetOpenFolderItem(“any” )
'Cancel button clicked
if f=nil then exit sub

'Open the file
If f <> Nil then
tis=TextInputStream.Open(f)
'Specify encoding of input stream
tis.Encoding=Encodings.MacRoman
'Exit routine if import fails
elseif tis=nil then
'Notify user
MsgBox(“The file could not be opened.”)
exit sub
end if

'Initialize the file, clear it, add the header and close the filestream
fileStream = TextOutputStream.Create(CSVf)
fileStream.WriteLine(CSVHeader)
fileStream.Close

'Compile new file
do
'Read line from file
s=tis.ReadLine
'Check if line is empty
if s <> “” then
'Remove trailing spaces
s = trim(s)
'Use the disk letter to add the first column separator
inputrow = replace (s, diskid, diskstring)
If type =“newfiles” then
'Use the year to add the second column separator on newfiles mode
inputrow = replace (inputrow, yearid, datestring)
end if
'execute the path char count routine and remove the corresponding number of chars from the total
outputrow=(inputrow + CSVsep + format(len(inputrow) - chars,“0”))
'Append data to file
fileStream = TextOutputStream.Append(CSVf)
'Input current line
fileStream.WriteLine(outputrow)
end if
loop until tis.eof

'Close both the robocopy and the export file
fileStream.Close
tis.Close
[/code]