I have two thesaurus text files that I am attempting to combine into a larger thesaurus file. Each file has a number lines of text, with each line consisting of a list of words separated by commas. The first word represents the main entry word one would lookup in a thesaurus while the remaining words on each line (which are listed in alphabetical order) are synonyms for the first word.
A typical line would be as follows - “sharp” is the main entry word - the remaining words are synonyms of “sharp”:
sharp,abrupt,acerb,acerbic,acuate,acute,astringent,astute,carnassial,crisp,cutting
This code has two main parts:
-
Initially open first file to count lines to display progress of process via a label.
-
Open first text file and add it to dictionary. This file has just over 30,000 lines of text.
-
Initially open second file to count lines to display progress of process via label.
-
Go through each line from the 2nd text file - it has about 142,000 lines and do one of the following:
A. If dictionary does not already have a matching main entry word from the line of text being read then add that line to the dictionary.
B. If dictionary already has a matching main entry word then go through each word on the line and only add those synonyms not already in the dictionary for that main entry word.
C. Save as text file.
I’ve used very similar code before; however, the application suddenly quits during the second process and never saves the resulting file. I suspect that the dictionary may be getting too large and perhaps it may be running out memory; however, I do have 48GB installed so I am not sure if that is a problem or not. I have no idea if there is a memory leak or not. A message window is displayed by the OS (10.9.5) simply saying that the application had suddenly quit, but gives no details - this happens in both the IDE and a compiled app.
Dim F as FolderItem
Dim TempIn, TextIn as TextInputStream
Dim TextOut as TextOutputStream
Dim CatLine, CatList, CatListWord, CatWord, DicList, TempArray(-1), TempList, TempText, TLine, TList, TWord, Unique as String
Dim D2, DoIt, TempCount as Integer
Dim D as New Dictionary
F = GetFolderItem("").Child("mthesaur.txt")//Open Existing Thearsus & Construct Dictionary
TempIn = F.OpenAsTextFile
If TempIn <> Nil then
TempText = TempIn.ReadAll
TempCount = CountFields(TempText, EndOfLine)
Label1.Text = Str(TempCount)
Label1.Refresh
TempText = ""
TempCount = 0
End If
TempIn.Close
F = GetFolderItem("").Child("mthesaur.txt")
If F.Exists and F <> Nil then
TempIn = F.OpenAsTextFile
TempIn.Encoding = Encodings.ASCII
If TempIn <> Nil then
Do
TLine = TempIn.ReadLine
TWord = NthField(TLine, ",", 1)
TList = Mid(TLine, Instr(TLine, ",") + 1)
D.Value(TWord) = TList
Label1.Text = Str(Val(Label1.Text) - 1)
Label1.Refresh
Loop Until TempIn.EOF
End If
End If
TempIn.Close
F = GetFolderItem("").Child("MyThes-1.0").Child("parsedic")
If F.Exists and F <> Nil then
TempIn = F.OpenAsTextFile
If TempIn <> Nil then
TempText = TempIn.ReadAll
TempCount = CountFields(TempText, EndOfLine)
Label1.Text = Str(TempCount)
Label1.Refresh
TempIn.Close
End If
TextIn = F.OpenAsTextFile
If TextIn <> Nil then
TextIn.Encoding = Encodings.ASCII
Do
CatLine = TextIn.ReadLine
Label1.Text = Str(Val(Label1.Text) - 1)
Label1.Refresh
CatWord = NthField(CatLine, ",", 1)//First Word on Line is Main Entry Word
CatList = Mid(CatLine, Instr(CatLine, ",") + 1)//Remaining Words on Line After 1st Comma are Synonyms
If D.HasKey(CatWord) then//Existing Main Word in Thesaurus
TempList = D.Value(CatWord)
For DoIt = 1 to CountFields(TempList, ",")
CatListWord = NthField(TempList, ",", DoIt)
If Instr(Unique, CatListWord + ",") = 0 then
Unique = Unique + CatListWord + ","
End If
Next
Unique = Left(Unique, Len(Unique) - 1)//Remove Trailing Comma
TempArray = Split(Unique, ",")
TempArray.Sort
Unique = Join(TempArray, ",")
D.Value(CatWord) = Unique//Reset Dictionary
Else//No Existing Main Word in Thesaurus
TempArray = Split(Unique, ",")
TempArray.Sort
Unique = Join(TempArray, ",")
D.Value(CatWord) = Unique
End If
Loop Until TextIn.EOF
Else
'MsgBox "Could not open the file."
End If
Else
'MsgBox "The file does not exist."
End If
TextIn.Close
Label1.Text = Str(D.Count)
Label1.Refresh
For D2 = 0 to D.Count - 1
DicList = DicList + D.Key(D2) + "," + D.Value(D.Key(D2)) + EndOfLine
Label1.Text = Str(Val(Label1.Text) - 1)
Label1.Refresh
Next
F = GetFolderItem("").Child("MyThes-1.0").Child("combineddic")
If F <> Nil then
TextOut = TextOutputStream.Create(F)
TextOut.Write DicList
TextOut.Close
End If
The second part of this question is what pragma code should I add to this method to speed it up such as:
#pragma DisableBackgroundTasks
#pragma NilObjectChecking
#pragma StackOverflowChecking
And where should I add it in the method - at the top or just inside the main loops?
Note:
This is a utility app I am using to prepare data for another app so I’ve not used a thread & timer as I am not concerned with the app’s window being manipulated while the method is running.