Gedcom-Parser

Hi,

at my Attachment you will see some Code for a Gedcom-Parser. Gedcom stores genealogical Information. It’s a simple Textfile.

Every Mainstructure (Individuals, Families, Sources, Multimedia-Objects and Notes) has an unique ID, called XReference. I would like to extract the XReference for each Person. But i will only get the first Person-Record with XReference. Please look at my Code xDocument.ParseGedcom():

Dim arrXRef() As String = Split(CurrentLine, "@") Dim XRef As String = arrXRef(1)
How to get it working?

Xojo-Project
Gedcom Sample file

Thank you all :wink:

OK,

here is just my actual Source. One Class parses every Line ParseLine. The Other parse the Gedcom-File ParseGedcom.
How can i optimize this Code (speediness??). I would like to show via Progressbar die Parsing-Progress, how can i implement that?

Any ideas?

Merry Christmas everyone.

[h]ParseLine[/h]

[code]Private a_Level As String
Private a_Line As String
Private a_Tag As String
Private a_Value As String
Private a_XReference As String

Level As String
Get
Return a_Level
End Get
End Property

Tag As String
Get
Return a_Tag
End Get
End Property

Value As String
Get
Return a_Value
End Get
End Property

XReference As String
Get
Return a_XReference
End Get
End Property

Function Get_Int_Level() As Integer
Return CDbl(a_Level)
End Function

Private Function Get_XReference(Text As String) As String
Return Mid(Text, 2, Text.Len-2)
End Function

Private Sub Init()
Dim Tokens() As String = Split(a_Line)

a_Level = Tokens(0)
Select Case Tokens.Ubound
Case 1
a_Tag = Uppercase(Tokens(1))
Case 2
If IsXReference(Tokens(1)) Then
// e.g. “0 @I1@ INDI”
a_Tag = Uppercase(Tokens(2)) a_XReference = Get_XReference(Tokens(1))
Else
If IsXReference(Tokens(2)) Then
// e.g. “1 HUSB @I1@”
a_Tag = Uppercase(Tokens(1))
a_XReference = Get_XReference(Tokens(2))
Else
// e.g. “1 SEX M”
a_Tag = Uppercase(Tokens(1))
a_Value = Tokens(2)
End If
End If
Case Is >= 3
a_Tag = Uppercase(Tokens(1))
For i As Integer = 2 To Tokens.Ubound
a_Value = a_Value + " " + Tokens(i)
Next
a_Value = Trim(a_Value)
End Select
End Sub

Private Function IsXReference(Text As String) As Boolean
Return If(Text.Left(1) = “@” And Text.Right(1) = “@”, True, False)
End Function

Sub Parse(Line As String)
a_Level = “”
a_Tag = “”
a_Value = “”
a_XReference = “”
a_Line = “”
If Line <> “” Then
a_Line = Line
Init
End If
End Sub[/code]

[h]ParseGedcom[/h]

Sub Parse(f As FolderItem, LB As Listbox) Dim Gedcom As TextInputStream If f <> Nil And f.Exists Then LB.DeleteAllRows // read Gedcom Gedcom = TextInputStream.Open(f) Gedcom.Encoding = Encodings.UTF8 While Not Gedcom.EOF Dim Line As String = ConvertEncoding(Trim(Gedcom.ReadLine), Encodings.UTF8) Dim Parser As New ParseLine // parse Line Parser.Parse(Line) LB.AddRow(Parser.Level, Parser.Tag, Parser.XReference, Parser.Value) Wend End If End Sub

Bei 70 kb Dateien sollte das Parsen ziemlich schnell gehen. Das Xojo-Projekt enhlt leider keinen Code. Daher wrde ich sagen: Profiler raus und testen. Oder poste das Projekt mit Code :slight_smile:

Dropbox wollte erst beim zweiten Mal. Dann hat es auch mit Xojo-Projekt und ged-File geklappt.

Du hast früher VBA gemacht, richtig? Collection ist uralt. byref braucht man in Xojo eigentlich nie. Auf den ersten Blick würde ich sagen: Lade Dein Text-File, stopfe das alles in ein Array mit Split bei EndOfLine. Dann arbeite das Zeile für Zeile durch.

Ok, here is the Xojo-Project File…

The Project of the first Post is the old one. This ist the actual Strategy:

Xojo Gedcom-Import

Christmas Push :smiley:

Really, no one can help?

Hier ist der “böse Bube”:

Dim Line As String = ConvertEncoding(Trim(Gedcom.ReadLine), Encodings.UTF8)

  1. Alles auf einmal einlesen. Es sei denn, die Dateien werden richtig groß (>100mb oder so).
  2. Die Daten splitten in ein Array.
  3. Dann den Parser drauf loslassen.

[quote=155113:@Beatrix Willius]Hier ist der “böse Bube”:

Dim Line As String = ConvertEncoding(Trim(Gedcom.ReadLine), Encodings.UTF8)

  1. Alles auf einmal einlesen. Es sei denn, die Dateien werden richtig groß (>100mb oder so).
  2. Die Daten splitten in ein Array.
  3. Dann den Parser drauf loslassen.[/quote]
    I am not sure. If a read all via Textinputstream.ReadAll i have to put it to an Collection. Ist ReadAll faster then Read and parse every line? Think about, i have to parse every line all the Time. So if i do it directly At the Import “ReadLine” is the same like parsing the line After i put it to a Collection…

Warum willst Du denn diese Collections verwenden? So mal ohne Xojo:

dim GedcomData as string = GedcomFile.readAll dim GedComArray(-1) as string = Split(GedcomData, EndofLine) for currentLine as integer = 0 to ubound(GedComArray) dim theLine as string = GedcomArray(currentLine) 'und nun parse fröhlich dahin mit der Zeile von Daten next

ReadLine ist wesentlich langsamer als ReadAll.

HTH

[quote=155130:@Beatrix Willius]Warum willst Du denn diese Collections verwenden? So mal ohne Xojo:

dim GedcomData as string = GedcomFile.readAll dim GedComArray(-1) as string = Split(GedcomData, EndofLine) for currentLine as integer = 0 to ubound(GedComArray) dim theLine as string = GedcomArray(currentLine) 'und nun parse fröhlich dahin mit der Zeile von Daten next

ReadLine ist wesentlich langsamer als ReadAll.

HTH[/quote]
Thank you. I took your Code, but i can’t see a difference. The freeze again, if i load the Gedcom Stresstestfile from Post One (only 4 Megabyte).

How can i Show the Progress of the Reading with a Progressbar?

Das ist ein anderes Problem. Wie kennst Du Dich mit Threads und Timern aus? Bei den Xojo-Beispielen schau unter Desktop/Updating UI from Thread nach.

Hello everyone,

i worked at my Gedcom-Import Routine. Now i stopped, because i can’t read alternative Names. Each Individual-Record has name(s). If there will be more like one, the Import-Routine should add the new name as alternative Name…i only get the first name. The Parser checks, if the Individual.Name <> Nil. If yes he creates a new SubObj (Personal_Name_Structure). It’s important to get it work, because later i will parse Life-Events…and there will be more like on…

Please have a look, i waiting for your Feedback. Thank you all for your work and have a nice weekend.

Martin - Were you ever able to pick this back up and get it working? I am working on my own genealogy program and was planning to write my own Gedcom parser, but if you have something you’ve already worked through, I’d be interested to know more about it.

The recent genealogy related topics prompted me to have a look at the referenced Wiki article. It’s kind of an ugly format, but it may not be too difficult to preprocess the file into xml format. Also, Googling “GEDCOM to XML” brings up quite a few hits, and it appears there may be a movement afoot to adopt XML as a modern replacement for GEDCOM. So I would be inclined to write a GEDCOM to XML preprocessor rather than try to parse GEDCOM directly. That way, you could make use of Xojo’s xml processing tools.

Thinking about this over the past few days (when I should have been doing my real work) I decided to throw together a simple GEDCOM to XML converter project.
Link: GEDCOMtoXML
I’ve tested it with both of the GEDCOM data files that Martin has posted, and it shows no errors with an XML validator. For additional info and caveats, see the notes in the project file.

I am working on a new genealogy program (last one I wrote was in 1983!, on an old CP/M based machine). From what I have seen of GEDCOM, both import and export routines are dependant on the structure of the underlying database.

And GEDCOM is now quite antiquated, but still a standard… There are somethings in my database GEDCOM simply doesn’t support,