PGN chess file reader in Xojo

Hi,
has anybody created a plugin or something with chess-related functions? In particular I am interested in a parser for PGN chess files. There are plenty of them in other languages, so I wonder if anybody attempted to write one in Xojo.

Thanks,
Davide

1 Like

Parsing that format seems pretty simple.

For a description of the format, see:

I was thinking about writing it by myself, but I have a couple of doubts:

  • how do I efficiently read char by char a large text file?
  • how do I efficiently traverse a tree? (I should create one to take into account multiple variations)

Thanks

How large is large for you?

Reading files character by character is slow always. If your file is less than 50 - 100 MB read it in one read operation.

Do a Goggle on tree traversal. As far as I remember there are some articles in XDev.

About the char by char I did not want to mean that I want to read char by char from the text file. My idea was to read N bytes at time, put them in a MemoryBlock and access it byte by byte (alas char by char). What is not clear to me is: should I convert each byte to string? should I care about big/little endian for cross-platform compatibility? what about encoding? how to handles bytes which belong to two different blocks (given that I read N bytes at time)? should I “consume” the read bytes?

Thanks.

That standard calls for ASCII files, so encoding will not be an issue. You can put the whole thing in a MemoryBlock and go byte-by-byte if you want, or use a regular expression to pull out each new move.

FYI, big/little endian only applies to integers or doubles, never strings. Strings are always left to right.

“Usually, the PGN contains one entire game, but it can also record just a fraction of a game.”

Thats not likely to be an enormous file.
Its not a very robust file format… comments can occur anywhere and have a number of indicators

Moves are recorded in algebraic notation, and comments can be inserted after a “;” symbol or between parentheses or curly brackets.

Its probably quite old… JSON or XML would meet these needs so much better by keeping data types separate… comments would be an optional attribute of a move, for example.

Trying to brute force this file needs you to look for successive numbers followed by a dot, to indicate move groupings.

etc

Inside which you get 2 algebraic notations, which MIGHT be side by side.
Rd1 Qe6

or MIGHT have a comment in the middle

Rd1 {Woah… this is a bad move!} Qe6

or at the end

Rd1 Qe6 ; Another comment.

Since the comment could contain numbers, dots, or semicolons… yikes…

So while it is certainly possible to read the file from left to right, careless comments could easily cause you problems.

I think this (simple :smiley: ) regular expression should do it:

(?x)
(?-i)
(?(DEFINE)
  (?<PC>[RBNKQ])
  (?<GRID>[a-h])
  (?<SQ>(?&GRID)[1-8])
  (?<CSTL>O-O(?:-O)?)
  (?<PCMV>(?&PC)?(?&GRID)?x?(?&PC)?(?&SQ)[+#]?)
  (?<MV>(?&CSTL)|(?&PCMV))
  (?<CMT>
    \( [^)]* \) |
    \{ [^}]* \} |
    ; .*

  )
)

\b(\d+)             # Move number
\. \s*              # Dot with optional whitespace
((?&CMT))?          # Optional comment
((?&MV))            # First move
\s*                 # Optional whitespace
(?:((?&CMT)) \s*)?  # Optional comment
(?:                 # Optional second move
  ((?&MV))
  \s*
  ((?&CMT))?        # Optional comment
)?
7 Likes

As always, I am in awe of Kem’s regex skills.

5 Likes

Thank you Kem for the time you spent to put together this regex. I will test it. Thank you again

Hi Kem,

I tried your regex with the following file (a valid pgn file with just one game), using the code below, but it does not work. Am I missing something?

Thank you,
Davide.

Code:

Var f As FolderItem = FolderItem.ShowOpenFileDialog("PGN")
If f <> Nil Then
  If f.Exists Then
    Var t As TextInputStream
    Try
      t = TextInputStream.Open(f)
      t.Encoding = Encodings.MacRoman
      Var pgnData As String = t.ReadAll
      Var rg As New RegEx
      Var myMatch As RegExMatch
      rg.SearchPattern = "(?x)(?-i)(?(DEFINE)(?<PC>[RBNKQ])(?<GRID>[a-h])(?<SQ>(?&GRID)[1-8])(?<CSTL>O-O(?:-O)?)(?<PCMV>(?&PC)?(?&GRID)?x?(?&PC)?(?&SQ)[+#]?)(?<MV>(?&CSTL)|(?&PCMV))(?<CMT>\( [^)]* \) |\{ [^}]* \} |; .*))\b(\d+) \. \s*((?&CMT))?((?&MV))\s*(?:((?&CMT)) \s*)?(?:((?&MV))\s*((?&CMT))?)?"
      myMatch = rg.Search(pgnData)
      If myMatch <> Nil Then
        MessageBox(myMatch.SubExpressionString(0))
      Else
        MessageBox("text not found")
      End If
    Exception err As RegExException
      MessageBox(err.Message)
    Catch e As IOException
      MessageBox("Error accessing file.")
    End Try
    t.Close
  End If
End If

PGN File content:

[Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "Player A"]
[Black "Player B"]
[Result "*"]
[EventDate "????.??.??"]
[PlyCount "39"]

1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.Ng5 h6 $4 {A blunder} ( {Was better} 4...d5 {with
equality} 5.exd5 ( 5.Bb5 a6 6.Bxc6+ bxc6 ) 5...Nxd5 6.Nxf7 Kxf7 7.Qf3+ ) 5.Nxf7 
Qe7 6.Nxh8 Nd8 7.Nc3 g5 8.Nd5 Nxd5 9.Qh5+ Qf7 10.Nxf7 Nxf7 11.Bxd5 d6 12.Qxf7+ 
Kd8 13.Qxf8+ Kd7 14.Qf7+ Kd8 15.d4 exd4 16.Qf8+ Kd7 17.Qxh6 Kd8 18.Qf8+ Kd7 
19.Bxg5 c6 20.Qe7# *

This data has a “$” and I don’t see that in the spec on Wikipedia. Do you have documentation on that?

I found it here:

http://www.saremba.de/chessgml/standards/pgn/pgn-complete.htm

It’s an additional annotation, and may appear, I guess, anywhere, just like a comment. Unfortunately, I don’t have time to update the pattern to account for this, or other things I missed (like “=”), put perhaps you want to tackle it.

FYI, you could have used my pattern as-is through a class or module constant rather than stripping the linefeeds and comments. It makes it easier to read and modify.

Sure I will try myself to fix. And thanks for the hint about the use of constant for the search pattern.

Ah RegEx, the one true write-only language. You can make it do anything you want. You just can’t figure out what the hell is going on once you have :slight_smile: :stuck_out_tongue_winking_eye:

1 Like

So is chess, and the millions of people using this file format are unlikely to be enamored with a Xojo-style forced format change :stuck_out_tongue:

1 Like

A quick question: when executing a regex does Xojo only returns the first occurrence? If so, how to get the others in an effective way?

Thanks

Were the docs always that thin? I can’t find an example.

You need to loop through the matches:

dim theRegex as new RegEx
theRegex.options.greedy = False
theRegex.Options.DotMatchAll = True
theRegex.searchPattern = EscapeRegex(Boundary) + "(.+)(\n\n|\r\r)"
dim theRegexMatch as RegExMatch = theRegex.search(MessageRawData)

dim theText(-1) as String
dim Starts(-1) as Integer
dim theStart as Integer
while theRegexMatch <> nil
  if theRegexMatch.SubExpressionCount < 2 then Continue
  theStart = theRegexMatch.subExpressionStartB(0) + theRegexMatch.subExpressionString(0).Length
  Starts.add(theStart)
  theText.Add theRegexMatch.SubExpressionString(1)
  theRegex.SearchStartPosition = theStart
  theRegexMatch = theRegex.search()
Wend

this website will help you understand what’s doing the regex you’re trying.

@Kem_Tekinay : wouldn’t it be a great feature to add to regexrx ?

Thank you Beatrix, I will take a look at it. I agree that the documentation should be improved…