Help With RegEx

Ron_Bower · August 16, 2015, 3:36pm

I’m working on a program that reads and parses a text file formatted in ADIF (Amateur Data Interchange Format). The specifics of the format are actually quite simple and consist of various “Tags” and related data.

Each “Tag” is contained within a set of ‘<’ and ‘>’ and also contains an indication of the length of the data string following the tag. A special Tag “” marks the end of an individual data record. Records can contain several tags or just a few and are not case sensitive.

A sample Tag would be “Tag1:9Tag1 Data” where “Tag1” is he Tag, “9” is the number of characters in the Data, and “Tag1 Data” is the data for Tag1.

Here is a sample of two records in a file:

call:4KK3P country:13United States dxcc:3291 freq:87.015000

call:5W2GTX distance:7370.646 gridsquare:6FN12qt <rst_rcvd:2>59 <rst_sent:2>59 <qso_date:8>20150809

It seems to me that RegEx would be a convenient way to parse the records. First separating the records using the tag, and then using another RegEx to separate each Tag within each record to identify the Tag and the related data. A Tag would be a ‘<’ followed by any number of characters followed by a “:” followed by any set of numbers followed by a ‘>’.

I don’t expect anyone to spend much time on the solution, but I am having a hard time getting started from the Xojo documentation on RegEx so am looking for any pointers or examples that might help.

Thanks in advance.

Ron Bower

Norman_P · August 16, 2015, 4:23pm

Get Kems RegExRX from the apple store
It might even have a built in sample that does most of what you want

Ron_Bower · August 16, 2015, 4:28pm

Thanks for the quick reply, Norman. I should have mentioned that I’m developing in a Window environment.

Ask_Greiffenberg · August 16, 2015, 4:29pm

I’m no good at regex, but I would use nthField instead.
First make an array by looking for
Then separate to a new array by looking for <
Then I would pass the tags and get the data by the length

I would of course make classes for record and records to make it more OOP

hope it is of any help, though it wasn’t the answer you where looking for

Norman_P · August 16, 2015, 4:30pm

He has a windows version to
Its written is Xojo so …

Ron_Bower · August 16, 2015, 4:34pm

Thanks, Ask Greiffenberg, I’ll look into nthField a bit.

Ron_Bower · August 16, 2015, 4:35pm

Okay, Norman, I’ll take a look.

Markus_Winter · August 16, 2015, 4:42pm

+1 on RegExRX. Indispensable if you are using RegEx.

Tim_Parnell · August 16, 2015, 4:44pm

If you get stumped and need a complete solution quickly I am around today and currently looking for work

Ron_Bower · August 16, 2015, 4:47pm

Thanks for the offer, Tim - currently looking at RegEx tutorials on the web.

Eli_Ott · August 16, 2015, 4:51pm

Be aware that the tag can contain a data type identifier. For example:

<aTime:6:T>HHMMSS<nextTag...

Ron_Bower · August 16, 2015, 4:54pm

Yes, Eli, it certainly can. I ASSUME that once I get a better understanding of Regular Expressions, that I’ll be able to handle all the varieties that can come along.

Kem_Tekinay · August 16, 2015, 5:13pm

First, thanks to all for the kind words re. RegExRX.

I would use RegEx in a limited way, just to identify the tags. Why? Because the tags themselves contain data you need (the length) and a regex pattern can’t make use of that.

So I might do something like this:

// src will contain the entire data
dim rx as new RegEx
rx.SearchPattern = "<([^:]*):(\\d+)[^>]*>"

dim dict as new Dictionary

dim match as RegExMatch = rx.Search( src )
while match isa RegExMatch
  src = src.LTrim
  if src.Left( 5 ) = "<EOR>" then
    exit while
  end if

  dim tag as string = match.SubExpressionString( 1 )
  dim length as integer = val( match.SubExpressionString( 2 ) )
  dim matchLen as integer = match.SubExpressionString( 0 ).Len
  dim data as string = src.Mid( matchLen + 1, length )
  dict.Value( tag ) = data

  src = src.Mid( matchLen + length + 1 )
  match = rx.Search( src )
wend

I did this off the top of my head (IOW, not tested) and, with refinement, I’d find a way to eliminate the src = src.Mid line as that will take a lot of time, but you get the idea.

This code will safely read one record. Even if a tag is written like <tag:5><eor>, it will still work, something that splitting on “” will not.

Ron_Bower · August 16, 2015, 5:54pm

Wow - very impressive, Kem !!!

I will be studying the code for a while and may nbe back with a question or two, but it sure looks like it gives me a good start.

Very appreciative of the help !