Help With RegEx

I’m working on a program that reads and parses a text file formatted in ADIF (Amateur Data Interchange Format). The specifics of the format are actually quite simple and consist of various “Tags” and related data.

Each “Tag” is contained within a set of ‘<’ and ‘>’ and also contains an indication of the length of the data string following the tag. A special Tag “” marks the end of an individual data record. Records can contain several tags or just a few and are not case sensitive.

A sample Tag would be “Tag1:9Tag1 Data” where “Tag1” is he Tag, “9” is the number of characters in the Data, and “Tag1 Data” is the data for Tag1.

Here is a sample of two records in a file:

call:4KK3P country:13United States dxcc:3291 freq:87.015000

call:5W2GTX distance:7370.646 gridsquare:6FN12qt <rst_rcvd:2>59 <rst_sent:2>59 <qso_date:8>20150809

It seems to me that RegEx would be a convenient way to parse the records. First separating the records using the tag, and then using another RegEx to separate each Tag within each record to identify the Tag and the related data. A Tag would be a ‘<’ followed by any number of characters followed by a “:” followed by any set of numbers followed by a ‘>’.

I don’t expect anyone to spend much time on the solution, but I am having a hard time getting started from the Xojo documentation on RegEx so am looking for any pointers or examples that might help.

Thanks in advance.

Ron Bower

Get Kems RegExRX from the apple store
It might even have a built in sample that does most of what you want

Thanks for the quick reply, Norman. I should have mentioned that I’m developing in a Window environment.

I’m no good at regex, but I would use nthField instead.
First make an array by looking for
Then separate to a new array by looking for <
Then I would pass the tags and get the data by the length

I would of course make classes for record and records to make it more OOP

hope it is of any help, though it wasn’t the answer you where looking for

He has a windows version to
Its written is Xojo so … :stuck_out_tongue:

Thanks, Ask Greiffenberg, I’ll look into nthField a bit.

Okay, Norman, I’ll take a look.

+1 on RegExRX. Indispensable if you are using RegEx.

If you get stumped and need a complete solution quickly I am around today and currently looking for work :wink:

Thanks for the offer, Tim - currently looking at RegEx tutorials on the web.

Be aware that the tag can contain a data type identifier. For example:

<aTime:6:T>HHMMSS<nextTag...

Yes, Eli, it certainly can. I ASSUME that once I get a better understanding of Regular Expressions, that I’ll be able to handle all the varieties that can come along.

First, thanks to all for the kind words re. RegExRX.

I would use RegEx in a limited way, just to identify the tags. Why? Because the tags themselves contain data you need (the length) and a regex pattern can’t make use of that.

So I might do something like this:

// src will contain the entire data
dim rx as new RegEx
rx.SearchPattern = "<([^:]*):(\\d+)[^>]*>"

dim dict as new Dictionary

dim match as RegExMatch = rx.Search( src )
while match isa RegExMatch
  src = src.LTrim
  if src.Left( 5 ) = "<EOR>" then
    exit while
  end if

  dim tag as string = match.SubExpressionString( 1 )
  dim length as integer = val( match.SubExpressionString( 2 ) )
  dim matchLen as integer = match.SubExpressionString( 0 ).Len
  dim data as string = src.Mid( matchLen + 1, length )
  dict.Value( tag ) = data

  src = src.Mid( matchLen + length + 1 )
  match = rx.Search( src )
wend

I did this off the top of my head (IOW, not tested) and, with refinement, I’d find a way to eliminate the src = src.Mid… line as that will take a lot of time, but you get the idea.

This code will safely read one record. Even if a tag is written like <tag:5><eor>…, it will still work, something that splitting on “” will not.

Wow - very impressive, Kem !!!

I will be studying the code for a while and may nbe back with a question or two, but it sure looks like it gives me a good start.

Very appreciative of the help !