I’m working on a program that reads and parses a text file formatted in ADIF (Amateur Data Interchange Format). The specifics of the format are actually quite simple and consist of various “Tags” and related data.
Each “Tag” is contained within a set of ‘<’ and ‘>’ and also contains an indication of the length of the data string following the tag. A special Tag “” marks the end of an individual data record. Records can contain several tags or just a few and are not case sensitive.
A sample Tag would be “Tag1:9Tag1 Data” where “Tag1” is he Tag, “9” is the number of characters in the Data, and “Tag1 Data” is the data for Tag1.
It seems to me that RegEx would be a convenient way to parse the records. First separating the records using the tag, and then using another RegEx to separate each Tag within each record to identify the Tag and the related data. A Tag would be a ‘<’ followed by any number of characters followed by a “:” followed by any set of numbers followed by a ‘>’.
I don’t expect anyone to spend much time on the solution, but I am having a hard time getting started from the Xojo documentation on RegEx so am looking for any pointers or examples that might help.
I’m no good at regex, but I would use nthField instead.
First make an array by looking for
Then separate to a new array by looking for <
Then I would pass the tags and get the data by the length
I would of course make classes for record and records to make it more OOP
hope it is of any help, though it wasn’t the answer you where looking for
Yes, Eli, it certainly can. I ASSUME that once I get a better understanding of Regular Expressions, that I’ll be able to handle all the varieties that can come along.
First, thanks to all for the kind words re. RegExRX.
I would use RegEx in a limited way, just to identify the tags. Why? Because the tags themselves contain data you need (the length) and a regex pattern can’t make use of that.
So I might do something like this:
// src will contain the entire data
dim rx as new RegEx
rx.SearchPattern = "<([^:]*):(\\d+)[^>]*>"
dim dict as new Dictionary
dim match as RegExMatch = rx.Search( src )
while match isa RegExMatch
src = src.LTrim
if src.Left( 5 ) = "<EOR>" then
exit while
end if
dim tag as string = match.SubExpressionString( 1 )
dim length as integer = val( match.SubExpressionString( 2 ) )
dim matchLen as integer = match.SubExpressionString( 0 ).Len
dim data as string = src.Mid( matchLen + 1, length )
dict.Value( tag ) = data
src = src.Mid( matchLen + length + 1 )
match = rx.Search( src )
wend
I did this off the top of my head (IOW, not tested) and, with refinement, I’d find a way to eliminate the src = src.Mid line as that will take a lot of time, but you get the idea.
This code will safely read one record. Even if a tag is written like <tag:5><eor> , it will still work, something that splitting on “” will not.