RegEx Help Needed

I’m wondering if someone can help me build a RegEx string that will give me an output in the format I want/need.

Here is how the data is going to come into the software from an external device:

 "\"07131134\"\"15036260281\"\"Biamp Systems\""

I could get what I wanted if I were to break this into groups and deal with all the sub-expression strings. However, what I would like to do is match specific data in just the SubExpression(0) string (ie: the whole match). Here is how I want the output to look:

15036260281 Biamp Systems

Is it possible to create a regex to give that output?

Thanks,

Jon

If you are a Windows person and you do enough RegEx work this might be helpful. It has a 30 day trial.

http://www.regexmagic.com/benefits.html

This may help also:

https://www.regular-expressions.info/regexmagic.html

You can also search for “RegEx Generator” and you will find other tools.

If this is a one shot request I am sure somebody here can help.

Pretty much a one shot. But I’ll look into those.

FYI, there is a Windows version of RegExRX too. :slight_smile:

The best you can do is match the entire string, then strip the escaped quotes, something like this:

dim rx as new RegEx
rx.SearchPattern = """((?:\\\""|[^""])*)"""

dim match as RegExMatch = rx.Search( myText )
while match isa object
  dim s as string = match.SubExpressionString( 1 ).ReplaceAll( "\""", """" )
  // Do something with s

  rx.Search
wend

Outside of Xojo escaping, the pattern looks like this:

"((?:\\\"|[^"])*)"
  • Match a quote
  • Inside the subgroup…
  • match every character this is ‘"’ or not a quote
  • Match the trailing quote

Unfortunately, I’m not wanting/able to match the whole string and then add the extra code. I’m trying to design something extremely flexible for parsing some incoming data. This caller ID string is the first customer application. Is there no way to match some parts of the string and not others. For example, I can do the phone number part with:

[0-9]{11}

But then I want to skip the next set of characters, \"\" and then match the text that comes at the end. Is there no way to do that without groupings and sub-expressions? I want a non-capturing group that isn’t part of the main output…Isn’t the (*SKIP) command supposed to do that, but I can’t make it work.

See I can do something like:

(?<=\\\"\\\")([0-9]+)\\\"\\\"(.*)\""

Which will give me the following for the input string;

15036260281\"\"Biamp Systems\""

With SubExpressions of:

15036260281
Biamp Systems

What I would love to do is add a positive look around for the \"\" characters and then include the matching text. But I don’t think I could add the space in. So perhaps I need to allow the user to do some Xojo scripting and get access to all the subexpressions they create so they can format the output however they want.

[quote=420015:@Jon Ogden]Here is how the data is going to come into the software from an external device:

""07131134"“15036260281"“Biamp Systems””
I could get what I wanted if I were to break this into groups and deal with all the sub-expression strings. However, what I would like to do is match specific data in just the SubExpression(0) string (ie: the whole match). Here is how I want the output to look:

15036260281 Biamp Systems
Is it possible to create a regex to give that output?[/quote]

Yes, it is actually a fairly easy RegEx pattern, but does require that you use Capture Groups (aka SubExpressions)
The RegEx pattern is:

"\\\".+?"\\\"(\\d+)\\\"\\\"(.+?)\\\\

Here’s the RegExRx file you can view and test:
https://www.dropbox.com/s/rk3jz8hsikx4dfo/Extract%20ID%20from%20Device%20Input%20%20Xojo.regexrx?dl=1

I do have one concern: The source string you provided looks like it may already be quote escaped.
Please double check that is is exactly the string you will receive to be processed.

[h]Code to Process RegEx[/h]

// You provided the source text as "\"07131134\"\"15036260281\"\"Biamp Systems\""
// This looks like it has already be quote escaped.
// The below escapes what you posted above.

dim sourceText As String = """\""07131134\""\""15036260281\""\""Biamp Systems\"""""
Dim resultStr as String = "TBD"

Var LF As String = Encodings.UTF8.Chr(10) 

dim rx as new RegEx
rx.SearchPattern = "(?m-Usi)""\\\"".+?""\\\""(\\d+)\\\""\\\""(.+?)\\\"

dim rxOptions as RegExOptions = rx.Options
rxOptions.StringBeginIsLineBegin = false
rxOptions.StringEndIsLineEnd = false
rxOptions.LineEndType = 4

dim match as RegExMatch = rx.Search( sourceText )

If (match isA RegExMatch) Then
  numSubExp = match.SubExpressionCount   // = fullMatch + Number of Capture Groups
  numCaptureGroups = numSubExp - 1
  
  If (numCaptureGroups = 2 ) Then
    resultStr = match.SubExpressionString(1) + " " + match.SubExpressionString(2)
  End If
End If


//--- CHANGE Control To The One You want to use ---
regexResultsTA.AddText(resultStr + LF)

//-->15036260281 Biamp Systems

It worked on my test project, running Xojo 19.3.1.47524 (19.3.1.3.47524) on macOS 10.14.6 (Mojave).

Please let us know if this works for you.

Hi Jim,

Thanks for your post. I wish I had it 18 moths ago! Here’s what I used as en expression.

(?<=\\\"\\\")(\\+[0-9]+)\\\"\\\"(.*)\\\""

And yes, the data coming in has plenty of the \ character.

Then further, I had an additional search/replace in a script that is this:

[code]Dim s as string

Dim SearchPattern as String = “(?mi-Us)(?:\+?\b1\D?)?(?:[([ ]?|\b)(?‘areacode’\d{3})(?:[])] |[]) .\-])?(?‘exchange’\d{3})[.\- ]?(?‘last4’\d{4})\b”

Dim ReplacementPattern as String = “($1) $2-$3”

dim replacedText as string = RegExReplace(SearchPattern, ReplacementPattern, Expressions(1))

If replacedText = “” Then
replacedText = Expressions(1)
End If

If ExpressionsUbound = 2 And Expressions(2) <> “” Then
s = "Call From: “+replacedText+” - "+Expressions(2)
Else
s = "Call From: "+replacedText
End If[/code]

It ends up working pretty well. I’ll have to take a further look at what you did.

Thanks again.

Jon, I’m glad you found a solution 18 months ago. I didn’t see a solution, and didn’t notice the date, so I made my post.

Hopefully, both of our posts will help others who have a similar need.