Extracting a Word out of a String for validation

My Xojo app is reading data from a connected serial device, and in this instance, for the purpose of extracting the model name (modelName).

  If Serial1.Open Then
    modelName = Serial1.ReadAll(encodings.ASCII)
  End If[/code]
[i]What I'm getting from the external device is:[/i]

[code]1
-0
-2
SRS40KG080

That’s fine. I don’t want to go into detail (if not required) of what’s happening with the code on the external device side of things.

What I do know, and consistently have, is the model name: SRS40KG080 in this case. There is some residual numeric data (what the heck is -0 minus zero?.. I have no idea), but it’s not important. Nevertheless, the model name is there.

Firstly, if the string contains “SRS”, then the model name is a valid device.
If so, then I also need to extract the whole model name to show in the UI when the device is connected.

eg. Device Connected: SRS40KG080

dim rx as new RegEx
rx.SearchPattern = "(?mi-Us)(SRS.+)"

dim rxOptions as RegExOptions = rx.Options
rxOptions.LineEndType = 4

dim match as RegExMatch = rx.Search( sourceText )
If match <> Nil Then
  Label1.Text = match.SubExpressionString(0)
End If

Thanks Jean-Yves,

Using RegEx is beyond my understanding. Is there another way that I might understand?

I’m not saying it’s wrong, and it’s likely the most efficient way, and I’m sure it works, but the only thing I can understand from the code is that the string is searched for a pattern that contains “SRS”.

I don’t understand anything else.

[quote=406999:@Steve Kelepouris]Using RegEx is beyond my understanding. Is there another way that I might understand?

I’m not saying it’s wrong, and it’s likely the most efficient way, and I’m sure it works, but the only thing I can understand from the code is that the string is searched for a pattern that contains “SRS”.

I don’t understand anything else.[/quote]

That‘s a rather defeatist attitude. How about reading up on it in the online help?

If a valid model number always starts with “SRS” and has a fixed number of characters, then I’d think it’d be pretty easy to find with INSTR and extract with MID.

[quote=406999:@Steve Kelepouris]Thanks Jean-Yves,

Using RegEx is beyond my understanding. Is there another way that I might understand?

I’m not saying it’s wrong, and it’s likely the most efficient way, and I’m sure it works, but the only thing I can understand from the code is that the string is searched for a pattern that contains “SRS”.

I don’t understand anything else.[/quote]
you just have to replace Label1.text with the string you want to have the “SRSxxxxx” inside
everything else is just copy-paste
try it it’s not that complicated.
(although regex CAN be very complicated… this one is simple)

Thanks Markus, I’m not being defeatist, just pragmatic and realistic. What I do know is that understanding RegEx is quite a task and some people on this forum are very adept at it, and a great tool to use if you understand it. But a man has to know his limitations - or how far he is prepared to go to achieve a result, or the same result with less effort.

I’m not saying that eventually I couldn’t get to understand it, I’m sure I could if I put my mind to it. But not at this point, it does not suit my purpose, I have other priorities - if that’s ok.

Thanks Julia,

I was just looking at that. It reminds me of very old FileMaker(5) patterncount, InStr and Mid functions. Great!!.. I now know I can get there with those tools. Thanks, I’ll look into it.

@Markus
Resourcefulness is also a powerful tool which is not written in any book or text. You either have it, or you don’t.

Also, thanks to Jean-Yves. I’m not disregarding what you posted, but just going down a different path.

Hopefully it’s all simple ASCII and you won’t get fouled up by encodings. The Byte versions, InStrB and MidB, are sometimes a better choice when dealing with external data.

If you do decide to go with a RegEx, I recommend this pattern instead:

(?-i)SRS[A-Z\\d]+

If the model name is always 10 characters, then this is even better:

(?-i)SRS[A-Z\\d]{7}

So you might consider it (because, in this case, it really is the easiest way to handle it, and it’s exactly what RegEx is designed for), here is what the tokens means:

(?-i)    a switch that means "not case-Insensitive", 
         i.e., case-sensitive
SRS      match this exactly
[ ... ]  start a character class, meaning match any 
         single character contained within the brackets
[A-Z]    because this is case-sensitive, match any 
         uppercase character
\\d       match any digit
+        match the preceding token (in this case, the 
         character class) one or more times
{7}      match the preceding token exactly seven times

If the model name itself always fits a particular pattern, you can refine this even further if you want to be confident that you are getting a true name. For example:

(?-i)SRS\\d{2}[A-Z]{2}\\d{3}

That means, case-sensitive, match “SRS”, then two digits, two letters, then three digits.

That is like saying unless you are good at Xojo there is no point in using it.

I think you misunderstand RegEx - a little RegEx goes a long way. I myself will never be an expert in RegEx, SQL, Perl, or the command line … or even Xojo for that matter … but it is very much worth it to know the basics as you can do a LOT with it.

The general rule of thumb is that you need 20% of the effort to master 80% of something … and you can do a LOT with 80%.

If you said it is not worth your investment in time and effort to get the last 20% then I’m with you … but otherwise no, I don’t understand your position at all as a little effort would yield big dividends.

Sure, you can use some tongs to hammer in a nail, and a hammer to get some screws into the wall …

Thanks Julia. It all looks promising and I’ll see how I go and ask back here if I get stuck.

@Kem Tekinay
Thanks Kem. I appreciate your post, you are the expert with RegEx. But if I was look into RegEx further, then that would be another distraction or path that I really don’t wish to go down.

I have my Xojo Application Software, I have my external hardware device (Arduino) which requires me learning a bit of C. I also have an actual “mechanical” hardware Load Cell contraption that requires some engineering skills.

I’m just trying to put this all together into a solution. That’s what I’m good at. I don’t need to know the intricacies of how all the parts work, but just how they work together.

Someone creates a wheel. They don’t know what to do with it. I come along and think, hmmmm "you know what guys, I reckon if we put 4 of these wheel thingies together we could make something of it. I’ll call it a cart.

Perhaps I’m the “Cart Maker”? :slight_smile: It doesn’t sound glamourous, but hope my point is understood.

Also, I’ll say that Xojo is a great tool that has allowed me to create/make some of my ideas into something tangible. :slight_smile:

This is the final code that has resolved the problem. I’ve used the InStr and Mid functions:

[code] Dim modelNumber As String
If Serial1.Open Then
modelNumber = Trim(Serial1.ReadAll(encodings.ASCII))
End If

Dim location As Integer
location = InStr(1, modelNumber, “SRS”)

If location <> 0 Then // It’s a valid SRS Device
SRSvalidModel = True
modelNumber = Mid(modelNumber, location, location + 13)
deviceName = modelNumber
btnConnect.Caption = deviceName // display the device/model number
Else
SRSvalidModel = False
modelNumber = “GENERIC”
deviceName = modelNumber
btnConnect.Caption = deviceName //display the device/model number
End If

PreferencesWRITE[/code]

This works perfectly to what I need. There’s always more than way to remove the dermal tissue from a feline - my apologies to cat lovers :slight_smile:

An important thing to mention, which threw me out for a bit, was the “location” number (ie. the beginning of where “SRS” was located). Looking at the example below, you could be forgiven for thinking that the location is 6. Well, no it’s not, It’s 12.

The reason is that there are line feeds/carriage returns that take up 2 characters.

1 -0 -2 SRS40KG080

Anyway, in my case it was more important to have the conditional construct working, than how I got there. I know my programming style is very verbose, but it’s important to me that I can can come back to this code later and understand what it means.

RegEx is overkill with this situation. With another application where I have to change things with a whole pile of text, then yes, sure, i’d be looking into RegEx.

I’ve settled on 13 characters as the fixed number for the model number (eg. SRSX040KG0080). I’m building the hardware device, therefore I can decide and make the rules.

@Kem
If you have the time, It may be worthwhile for me and others to take my above final code and make it more concise in any way you like - rip it to shreds. I’m not saying that I’m going to use it now, but definitely worth knowing about for future apps.

@Markus

No no no, on the contraire Markus.
It’s like saying “I AM good at using Xojo, but I don’t understand the finer points”. That doesn’t stop me from using Xojo. I just need to learn more. When It suits me :slight_smile:

Isn’t this a bug?

modelNumber = Mid(modelNumber, location, location + 13)

If, say, the location is set to 10, you will grab 23 characters, but if it’s set to 2, you will grab 15 characters.

I had to take a look on how Mid works. Yes the last value is optional and it is for length, so instead ‘location + 13’ it should be only ‘13’:

Maybe it is working because the modelNumber is always the last on the string?:

Edit: Steve, as I said, it can work perfectly but only if the modelNumber (eg. SRSX040KG0080) is the last thing in the source string.

No, there is no bug.

It makes complete and perfect sense to me and the code works perfectly.

[EDIT]
Ideas will always win regardless of the code :slight_smile:

this is a serial connexion. without checksum control.
if you have any glitches on the transmission, the location of the model number will not be at the right (expected) place
with regex it would be less problematic…

Steve, just trying to learn and help.

I create a sample from your code on a Button (Action), with TextArea1 and Label1:

[code]Dim modelNumber As String
'If Serial1.Open Then
modelNumber = Trim(TextArea1.Text)
'End If

Dim location As Integer
location = InStr(1, modelNumber, “SRS”)

Dim SRSvalidModel As Boolean
Dim deviceName As String
If location <> 0 Then // It’s a valid SRS Device
SRSvalidModel = True
modelNumber = Mid(modelNumber, location, location + 13)
deviceName = modelNumber
label1.Text = deviceName // display the device/model number
Else
SRSvalidModel = False
modelNumber = “GENERIC”
deviceName = modelNumber
label1.Text = deviceName //display the device/model number
End If
[/code]
In TextArea1 I put:

1 -0 -2 SRSX040KG0080 123 456 789

When I run this, label1 show:

SRSX040KG0080

Then I changed label1 Multiline to ON, now it shows:

SRSX040KG0080 123 456

If I change:

modelNumber = Mid(modelNumber, location, location + 13)

to

modelNumber = Mid(modelNumber, location, 13)

I always get:

SRSX040KG0080

no matter if Multiline is ON or OFF, because now the code only gives me 13 characters and not 13 + location. BTW location on my test was 9, that’s why with the original code I get 22 characters on my mac.

Hope this helps. It helped me learn.

The “location” is determined via InStr. I fail to understand the issue.

It is SOLVED.

There are NO Glitches anymore. Why would there be? This works PERFECT.

[EDIT]

Julia solved this very early on.

“Working correctly” is not the same as “there is no bug”. Both Albert and I explained the issue above, but let me try one last time.

The bug is in the parameters you are supplying to Mid. It expects start_position and length, but it looks like you are supplying start_position and end_position. Because Mid is tolerant, if the model number is the last thing in the string anyway, it will appear to work correctly. For example:

s = Mid( "abc", 3, 10000 ) // s will be set to "c"

In the future, if some string appears after the model number, it will stop working the way you expect.

The location is not the problem, is the use of Mid. You are using ‘location + 13’ on a length value. If you know that the length is always 13, then you could put 13 there and not ‘location + 13’.

But it is great that it works for you.

I read that [quote]I’m building the hardware device, therefore I can decide and make the rules[/quote]so that is an advantage and with that your code will always work. My guess is the 13 character SRS model is always the last thing on the string. You will only have issues is you get something like:

1023045SRSX040KG0080a3246

and you want to extract 13 characters starting with SRS. Using ‘location + 13’ for mid, will return SRSX040KG0080a3246, if you change that to ‘13’, you will get SRSX040KG0080.

But you will not get that because you are building the hardware device and you make the rules. And that’s great. I’m really happy that it works for you.

As I said, I wanted to learn more about Mid (after reading what Kem said) and I learned a lot. Sorry to bother you with my tests, but that’s the way I understand things.