RegEx mystery


I have this RegEx code:

[code] Dim r As RegEx
Dim s As String
r = New RegEx
r.SearchPattern = “^(.*) ([^ ]+)$”
r.ReplacementPattern = “\2, \1”
r.Options.ReplaceAllMatches = True
s = r.Replace(entrantName)

Return s[/code]

I use it to convert a name from “John Smith” to “Smith, John”

This regex

^(.*) ([^ ]+)$

In BBEdit does what I want it to. I thought that that should work in Xojo, but it doesn’t, it leaves the string unchanged.

The name comes from a database.


Your code, copied exactly from above, but with ‘entrantname’ replaced with “John Smith”, works just fine in Real Studio 2012 r2.1, so either something’s wrong with your input (perhaps the space isn’t a space?) or something’s broken in Xojo. I don’t have the latest version of Xojo to be able to test on, though.

Thanks for checking Hamish.

The name is passed to the Method from a database SELECT, and there are definitely spaces in all the correct places. I am using Xojo 2014 r2. I will go back and try it with an earlier version.

Have you tried passing the name “John Smith” which you just type in?

Hello Hamish,
if you like, you can download the latest Xojo version for free and can use it parallel to your current version of Xojo/Realbasic, without any problem, to test things in the new Xojo environment. The only thing you can’t do is build apps. Although you shouldn’t open your current Realbasic projects with the new version until you want to convert them.

I believe you no this, but it could help to try new things in the new IDE.

Hi Torsten,
Yep, I know this; I’ve not done that because at present we need our app built for Snow Leopard, so we have to stay with RS. I’d love to have the time to play with Xojo, but we’re so busy with our own work that that’s just a forlorn hope at the moment…

I have tried passing “John Smith” to the method and it worked!

I am actually selecting the name from a ListBox cell (I forgot that, but it got into the ListBox cell from a database).

The code for that is:

currentEntrant = CleanEntrantName(CatalogueWin.catEditorList.Cell(i,1))

So actually, the spaces should be spaces, unless the ListBox changes them to something else.

I even tried putting \s where the spaces are, to no avail; except when I actually supplied a string of a name.

It’s actually more frustrating than I thought, as it is actually working, but not from my ListBox.

So what does CleanEntrantName do? My hunch is that the thing which looks like a space isn’t a space. Try taking that method out, and seeing if that helps. You could also try making a listbox into which you load the text “John Smith” manually, so not from the database, to see if that helps.

All CleanEntrantName does is that RegEx function. I called it that because I was thinking I had to do some other cleaning up of the name, but as it comes from a database I have no need to. I will try your other suggestion about manually loading “John Smith” into a ListBox.

Oh No! It gets worse! Creating a ListBox and manually putting “John Smith” in it worked. So what are the “spaces” in the names in my other ListBox that have been supplied by an SQLite database? I guess that is now my question.

Perhaps it has something to to with encoding.

That would be my guess too. What about making the last line of your method this:
s = r.Replace(ConvertEncoding(entrantName, Encodings.UTF8))

Or you could try making the regex
r.SearchPattern = “^(.*)\s([^ ]+)$”
because \s will detect any space.

Thanks Torsten and Hamish, but no luck. I was in the process of replying that I had made the Cell Editable and copied the space into BBEdit, and it reported it as being HEX 20, so I am not sure but that seems OK.

Hamish I tried your “s = r.Replace(ConvertEncoding(entrantName, Encodings.UTF8))” and that didn’t work either.

I already have changed the spaces to \s, but I also changed the second one after the ([^ but I will put that back to a space.

OK. Next step: in the CleanEntrantName method, loop through entrantName, character by character, and check out what it’s being passed - look at its ascii value.

Something like

dim aString as string
for i as integer = 1 to len(entrantName)
aString = aString + cStr(asc(mid(entrantName, i, 1))) + " "

I’ve just tried giving it John Smith and the value returned is “74 111 104 110 32 83 109 105 116 104”, so check that against your input. In particular note the fifth one, which is 32 - space.

Wow, I was just replying that “I give up” when your post arrived. I did what you suggested and the result is:

“65 108 105 99 105 97 32 67 117 108 108 105 110 103 32 74 110 114 32 32”

The actual name from the ListBox. It seems the only odd thing is the two spaces at the end.

The name is Alicia Culling Jnr

The Jnr is another problem I am going to face later.

OK, so pop entrantName in a trim() to get rid of those trailing spaces. The expected result will be Jnr, Alicia Culling…

Thanks so much for your time, and patience Hamish.

That was the problem, those two little spaces at the end. I mean, I totally understand why, but it is little things like that that trip me up. I have learnt a lot by your trying all the different steps, and finally nailing the problem.

No problem at all. It’s what coding’s about - first working out how to do it, then the myriad exciting ways in which it can break. Someone once told me that the first 90% of the work is the grunt work in making things happen, the second 90% is trapping the bugs, and the third 90% is polishing. Nothing is ever simple! :slight_smile:

Cliff, the problem could have been resolved entirely in the pattern. You should avoid .* whenever possible, and also take care not to make the pattern overly specific.

In this case, this pattern would have done the job for you without needing to trim:

(\\w+) +(\\w+)

If you were looking to avoid modifying entries with more than two words in them, e.g., “John G. Jones”, then this pattern:

^ *(\\w+) +(\\w+) *$

Finally, if some entries might have numbers and you want to avoid those too, replace \w with [a-z]:

^ *([a-z]+) +([a-z]+) *$

The dot token should only be used when you really, really don’t care what it matches.