Regex helps

Hi,

I am trying to get the result in a listbox with this code,

dim rowx as integer
for rowx=Listbox1.ListCount -1 DownTo 0

dim rx2 as new RegEx
rx2.SearchPattern = "^.*\b(000000)\b.*$"
dim match2 as RegExMatch = rx2.Search((Listbox1.Cell(rowx,0)))

if match2 is nil then // Delete rows with result '000000
else
Listbox1.RemoveRow rowx
end if

next

dim row as integer
for row=Listbox1.ListCount -1 DownTo 0

dim rx1 as new RegEx
rx1.SearchPattern = " ^[0-9]"
dim match1 as RegExMatch = rx1.Search((Listbox1.Cell(row,0)))

if match1 is nil then // remove all rows and leave the rows if the first character is number betweeen 0 to 9
Listbox1.RemoveRow row
end if

next

and the listbox data,

00050
AB001
00024
01254
CD00
98254
88954
00000
10001
56000

how to get the result like,
00050
00024
01254
98254
88954
00000
10001
56000

any helps,

thanks
Arief

I am not sure EXACTLY what you are after here, but it would seem that you want to delete any entry that has 000000 (six zeros).

My impression, for one thing, is that the ^.* at the beginning and the .*$ at the end have not value at all. To me, your expression is, in meaning, identical to β€œ\b(000000)\b”.

The β€œ(” and the β€œ)” have no particular use as best I can tell. They allow you to β€œcapture” the content within for use elsewhere but in your example, it seems to serve no purpose. So perhaps those parenthesis are just fluff.

So now we are down to β€œ\b000000\b”. It is unclear what the purpose of the \b is. Those mark the end and beginnings of words. But your example has nothing to suggest that is important.

In any case, your first construction will not match (and thus exclude in your use case) A000000 or cat000000 or 000000dog. It will exclude the 000000 only if it is considered a β€œword” ie it is at the start or end of a line or its boundaries are non-word characters (like space or period or semicolon). It will match 000000.cat or 000000 cat or A 000000. It is not clear from your example that this is what your are after. It will also not match 0000000 (seven zeros) so those will be considered OK in your construction.

I would have guessed that you just want β€œ000000” and exclude rows that met that simple criteria: that is six zeros in a row ANYWHERE.


Your second construction " ^[0-9]" is not meaningful because of the space before the ^ (caret) character. ^ is supposed to mean the beginning of the line. I presume you mean β€œ^[0-9]” which means in English any line that starts with a digit will match. In that way your code will remove anything that starts with a non-digit.

AB001
CD00
will be removed.

0A01 will survive

Try something like this:

var rx as new RegEx
rx.SearchPattern = "0{6}|^\D"

for row as integer = Listbox1.LastRowIndex downto 0
  if rx.Search( Listbox1.Cell( row, 0 ) ) isa RegExMatch then
    Listbox1.RemoveRow row
  end if
next

(Untested.)

Kem is a RegEx expert, but to expand on his missive:

β€œ0{6}|^\D”

means that matches will occur if six zeros appear anywhere OR (that is the meaning of | {pipe} character) the line starts with a non-digit.

Those will match as β€œbad” in your context and thus get removed in Kem’s code.

1 Like

Wherein we learn that Robert is way more awake at this hour than I am.

2 Likes

In the first code, yes I want to remove the rows with six digit zero (β€˜000000’), I am forgot to include in the data.
the result was six digit and five digit. I dont need the six digit, then I want to remove it.

My Goals is I want to keep the rows, which is five digit only, and the first digit is a number

I ever use this pattern rx1.SearchPattern = β€œ^0\d*$”,
its works as long as the first digit is a zero, but when the first digit not zero the rows are removed.
how to avoid this using regex.

thanks
arief

There are some problems with the preciseness of your English here.

You might mean:

I want to only keep rows that are 5 characters long and the first character also has to be a digit.

In precise English all digits are numbers. All characters are not numbers. However all digits are characters. That is to say: A 1 4 B are all characters. 1 5 3 are all digits.

So if

0A123
01234

are OK because they are 5 characters long and they start with a digit (i.e. a number)

And then

A0000
B1234

are not OK because, while they are 5 characters long, they do not start with a digit.

and then

000000
1234567
0123

are not OK because they are not 5 characters long even though they start with a digit.

Your construction

β€œ^0\d*$” means that it has to start with a zero and all subsequent characters have to be digits and there can be 0 or 1 or 3 or 50 of such extra digit characters.

0
01234
0123456545345345345
000000

would all match.

0A0043
12345

would not match


β€œ^\d\d*$” would mean that it has to start with a digit (any digit – 0 1 2 3 4 5 6 7 8 9) and then it is followed by any number of digits

0
5
12345723054993452345
000000
8234321345
would all qualify

0A2345
would not qualify because it contains a non-digit.


Perhaps you mean

^\d{5}$ would mean that it consists only of digits and there have to be 5 of them

12345
00000
34000

would qualify.

0A000
000000
123456

would not qualify


So I still think that it is not EXACTLY clear what is and is not acceptable in your mind. You might expand your list of things that are acceptable and your list of things that are not acceptable so you can be better understood.

1 Like

Based on this, try:

var rx as new RegEx
rx.SearchPattern = "^\d[\dA-Z]{4}$" // 5 chars that start with a digit

for row as integer = Listbox1.LastRowIndex downto 0
  if rx.Search( Listbox1.Cell( row, 0 ) ) is nil then
    Listbox1.RemoveRow row
  end if
next
1 Like

Its works now.

Thanks for all the helps

regards,
arief

Forum for Xojo Programming Language and IDE. Copyright Β© 2021 Xojo, Inc.