RegEx has me beat, I’ve read docs, read tutorials and can’t get a damned thing to work!
I’m trying to use it to clean up text and importantly remove non-alpha numerics from a string, but all it does is remove a single space.
[code]Dim reg as new RegEx
reg.searchPattern = “[^a-zA-Z0-9]”
reg.replacementPattern = “”
Dim source as string = “wiou fhbv6H98E61-5L!@&L$!@O ^?W""><{O)_R)&&L!_”“D”
Dim result as string = reg.replace( source )
msgBox source + endOfLine + result[/code]
[code] Dim reg as new RegEx
reg.searchPattern = “[^a-zA-Z0-9]”
reg.replacementPattern = “”
reg.Options.ReplaceAllMatches = True
Dim source as string = “wiou fhbv6H98E61-5L!@&L$!@O ^?W""><{O)_R)&&L!_”“D”
Dim result as string = reg.replace( source )
msgBox source + endOfLine + result[/code]
Huh! That simple, I think I just won moron of the week award!
Thanks Syed.
Sam, a few things about that pattern.
- It will remove spaces. I assume that’s what you want?
- The RegEx class is case INsensitive unless you change the option, so while a-zA-Z is not wrong, it is redundant.
- That pattern will remove accented characters like ü or é. If that’s not what you want, try this instead:
[^\\pL\\pN]
That uses the Unicode properties of each character to determine if it’s a letter or number, and it will only work in newer versions of Xojo or the latest MBS plugins.
Thanks Kem,
I basically want to strip it down to either numbers or letter. My original plan was to remove all non english characters as well, but now you mention it, I may have to rethink it.
It’s for renaming files so that they’re safe to post to online.
Perhaps ConvertEncoding to ASCII first? That will replace things like ù with just u. Then you should just be able to ReplaceAll( s, “?”, “-” ) (or something).
Or run it through your RegEx after ConvertEncoding if you want to be sure to remove Windows-illegal characters.
Thanks so much Kem for the extra advice… Pretty darn cool what can be done now!.
“mj ? ? Mac aplicacin” = “mojMacAplicacion”
Which is what I originally wanted, but with the preserving accents option, it comes out “mj?MacAplicacin”!
Can I use subsequent RegEx to then remove sequential occurrences of a string? For instance, if the replacement character for the above string is “_”, it then reads “moj_____Mac_Aplicacion”, which to me looks wrong, so I’d like to replace the “_____” with just a single underscore.
I’ve got it… After much failure, it’s incredibly simple, that is removing multiple repeating characters.
searchPattern = “+"
replacementPattern = "”
Horay!
There is a way to “emulate” the Unicode Blocks?
Like \p{InLatin-1_Supplement}
You can remove squeeze any repeating character with:
rx.SearchPattern = "(.)\\g1+"
rx.ReplacementPattern = "$1"
If you only want to squeeze symbols (and you’ve already converted to ASCII), you can use a variation of your original pattern:
rx.SearchPattern = "([^a-z0-9])\\g1+"
rx.ReplacementPattern = "$1"
[quote=103406:@Antonio Rinaldi]There is a way to “emulate” the Unicode Blocks?
Like \p{InLatin-1_Supplement}[/quote]
Since the blocks are continuous, you can look for the range of code points (listed here, among other places).
In the example you mentioned, that’s range U+0080 through U+00FF. The pattern “\x{NNNN}” will let you specify a code point, and you can put the range within a character class, so:
rx.SearchPattern = "[\\x{80}-\\x{FF}]"
Odd how the link came in, and that it won’t let me edit it now. It’s this:
http://en.wikipedia.org/wiki/Unicode_block
Get in line Sam. No queue jumping allowed.