Filter out non-alphanumeric characters from text

I been using the following code to filter out illegal characters from a textfield for a valid Windows path. However, now I have a need for only alphanumeric characters (0-9, A-Z, a-z) to pass. Is there any easy way add all non-alphanumeric characters to this array, as opposed to typing each one in manually? Adding the few illegal characters below was easy for a Windows path, but I can’t even image putting every non-alphanumeric in one by one!

dim INVALIDCHAR() as String

INVALIDCHAR=Array(",", "?", "/", "\", ":", ">", "<", "*", "|", """")

for each NAME as string in INVALIDCHAR
  
  
  if TEXTFIELD1.Text.InStr(NAME) <> 0 then
    MsgBox "Entry can only contain alphanumeric characters."
    return
  end
next 

Invert your test, thus having a list of valid characters and test for those.

I tried that. Couldn’t get it to work.

RegEx is your friend here. If you want to include the underscore character as well, then the regex library used by Xojo support \w to mean that set and \W to mean NOT in that set. So just test your filename against \W and if there is a match, then issue your MsgBox. If you don’t want to include the underscore, then test against [^A-Za-z0-9] or whatever group of characters you prefer.

dim rx as new RegEx
rx.SearchPattern = "(?Ui-ms)[^a-zA-Z0-9]"
rx.ReplacementPattern = "_"

dim rxOptions as RegExOptions = rx.Options
rxOptions.LineEndType = 4
rxOptions.ReplaceAllMatches = true

dim replacedText as string = rx.Replace( sourceText )

Created using RegExRX by MacTechnologies Consulting.

1 Like

He wants to know, not make a replacement. That said, RegEx is the way.

2 Likes

He could compare the original and the filtered String. :smiley:

I’m not seeing how to use Regex for what I’m wanting.

I’ve seen character ranges used for KeyDown to do this. Is there a way to just add character ranges in the array?

var rx as new RegEx
rx.SearchPattern = "[^A-Z\d]"

if rx.Search( sourceText ) isa RegExMatch then
  // Contains bad characters
end if

Note that Xojo’s RegEx is case-insensitive by default.

1 Like

Better just use .Search() instead of .Replace()

Edit: I was holding on my response because Kem was writing, then he got silent and I published. Then his answer appeared. :laughing:

2 Likes

Don’t I need to add 0-9 in there somewhere?

That’s what \d means.

If you wanted to allow an underscore too, you could replace the entire pattern with \W, which means, “not a word character (a-z, 0-9, _)”.

Gotcha! I’ll try it out!

I guess a-z is missing from the requirements.

Unless you manually change options to case-sensitive, it’s unnecessary. But it doesn’t hurt or change anything to add it if you wish.

1 Like

Oh, such behavior breaks the default RegEx expectations.

It needs a clear WARNING in the Xojo RegEx manual page right in the description.

Turns out I can use underscores in my string, so \W works great!

Thanks Kem!

2 Likes