Remove emojis with regex

My app has a fun bug when a user tries to write PDFs to a folder on Dropbox or BoxCryptor. As a workaround I’m going to remove emojis. Goggle tells me that

/[\\u{1F600}-\\u{1F6FF}]/

is the correct pattern to remove emojis. The Xojo regex complains that the syntax isn’t supported and MBS does nothing.

dim theRegex as new RegEx theRegex.options.ReplaceAllMatches = true theRegex.Options.Greedy = False theRegex.Options.DotMatchAll = True theRegex.SearchPattern = "/[\\u{1F600}-\\u{1F6FF}]/" theRegex.ReplacementPattern = "" Filename = theRegex.Replace(Filename) Return Filename

and

dim theRegex as new RegExMBS theRegex.CompileOptionDotAll = True theRegex.CompileOptionUngreedy = True if theRegex.Compile("/[\\u{1F600}-\\u{1F6FF}]/") and theRegex.Study then Filename = theRegex.ReplaceAll(Filename, "") end if Return Filename

What am I doing wrong, oh regex masters?

Have you tried x{XXXX} syntax? (ie: ‘x’ instead of ‘u’)?

Regards

you have emojis in Filenames ?

@Charlie Robin: doesn’t change the filenames.

@Markus Rauch: yes, unfortunately. I’m archiving emails and “modern” marketers like to send emails with emojis in the subject.

Beatrix, can you post a sample filename?

You may need a pattern like: 1F6[0-9A-F][0-9A-F]

Edit: well it looks like this is in hex and the UTF-8 code for 1F600 is F09F9880

There is something weird/wrong, as with one test RegEx app that pattern gives errors, with another (RegExRX) it works fine. (But not in Xojo.)

In the absence of a RegEx solution that (reliably) works, perhaps something like this … (The code below had some emojis in the test Text, but that seemed to break my post when I submitted it to the forum?)


var Filename as Text = "test  something else are we done yet?"

var NewFilename as Text = ""

for each codePoint as UInt32 in Filename.Codepoints()
  
  if codePoint >= &h1F600 and codePoint <= &h1F64F then
           // Emoticon
  elseif codePoint >= &h1F300 and codePoint <= &h1F5FF then
           // Misc Symbol / Pictograph
  elseif codePoint >= &h1F680 and codePoint <= &h1F6FF then
           // Transport / Map
  elseif codePoint >= &h2600 and codePoint <= &h26FF then
           // Misc symbol
  elseif codePoint >= &h2700 and codePoint <= &h27BF then
           // Dingbat
  elseif codePoint >= &hFE00 and codePoint <= &hFE0F then
           // Variation Selector
  elseif codePoint >= &h1F900 and codePoint <= &h1F9FF then
           // Supplemental Symbol / Pictograph
  elseif codePoint >= &h1F1E6 and codePoint <= &h1F1FF then
           // Flag
  else
           NewFilename = NewFilename + Text.FromUnicodeCodepoint( codePoint )
  end if
  
next

System.DebugLog( NewFilename )

Thanks, Charlie, that works great!

This should work too:

Dim theRegex As New RegEx theRegex.options.ReplaceAllMatches = True theRegex.Options.Greedy = False theRegex.Options.DotMatchAll = True theRegex.SearchPattern = "(\\%[0-9A-F]{2})" theRegex.ReplacementPattern = "" Filename = theRegex.Replace(EncodeURLComponent(Filename)) Return Filename

Edit: this will remove other characters like &, +, and %, I don’t know if you want that too.