TextEncodings to remove accented characters

Hi there,

In Xojo desktop I’m using a simple function to remove accented characters to allow easy searches.
Example:
I want the user to be able to type “andre” in the search field and to find “Andr”, the following function stripe accents and remove unknown ascii char.

Function stripeText(input As String) As String
Dim output As String =  ConvertEncoding(Lowercase(input), Encodings.ASCII) 
output = output.ReplaceAll("?","")
Return output
End Function

This is the result:

Dim input As String = ""
Dim output As String = stripeText(input)
' output = "eecauoiuoaiuoia"
' all accents removed 

I don’t succeed doing the same with iOS framework.
I tried:

  Dim memA As MemoryBlock = TextEncoding.ASCII.ConvertTextToData("", True)
  Dim output As text = TextEncoding.UTF8.ConvertDataToText(memA, True)
  ' Output is "???????????????"

And also

  Dim memU As MemoryBlock = TextEncoding.UTF8.ConvertTextToData("", True)
  Dim output As text = TextEncoding.ASCII.ConvertDataToText(memU, True)
  ' Output is "éèçàùòìûôâî????"

How do you translate the stripeText function to iOS ?
What’s the iOS equivalent for Dim output As String = ConvertEncoding(Lowercase(input), Encodings.ASCII)

Any help is welcome.

Thanks !

This is not a direct answer to your question, but still – I would do it the other way around and use the capabilities of iOS (and OS X on the desktop):
NSString in Foundation has a method called “compare:options:range:locale”. The options include NSDiacriticInsensitiveSearch, which removes all diacritics from their letters (“A” is the same as “Å” and as “Ä”). There are of course additional options like NSCaseInsensitiveSearch.

URL encoding strips accents and represents them as % notations. So if you strip these, you get to the lower ASCII. I have added the dieresis (trmas, umlaut) which is a common accent in French.

dim t as text = "" dim f as folderitem = SpecialFolder.Temporary.child(t) t = f.URLPath t = t.right(t.Length-(t.IndexOf("/tmp/")+5)) t = t.ReplaceAll("%CC","") t = t.ReplaceAll("%A7","") for i as integer = 80 to 88 t = t.ReplaceAll("%"+i.ToText,"") next system.DebugLog t //result : eecauoiuoaiUOIAaeiou

Thanks.
@Michel Bujardet, your option is fine but requires to create a file, even temporary, for every search.
I’ll try to find another way, based on the url encoding trick.
@Eli Ott, I’m not familiar to declares, but I’ll maybe give it a try

It’s a pity that the iOS framework is still missing a lot of stuff and that it’s not possible
Even the documentation is not correct as it gives a solution in the Text data type doc:

You can also convert a String with a known encoding to a Text using the String.ToText method: Dim s As String = "Hello" Dim t As Text = s.ToText ' t = "Hello"
which gives a compile error “String is not available in iOS, use Text or MemoryBlock instead”

[quote=238442:@Olivier Colard]Thanks.
@Michel Bujardet, your option is fine but requires to create a file, even temporary, for every search.
I’ll try to find another way, based on the url encoding trick.[/quote]

Simply use FolderItem.delete after you are done.
Besides, it does not actually create the file ; just a pointer. The file will only exist if you write to it.

[quote]It’s a pity that the iOS framework is still missing a lot of stuff and that it’s not possible
Even the documentation is not correct as it gives a solution in the Text data type doc:

You can also convert a String with a known encoding to a Text using the String.ToText method: Dim s As String = "Hello" Dim t As Text = s.ToText ' t = "Hello"
which gives a compile error “String is not available in iOS, use Text or MemoryBlock instead”[/quote]

The new framework can be used on iOS, where string does not exist, or in Desktop and Web, where String exists. It is not an error, string is just unsupported on iOS. So is not supported String.ToText.

String is not available in iOS.

Since Text is available on all platforms, it also contains code samples for desktop, console and web applications.

Thanks to both of you.
I’ll make some performance test using Michel’s trick and compare with a simple loop replacing all vowels with accents and “”, there aren’t so many …

FYI, just ran the comparison, performing 100 times the conversion of “”

with Michel’s trick:
25157 microseconds

with a simple loop:
12700 microseconds

The difference is just a few hundreds of a sec, but I’ll go for the less sexy simple loop:

  Dim i(), o(), output As text
  
  input = input.Lowercase
  i = input.Split
  
  for x As Integer = 0 to i.Ubound
    Select Case i(x)
    Case "","","",""
      o.Append "a"
    Case "","",""
      o.Append "e"
    Case "","",""
      o.Append "i"
    Case "","","",""
      o.Append "o"
    Case "","",""
      o.Append "u"
    Case ""
      o.Append "c"
    Else
      o.Append i(x)
    End Select
  next
  
  
  output = Text.Join(o,"")
  Return output

900 Microseconds:

[code]Declare Function dataUsingEncoding Lib “Foundation” Selector “dataUsingEncoding:allowLossyConversion:” _
(NSString As CFStringRef, NSStringEncoding As UInteger, allowLossyConversion As Boolean) As Ptr
Declare Function NSClassFromString Lib “Foundation” (className As CFStringRef) As Ptr
Declare Function alloc Lib “Foundation” Selector “alloc” (NSClass As Ptr) As Ptr
Declare Function initWithData Lib “Foundation” Selector “initWithData:encoding:” _
(NSClass As Ptr, NSData As Ptr, NSStringEncoding As UInteger) As CFStringRef

Const NSASCIIStringEncoding = 1

Dim t As Text = “”

Dim data As Ptr = dataUsingEncoding(t, NSASCIIStringEncoding, True)
Dim result As CFStringRef = initWithData(alloc(NSClassFromString(“NSString”)), data, NSASCIIStringEncoding)[/code]

Hi Eli,

That’s much faster and much more sexy ! On my iMac it runs in 490µs
I definitely need to look at declares !

Thanks a lot and have a happy new year, Gesundheit !

I have added that to XojoiOSWrapper https://github.com/Mitchboo/XojoiOSWrapper

Thank you Eli !