Array.Sort can't handle Portuguese words

Suppose I have four Portuguese words:

anúncio
ânus
anverso
zoologia

If I use Array.Sort to handle them, the result is:

anúncio
anverso
zoologia
ânus

How to fix this bug? Thanks for reply!

I actually just filed a bug report today about sorting being wrong.

I thought it was limited to the Navigator in the IDE, but this makes it look like all string sorting could be wrong?

File a new case specific to the Sort function though just to be sure.

Well, it’s about whether to sort localized or sort universal.
And universal puts accented characters behind.

In MBS Plugin, we have NSStringArraySortMBS function to sort with localization on Mac.


  Function UnaccentedChar(c As String) As String
    // Pass one char, returns the unaccented equivalent
    // This table handles many European languages, except some Nordic ones due to different collation
    // Adjust as necessary
    Static AccentedLower As String   = ""
    Static AccentedUpper As String   = "֊ݟ"
    Static UnAccentedLower As String = "aaaaaceeeeiiiinooooosuuuuyyz"
    Static UnAccentedUpper As String = "AAAAACEEEEIIIINOOOOOSUUUUYYZ"
    c = c.Left(1)
    If InStrB(AccentedLower, c)>0 Then Return UnAccentedLower.Mid(InStr(AccentedLower, c),1) // return the translated lower case grapheme found
    If InStrB(AccentedUpper, c)>0 Then Return UnAccentedUpper.Mid(InStr(AccentedUpper, c),1) // return the translated upper case grapheme found
    Return c
  End Function
  
  Function Unaccented(s As String) As String 
    Dim arrs() As String = s.Split("")
    Dim uarrs() As String
    For Each c As String In arrs
      uarrs.Append(UnaccentedChar(c))
    Next
    Return Join(uarrs, "")
  End Function
  
  Function UnaccentedCompare(s1 As String, s2 As String) As Integer
    s1 = Unaccented(s1)
    s2 = Unaccented(s2)
    If s1>s2 Then Return 1
    If s1<s2 Then Return -1
    Return 0
  End Function
  
  // Much slower than Christian's plugins, but universal

  Sub Window1.MyButton1.Action()
    Dim a() As String = Array("Mo", "Pe?", "Mao", "mao", "pe", "mbar") 
    a.Sort(AddressOf UnaccentedCompare)
    MsgBox Join(a, ", ")
  End Sub

Compare your Unaccent function to converting the string encoding to ascii. Do you get the same result?

Did a test. Close enough to use with Portuguese. Because portuguese don’t use “Ž” where I saw it failing.

Also wonder what the result is if the array is Text instead of String.

Use it as you wish, need Text? handle it as Text, that’s why I gifted the source here. :smiley:
It’s ready to be adapted even to Danish, just adjust the tables.

Sorry, I wasn’t clear. My first post about converting the encoding was directed to your code, but the second was back to the original question.

I always appreciate someone who posts code to help a fellow coder. :slight_smile:

if you use text instead of string the array will be sorted in the right way

[quote=376618:@Hong Zhang]Suppose I have four Portuguese words:

anúncio
ânus
anverso
zoologia

If I use Array.Sort to handle them, the result is:

anúncio
anverso
zoologia
ânus

How to fix this bug? Thanks for reply![/quote]
I don’t get the same result as you. I tested this code:

Dim words() As String = Array("anúncio", "ânus", "anverso", "zoologia") words.Sort
and I get:
anverso
anúncio
zoologia
ânus

If I change the code from String to Text, I get:
anúncio
anverso
ânus
zoologia

ETA: the problem is that with Text the special characters have a higher value than normal characters, if I want to sort “alfabeto” and “alfabético” I want alfabético to go first (e and é same ‘value’ so keep checking until i < o). I will try to understand/test Rick’s code.

Yep. That Happens:

[code] Dim ar() As String = Array(“alfabeto”, “alfabtico”)
ar.Sort(AddressOf UnaccentedCompare)
MsgBox Join(ar, ", ") // Right Collation

Dim art() As Text = Array(“alfabeto”, “alfabtico”)
art.Sort
MsgBox Text.Join(art, ", ") // Wrong Collation[/code]