Lexical sorting?

I thought that Xojo does lexical sorting. But a simple test

dim a(-1) as String = Array("customer ", "customer b", "test", "abracadabra", "something else") a.Sort

shows that “customer b” is incorrectly sorted after "customer ". The is a-umlaut and should be sorted before b. Bug or feature?

And how would I do a lexical sorting irregardless if this is a bug of a feature?

No, it uses… whatever the opposite of lexical sorting is (drawing a blank, and too tired to look it up at the moment). You could roll your own sorter, or do something like this:

  dim a(-1) as String = Array("customer Ä", "customer b", "test", "abracadabra", "something else")

  dim sorter() as string
  for i as integer = 0 to a.Ubound
    sorter.Append a( i ).ConvertEncoding( Encodings.ASCII )
  next
  
  sorter.SortWith( a )

Thanks, Kem. This looks like a good idea. I found one feature request from 2005 in Feedback, which is simply embarrassing.

It does sort lexically - the problem is that the collating sequence it uses is not what you expect.
You’re expecting that the collating sequence treats as the equivalent of A - but it doesn’t.
It uses the code points as the collating sequence which means that (code point 196 in UTF-8) sorts well after B - which isn’t quite “lexical” in cases / languages where should sort right next to A (or in other cases like German where might be treated as “ss”.

See http://www.utf8-chartable.de

Kems trick forces everything to sort as ascii - so it treats as A etc

I think there’s a FR in Feedback to be able to set the collating sequence for things like sorting arrays and list boxes

It’s Mac-only of course, but you might take a look at this Carbon function, which I believe Charles posted code for a few years ago. I use it for lexical sorting in Cocoa builds.

Declare Function UCCompareTextDefault Lib “Carbon” (options as Integer, text1Ptr as CString, text1Length as Integer, text2Ptr as CString, text2Length as Integer, equivalent as Integer, ByRef order as Integer) as Integer

The text encoding must be UTF-16.

For the options parameter, use these:

Const kUCCollateComposeInsensitiveMask = 2
Const kUCCollateWidthInsensitiveMask = 4
Const kUCCollateCaseInsensitiveMask = 8
Const kUCCollateDiacritInsensitiveMask = 16
Const kUCCollatePunctuationSignificantMask = 32768
Const kUCCollateDigitsOverrideMask = 65536
Const kUCCollateDigitsAsNumberMask = 131072
Const Null = 0

My trick is not ideal since it will disregard any characters that don’t have an ASCII equivalent. However, changing the sort to this will probably do it. But I haven’t tested.

a.SortWith( sorter )
sorter.SortWith( a )

I must disagree. In my native tongue a-umlaut and o-umlaut are definately sorted after b.

Right, it differs by language. In Scandinavian languages, for example, the vowels with diacritics sort at the end, but in German, they sort as either just the underlying vowel, or as the vowel followed by e. So here’s a follow-up question: is there a way to cause it to sort in the order that the underlying OS itself sorts by current-language default?

Jonathans code can sort using the OS X sort which is sensitive to the OS language & collating sequence.

But in general - no.
Xojo sorts the way it sorts.

Hence the FR for some way to make the sorts take a collating sequence as a parameter or some other way to have lexical sorting respect the OS level collating sequence.