Possible Xojo.Core.Dictionary bug

Joe_Ranieri · August 31, 2016, 8:07pm

It’s insensitive because neither kCFCompareLocalized nor an explicit locale were passed.

[quote=284665:@Thomas Tempelmann]BTW, if it’s a red-black tree, as Kem suggests, it means that the dict behaves rather inefficiently when it comes to looking up complex text in a large “dict”, because then every node requires a comparison, which will be rather costly in time compared to a true “map” like dictionary that will only have to calculate one hash code and then pick a single match in most cases, though it’ll be a bit more wasteful in memory consumption (but memory’s cheap nowadays, especially with 64 bit support, isn’t it?)

Correct me if I’m wrong.[/quote]

No, you are correct.

Allowing customization via the CompareKeys event is actually the motivating factor for making it be a red-black tree. This will likely change in the future though, since it’s hard to do strict weak ordering with heterogeneous types and isn’t really worth the speed hit.

Thomas_Tempelmann · August 31, 2016, 8:11pm

Thanks for explaining, Joe. Yes, being able to provide a custom comparison function would have been nice, indeed. I am still a bit upset about the fact that this class is called “dictionary” when it’s more like a tree class (though, it’s still a map, or an associative array, just not a hash map as we’ve gotten used to).

Eli_Ott · August 31, 2016, 8:13pm

If one replaces CFSTR in Norman’s example with CFStringCreateWithCString it works correctly (with Unicode encoding):

[code]int main() {
CFStringRef str1 = CFStringCreateWithCString(kCFAllocatorDefault, “Kota”, kCFStringEncodingUnicode);
CFStringRef str2 = CFStringCreateWithCString(kCFAllocatorDefault, “Knig”, kCFStringEncodingUnicode);

CFStringCompareFlags flags = kCFCompareNonliteral | kCFCompareForcedOrdering;

CFIndex cmp1 = CFStringCompareWithOptionsAndLocale(str1, str2, CFRangeMake(0, CFStringGetLength(str1)), flags, NULL);
CFIndex cmp2 = CFStringCompareWithOptionsAndLocale(str2, str1, CFRangeMake(0, CFStringGetLength(str2)), flags, NULL);

printf("%li, %li
", cmp1, cmp2);
}[/code]

One question: why was CFSTR used in that Norman’s example? It is a macro, which takes a C string at compile time. This is not the case in question here as the issue is with runtime string values.

EDIT: CFSTR is UTF8 CFStringCreateWithCString with kCFStringEncodingUTF8 will fail also.

Joe_Ranieri · August 31, 2016, 8:23pm

kCFStringEncodingUnicode is not UTF-8.

Eli_Ott · August 31, 2016, 8:25pm

I didn’t say the opposite.

Eli_Ott · August 31, 2016, 8:40pm

So it works with kCFStringEncodingUTF16 but not with kCFStringEncodingUTF8.
And CFStringGetLength shows 6 when using kCFStringEncodingUTF16 and 5 when using kCFStringEncodingUTF8.

Will_Shank · September 1, 2016, 2:36am

I tried the code Norman posted (ported to Xojo) and it looks like the kCFCompareNonliteral flag is at issue. Remove that flag and the comparisons result “Kota” < “Knig” and “Knig” > “Kota”.

kCFCompareNonliteral
https://developer.apple.com/reference/corefoundation/cfstringcompareflags/kcfcomparenonliteral?language=objc

[quote]Specifies that loose equivalence is acceptable, especially as pertains to diacritical marks.

For example, represented as two distinct characters (o and umlaut) is equivalent to represented by a single character (o-umlaut). Note that this is not the same as diacritic insensitivity.[/quote]

Note I’m just saying this might be of interest, I’m not sure how how to interpret the intent of this flag, if it should give those results or not.

Also the bug is evident just using “o” and “”.

Eli_Ott · September 1, 2016, 4:46am

kCFCompareNonliteral means “normalize to NFD”. It is correct IMHO to use kCFCompareNonliteral as far as I understand it. This would be the result of the normalization (in utf8):

= c3b1 > 6e303
= c3b6 > 6f308

So the question is why does it work with utf16 encoding but not with utf8 encoding.

Eli_Ott · September 1, 2016, 5:13am

Yes, and I haven’t found any other combination of two characters between U+00E0 and U+00FF doing this. Only a string with a to z followed by à to ÿ compared to ö followed by nothing or anything leads to that issue.