String.Compare and Controls characters

Hi, I suppose it’s not a bug but I think it’s strange.
Enter the code below :

Dim TpTextA, TpTextB as String

TpTextA = "Hello World"
TpTextB = "Hello " + chr(31) + "World"
MessageBox Cstr(TpTextA = TpTextB) + EndOfLine + str(TpTextA.Compare(TpTextB, ComparisonOptions.CaseSensitive)) + EndOfLine + str(TpTextA.Compare(TpTextB, ComparisonOptions.CaseInsensitive)) + EndOfLine + str(TpTextB.Compare(TpTextA, ComparisonOptions.CaseSensitive)) + EndOfLine + str(TpTextB.Compare(TpTextA, ComparisonOptions.CaseInsensitive))

I think the result would not be 0. Note that (TpTextA = TpTextB) is False.
The result is 0 except with ch(9), chr(10), chr(11), chr(12) and chr(13).
I made a small exemple, click ButtonA and ButtonB .
Test-CompareString

What do you want to achieve?

I would like to know if it’s a bug or not, if I have to fill a bug report.
If a string has a control character and the other string does not, then the compare string should not be 0, shouldn’t it?
I test (MyStringA = MyStringB) and (MyStringA.Compare(MyStringB, ComparisonOptions.CaseInsensitive) in order to obtain the result I want.

The documentation is completely opaque as to how ranking is determined by string.compare, i.e. how does it decide if one string is “greater than” another. The example tells us that “dog” > “cat” but not why. In your example, maybe it gives equal weight to Chr(32) and Chr(31).

1 Like

The Chr(31) does not replace the space chr(32) but is between the space and the W.
I think I will fill a bug report just to clarify.

Because “d” (ascii 100) is greater than “c” (ascii 99) perhaps?

Or because most dogs are greater than cats… :grin:

I submited a <https://xojo.com/issue/66547>

I just ran into this today, I get the same results whether I do this:

If FirstStr > SecondStr then…

or this:
Result=FirstStr.Compare(SecondStr)

I get the same results if I convert the strings to text. I added comments to Thomas’ bug report #66547.

Alphabetizing words correctly is a HUGE thing for almost all of my programming. This is a MAJOR BUG.

You haven’t told us what FirstStr and SecondStr contain.

Why is it that recently there has been a rash of content-free questions and reports?

OTOH, I’ve heard it said “Dogs have masters; cats have slaves.”

Although personally I am more of a dog lover…

Oops! Sorry for not including my test phrases. With ordinary words, it does work - comparing “Alice” and “Beth” give correct results.

If the two words are “10001922” and “1982d”, then the function says that 10001922 is less than 1982d. It also says “dogs23” is greater than “C9999cats999”.

I’d love to know what criteria it’s using, I can’t begin to predict the results.

No, cats have staff.

Looks like it takes the first char of each and compares their ASCII values, then moves to the second char of each, until either it gets a less/greater result for a pair of chars, or runs out of chars in one string. BICBW.

Well, it is. This is a string comparison, not a numerical comparison.

Also true. “d” is greater than “C” in an alphabetical sort.

Unless you are on an IBM system using EBCDIC collating sequence. :slight_smile:

1 Like

Comparing strings that represent numbers with prefixes and suffixes is always a pain, I’ve written my own compare function, but it too has trouble with phrases that contain a mix of letters and numbers. I was hoping that Xojo’s text compare would have a way to sort the way people do, not machines.

I have the old Sort Library from Charles Yeomans. It’s easy to add another sort order:

// Part of the StringComparator interface.
const NSCaseInsensitiveSearch = 1
const NSDiacriticInsensitiveSearch = 128

Return NSStringCompareMBS(s1, s2, NSDiacriticInsensitiveSearch + NSCaseInsensitiveSearch)

I’m not sure what magic Xojo (or any language) could do for your here. It would seem like the rules for sorting/comparing such combined data would vary by its purpose.

However, you can create your own comparers/sorters with Xojo. So if you had a string value that actually consisted on multiple parts that were concatenated, such as “C9999cats999”, then you could instead create a class with properties for each part:C, 9999, cats, 999.

Put your values in class instances and then implement your own Operator_Compare() method on the class to allow you to compare all those properties in the way that makes sense for you.

Alternatively you could create your own comparison algorithm for Sort() that knows how to parse out the combined string.

Now I’m slightly confused. Neither https://documentation.xojo.com/api/data_types/string.html nor https://documentation.xojo.com/api/data_types/string.html#string-compare appear to mention https://documentation.xojo.com/api/math/operator_compare.html. Does this mean I can’t create an alternative to string.compare() that will do a comparison of two strings to suit myself?

Operator_Compare works on classes. String is not a class, so you cannot use them together.

Although as I type this I wonder if you could do an Operator_Compare as an extension method. That seems unlikely, but off to try…

Edit: Doesn’t seem to do anything in my test.