Please "Un-deprecate" `Text`

Thanks for responding @Geoff_Perlman.

I find myself in a bind here as I have a product I thought was close to release but now find it has deep flaws due to String's mis-handling of characters. The “easiest” solution is to change every reference in my project for String to Text since that will solve the issues but then I will have a product using a deprecated language feature which feels wrong and risky.

Adding a bunch of extensions to String could work but I have no doubt that the performance will really suffer since they won’t be as fast as the native Text data type. One of this product’s selling features is its speed so I’m acutely sensitive to performance.

No offence intended but waiting on Xojo to implement a feature request doesn’t seem a viable option as there are no guarantees on turnaround time for implementation.

Why can’t Text and String exist beside each other? If the concern is that there is a risk of confusion to customers then I think that is the status quo. Those who need to manipulate bytes, are constrained by performance or know they will only be using common unicode characters can use String. For those who need 100% coverage of the entire language spectrum can use Text.

I didn’t even realise that existed as its hidden away.

<https://xojo.com/issue/65406>

Hi Geoff

Would adding a Unicode mode not cause confusion / introduce weird bugs?

Imagine you have a function that accepts a string and wants to perform a character based Left on it. There is no guarantee that the input string has the correct mode which would cause the function to operate incorrectly. This probably means that you would have to change the mode and possibly restore it which could cause further bugs and potentially unnecessary cloning of data.

Would it not be better to add character specific functions as it would be obvious what the code was doing and avoid any weird state related issues.

If you were to introduce a Unicode mode could you please keep the default as code point so that everybody’s existing code doesn’t have to be updated.

As much as I hate the extreme verbosity of the method names, I think LeftCharacters, MiddleCharacters, RightCharacters, and IndexOfCharacters makes the most sense.

As was mentioned, designed in a vacuum I’d say it should be different. The “unadorned” versions should work on characters, not code points. Then we’d have both Bytes and CodePoints variants. But that ship has sailed.

In theory, overloaded versions of Left, Right, Middle, and IndexOf could be introduced to support the enum that was mentioned. The optional parameters make that tricky, but should still be doable. I’d still prefer the default is to work in characters rather than code points, but again that ship has sailed. And if working characters is super slow like Text is, then maybe that wouldn’t be such a great default.

An enum would be harder to use since auto-complete would be little help. I’d favor the “…Characters” variations.

Without Xojo’s help, class could probably be created that works this way, but I’d call that class “Text”. :slight_smile:

1 Like

True, but enums are used a lot in API 2, so it’d be better fix autocomplete than to avoid things it doesn’t work with. I know better than most that’s easier said than done, but that doesn’t change anything.

You mean you don’t like to see code that reads like a novel? I was raised on C and assembly so to me code efficiency is important (with readability set by proper coding techniques). But it seems there is a big push for code efficiency be dammed, we need more adjectives, and adverbs, and pronouns to make code Shakespeare would envy.

What has any of that to do with code efficiency?

The compiler drops method names and doesn’t care about them.

So speaking names are welcome.
The enum thing make it possibles to propagate the setting to our own methods.
e.g. have my own CalculateAverageWordLength() taking the same enum to say how to count.

2 Likes

I agree, and often use long descriptive variables names to help readability. I have no problem with long keywords, but complex syntax can be very tedious (e.g. DateTime.toString, which means a trip to the LR every single time I use it), especially if autocomplete isn’t working perfectly.

2 Likes

We need a smarter autocomplete with parameter tips as found in some other editors.

5 Likes

That looks awesome, and probably not all that difficult to do if your resources are not all busy renaming stuff.

6 Likes

Famous last words.

2 Likes

If I were you, I would use Text until the day comes when String has the features of Text you need. It’s highly unlikely that Text will go away prior to that.

2 Likes

I don’t think so. You’d have to set the string to unicode mode and thus that the various methods act upon it by character rather than byte is pretty clear. In fact, characters is what most users think those functions on String do already. It’s just convenient that in most cases, a character is a byte. In unicode mode, it would continue to operate as most expect it to even when the characters require multiple bytes.

The current functions already work on multiple bytes as a code point can consist of more than one. They just don’t work on characters that comprise of multiple code points. Calling it Unicode mode could also be confusing as the current functions do work on Unicode.

If you don’t think there could be confusion could you describe what this function would do if passed a string that contains the characters “:relaxed::grinning:

Private Function DoSomething(pString As String) as String
  Return Left(pString, 2)
End Function

The answer is you can’t, as the result would depend entirely on the string’s mode.


If you needed a function to operate on a string a certain way you would have to start switching modes within the function. Something like this…

Private Function DoSomething(pString As String) as String
  If pString.Mode <> String.Mode.Characters Then
    pString.Mode = String.Mode.Characters
  End If

  Return Left(pString, 2)
End Function

Would changing the mode cause pString to be copied and potentially cause performance / memory related issues?


I see a worse problem if pString was passed by ref:

Private Function DoSomething(ByRef pString As String) as String
  If pString.Mode <> String.Mode.Characters Then
    pString.Mode = String.Mode.Characters
  End If

  Return Left(pString, 2)
End Function

The input string’s mode hasn’t been restored which potentially screws up any future code that operates on the string.


The cleaner way would be to have separate functions that deal with user perceived characters as there would be no mistaking what the function did or rely on any kind of mode.

Private Function DoSomething(pString As String) as String
  Return LeftCharacters(pString, 2)
End Function
1 Like

To be even more clear, Text is not part of the framework. It’s part of the language. As such, it’s going to be around for a very long time, likely far after we have added the needed features to String.

I use words like likely because I can’t predict the future with 100% accuracy. :slight_smile:

2 Likes

Thanks @Geoff_Perlman. That’s very reassuring. I took the executive decision yesterday to spend the last 24 hours porting String to Text. I’m happy to report it wasn’t too tricky and so far the performance is OK. The emoji bug is no more!

6 Likes

I’m just waiting for String.Asc to be replaced with String.America
nStandardCodeForInformationInterchange :slight_smile:

4 Likes

There should be a new category for this LTS status.

1 Like