Are strings immutable?

The Xojo docs explicitly state that the Text data type is immutable. It doesn’t say one way or the other concerning its successor in API 2, String.

Coming from the .NET framework, I’m accustomed to strings being immutable (mainly for thread safety reasons) and to the performance tradeoffs involved. I have worked for many years on a code base that’s very heavy on text processing, and one learns that there’s considerable performance lift in avoiding string allocations like the plague in that environment.

For example, code like this in C#:

string string1 = “Foo”;
string string 2 = “Bar”;
string string1 = string2;

… represents another allocation in reassigning string1. It does not simply assign a reference to string2 to string1 or copy the existing characters into the existing instance if possible, it creates a whole new string, assigns a copy of the characters of string2 to it, and then discards the old reference to string1 for later garbage collection. This is basically syntax sugar that makes it look like no allocation is happening – yet, it is. String concatenation is also a no-no at least in principle. One uses a StringBuilder instance to construct a string from sequential bits and pieces, or one uses Spans over an array of characters and copies into that. Either way, you create a new string from the final result so that there is only one string allocation, and no extra or hidden ones.

Do these considerations apply in Xojo? I’m guessing that they do, that Memory is more or less a substitute for Span, and that one can write abstractions to construct complex strings with minimal allocations.

Yes, I understand that this is probably not a practical problem in most apps and would constitute premature optimization. I am just wanting to confirm the internal architecture so that I can keep it in mind when I have hot paths with lots of string manipulation.

Another factor is that .NET does optimize away concatenations of (I think currently) up to 4 string fragments, and perhaps Xojo does so as well. It basically does an under-the-hood assembly of a character array and does a fast in-memory copy into it and creates a new string with one allocation. Not having access to the source like I am used to (I would Google, e.g., “System.String source code” to see what it’s actually doing) I am hoping someone has written an article or blog post about this at some point.

I guess a related question (because of how much of an impact on performance object allocations are, and when) is whether Xojo uses garbage collection or reference counting internally. If anyone knows.

Thanks in advance for any info you might have!

Yes, they are immutable.

Reference counting is used.
If you need to concat a lot of strings, you can use an array and the join function.

Thanks Christian.

Do you happen to know if there is a string interning mechanism? For example are all string constants in a program preallocated in some sort of internal dictionary? Is this programmer-accessible?

For example if one has a large list or array of classes that contain many string properties, and some of those have low cardinality, you can often save a ton of memory (at the expense of slower performance in the initial loading) if you intern low-cardinality strings. E.g.:

// save another instance of the string
myInstance.CountryCode = “US”

// intern the string. If it’s not already there, a new instance is created, else, the property points to an
// existing instance; any number of instances can share the string “US” instead of allocating many
// instances.

myInstance.CountryCode = String.Intern(“US”)

Does anything similar exist in the Xojo framework?

Thanks.

All constant strings go into the data segment of the executable, so they are mapped in as read only memory.
If you use a string twice, the compiler should detect it and make them references to the same memory.

Dim a As String = "US"
Dim b As String = "US"

this will both give a reference to the same constant string.

You can also use the constant system, which allows you to define all your literal strings once and then reference them elsewhere in the code. It also allows you to, were appropriate, translate the system into multiple languages (French, German etc, not C / C++).

You can right click on your literal and select convert to constant. It will allow you to name the constant and change the literal into a reference. In the inspector you use #ContantName. In your code just ConstantName. You can place constants in the place they work best. For example a global module, a class etc.

So effectively same as the default in .NET, all unique string constants refer to the same string. The question is can this be extended to non-constants. I’m thinking not.

Thanks, Ian. I am not sure this is exactly what I’m wanting but I will investigate. I’m looking for more of a runtime mechanism whereby I can say, “this value has low cardinality and I want to conserve memory by having all unique values point to the same instance”. So as an example imagine a table of addresses. You have 2 address lines, city, state / province, postal / zip code, and country code. There might be 10,000 rows in memory but only 50 or so unique state values, 200 or so unique country values. Rather than have 10,000 strings in memory for the state, you have 50. That kind of thing. In that case I don’t necessarily know in advance what the “constants” are. They can vary at runtime, but the point is that the number of unique values is small.

You could create likely such a thing using a Dictionary. A dictionary is a collection of Key, Value pairs. Key and value can be anything you want. There is a method to ask if the Dictionary already has a given Key. Wrap it all up in a class and you likely have what you are after.

Yes, Ian, that would essentially be rolling your own string interning mechanism. Sort of. The thing about a built in mechanism is that it’s global to the entire app, and more transparent and simple with fewer opportunities for bugs. A dictionary-based solution would have the upside of more control over lifetime scope. In .NET, once a string is interned, it lives in the interning pool until the app terminates, which might not always be what you want. I tend to use it for lookup tables or cached data that needs to be accessible to the entire app.

Here again, not necessarily a big deal, just trying to get my bearings of what is (im)possible relative to the techniques I’ve often used before in other environments.

If you assign a string to variable s1, then assign s1 to s2, there will be only one allocation of memory and both variables will point to it. The bytes that make up the string are not copied.

If you then concatenate a string to s2, a new allocation of memory will occur to hold the new string.

Did I understand your concern?

Well what I am used to is that by default a string instance wraps a character array that it owns.

If I’m understanding you correctly though in Xojo it works like so:

var s1 as string = “Foo” // allocation
var s2 as string = s1 // no allocation – same instance
s2 = “Bar” // allocation – now its own instance

In .NET AFAIK the 2nd line would create a new string instance unless it is explicitly interned. OTOH I see no reason that has to be the case. The tradeoff is that at runtime you’d have to check to see if a string exists with the same value when creating s2. In a large application, even with a dictionary implementation, that would kill some time. So I think interning was designed to allow you to make the tradeoff between using less memory or having greater speed.

This brings me to the question of value equality vs instance equality, which is always a fun topic in the .NET framework. Is there a means to determine programmatically that s1 and s2 are the same instance? Or is this pointless because 2, 10, or 10,000 strings with the same value (characters) inherently ARE the same instance? In which case, in Xojo, there is only ever value equality to be concerned with. Is that about the size of it?

var s1 as string = “Foo” // allocation
var s2 as string = “F” + “oo” // allocation, same value, separate instance

No, there is no mechanism to determine if two strings are the same instance.

But if this level of allocation is a concern, maybe you need a lower level language.

1 Like

At the end of the day if it’s fast enough, it’s fast enough. Xojo really isn’t designed primarily for the sort of intense string processing I’m describing anyway, though it’s no slouch either. I am mostly just curious about its internals. Whole books have been written about things like .NET memory management and I kind of like understanding the underpinnings a bit. But strictly speaking – for line-of-business apps and similar, it mostly just has to work and be stable.

1 Like

Once again, Xojo comes through.

I was thinking a MemoryBlock could be useful for the kind of heavy duty string appending I’m mentioning above. And sure enough, in the docs for MemoryBlock, near the bottom, a sample FastStringAppend project is mentioned (Examples/Advanced/MemoryBlock/FastStringAppend) which is roughly equivalent to the .NET framework’s StringBuilder. It is said to be much faster than string concatenations “if you have very long strings”. I am guessing it is some intersection of quantity of concatenations plus lengths in reality – benchmarking would be needed. But this does cover the use case of many string concatenations on hot paths that I was wondering if I could resort to when appropriate.

I’m sure that pondering the functionality of MemoryBlock will also reveal ways to do things like assemble strings from substrings of other strings without using Mid() and the attendant allocations.

I think it is very fortuitous that Xojo is heavily dogfooded. If the Xojo IDE has to be built in Xojo, it forces the language to keep up pretty will with everything offered by the best of any of the modern programming languages. And that is a Good Thing.

In reality there is also instance equality in Xojo: the Is keyword

If obj1 Is obj2 Then
// obj1 and obj2 are the same instance
End If

This means for example it’s easy to game out whether or not Xojo actually interns string instances by default. So I will use that to investigate further.

IIRC Xojo does not treat strings (don’t need NEW) as objects but rather as an intrinsic type so I don’t expect “Is” to work for strings.
-Karen

1 Like

Oh, poo. I just did a quick test and you are absolutely correct. I am used to the semantics making strings look intrinsic but they are actually objects in .NET, so you can for example with VB.NET get a string of 20 spaces via new String(" "c,20) or create a string from a character array and probably other overloads I’ve forgotten. But you don’t normally resort to using new with strings. You just assign to them as if they were an intrinsic type, and don’t have to think much about what’s under the hood unless, as I said, you’re doing a ton of string manipulation.

So my takeaway is that Xojo strings actually have in common with the .NET framework that strings are immutable and there is some degree of overhead in doing too much concatenation of strings on hot paths – but though strings may for all we know be objects (or structs) internally, they are 100% intrinsic semantics to developers. Some here also say that string interning, which is optional in .NET (and normally only used for string constants defined in code), is universal in Xojo. I was hoping to verify that is the case and identify any exceptions.

A project for another day I guess. Also not really a pragmatic priority – just an item of personal curiosity. My version of taking a clock apart to see how it works.

1 Like