Mid versus IndexOf

I’ve been confused about Xojo’s string conventions. At first I gathered that the first position in a string was position 0 (as shown in the IndexOf example in the documentation), then something else made me think that it was position 1, then I went back to zero again.

I’ve finally identified the culprit as Mid, which “returns a portion of a [string]… The first character is numbered 1”.

This seems inconsistent, to say the least. Are there any more traps like this? Or am I missing some subtlety?

I Don’t use API 2 so I’m unsure of string handling in it.

Arrays are always 0 based. In API 1 Strings were always 1 based… Not sure about API 2… as I said I don’t use it.

-Karen

1 Like

In the conversion from API 1 to API2, string functions changed from 1-based to 0-based. Mid is API1 and 1-based, IndexOf is API2 and 0-based. The change was intended to make things more consistent, which resulted in string handling functions being changed.

4 Likes

IndexOf() and Mid(() serve different purposes. IndexOf() is to locate the start position of a substring, while Mid() is to extract a substring at a known starting location. In API1, the Mid() function is 1-based whereas in API2 the counterpart is Middle() which is 0-based. The name changed so that Mid() could continue to be 1-based for existing code; for those coding in API2 the thing to use is Middle() which is 0 based. (As I recall, AP1 also had IndexOf() available.)

2 Likes

But generations of Basic (Not just RB/RS/Xojo) users expect Mid as well as the Left and Right string functions. As Left and Right work in as in traditional BASIC things get confusing…

IMO they should have left String character positions alone … Beside BASIC history, that would have made going to API 2 practically a LOT easier!

Index twiddling is most used for strings in my experience!

BTW in terms of consistency, I would have guessed in API 2, Mid would have been Middle not IndexOf ! :wink:

-Karen

It is. They both exist but serve different purposes; Middle() is the API2 0-based counterpart to Mid().

1 Like

Right, that was an overly-quick response to the OP, who mentioned Mid and IndexOf.
Mid → Middle
Instr → IndexOf

And I agree that they shouldn’t have changed.

Xojo made many cosmetic changes with the “api 2.0”

Many!

Yes, it is, there were many protesting against those changes, but… That is what xojo considered a “modern” change… There were also suggestions like having an option to make a Strict Api1 or Api2 Only apps on the IDE but they say it is ok to mix and match different api versions with aparent similar haviors but different :roll_eyes:

Check some of the changes, Your Path Forward with API 2.0 – Xojo Programming Blog

Some links are also broken because of the change in the documentation

1 Like

Thanks, guys, that would be perfectly clear — except that I can find nothing about Middle, either in the documentation or in the Introduction!

Look under the docs for String.

https://documentation.xojo.com/api/data_types/string.html#string

Edit: This is a possible failure of the new doc system. In the old docs, the first item that comes up when searching for “Mid” is Mid - Xojo Documentation. That explicitly states the replacement is String.Middle. In the new docs, the first hit is for Text.Mid, which is API 1.0 and is deprecated. But it neither states that nor mentions String.Middle.

1 Like

We (humans) seem to have a problem with 1s & 0s. Examples of this are:

This is the 21st Century, but our years are 20xx

I am 62 but in my 63rd year

Take a baby for instance. Starts out being days old then after about a month becomes weeks and isn’t really measured in years of age until the terrible two’s.

And we can’t even agree that 2000 was the last year of the 20th century or the first of the 21st.

For consistency Xojo have decreed that all elements will be zero based so the first unicode character in a string will be at position 0 as it would be if the string were an array of unicode characters. Computers have pushed us toward 0 based math, now we just need to apply that to strings too :slight_smile:

2 Likes

https://documentation.xojo.com/api/data_types/string.html#string-middle

Here’s the logic I used to convince Geoff that Middle should be 0-based. It was originally 1-based in the first API2 versions.

If I’m trying to parse something like Var Target As String = "Key=Value", I can use Var KeyLen As Integer = Target.IndexOf("=") which gives me 3. This is conveniently also the number of characters I need to skip to find the equal sign. And what does Left need? A number of characters. So I can do Var LeftPart As String = Target.Left(KeyLen). Now to get the right side, I just need Middle, plus one character to skip the equals sign: Var RightPart As String = Target.Middle(KeyLen + 1).

This all works really nicely in your head, which is important.

Mid and InStr weren’t exactly hard to figure out, but they are absolutely less intuitive.

Var Target As String = "Key=Value"
Var EqualsPosition As Integer = InStr(Target, "=")
Var LeftPart As Integer = Left(Target, EqualsPosition - 1)
Var RightPart As Integer = Mid(Target, EqualsPosition)

Even after 20 years of writing Xojo code, this feels really weird. It feels like the position is after the equals sign. That I have to go backwards to find the chunk before it.

Essentially, IndexOf is “how many characters did you skip to find the one being searched for.” That sounds right to me. Left and Right skip a certain number of characters too. Why wouldn’t Middle work the same way?

2 Likes

Index twiddling was the worst part of a few one-based functions. I no longer index twiddle, and I no longer have to guess what the output is. It makes sense in my head now, just like Thom describes.

I am so very grateful it did change.

2 Likes

Computer math at its core level, when dealing with sparse data, is a very offset based math. Offset based math is 0-based, if you need using a one based, you need to a i = i - 1 internally all the time, so people just abolished 1 based indexes in most modern languages.

Imagine a vector with this data of 5 blocks of 4 chars each: AAAASSSSDDDDFFFFGGGG

The vector starts at address 12345678 of the memory. Let’s call it “base”.

i is the index of each block

Let’s make i one-based and the offset formula to reach each block is:

offset = (i - 1) * 4 + base

Can we make it faster? Yes, make it zero based and the computation simplifies to just

offset = i * 4 + base

Zero based indexes are not very natural for mathematicians, so languages as FORTRAN and MATLAB are still 1-Based. The immense majority is 0-based, for simplicity, speed, and maintaining an assumed computer science standard people expect in most languages these days.

Thank you! The reason I said that I can find nothing was the result of this search:

1 Like

I understand. The worst experience in search that I know.

image

Well, you could have it both ways, as in my native language, PL/I (yes, I know that dates me!):

DCL ARRAY(10) /* 1-based by default. */
DCL ARRAY1(0:9) /* Zero-based. */
DCL ARRAY2(500:509) /* Any bounds you like. */

P.S. Something’s eating all the asterisks except the first and last. Why is that?

P.P.S. Thanks, Tim.

You need to put code inside code brackets. Use three backticks ``` or the </> button.

1 Like

In Pascal the compiler is very clever and can handle many things as predefined indexes.
That’s elegant, but has a cost some languages just don’t have chosen to pay.

type
   vector = array [ 0..999] of real;
   vector1 = array [ 1..999] of real;
   rgb = ( red, green, blue);
   color = array of [rgb] of integer;
   
var
   v0: vector;
   v1: vector1;
   aColor: color;

aColor[red] := 100;
2 Likes