I’ve been confused about Xojo’s string conventions. At first I gathered that the first position in a string was position 0 (as shown in the IndexOf example in the documentation), then something else made me think that it was position 1, then I went back to zero again.
I’ve finally identified the culprit as Mid, which “returns a portion of a [string]… The first character is numbered 1”.
This seems inconsistent, to say the least. Are there any more traps like this? Or am I missing some subtlety?
In the conversion from API 1 to API2, string functions changed from 1-based to 0-based. Mid is API1 and 1-based, IndexOf is API2 and 0-based. The change was intended to make things more consistent, which resulted in string handling functions being changed.
IndexOf() and Mid(() serve different purposes. IndexOf() is to locate the start position of a substring, while Mid() is to extract a substring at a known starting location. In API1, the Mid() function is 1-based whereas in API2 the counterpart is Middle() which is 0-based. The name changed so that Mid() could continue to be 1-based for existing code; for those coding in API2 the thing to use is Middle() which is 0 based. (As I recall, AP1 also had IndexOf() available.)
Xojo made many cosmetic changes with the “api 2.0”
Yes, it is, there were many protesting against those changes, but… That is what xojo considered a “modern” change… There were also suggestions like having an option to make a Strict Api1 or Api2 Only apps on the IDE but they say it is ok to mix and match different api versions with aparent similar haviors but different
Edit: This is a possible failure of the new doc system. In the old docs, the first item that comes up when searching for “Mid” is Mid - Xojo Documentation. That explicitly states the replacement is String.Middle. In the new docs, the first hit is for Text.Mid, which is API 1.0 and is deprecated. But it neither states that nor mentions String.Middle.
We (humans) seem to have a problem with 1s & 0s. Examples of this are:
This is the 21st Century, but our years are 20xx
I am 62 but in my 63rd year
Take a baby for instance. Starts out being days old then after about a month becomes weeks and isn’t really measured in years of age until the terrible two’s.
And we can’t even agree that 2000 was the last year of the 20th century or the first of the 21st.
For consistency Xojo have decreed that all elements will be zero based so the first unicode character in a string will be at position 0 as it would be if the string were an array of unicode characters. Computers have pushed us toward 0 based math, now we just need to apply that to strings too
Here’s the logic I used to convince Geoff that Middle should be 0-based. It was originally 1-based in the first API2 versions.
If I’m trying to parse something like Var Target As String = "Key=Value", I can use Var KeyLen As Integer = Target.IndexOf("=") which gives me 3. This is conveniently also the number of characters I need to skip to find the equal sign. And what does Left need? A number of characters. So I can do Var LeftPart As String = Target.Left(KeyLen). Now to get the right side, I just need Middle, plus one character to skip the equals sign: Var RightPart As String = Target.Middle(KeyLen + 1).
This all works really nicely in your head, which is important.
Mid and InStr weren’t exactly hard to figure out, but they are absolutely less intuitive.
Var Target As String = "Key=Value"
Var EqualsPosition As Integer = InStr(Target, "=")
Var LeftPart As Integer = Left(Target, EqualsPosition - 1)
Var RightPart As Integer = Mid(Target, EqualsPosition)
Even after 20 years of writing Xojo code, this feels really weird. It feels like the position is after the equals sign. That I have to go backwards to find the chunk before it.
Essentially, IndexOf is “how many characters did you skip to find the one being searched for.” That sounds right to me. Left and Right skip a certain number of characters too. Why wouldn’t Middle work the same way?
Computer math at its core level, when dealing with sparse data, is a very offset based math. Offset based math is 0-based, if you need using a one based, you need to a i = i - 1 internally all the time, so people just abolished 1 based indexes in most modern languages.
Imagine a vector with this data of 5 blocks of 4 chars each: AAAASSSSDDDDFFFFGGGG
The vector starts at address 12345678 of the memory. Let’s call it “base”.
i is the index of each block
Let’s make i one-based and the offset formula to reach each block is:
offset = (i - 1) * 4 + base
Can we make it faster? Yes, make it zero based and the computation simplifies to just
offset = i * 4 + base
Zero based indexes are not very natural for mathematicians, so languages as FORTRAN and MATLAB are still 1-Based. The immense majority is 0-based, for simplicity, speed, and maintaining an assumed computer science standard people expect in most languages these days.