String.Split with newlines

I need to split a string based on UNIX newlines. What I’m trying to do is end up with a String array of the actual lines of text.

Var s1 As String = EndOfLine.UNIX
Var s2 As String = EndOfLine.UNIX + EndOfLine.UNIX
Var s3 As String = "Hi" + EndOfLine.UNIX + "you"

Var a1() As String = s1.Split(EndOfLine.UNIX) // Two empty elements
Var a2() As String = s2.Split(EndOfLine.UNIX) // Three empty elements
Var a3() As String = s3.Split(EndOfLine.UNIX) // Two elements containing "Hi" and "you"

Break

This to me doesn’t seem like the correct behaviour. What I would expect to see is:

// a1 = [""]

// a2 = ["", ""]

// a3 = ["Hi", "you"]

The function splits ON the character and the character is lost.

Your comment descriptions next to the lines are correct, the expectations for the empty elements are off by one.

// One EOL to split on, two elements
// a1 = ["", ""]

// Two EOLs to split on, three elements
// a2 = ["", "", ""]

// One EOL to split on, two elements
// a3 = ["Hi", "you"]
1 Like

What I expect is:

  • with no endofline.unix - 1 element
  • with one endofline.unix - 2 elements
  • with two endofline.unix - 3 elements
    no matter if the elements are empty or not
1 Like

Hmm. Does anyone have a good suggestion then to achieve what I’m trying to do? If it splits on the character and then loses it I don’t think that’s the correct approach to achieve what I’m hoping. It’s not like I can simply count the size of the resultant array and subtract one and that will give me the correct number of lines (because I don’t think it does). For example, a1 and a3 both have two elements but s1 is only one line and s3 is two lines.

you can simply String.TrimRight(EndOfLine.UNIX) and then do the split ?

Var s1 As String = EndOfLine.UNIX
Var s2 As String = EndOfLine.UNIX + EndOfLine.UNIX
Var s3 As String = "Hi" + EndOfLine.UNIX + "you"

s1 = s1.TrimRight(EndOfLine.UNIX)
Var a1() As String = s1.Split(EndOfLine.UNIX) // Two empty elements
System.DebugLog "s1 = " + String.FromArray(a1, "-")

s2 = s2.TrimRight(EndOfLine.UNIX)
Var a2() As String = s2.Split(EndOfLine.UNIX) // Three empty elements
System.DebugLog "s2 = " + String.FromArray(a2, "-")

s3 = s3.TrimRight(EndOfLine.UNIX)
Var a3() As String = s3.Split(EndOfLine.UNIX) // Two elements containing "Hi" and "you"
System.DebugLog "s3 = " + String.FromArray(a3, "-")

Break

DebugLog (macos):

s1 =
s2 =
s3 = Hi-you

Build a function that does Trim + Split.

Does your editor offer text wrapping? I would think this method of splitting by endofline wouldn’t be sufficient.

That’s a can of worms.

My current editor (XUICodeEditor) doesn’t but I do want to support that in the new one I’m working (at least eventually).

I’m still in the prototyping phase - trying to figure out the best way to store the text and line starts/finishes.

I’m currently leaning towards storing all the text in a gap buffer (line endings as well) and probably storing the start and finish offset of each line in a TextLine class (which are stored in an array in a LineManager class of the editor). By storing all the text in a gap buffer in theory that gives me the chance to soft wrap if needed.

As I understand it

Splitting on a character “that is included” in the string implies that there are two parts, before AND after.

Var s1 As String = EndOfLine.UNIX (thus has two parts)
“” before and “” after

if it was
Var s1 As String = “”
then you would have one part, which would be “”

Your s3 only has two parts because you only have 1 actual endofline.unix

Your s2 has 3 parts because there are 2 endoflines
the first part of s2 splits it into
“”, endofline
that remaining endofline splits into
“”, “”
The result is
“”, “”, “”

When doing a split you will always end up with n + 1 the number of times the character you are splitting on ACTUALLY is in the string. You can’t imply more.

1 Like

Another way to think of it would be like electric poles and wires. Electric poles would be the character(s) to split and the wires are the returned parts.

1 Like

Thanks for the explanations guys - they’re really helpful, particularly the wires and poles analogy.

I’m going to create a new topic about managing lines since it’s slightly different that this one.

In fact, it is a matter of “Number of Delimiters” and “Number of Intervals”.

Think:
I was born on a Mardi Gras.
On some rare years, Mardi Gras is also my Birthday. On these years, I am n years old, and n+1 Mardi Gras.

At first it does not seems obvious…