I need to split a string based on UNIX newlines. What I’m trying to do is end up with a String array of the actual lines of text.
Var s1 As String = EndOfLine.UNIX
Var s2 As String = EndOfLine.UNIX + EndOfLine.UNIX
Var s3 As String = "Hi" + EndOfLine.UNIX + "you"
Var a1() As String = s1.Split(EndOfLine.UNIX) // Two empty elements
Var a2() As String = s2.Split(EndOfLine.UNIX) // Three empty elements
Var a3() As String = s3.Split(EndOfLine.UNIX) // Two elements containing "Hi" and "you"
Break
This to me doesn’t seem like the correct behaviour. What I would expect to see is:
The function splits ON the character and the character is lost.
Your comment descriptions next to the lines are correct, the expectations for the empty elements are off by one.
// One EOL to split on, two elements
// a1 = ["", ""]
// Two EOLs to split on, three elements
// a2 = ["", "", ""]
// One EOL to split on, two elements
// a3 = ["Hi", "you"]
Hmm. Does anyone have a good suggestion then to achieve what I’m trying to do? If it splits on the character and then loses it I don’t think that’s the correct approach to achieve what I’m hoping. It’s not like I can simply count the size of the resultant array and subtract one and that will give me the correct number of lines (because I don’t think it does). For example, a1 and a3 both have two elements but s1 is only one line and s3 is two lines.
you can simply String.TrimRight(EndOfLine.UNIX) and then do the split ?
Var s1 As String = EndOfLine.UNIX
Var s2 As String = EndOfLine.UNIX + EndOfLine.UNIX
Var s3 As String = "Hi" + EndOfLine.UNIX + "you"
s1 = s1.TrimRight(EndOfLine.UNIX)
Var a1() As String = s1.Split(EndOfLine.UNIX) // Two empty elements
System.DebugLog "s1 = " + String.FromArray(a1, "-")
s2 = s2.TrimRight(EndOfLine.UNIX)
Var a2() As String = s2.Split(EndOfLine.UNIX) // Three empty elements
System.DebugLog "s2 = " + String.FromArray(a2, "-")
s3 = s3.TrimRight(EndOfLine.UNIX)
Var a3() As String = s3.Split(EndOfLine.UNIX) // Two elements containing "Hi" and "you"
System.DebugLog "s3 = " + String.FromArray(a3, "-")
Break
My current editor (XUICodeEditor) doesn’t but I do want to support that in the new one I’m working (at least eventually).
I’m still in the prototyping phase - trying to figure out the best way to store the text and line starts/finishes.
I’m currently leaning towards storing all the text in a gap buffer (line endings as well) and probably storing the start and finish offset of each line in a TextLine class (which are stored in an array in a LineManager class of the editor). By storing all the text in a gap buffer in theory that gives me the chance to soft wrap if needed.
Splitting on a character “that is included” in the string implies that there are two parts, before AND after.
Var s1 As String = EndOfLine.UNIX (thus has two parts)
“” before and “” after
if it was
Var s1 As String = “”
then you would have one part, which would be “”
Your s3 only has two parts because you only have 1 actual endofline.unix
Your s2 has 3 parts because there are 2 endoflines
the first part of s2 splits it into
“”, endofline
that remaining endofline splits into
“”, “”
The result is
“”, “”, “”
When doing a split you will always end up with n + 1 the number of times the character you are splitting on ACTUALLY is in the string. You can’t imply more.
Another way to think of it would be like electric poles and wires. Electric poles would be the character(s) to split and the wires are the returned parts.