Splitting text by endOfLine

I have been, and want to continue, allowing my users to import long lists from a text file rather than typing them into my program. I have been reading the text file.

dim tis As textInputStream = TextInputStream.Open(f) if tis <> nil Then dim theStr As String = tis.ReadAll iList() = Split(theStr, EndOfLine) end if
I find that this works only some of the time depending on which app created the text file. I also tried endOfLine.UNIX and the results are the same. When the file was exported as .txt by Pages, it worked both ways. When by Tex-Edit or Text Wrangler, it didn’t split the string correctly either with .UNIX or not.
This is with a Cocoa application.
Is there something I can do other than requiring my users to use Pages to make this work with all word processing apps?

Try ReplaceLineEndings first so you know what they are

Nevermind. Just discovered that changing my code to

    [code]while not tis.EOF
      iList.Append tis.ReadLine
    wend[/code]

Let Xojo do its work without my interference.

That was my point, Karen. Replace them with what?

Taking a peek at: http://documentation.xojo.com/index.php/ReplaceLineEndings should clear things up. It handles what to search for. You then replace it with something you know, such as EndOfLine. You can then split on EndOfLine and know that it will do the right thing since all your line endings (of an unknown type) are now of a known type.

Your solution is fine, but if memory is of no concern, a lot of times its faster to read the whole file in one action.

something like this?

dim v() as string
v=split(replacelineendings(textinput.readall,endofline.unix),endofline.unix)

where textinput is a open TEXTINPUTSTREAM

I have found a universal way that seems to work for me:

[code]dim tis As textInputStream = TextInputStream.Open(f)

if tis <> nil Then
dim theStr As String = tis.ReadAll
theStr = ReplaceAll(theStr, Chr(13), “|”)
theStr = ReplaceAll(theStr, Chr(10), “|”)
theStr = ReplaceAll(theStr, “||”, “|”)
theStr = ReplaceAll(theStr, “|”, EndOfLine.Unix)
iList() = Split(theStr, EndOfLine.Unix)
end if[/code]

[quote=86238:@Simon Berridge]I have found a universal way that seems to work for me:
[/quote]

What is wrong with ReplaceLineEndings? Dave’s solution does the same thing as yours as far as I can tell, probably quite a bit faster. Yours has some error checking, which is needed but as for the string manipulation. Merging the two just use:

Dim lines() As String
Dim tis As TextInputStream = TextInputStream.Open(f)

If tis <> Nil Then
    lines = Split(ReplaceLineEndings(tis.ReadAll, EndOfLIne), EndOfLine)
End If

Simon, I strongly recommend that you look into ReplaceLineEndings. First, it will do all yours does in one line. Second, yours will fail if the text contains a vertical bar so it’s not reliable.

theStr = ReplaceLineEndings( theStr, EndOfLine.UNIX )

For some of just trusting that the Xojo compiler knows what the line endings are for the replace all EndOfLine is not enough.

I suffer a little from OCD and have had bad experiences with EOL in the past (Delphi, VB). I know that universally EOL is either #13#10 or #13 and my method guarantees that any/either of these will be picked up.

In fact, the last two lines:

theStr = ReplaceAll(theStr, "|", EndOfLine.Unix) iList() = Split(theStr, EndOfLine.Unix)
can be changed to:

iList() = Split(theStr, "|")

which will save a step.

I am not advocating using my method, just stating that this is what I use for reading and parsing text files. My programs use a lot of text input and some of the files run into megabytes of data (for a text file that is huge). I see virtually no change in the speed!

I have just seen Kem’s reply. I don’t advocate “|”, it was used as an example. It very much depends upon whether you know the input style or not. I have used “±” and “§” as well! I have to admit to using ReplaceLineEndings too, but I just have a doubt in my mind every time I do!

Apologies, as I do not wish to offend…

Just run a few tests. I’m sure you’ll find that ReplaceLineEndings is much more reliable and faster. No more reason to not trust it than the Split, Mid, Left, Right, or ReplaceAll methods (i.e. any other method Xojo provides for your use).

Jeremy

In essence I agree with you inasmuch as I trust Xojo for most things.

However, when I see a recommended pice of code that says:

lines = Split(ReplaceLineEndings(tis.ReadAll, EndOfLIne), EndOfLine)

I question that it doesn’t look like it will actually do anything!

Also, Split, Mid, Left etc. all work on straight text whereas EOL can be one of two different things and my little mind has some trouble in understanding how iy knows the difference!

Just my OCD!

ReplaceLineEndings works fine, I know, I have used it as well. However, I do not use it for my commercial apps as I just can’t control what files my users are loading!

[quote=86246:@Simon Berridge]Jeremy

In essence I agree with you inasmuch as I trust Xojo for most things.

However, when I see a recommended pice of code that says:

lines = Split(ReplaceLineEndings(tis.ReadAll, EndOfLIne), EndOfLine)

I question that it doesn’t look like it will actually do anything![/quote]

Maybe the docs would help. What ReplaceLineEndings does is replace all various line ending styles with 1 line ending style. So, it only needs 2 parameters, 1 being the string to do the work on and 2 being the line ending that you want to replace all the various line endings with. So…

str = ReplaceLineEndings("Hi\
How are you\\rWhat are you doing\\r\
Today\
\\r", EndOfLine)

Will result in str having only 1 line ending style, that of EndOfLine. It will now look like:

"Hi\
How are you\
What are you doing\
Today\
"

You than then split based on that one line ending, EndOfLine

That’s exactly the situation that ReplaceLineEndings is meant for.

Btw, you don’t need that many lines in any case.

theStr = theStr.ReplaceAll( EndOfLine.Windows, EndOfLine.UNIX )
theStr = theStr.ReplaceAll( EndOfLine.Macintosh, EndOfLine.UNIX )

You can actually use it to replace it with anything or nothing. This is perfectly acceptable.

s = ReplaceLineEndings( s, "$" )

[quote=86238:@Simon Berridge]I have found a universal way that seems to work for me:

dim tis As textInputStream = TextInputStream.Open(f) if tis <> nil Then dim theStr As String = tis.ReadAll theStr = ReplaceAll(theStr, Chr(13), "|") theStr = ReplaceAll(theStr, Chr(10), "|") theStr = ReplaceAll(theStr, "||", "|") theStr = ReplaceAll(theStr, "|", EndOfLine.Unix) iList() = Split(theStr, EndOfLine.Unix) end if
[/quote]
Yeah dont do this this way
You will never be able to put | in your text

Just do

if tis <> nil Then
  dim theStr As String = tis.ReadAll
  theStr = replaceLineEndings( theStr , endofline)
  iList() = Split(theStr, EndOfLine)
end if

Its the right way to do it since ReplaceLineEndings will run FIRST and return a string that has one consistent form of line ending which you then split on

As to “not doing anything” … please note that the code I posted and the code mentioned afterwards is NOT the same

my code, replaces All lineendings of ANY style with a KNOWN value, and splits on that known value, regardless of platform

v=split(replacelineendings(textinput.readall,endofline.unix),endofline.unix)

other versions mentioned

v=split(replacelineendings(textinput.readall,endofline),endofline)

Not saying my way is “right” and others are “wrong”, just wanted to clarify that two different versions of very similar code were being discussed.

The difference is that using EndOfLine will give you platform correct line endings

Which is, in general, subtly “better” since if you don’t immediately split but use the text for some other purpose where you need them

For the case where the replace is immediately followed by a split it really won’t matter

Following this thread, I have decided to use…

iList() = Split(ReplaceLineEndings(tis.ReadAll, EndOfLIne), EndOfLine)

which seems to be working for all situations.
Thank you all for your enlightening input.