CR / LF Order in a Text File Affects File Contents

Lol, “EDI Standard” is a mythical beast, like a unicorn or bigfoot.

2 Likes

You can change the line endings in BBedit to your choice.

Ah, the elusive “EDI Standard”…

I thought I saw the beast once, back when I was working for a logistics company in the mid-aughts, but when I looked again it had vanished faster than a shipping container full of Nintendo Wiis left in an unsecured parking lot.

Because this topic is about CR LF, I hope I’m not highjacking by asking a related “How to” question.

I have a text file, copied from a web page (drag, select, and cmd-C) and instead of word-wrap, there’s a CR or LF or some combination - I can find out with Hex Fiend - at the end of a line. Thanks for the heads-up on BBEdit’s modification.

I want to change the instances of single CR’s, replacing them with a space, and reduce the double CR’s to one CR.

So this seems to be a variation of the chicken/fox/corn problem. How do I seek and substitute the occurrences of one CR with a space, without changing the occurrences of two CR’s to two spaces?

I’m pretty sure searching for two CR’s and replacing with one CR is straight forward. It’s changing the one CR to a space and gums of the (mental) works.

In yesteryear, I’d seek the two CR’s and replace them with some “nonsense” symbols, like “#@”, then I’d seek the remaining single CR’s, changing them to . Finally, I see that “#@” and change it to a single CR.

Is there a more elegant way?

I’m using “CR” in this example but I’ll use Hex Fiend to find out what is actually used (CR, LF, or CR+LF)

Off the top of my head…

page = page.ReplaceLineEndings( &uA ) // Standardize on LF
page = page.ReplaceAll( &uA + &uA, &uD ) // Replace double-LF with CR
page = page.ReplaceAll( &uA, " " ) // Replace single LF with space
page = page.ReplaceAll( &uD, &uA ) // Restore the LF

If you might have more than two consecutive EOL characters that you want to replace with a single one, you’d have to use a RegEx.

Actually, this might be more elegant.

(Again, not tested.)

var rx as new RegEx
rx.Options.ReplaceAllMatches = true

rx.SearchPattern = "\R(\R?)"
rx.ReplacementPattern = " \1"

page = rx.Replace( page )

Where there are double-EOL, you will get space-EOL. Where there is a single-EOL, you will get a space.

Thank you Kem, I’ll look at RegEx to understand the symbols.

I’m inputting a text grab into a database. It’s not a lot of text and the database allows a Load-from-Clipboard operation. With the data as it is, the word wrapped text, with its single EOL, creates a new record - which I don’t want. So the space substitute will continue the text line instead of making a new record.

The double EOL, gives an additional blank record. It’s easy enough to select those and delete them, but better not to have blank records at all.

With your scheme, as it stands, there will be an extra space before the EOL. It actually might be handy to have that as a parsing stop. What did they call those in yesteryear … a sentinel.