RegEx split from Standard Shell Output

Numerous shell commands output data in a human-readable format like this one on Mac OS X. How can one use RegEx to parse out each line into an array of values knowing that you can’t simply use “whitespace” as a variable as the final column is a path and paths may contain spaces as well.

MBP:~ me$ lsof
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
loginwind 97 me cwd DIR 1,4 1122 2 /
loginwind 97 me txt REG 1,4 840176 20150460 /System/Library/CoreServices/loginwindow.app/Contents/MacOS/loginwindow
loginwind 97 me txt REG 1,4 76240 20068359 /System/Library/Extensions/AppleHDA.kext/Contents/PlugIns/AppleHDAHALPlugIn.bundle/Contents/MacOS/AppleHDAHALPlugIn
loginwind 97 me txt REG 1,4 55248 12563538 /System/Library/ColorSync/Profiles/Generic CMYK Profile.icc
loginwind 97 me txt REG 1,4 35904 12944831 /System/Library/Caches/com.apple.IntlDataCache.le.kbdx
loginwind 97 me txt REG 1,4 50744 22475037 /private/var/db/mds/system/mdsDirectory.db
loginwind 97 me txt REG 1,4 1960 12563542 /System/Library/ColorSync/Profiles/Generic RGB Profile.icc
loginwind 97 me txt REG 1,4 24439776 20082672 /usr/share/icu/icudt53l.dat

looks like

([^\s])\s([^\s])\s([^\s])\s([^\s])\s([^\s])\s([^\s])\s([^\s])\s([^\s])\s(.*)

I’m sure kem will post something else

A non-regex solution would be to split by space anyway. Take the first 8 fields as they are, then join 9…ubound by a space.

A regex could be worked out, but it would be tricky since there might be missing columns.

In this case, I would forgo RegEx entirely. Instead, look at the man page for lsof for the “-F” option and the section OUTPUT FOR OTHER PROGRAMS. I think you’ll ultimately get more accurate results and have an easier time of it.

Not knowing anything about it but just looking at it:

  • use instr to find the first “/”
  • divide it into 2 strings using that
  • split the first with space character
  • trim as required

You can split, but it will have to be on column widths. There can be columns with no data in them, just spaces.

That being so, I’d use your OUTPUT FOR OTHER PROGRAMS plan. Apologies for commenting from a state of customary ignorance.