I’m going to add this as an example in my RegExRX Samples, but thought it would be useful to post.
There are times when you may need to split up a tab-delimited line where the fields may or may not be quoted. This pattern will determine if a field is quoted (" or ’ only) and, if so, grab everything between the quotes, including other tabs and newlines. If the field is not quoted, it will return everything until the next tab or end of line.
The pattern (which I will probably revise when I’m more awake in the morning) is this:
(?x) # Free-space mode (whitespace ignored) (?:^|\\t) # start of line or tab (non-capturing) (['"])? # optional quote at the start of the field ( # start a capturing group (?(1) # IF the quote exists (?:\\\\\\g1 | (?!\\g1). )* # THEN look for slash-quote or a character that is not # the quote (zero-or more) | # ELSE (there is no leading quote character) [^\\t\\r\ ]* # look for anything not a tab or newline (zero or more) ) # END IF ) # close the capturing group (?(1)(\\g1)) # IF the quote exists THEN capture it