Regex help : extract between first level commas

Hi folks,

I have a string with items separated by commas (it is a column list in an sql select query)

item1,item2,...itemn

and sometimes I have items that contains commas, but are surrounded by parenthesis

(item1,item1.1,..item1.n),item2,...,itemp

what would be the regex to extract all items from the first-level commas ?
like

code
item2

itemp[/code]

thanks for any help.

(\\([^)]*\\)|[^,]+)(?=,|$)

This assumes that, were parens are used, there won’t be embedded parens, and the closing paren is always followed by a comma or ends the line. It also assumes there is a match in each column as empty columns will be skipped.

there are multiple parenthesis level in one item.
in fact it is to extract some sql subqueries and consider only each resulting column.
the subquery has to be considered as one column.

(select account.code from account where account.id=convert (SUBSTRING_INDEX(SUBSTRING_INDEX(sales_document.custom_fields_raw, 'custom_2921:', -1), ';', 1) , unsigned)), sales_document.code, sales_document.date, account.name, sales_document.customer_reference, sales_document.amount, sales_document.vat_amount, sales_document.amount_with_vat, (select sum(amount) from sales_schedule where sales_schedule.sales_document_id=sales_document.id), status.label

Please post complete representative text and I’ll see what I can do.

it is in my previous post.

I found this expression : ,(?![^\\(\\)]*\\))
here : https://www.statalist.org/forums/forum/general-stata-discussion/general/1485238-extracting-commas-between-specific-characters

it works as it finds the right commas I need to use. then replaces it with a pipe | and I can search with NthField
not the most effective method, but it works …
Kem if you have anything more efficient, I’m taking it !

The pattern I posted works with the text you posted, but since you said there could be embedded parens and your sample text doesn’t show that, I don’t consider it completely representative.

sorry about that, I wasn’t complete with that. the text I posted in my second post is the one that I need to be parsed.

Missed that in the cross-post, sorry.

This is getting trickier than you might imagine. What happens if there is a comma between quotes or in a remark? I think it’s doable, but more work than I can put into it at the moment.

OK, fine, try this:

(?x)
(?(DEFINE)
  (?<comment>--.*|(?U)/\\*[\\s\\S]*\\*/)
  (?<quote>(?U)'[^']*'|"[^"]*")
  (?<paren>(?U)\\(((?&paren)|[^)])*\\))
)
((?&comment)(*SKIP)(*FAIL)|(?&quote)(*SKIP)(*FAIL)|(?&paren)|[^,]+)(?=,|\\z)

everything can be tricky with regex …