Parsing a natural language query to spotlight/mdfind syntax

I’m working on a project to convert a natural language query like this:

bright red AND green AND “black and blue” NOT “orange”

to syntax I can use with mdfind. It’s not trivial, because one has to distinguish between the words and/or/not and their logical operator counterparts, and preserve quoted text as strings.

As I’m slogging along, I wonder if anyone else has solved parsing problems like this (or if there is some neat shortcut in OS X).

You need a “tokenizer” of some kind that can find “tokens”
SO in the statement above you might have

word token “bright”
word token “red”
logical token “and”
word token “green”
logical token “and”
quoted string “black and blue”
logical token NOT
quoted string “orange”

note bright red is NOT one token but may be one logical concept

after that the rest is “semantics” which is fun as heck