If the context of a line of code the + and - symbols can denote two different situations.
A binary operation (ie. requires TWO arguments) X = 3 - 2
A Unary operation (ie. applies only to the next argument) X = -3
What I am trying to insure is that I have all the situations that indicate a UNARY scenario.
This is what I have, and I THINK it covers everything… but I’m not 100% sure
Note : all these assume that whitespace is ignored
If the symbol preceding the +/- is a (
the +/- is 1st symbol on a line
the +/- is preceded by an operator (* / + -) or equal sign (=)
the +/- is preceded by a keyword but NOT a variable name
I guess you should add the edge case of when a calculation is broken into multiple lines by the _ line continuation operator?
For myself, I dislike this kind of syntax ambiguity when it comes to treating a positive value as a negative. But because there is no opposite function to Abs (which is true in many languages), I prefer to write variables in this case like (myPositiveNum * -1) rather than -myPositiveNum. Except of course when I’m writing a literal negative value, e.g., -3.
You get the same result with (myPositiveNum * -1), plus it stands out more in the code when you’re reading it.
I got caught the other day when I was reading some example code in the forum and totally missed the use of inverse assignment on a variable name. Had me really confused for a bit.
Create a lexical analyzer. What you called “symbol” is an operator, if the previous token was a literal or an identifier or an expression ( “)” ), the “-” is “subtract token” otherwise consider it a “negate token”. https://www.rosettacode.org/wiki/Compiler
“-” is unary if in its context it is equivalent to (-x)
it’s “binary” when is preceded by another “expression”.
So, in “n = -x”, - is unary, and “n = y -x” it should be binary.
in “n = (2*4) -x” also binary
in “n = 2 * -x” is unary, the same as “n = 2 * (-x)”
The “what” operators are is defined in the context of the “invented language”. You make the rules.
But in a wide context, usually, a “-” is unary when NOT preceded by a expression ( (2*3), (2+a), …) or an identifier (a variable), it means, a second operand between the operator.
you also have to skip / etc whitespace IF you allow unary operators to have whitespace between them & the number/expression
As far as I recall parser generators like flex/bison dont try to resolve this at the level of a “token” because it can be ambiguous (ie/ there’s no token for “expression”) and this should be part of the semantic analysis
Compilers can be multi pass (multiple steps like a dumb lexical analysis, followed by syntactical and semantical) or have less steps, sometimes just one pass, because you constructed an smart analyzer that can keep contexts while going and infer things from the tree.