If the context of a line of code the + and - symbols can denote two different situations.
- A binary operation (ie. requires TWO arguments) X = 3 - 2
- A Unary operation (ie. applies only to the next argument) X = -3
What I am trying to insure is that I have all the situations that indicate a UNARY scenario.
This is what I have, and I THINK it covers everything… but I’m not 100% sure
Note : all these assume that whitespace is ignored
- If the symbol preceding the +/- is a (
- the +/- is 1st symbol on a line
- the +/- is preceded by an operator (* / + -) or equal sign (=)
- the +/- is preceded by a keyword but NOT a variable name
I guess you should add the edge case of when a calculation is broken into multiple lines by the
_ line continuation operator?
For myself, I dislike this kind of syntax ambiguity when it comes to treating a positive value as a negative. But because there is no opposite function to
Abs (which is true in many languages), I prefer to write variables in this case like
(myPositiveNum * -1) rather than
-myPositiveNum. Except of course when I’m writing a literal negative value, e.g.,
You get the same result with
(myPositiveNum * -1), plus it stands out more in the code when you’re reading it.
I got caught the other day when I was reading some example code in the forum and totally missed the use of inverse assignment on a variable name. Had me really confused for a bit.
well the abiguity exists… and I’m trying to “remove” it by examining the context in which it exists.
And your suggestion of “_” I have covered with the EOL case
Just looking down my test list for my reformat code script:
i = -(a+b)
i = a^-b (I assume you have all operators covered?)
Do you need to handle comma separators in multi parameter functions?
Create a lexical analyzer. What you called “symbol” is an operator, if the previous token was a literal or an identifier or an expression ( “)” ), the “-” is “subtract token” otherwise consider it a “negate token”.
Pretty much what I am doing.
so it boils down to this
A + or - symbol are considered to be a mathematical operator unless
it is preceded by ) or , or = or any mathematical operator (±/*^ etc) or any comparison operator (> < etc)
“-” is unary if in its context it is equivalent to (-x)
it’s “binary” when is preceded by another “expression”.
So, in “n = -x”, - is unary, and “n = y -x” it should be binary.
in “n = (2*4) -x” also binary
in “n = 2 * -x” is unary, the same as “n = 2 * (-x)”
which is what I just said I thought?
I didn’t read what I said in your words. What’s a “mathematical operator”? Both the unary and binary are.
The “what” operators are is defined in the context of the “invented language”. You make the rules.
But in a wide context, usually, a “-” is unary when NOT preceded by a expression ( (2*3), (2+a), …) or an identifier (a variable), it means, a second operand between the operator.
and then what do you do with
x = x * - 2
x = x * - (3*1)
you also have to skip / etc whitespace IF you allow unary operators to have whitespace between them & the number/expression
As far as I recall parser generators like flex/bison dont try to resolve this at the level of a “token” because it can be ambiguous (ie/ there’s no token for “expression”) and this should be part of the semantic analysis
Compilers can be multi pass (multiple steps like a dumb lexical analysis, followed by syntactical and semantical) or have less steps, sometimes just one pass, because you constructed an smart analyzer that can keep contexts while going and infer things from the tree.
Just in the firsts pages of the Niklaus Wirth’s “Compiler Construction” you find basic answers for all those questions http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf
[quote=466826:@Norman Palardy]and then what do you do with
x = x * - 2
x = x * - (3*1)[/quote]
The first translates to x = x * y (y=-2) by the basic rudimentary rule I said, the second is ok also with x = x * y (y=-z, (z=3*1))
x = x * - (foo*bar)
theres still no TOKEN for “expression”
thats usually left to the actual parser NOT the tokenizer to decide if the - is unary or not
EDIT : and this has NOTHING to do with whether or not the compiler is single pass or multi-pass
The token “(” is here comes an expression.
One pass compilers makes interleaved processing of lexic, syntax and semantic analysis.
( may not be the only lead in to an expression depending on the grammar for whatever language it is you’re parsing