Determine if a symbol is Unary or not

DaveS · December 10, 2019, 1:14am

If the context of a line of code the + and - symbols can denote two different situations.

A binary operation (ie. requires TWO arguments) X = 3 - 2
A Unary operation (ie. applies only to the next argument) X = -3

What I am trying to insure is that I have all the situations that indicate a UNARY scenario.
This is what I have, and I THINK it covers everything… but I’m not 100% sure
Note : all these assume that whitespace is ignored

If the symbol preceding the +/- is a (
the +/- is 1st symbol on a line
the +/- is preceded by an operator (* / + -) or equal sign (=)
the +/- is preceded by a keyword but NOT a variable name

anon93744516 · December 10, 2019, 2:05am

I guess you should add the edge case of when a calculation is broken into multiple lines by the _ line continuation operator?

For myself, I dislike this kind of syntax ambiguity when it comes to treating a positive value as a negative. But because there is no opposite function to Abs (which is true in many languages), I prefer to write variables in this case like (myPositiveNum * -1) rather than -myPositiveNum. Except of course when I’m writing a literal negative value, e.g., -3.

You get the same result with (myPositiveNum * -1), plus it stands out more in the code when you’re reading it.

I got caught the other day when I was reading some example code in the forum and totally missed the use of inverse assignment on a variable name. Had me really confused for a bit.

DaveS · December 10, 2019, 2:12am

well the abiguity exists… and I’m trying to “remove” it by examining the context in which it exists.

And your suggestion of “_” I have covered with the EOL case

anon20074439 · December 10, 2019, 5:59am

Just looking down my test list for my reformat code script:

i = -(a+b) a(a,-1) i = a^-b (I assume you have all operators covered?)

Robert_Weaver · December 10, 2019, 8:07am

Do you need to handle comma separators in multi parameter functions?
Such as:
y=Atan2(a,-b)

Rick_Araujo · December 10, 2019, 11:45am

Create a lexical analyzer. What you called “symbol” is an operator, if the previous token was a literal or an identifier or an expression ( “)” ), the “-” is “subtract token” otherwise consider it a “negate token”.
https://www.rosettacode.org/wiki/Compiler

DaveS · December 10, 2019, 3:05pm

Pretty much what I am doing.

so it boils down to this
A + or - symbol are considered to be a mathematical operator unless

it is preceded by ) or , or = or any mathematical operator (±/*^ etc) or any comparison operator (> < etc)

Rick_Araujo · December 10, 2019, 5:35pm

“-” is unary if in its context it is equivalent to (-x)
it’s “binary” when is preceded by another “expression”.
So, in “n = -x”, - is unary, and “n = y -x” it should be binary.
in “n = (2*4) -x” also binary
in “n = 2 * -x” is unary, the same as “n = 2 * (-x)”

DaveS · December 10, 2019, 5:37pm

which is what I just said I thought?

Rick_Araujo · December 10, 2019, 5:40pm

I didn’t read what I said in your words. What’s a “mathematical operator”? Both the unary and binary are.

Rick_Araujo · December 10, 2019, 5:43pm

The “what” operators are is defined in the context of the “invented language”. You make the rules.
But in a wide context, usually, a “-” is unary when NOT preceded by a expression ( (2*3), (2+a), …) or an identifier (a variable), it means, a second operand between the operator.

Norman_Palardy · December 10, 2019, 5:49pm

and then what do you do with

x = x * - 2
or
x = x * - (3*1)

you also have to skip / etc whitespace IF you allow unary operators to have whitespace between them & the number/expression

As far as I recall parser generators like flex/bison dont try to resolve this at the level of a “token” because it can be ambiguous (ie/ there’s no token for “expression”) and this should be part of the semantic analysis

see https://en.wikipedia.org/wiki/Flex_(lexical_analyser_generator)

Rick_Araujo · December 10, 2019, 6:16pm

Compilers can be multi pass (multiple steps like a dumb lexical analysis, followed by syntactical and semantical) or have less steps, sometimes just one pass, because you constructed an smart analyzer that can keep contexts while going and infer things from the tree.

Just in the firsts pages of the Niklaus Wirth’s “Compiler Construction” you find basic answers for all those questions http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

Rick_Araujo · December 10, 2019, 6:23pm

[quote=466826:@Norman Palardy]and then what do you do with

x = x * - 2
or
x = x * - (3*1)[/quote]

The first translates to x = x * y (y=-2) by the basic rudimentary rule I said, the second is ok also with x = x * y (y=-z, (z=3*1))

Norman_Palardy · December 10, 2019, 6:49pm

here then

x = x * - (foo*bar)

theres still no TOKEN for “expression”
thats usually left to the actual parser NOT the tokenizer to decide if the - is unary or not

EDIT : and this has NOTHING to do with whether or not the compiler is single pass or multi-pass

Rick_Araujo · December 10, 2019, 8:25pm

The token “(” is here comes an expression.

One pass compilers makes interleaved processing of lexic, syntax and semantic analysis.

https://en.wikipedia.org/wiki/One-pass_compiler

Norman_Palardy · December 10, 2019, 9:33pm

( may not be the only lead in to an expression depending on the grammar for whatever language it is you’re parsing

Rick_Araujo · December 11, 2019, 12:28pm

Correct.