Stack operations using arrays

I need to do some symbol manipulation to evaluate some expressions. Arrays with the array.append and array.pop methods seem to be the logical choice for performing stack operations, but an alternative would be to do it using strings. Just wondering if anyone can provide speed comparison info. Thanks.

Edit: To provide a bit more info, my application needs to evaluate about 40 million expressions (average of 20 characters per expression, 8 of which are operators) per run. My present method which is all string based, takes about 80 seconds to do this. However, due to some changes in the input data format, I will need to redo my code, and wondering whether I should completely redo things using arrays.

google “Shunting Yard Algorithm”
This is a very fast method to evaluate expressions… you will have to customize it some depending on what datatypes, functions and so on you want to support, but it is not that complicated.

I have used it in various projects, including “RetroBasic” which is a “TRS80” like emulator written 100% in Xojo.

it uses TWO stacks (arrays), to convert an Infix expression into an RPN type expression, which can then very easily be “processed”

Arrays will almost always be faster than string manipulation.

Thanks. Since the input expression arrives as a string, it became a question of whether I would recover the time taken by completely tokenizing it into an array, and then processing it with array operations, compared to leaving it as the original string and using string functions. Especially since there are some rather nice string functions such as replaceall(), countfields() and nthField(), that get a lot of work done, and save a lot of programming headaches.

Up until now, I’ve been able to take advantages of certain constraints in the input expression (eg., no parentheses) to use a much simpler evaluation method allowing the expression to be quickly processed left to right. The change in input format may require me to switch over to a stack based method though. BTW, the expressions are machine generated and are guaranteed to be syntactically correct, so there’s no need to error check them.

After Dave mentioned it, I remembered that I implemented a stack based expression evaluator in an assembly language IDE that I wrote a few years back. I guess I’ll have to resurrect it and see how fast it is.

why not just a xojo script and then you can process any kind of expression that it can handle ?

even if you have symbolic values you can shove those in before the expression as local variables, tack on the expression and run the whole mess as a script to get output

might mean less work & give you other opportunities to provide additional functionality

Except that as I mentioned in my first post, the application needs to evaluate about 40 million expressions per run. For each one of those, the xojoscript would have to be compiled before running. I have my doubts that it would be faster.

40 million different expressions?
or a handful of equations with 40 million possible sets of datum?

if the 2nd, code the equation into functions, and pass the data to them …

If the first… there is NOTHING that will give you a speed increase
but if you have parends, you can no longer go left to right (and even THAT might have violated precedence rules), but the algorithm I showed you handles the () to as many levels as there might be and DOES turn it into a left to right sequence that does NOT violate the precedence rules.

The only real way to know is to build and time the possibilities. I had a situation where an algorithm could be done with a Dictionary or Array or MemoryBlock. Each had strengths in different parts and I hemmed and hawed on which strength would win out but it wasn’t until implementing all 3 versions that I could say (turned out the one I thought had the least chance was fastest).

Then, profiling how each version performed in the various phases of the algorithm, I get a better sense of their tradeoffs and can better intuit where a Dictionary might be speediest. But if I really want to know and have doubts I need to actually time any entertained possibilities.

And you don’t need to necessarily implement the whole thing, just enough of some critical part to have a concrete measure of how one way will be faster than the other.

This is sort of what I expected, but thought I should get as much input as possible before spending a lot of time running up some blind alley.

[quote=282324:@Dave S]40 million different expressions?
or a handful of equations with 40 million possible sets of datum?
[/quote]
Yes, 40 million unique expressions, not just different data. Actually, in many cases it will be 80 million. However, I was in error when I said that is what the current application is solving. In fact, it’s only about 4 million expressions per run currently. But the updated one will be either 40 million or 80 million, depending on certain criteria.

To elaborate, this is all about solving an arithmetic puzzle. As we input more years into Moore’s law, we receive in return the ability to crunch ridiculously more numbers, and things that we wouldn’t have dreamed of trying a few years ago, are completely practical now. This particular one is a puzzle proposed some time ago:

Given the string of digits: 0123456789, the problem is to insert the arithmetic operators + - * / wherever and as often as necessary to create an expression that evaluates to a certain given value. For example let’s say the given value is the current year 2016. Then the possible solutions are:
0-1-2+3/45678+9
0
1+23456-7-8-9
0/1+2
3456-7-8-9
0+123456-7-8-9
0-1-2+3+45678/9
0+1+2-3+45678/9
0123+45678/9
0/1
23+45678/9
0
1/23+45678/9
0/1/2
3+45678/9
0123+45678/9
0/123+45678/9
0
12/3+45678/9
0/1
2/3+45678/9
01/2/3+45678/9
0/1/2/3+4
5678/9
0
12/3+45678/9
0/12/3+45678/9
0123+45678/9
0/123+45678/9
0
1/23+45678/9
0/1/23+45678/9
0123+45678/9
0/123+4
567*8/9

And for the digit string: 9876543210, the possible solutions are:

9-8+7/65432-1+0
9876-54321+0
9876-5432/1+0
9-8+7/6
5432-1-0
98
76-54321-0
98
76-5432/1-0
9876-5432+10
9876-5432-10
98/7/6/543210
–9+8+7/65432+1+0
–9+8+7/65432+1-0
–9-8+7654+3+210

In my current version of the program, it takes about 80 seconds to generate all of the possible solutions for a given digit string and target value. It does this by the inelegant method of generating every possible permutation of syntactically correct expressions, and then evaluating them to see if they are equal to the target value. However, there are certain target values for which there are no solutions (for example there is no solution for the digit sequence 0123456789 and the target value 3897). So, I’ve been contemplating the possibility of allowing for slightly more complicated expressions which would include parentheses, and other operators such as exponentiation, and factorial. In order to do this I would need to rework the current expression evaluator.

One of the advantages of having the program generate all of the expressions is that I can invent my own syntax where it would be more efficient, as long as it can be easily translated back into infix for final display.

This is the sort of thing that I think about when I’m supposed to be working. But I rationalize it by recognizing that whatever I learn from this exercise, it may come in handy for some serious work later on.

Have you thought of building a tree of subsolutions recursively?

Root: empty Children: 0, 01, 0123, 01234, 012345, 0123456, 01234567, 012345678, 0123456789 Children for each child: + 1, - 1, *1, /1, +12, -12, *12, /12, +123, -123, ... ...
You would need a tree item class along this:

Class TreeItem Property Value As Integer Property ValueRepresentation As String Property OperatorRepresentation As String Property Result As Double // The result of the calculation from the root to this level Property Parent As TreeItem Property Children() As TreeItem End
Then you would recursively step down from the root to the deepest level. When you encounter the 9, you check if the Result property equals to 2016. If true add the item to a “valid results array”. At the end you’ll have an array with all these valid results. For each result you then can construct a string “0 + 1 / 2 * 3…” by walking up via the Parent property.

The whole thing should run in very few seconds (< 5 seconds). The whole thing could then be tweaked (examples: replacing the operator strings with an enumeration, getting rid of the Children property which is probably not needed at all, etc.).

Eli, your suggestion of creating a sub-solution tree has prompted me to take step back and review the entire application. I’m now thinking that I will convert my infix expression generator into a postfix expression generator. I had originally rejected this idea because I had incorrectly assumed that the eventual conversion of successful expressions to infix might reverse the order of some digits. I see now that they will always remain the same. So, working completely in postfix would be faster and would avoid the need for a tree structure. Sub-solutions could then be stored on a stack.

More later…

don’t forget you cannot solve an infix equation from left to right (sometimes)

the correct answer is 3.66667

  • 2/3 = .6667
  • x 4 = 2.6667
    • 1 = 3.6667

but if you solve it left to right you get 4

  • 1+2 = 3
  • / 3 = 1
    • 4 = 4

and that algorithm I keep touting, figures out the correct order (ie. the top solution) it converts it from

to

[quote] 2/3*4+1 [/quote] which CAN be solved correctly left to right … it also handles ()

but if you only have 0 thru 9, + / - * and no other symbols, functions or operators a much simpler method can be used,

  • break into tokens
  • scan tokens for /, if found take left and right, divde and replace all 3 tokens with the answer
  • continue until all / tokens are gone
  • do same for *
  • do same for + and - (both of these can be done in the same pass as they have the same precedence)
  • once all operators are gone, all the is left is “the answer”

if first token is “-” or “+” remove it and apply as the last step, its the only place a unary operator could appear

  • 1+2/3+4
  • becomes [1] [+] [2] [/] [3] [*] [4]
  • becomes [1] [+] [0.66667] [*] [4]
  • becomes [1] [+] [2.6667]
  • becomes [3.66667]

When I said that I was solving infix from left to right, I oversimplified it. What I was doing was splitting the expression into terms wherever a plus or minus sign occurred, and then solved each of those terms, which at that point would consist of only multiply or divide operations. Tt’s a very fast way to solve these expressions as long as there are no parentheses or operators other than + - * /.

that is the method I suggested… BUT you must do the / first, then the * then the + and -
as I illustrated above, or risk a totally wrong answer.

The idea of using a tree structure is to reduce the amount of solutions which need to be calculated.

An example with 123 as possible digits and ±*/ as operators:

[code]Solution no tree with tree
123 0 0
12 + 3 1 1
12 - 3 1 1
12 * 3 1 1
12 / 3 1 1
1 + 23 1 1
1 - 23 1 1
1 * 23 1 1
1 / 23 1 1
1 + 2 + 3 2 1 // 1 + 2 is already calculated above
1 + 2 - 3 2 1 // “”
1 + 2 * 3 2 1 // “”
1 + 2 / 3 2 1 // “”
1 - 2 + 3 2 1 // 1 - 2 is already calculated above
1 - 2 - 3 2 1 // “”
1 - 2 * 3 2 1 // “”
1 - 2 / 3 2 1 // “”
1 * 2 + 3 2 1 // 1 * 2 is already calculated above
1 * 2 - 3 2 1 // “”
1 * 2 * 3 2 1 // “”
1 * 2 / 3 2 1 // “”
1 / 2 + 3 2 1 // 1 / 2 is already calculated above
1 / 2 - 3 2 1 // “”
1 / 2 * 3 2 1 // “”
1 / 2 / 3 2 1 // “”

Calculations 40 24[/code]
And some calculations are especially heavy, like divisions (maybe even the integer one \) or square root. So to bring down time to calculate all the solutions the way to go is really to eliminate re-calculation.

why not precaluclate all the “pairs”, replace them in the equations ( in precedence order), and solve the remaining

After many many many false starts, I finally got this thing working tonight. I can’t recall the last time I had so much trouble getting a programming problem to work. Several days of brain fog certainly didn’t help. I guess this is what I have to look forward to as I get older and dementia sets in.

As I mentioned in my last post, I decided to work entirely in postfix. So, all permutations of expressions are generated in postfix and evaluated as they are generated. Then, only the successful solutions are converted to infix. This is much more efficient, and I’m able to reuse partial calculations of previous terms without having to use a tree structure.

As a result of working in postfix, the permutations of expressions can and will be more general. Whereas the previous expression generator would only generate about 4 million permutations, the new one can generate a whole lot more. The reason for this is that the previous infix expression generator would never use parentheses, while the new generator will create all kinds of expressions which must be parenthesized for the final infix conversion. Consequently, I’ve added some limits in the program to exit after it finds a specified number of solutions. Even so, it generates and evaluates expressions about 25 times faster.

A side benefit of this is that the program is flexible enough that I can easily add more operations such as exponentiation, factorial, etc. Of course, that means even more permutations, but if I set the program to exit after finding 1000 solutions, it’s still faster than the old program.

I still need to fix a couple of bugs in the postfix to infix converter, because it fails to add parentheses in some situations. I know what the problem is. I just have to find the easiest way to fix it.

Not sure if anyone’s interested, but I can post the code once I get it cleaned up.

Please do. Sounds interesting and I’d like to see what you’re actually doing :slight_smile:

This is a polynomial problem. Are you sure that you have checked more clever solutions than a crude calculation of all possibilities?

Shaking head… either you reinvented the “Shunting Yard Algorithm”, or you created a similar version that is not as comprehensive, but I’m sure it had its educational benefits