Tag Archives: Parsing

Infix to Postfix – but Better

Summary

There are many examples of the shunting algorithm that takes infix and converts to postfix (or RPN).  None of them did everything I wanted.  I wanted something full featured sort of like excel.  I wanted:

  1. Negative numbers need to be handled and they are often ignored in converters.
  2. Support single and multi character operations (i.e. <<, >>).
  3. I wanted support for multiple argument functions and even no argument functions.
  4. I wanted solid error messages and an indication where the error was.
  5. I wanted an optimization step where constants are evaluated and reduced prior to execution.
  6. You could name your variables anything you wanted.

The code is here:  InfixParse

Motivation

The motivation was a project where we need a think PC client to process an infix equation and translate this down to a set of opcodes and data for a small processor (memory/horsepower).

The first step is getting the conversation to work in high level constructs.

The second step which will happen later was to convert the high level language constructs to a set of opcodes and data.

Background

There are numerous examples of equation processing but I never found any that did everything sort of like excel did.

This one is pretty good: http://interactivepython.org/runestone/static/pythonds/BasicDS/InfixPrefixandPostfixExpressions.html

Maybe someone else did a full implementation but I never found it.

Tools

PyCharm and Python 3.6.

Definitions

operator – This is a symbol such as +, -, *, /.  It can be more than one character.  The ( is also an operator technically.

function – This is a piece of operation in the form of FUNC(…) with a certain number of arguments defined for that function.  Each argument is separated by a comma.  If there are no arguments (for example, RND for random), then no parenthesis are allowed.  Functions can have zero to “n” arguments.

variable – This is a textual name for an variable (or input) in the equation.  This can be one or more characters.

constant – This is a number in the infix equation.

Steps

As we work through the steps, refer to this equation:

2 * (1 + -3) + abs(-round(rnd, 2))

In text, this is 2 times the sum of 1 and a -3 + the absolute value of a random number rounded to the 2nd decimal place and negated.

The first step is to convert the infix to a set of stack items.  We are effectively splitting on operators.  Then each stack item is evaluated for the type of information between the operators.  After this step, the infix stack looks like this.

[<2, constant>, <*, operator>, <(, operator>, <1, constant>, <+, operator>, <-, operator>, <3, constant>, <), close>, <+, operator>, <ABS, function>, <(, operator>, <-, operator>, <ROUND, function>, <(, operator>, <RND, function>, <,, operator>, <2, constant>, <), close>, <), close>]

Notice there are two subtractions operators that really should be negation.  So the second step is to determine if there is a valid operator in front of the subtraction and if a function, constant, or variable is immediately after it.  After this check, here is the infix stack.  (I used the ! as a negation symbol.)

[<2, constant>, <*, operator>, <(, operator>, <1, constant>, <+, operator>, <!, operator>, <3, constant>, <), close>, <+, operator>, <ABS, function>, <(, operator>, <!, operator>, <ROUND, function>, <(, operator>, <RND, function>, <,, operator>, <2, constant>, <), close>, <), close>]

The next 3 steps are to check the validity of the infix equation.  First, we check to make sure that we do not have operators next to each other.  Next, we check for mismatched parenthesis.  Finally we check for the correct number of arguments in functions.

if anything failure, the python script will tell where the error occurred and what the error was.  This can certainly be beautified into  whatever final UI is chosen.  The information is there.

If there are no errors, the stack is shunted into a postfix stack and the commas and parenthesis are removed.

[<2, constant>, <1, constant>, <3, constant>, <!, operator>, <+, operator>, <*, operator>, <RND, function>, <2, constant>, <ROUND, function>, <!, operator>, <ABS, function>, <+, operator>]

Now, there are constants that can be combined.  While this is a nonsense, there are cases where a couple of constants can be combined.  This example simplifies to this.

[<-4.0, constant>, <RND, function>, <2, constant>, <ROUND, function>, <!, operator>, <ABS, function>, <+, operator>]

Executing this yields a result between negative 4 and negative 3 such as -3.81 or -3.66.

Future

This code will likely be translated into C++ with the functions in C so that it can be shared on the embedded system.   The functions used for optimization and execution must be the exact same and this is a case where we will want to share source.

The python script could be better organized into several modules but this is a sample for the final run which will be in likely C/C++.