使用 pyparsing 解析数学表达式

问题描述 投票:0回答:2

我正在尝试使用 pyparsing 来解析数学表达式。我知道我可以从 pyparsing 网站复制示例计算器,但我想理解它,以便稍后添加。我来这里是因为我试图理解这个例子,但我做不到,所以我尽力了,我得到了这个:

symbol = (
    pp.Literal("^") |
    pp.Literal("*") |
    pp.Literal("/") |
    pp.Literal("+") |
    pp.Literal("-")
)
operation = pp.Forward()
atom = pp.Group(
    pp.Literal("(").suppress() + operation + pp.Literal(")").suppress()
) | number
operation << (pp.Group(number + symbol + number + pp.ZeroOrMore(symbol + atom)) | atom)
expression = pp.OneOrMore(operation)


print(expression.parseString("9-1+27+(3-5)+9"))

打印:

[[9, '-', 1, '+', 27, '+', [[3, '-', 5]], '+', 9]]

它确实有效。我想要优先级并全部排序为

Groups
,但经过多次尝试后,我找不到办法做到这一点。或多或少是这样的:

[[[[9, '-', 1], '+', 27], '+', [3, '-', 5]], '+', 9]

我想保持它的 AST 外观,我想从中生成代码。

我确实看到了

operatorPrecedence
课程吗?类似于
Forward
,但我也不认为我理解它是如何工作的。

编辑:

更深入地尝试

operatorPrecedence
,我得到了这个:

expression = pp.operatorPrecedence(number, [
    (pp.Literal("^"), 1, pp.opAssoc.RIGHT),
    (pp.Literal("*"), 2, pp.opAssoc.LEFT),
    (pp.Literal("/"), 2, pp.opAssoc.LEFT),
    (pp.Literal("+"), 2, pp.opAssoc.LEFT),
    (pp.Literal("-"), 2, pp.opAssoc.LEFT)
])

它不处理括号(我不知道是否需要对结果进行后处理),我需要处理它们。

python-3.x pyparsing
2个回答
14
投票

这个解析问题的实际名称是“中缀表示法”(在最新版本的 pyparsing 中,我将

operatorPrecedence
重命名为
infixNotation
)。 要查看中缀表示法解析的典型实现,请查看 pyparsing wiki 上的 fourFn.py 示例。 在那里,您将看到这个简化的 BNF 的实现,用于实现 4 函数算术,并具有运算优先级:

operand :: integer or real number
factor :: operand | '(' expr ')'
term :: factor ( ('*' | '/') factor )*
expr :: term ( ('+' | '-') term )*

因此,表达式是由加法或减法运算分隔开的一项或多项。

项是通过乘法或除法运算分隔的一个或多个因子。

因子要么是最低级别的操作数(在本例中,只是整数或实数),要么是包含在 () 中的表达式。

请注意,这是一个递归解析器,因为在 expr 的定义中间接使用了factor,但expr也用于定义factor。

在 pyparsing 中,这看起来大致如下(假设已经定义了整数和实数):

LPAR,RPAR = map(Suppress, '()')
expr = Forward()
operand = real | integer
factor = operand | Group(LPAR + expr + RPAR)
term = factor + ZeroOrMore( oneOf('* /') + factor )
expr <<= term + ZeroOrMore( oneOf('+ -') + term )

现在使用

expr
,您可以解析其中任何一个:

3
3+2
3+2*4
(3+2)*4

infixNotation
pyparsing 辅助方法负责所有递归定义和分组,并允许您将其定义为:

expr = infixNotation(operand,
        [
        (oneOf('* /'), 2, opAssoc.LEFT),
        (oneOf('+ -'), 2, opAssoc.LEFT),
        ])

但这掩盖了所有的底层理论,所以如果您想了解这是如何实现的,请查看 fourFn.py 中的原始解决方案。

[编辑 - 2022 年 12 月 18 日] 对于那些寻找预定义解决方案的人,我已将

infixNotation
打包到其自己的 pip 可安装包中,称为 plusminus
plusminus
定义了
BaseArithmeticParser
类,用于创建支持这些运算符的可立即运行的解析器和求值器:

  **   ÷   >=  ∈  in   ?:
  *    +   ==  ∉  not  |absolute-value|
  //   -   !=  ∩  and
  /    <   ≠   ∪  ∧
  mod  >   ≤   &  or
  ×    <=  ≥   |  ∨

还有这些功能:

  abs    ceil   max
  round  floor  str
  trunc  min    bool

BaseArithmeticParser
类允许您为自己的特定于域的表达式定义其他运算符和函数,示例展示了如何使用自定义函数和运算符定义解析器以进行掷骰子、零售价格折扣等。


0
投票

虽然迟到了,但我已经到处学习并尝试开发一个方程解析器来处理 Pint 数量、numpy 数组、numpy 函数和数字。我对处理 np.max 等函数的方式不满意。这可行,但我觉得必须有一种更清晰的方法来处理方程中函数的解析和处理。我将不胜感激任何建设性的批评。

import numpy as np
from pint import UnitRegistry
import pyparsing as pp

pp.ParserElement.enablePackrat()
ureg = UnitRegistry()

class EquationEvaluator:
    def __init__(self, ureg):
        self.ureg = ureg
        self.setup_parser()
        self.variables = {}

    def setup_parser(self):
        # Define the grammar for parsing
        lparen = pp.Literal("(").suppress()
        rparen = pp.Literal(")").suppress()

        self.equation = pp.Forward()

        # Define basic elements
        integer =  pp.pyparsing_common.integer
        number = pp.pyparsing_common.number
        variable = pp.Word(pp.alphas, pp.alphas + pp.nums + '_')

        # Define units (you can customize this list as needed)
        unit = pp.oneOf(list(self.ureg._units.keys()), asKeyword=True)

        # Define quantity
        quantity = pp.Group(number + unit).setParseAction(self.parse_quantity)

        # Allow numeric literals and variables to be part of the quantity
        numeric = pp.Group(integer).setParseAction(lambda t: float(t[0]) * self.ureg.dimensionless)
        variable_quantity = variable.setParseAction(self.parse_variable)

        arg = pp.Forward() | number | variable
        args = pp.delimited_list(arg)

        functions_1 = list(['COS', 'SIN', 'ARCSIN', 'ARCCOS', 'ARCTAN2', 'TAN'])
        functions_2 = list(['ABS', 'MAX', 'MIN', 'SUM', 'LOG', 'LOG10', 'EXP'])
        self.functions = functions_1 + functions_2
        functions =  pp.oneOf(self.functions, asKeyword=True) + pp.Group(lparen +  pp.OneOrMore(args) + rparen)

        self.constants = list(['PI', 'E', 'C'])
        constants = pp.oneOf(self.constants)

        # Define expressions
        # self.equation <<= pp.infixNotation(variable_quantity | quantity | numeric,
        operand = constants | functions | variable_quantity | number | integer

        funcop = pp.oneOf(self.functions)

        expop = pp.Literal("^")
        signop = pp.oneOf("+ -")
        multop = pp.oneOf("* / %")
        plusop = pp.oneOf("+ -")
        factop = pp.Literal("!")

        # To use the infixNotation helper:
        #   1.  Define the "atom" operand term of the grammar.
        #       For this simple grammar, the smallest operand is either
        #       and integer or a variable.  This will be the first argument
        #       to the infixNotation method.
        #   2.  Define a list of tuples for each level of operator
        #       precedence.  Each tuple is of the form
        #       (opExpr, numTerms, rightLeftAssoc, parseAction), where
        #       - opExpr is the pyparsing expression for the operator;
        #          may also be a string, which will be converted to a Literal
        #       - numTerms is the number of terms for this operator (must
        #          be 1 or 2)
        #       - rightLeftAssoc is the indicator whether the operator is
        #          right or left associative, using the pyparsing-defined
        #          constants opAssoc.RIGHT and opAssoc.LEFT.
        #       - parseAction is the parse action to be associated with
        #          expressions matching this operator expression (the
        #          parse action tuple member may be omitted)
        #   3.  Call infixNotation passing the operand expression and
        #       the operator precedence list, and save the returned value
        #       as the generated pyparsing expression.  You can then use
        #       this expression to parse input strings, or incorporate it
        #       into a larger, more complex grammar.
        #
        self.equation <<= pp.infixNotation(
            operand,
            [
                (funcop, 1, pp.opAssoc.RIGHT),
                (expop, 2, pp.opAssoc.RIGHT),
                # (signop, 1, pp.opAssoc.RIGHT),
                (multop, 2, pp.opAssoc.LEFT),
                (plusop, 2, pp.opAssoc.LEFT),
            ],
        )
        return None
    def parse_quantity(self, tokens):
        """Convert parsed tokens to Pint Quantity."""
        value, unit = tokens[0]
        return self.ureg(float(value)) * self.ureg(unit)

    def parse_variable(self, tokens):
        """Retrieve Pint Quantity from variable name."""
        var_name = tokens[0]
        if var_name in self.variables:
            return self.variables[var_name]
        else:
            raise ValueError(f"Variable '{var_name}' not defined.")

    def define_variable(self, name, quantity):
        """Define a variable with a Pint Quantity."""
        self.variables[name] = quantity

    def evaluate(self, expression):
        """Evaluate the mathematical expression with quantities and return a Pint Quantity."""
        try:
            parsed_expression = self.equation.parseString(expression, parseAll=True)

            # there is probably a much better way to handle functions versus operations

            if str(parsed_expression[0]) in self.functions:
                # this is valid for functions
                computed = self.compute(parsed_expression)
            else:
                # this is used for operations
                computed = self.compute(parsed_expression[0])
            return computed

        except Exception as e:
            print('')
            return str(e)
    def compute(self, tokens):
        """Recursively compute the result of parsed tokens."""
        if isinstance(tokens, pp.results.ParseResults):
            if len(tokens) == 1:
                return tokens[0]  # Base case for a single quantity or variable

            elif len(tokens) == 2:
                if tokens[0] == 'COS':
                    arg_vall = self.compute(tokens[1])
                    value = np.cos(arg_vall)
                    return value
                elif tokens[0] == 'SIN':
                    arg_vall = self.compute(tokens[1])
                    value = np.sin(arg_vall)
                    return value
                elif tokens[0] == 'TAN':
                    arg_vall = self.compute(tokens[1])
                    value = np.tan(arg_vall)
                    return value
                elif tokens[0] == 'ARCTAN2':
                    arg_val_1 = self.compute(tokens[1][0])
                    arg_val_2 = self.compute(tokens[1][1])
                    value = np.arctan2(arg_val_1, arg_val_2)
                    return value
                elif tokens[0] == 'ARCSIN':
                    arg_vall = self.compute(tokens[1])
                    value = np.arcsin(arg_vall)
                    return value
                elif tokens[0] == 'ARCCOS':
                    arg_vall = self.compute(tokens[1])
                    value = np.arccos(arg_vall)
                    return value
                elif tokens[0] == 'ABS':
                    arg_vall = self.compute(tokens[1])
                    value = np.abs(arg_vall)
                    return value
                elif tokens[0] == 'MAX':
                    arg_vall = self.compute(tokens[1])
                    value = np.max(arg_vall)
                    return value
                elif tokens[0] == 'MIN':
                    arg_vall = self.compute(tokens[1])
                    value = np.min(arg_vall)
                    return value
                elif tokens[0] == 'SUM':
                    arg_vall = self.compute(tokens[1])
                    value = np.sum(arg_vall)
                    return value
                elif tokens[0] == 'LOG':
                    arg_vall = self.compute(tokens[1])
                    value = np.log(arg_vall)
                    return value
                elif tokens[0] == 'LOG10':
                    arg_vall = self.compute(tokens[1])
                    value = np.log10(arg_vall)
                    return value
                elif tokens[0] == 'EXP':
                    arg_vall = self.compute(tokens[1])
                    value = np.exp(arg_vall)
                    return value

            # the following handles any odd number (>1) of tokens
            else:
                n_tokens = len(tokens)
                left = self.compute(tokens[0])
                for index in range(1, n_tokens, 2):
                    operator = tokens[index]
                    right = self.compute(tokens[index + 1])
                    left = self.compute_value(left, right, operator)
                return left
        else:
            if isinstance(tokens, str):
                if tokens == 'PI':
                    return np.pi
                elif tokens == 'E':
                    return np.e
        return tokens
    def compute_value(self, left, right, operator):

        value = None
        if operator == '+':
            value = left + right
        elif operator == '-':
            value = left - right
        elif operator == '*':
            value = left * right
        elif operator == '/':
            value = left / right
        elif operator == '^':
            value = left ** right
        elif operator == '%':
            value = left % right

        return value

# Example usage
if __name__ == "__main__":

    # ureg should probably be expressed as a singleton design pattern
    evaluator = EquationEvaluator(ureg)

    # Define some variables using the preferred method
    evaluator.define_variable("length", 10.2 * evaluator.ureg("meter"))
    evaluator.define_variable("width", 3.0 * evaluator.ureg("foot"))
    evaluator.define_variable("mass", 5.0 * evaluator.ureg("kilogram"))

    short_array = np.arange(0.0, 5) * evaluator.ureg.meter * evaluator.ureg.newton
    evaluator.define_variable("torque", short_array)

    np_array_1 = np.ones((3, 3))
    np_array_2 = np.ones((3, 3)) * np.pi
    evaluator.define_variable("np_array_1", np_array_1)
    evaluator.define_variable("np_array_2", np_array_2)

    angle_360 = np.arange(0., 360.1, 15) * ureg.degree
    sine_wave = np.sin(angle_360)
    evaluator.define_variable("sine_wave", sine_wave)
    evaluator.define_variable('tan', 111.11)

    # Example expressions using both variables and direct quantities
    good_expressions = [
        "length",
        "width",
        "mass",
        "torque",
        "mass ^ 2.0",
        "-3 + 4 * 5",
        "+3 + 4 * -5",
        "length % width",
        "length - length -width",
        "length + width / 2",
        "length + width",
        "width * width * width",
        "((width + width)* width) / length",
        "((mass * 2))",
        "torque / width",
        "length * 2 + width",
        "np_array_1 + np_array_2",
        "np_array_1 / np_array_2",
        "np_array_1 % np_array_2",
        "np_array_1 % 0.45",
        "-1.0 * np_array_1 % 0.21",
        "np_array_1 % -0.21",
        "COS(np_array_1)",
        "COS(3.14159/4.0)",
        "SIN(0.0)",
        "ARCTAN2(np_array_1, np_array_2)",
        "PI",
        "PI*2.0",
        "COS(PI*2.0)",
        "ABS(-100/10.0)",
        "ABS(sine_wave)",
        "1.3e10",
        "TAN(sine_wave)",
        "ARCSIN(sine_wave) *180.0/PI",
        'MAX(sine_wave)',
        'MIN(sine_wave)',
        'MIN(14.5)',
        'sine_wave/(MAX(sine_wave/2))',
        'SUM(sine_wave)',
        'ARCCOS(sine_wave)',

    ]
    bad_expressions = [
        "length ^ width",
        "length ^ ",
        "torque % width",
        "ABS(sine_wave, 2)",
    ]

    print('Good Expressions {}'.format('#'*60))
    for expr in good_expressions:
        result = evaluator.evaluate(expr)
        print("Result of '{}':\n    {}".format(expr, result))

    print('\n')
    print('Bad Expressions {}'.format('#'*60))
    for expr in bad_expressions:
        result = evaluator.evaluate(expr)
        print("Result of '{}':\n    {}".format(expr, result))

代码的可疑区域如下:

def evaluate(self, expression):
    """Evaluate the mathematical expression with quantities and return a Pint Quantity."""
    try:
        parsed_expression = self.equation.parseString(expression, parseAll=True)

        # there is probably a much better way to handle functions versus operations

        if str(parsed_expression[0]) in self.functions:
            # this is valid for functions
            computed = self.compute(parsed_expression)
        else:
            # this is used for operations
            computed = self.compute(parsed_expression[0])
        return computed

    except Exception as e:
        print('')
        return str(e)

我还希望能够以更好的方式访问以下函数调用中的令牌。 IE 我想以更通用的方式处理提供函数调用的参数数量。

    elif len(tokens) == 2:
        if tokens[0] == 'COS':
            arg_vall = self.compute(tokens[1])
            value = np.cos(arg_vall)
            return value
        elif tokens[0] == 'SIN':
            arg_vall = self.compute(tokens[1])
            value = np.sin(arg_vall)
            return value
        elif tokens[0] == 'TAN':
            arg_vall = self.compute(tokens[1])
            value = np.tan(arg_vall)
            return value
        elif tokens[0] == 'ARCTAN2':
            arg_val_1 = self.compute(tokens[1][0])
            arg_val_2 = self.compute(tokens[1][1])
            value = np.arctan2(arg_val_1, arg_val_2)
            return value
        elif tokens[0] == 'ARCSIN':
            arg_vall = self.compute(tokens[1])
            value = np.arcsin(arg_vall)
            return value
© www.soinside.com 2019 - 2024. All rights reserved.