我正在尝试使用 pyparsing 来解析数学表达式。我知道我可以从 pyparsing 网站复制示例计算器,但我想理解它,以便稍后添加。我来这里是因为我试图理解这个例子,但我做不到,所以我尽力了,我得到了这个:
symbol = (
pp.Literal("^") |
pp.Literal("*") |
pp.Literal("/") |
pp.Literal("+") |
pp.Literal("-")
)
operation = pp.Forward()
atom = pp.Group(
pp.Literal("(").suppress() + operation + pp.Literal(")").suppress()
) | number
operation << (pp.Group(number + symbol + number + pp.ZeroOrMore(symbol + atom)) | atom)
expression = pp.OneOrMore(operation)
print(expression.parseString("9-1+27+(3-5)+9"))
打印:
[[9, '-', 1, '+', 27, '+', [[3, '-', 5]], '+', 9]]
它确实有效。我想要优先级并全部排序为
Groups
,但经过多次尝试后,我找不到办法做到这一点。或多或少是这样的:
[[[[9, '-', 1], '+', 27], '+', [3, '-', 5]], '+', 9]
我想保持它的 AST 外观,我想从中生成代码。
我确实看到了
operatorPrecedence
课程吗?类似于 Forward
,但我也不认为我理解它是如何工作的。
编辑:
更深入地尝试
operatorPrecedence
,我得到了这个:
expression = pp.operatorPrecedence(number, [
(pp.Literal("^"), 1, pp.opAssoc.RIGHT),
(pp.Literal("*"), 2, pp.opAssoc.LEFT),
(pp.Literal("/"), 2, pp.opAssoc.LEFT),
(pp.Literal("+"), 2, pp.opAssoc.LEFT),
(pp.Literal("-"), 2, pp.opAssoc.LEFT)
])
它不处理括号(我不知道是否需要对结果进行后处理),我需要处理它们。
这个解析问题的实际名称是“中缀表示法”(在最新版本的 pyparsing 中,我将
operatorPrecedence
重命名为 infixNotation
)。 要查看中缀表示法解析的典型实现,请查看 pyparsing wiki 上的 fourFn.py 示例。 在那里,您将看到这个简化的 BNF 的实现,用于实现 4 函数算术,并具有运算优先级:
operand :: integer or real number
factor :: operand | '(' expr ')'
term :: factor ( ('*' | '/') factor )*
expr :: term ( ('+' | '-') term )*
因此,表达式是由加法或减法运算分隔开的一项或多项。
项是通过乘法或除法运算分隔的一个或多个因子。
因子要么是最低级别的操作数(在本例中,只是整数或实数),要么是包含在 () 中的表达式。
请注意,这是一个递归解析器,因为在 expr 的定义中间接使用了factor,但expr也用于定义factor。
在 pyparsing 中,这看起来大致如下(假设已经定义了整数和实数):
LPAR,RPAR = map(Suppress, '()')
expr = Forward()
operand = real | integer
factor = operand | Group(LPAR + expr + RPAR)
term = factor + ZeroOrMore( oneOf('* /') + factor )
expr <<= term + ZeroOrMore( oneOf('+ -') + term )
现在使用
expr
,您可以解析其中任何一个:
3
3+2
3+2*4
(3+2)*4
infixNotation
pyparsing 辅助方法负责所有递归定义和分组,并允许您将其定义为:
expr = infixNotation(operand,
[
(oneOf('* /'), 2, opAssoc.LEFT),
(oneOf('+ -'), 2, opAssoc.LEFT),
])
但这掩盖了所有的底层理论,所以如果您想了解这是如何实现的,请查看 fourFn.py 中的原始解决方案。
[编辑 - 2022 年 12 月 18 日] 对于那些寻找预定义解决方案的人,我已将
infixNotation
打包到其自己的 pip 可安装包中,称为 plusminus。 plusminus
定义了 BaseArithmeticParser
类,用于创建支持这些运算符的可立即运行的解析器和求值器:
** ÷ >= ∈ in ?:
* + == ∉ not |absolute-value|
// - != ∩ and
/ < ≠ ∪ ∧
mod > ≤ & or
× <= ≥ | ∨
还有这些功能:
abs ceil max
round floor str
trunc min bool
BaseArithmeticParser
类允许您为自己的特定于域的表达式定义其他运算符和函数,示例展示了如何使用自定义函数和运算符定义解析器以进行掷骰子、零售价格折扣等。
虽然迟到了,但我已经到处学习并尝试开发一个方程解析器来处理 Pint 数量、numpy 数组、numpy 函数和数字。我对处理 np.max 等函数的方式不满意。这可行,但我觉得必须有一种更清晰的方法来处理方程中函数的解析和处理。我将不胜感激任何建设性的批评。
import numpy as np
from pint import UnitRegistry
import pyparsing as pp
pp.ParserElement.enablePackrat()
ureg = UnitRegistry()
class EquationEvaluator:
def __init__(self, ureg):
self.ureg = ureg
self.setup_parser()
self.variables = {}
def setup_parser(self):
# Define the grammar for parsing
lparen = pp.Literal("(").suppress()
rparen = pp.Literal(")").suppress()
self.equation = pp.Forward()
# Define basic elements
integer = pp.pyparsing_common.integer
number = pp.pyparsing_common.number
variable = pp.Word(pp.alphas, pp.alphas + pp.nums + '_')
# Define units (you can customize this list as needed)
unit = pp.oneOf(list(self.ureg._units.keys()), asKeyword=True)
# Define quantity
quantity = pp.Group(number + unit).setParseAction(self.parse_quantity)
# Allow numeric literals and variables to be part of the quantity
numeric = pp.Group(integer).setParseAction(lambda t: float(t[0]) * self.ureg.dimensionless)
variable_quantity = variable.setParseAction(self.parse_variable)
arg = pp.Forward() | number | variable
args = pp.delimited_list(arg)
functions_1 = list(['COS', 'SIN', 'ARCSIN', 'ARCCOS', 'ARCTAN2', 'TAN'])
functions_2 = list(['ABS', 'MAX', 'MIN', 'SUM', 'LOG', 'LOG10', 'EXP'])
self.functions = functions_1 + functions_2
functions = pp.oneOf(self.functions, asKeyword=True) + pp.Group(lparen + pp.OneOrMore(args) + rparen)
self.constants = list(['PI', 'E', 'C'])
constants = pp.oneOf(self.constants)
# Define expressions
# self.equation <<= pp.infixNotation(variable_quantity | quantity | numeric,
operand = constants | functions | variable_quantity | number | integer
funcop = pp.oneOf(self.functions)
expop = pp.Literal("^")
signop = pp.oneOf("+ -")
multop = pp.oneOf("* / %")
plusop = pp.oneOf("+ -")
factop = pp.Literal("!")
# To use the infixNotation helper:
# 1. Define the "atom" operand term of the grammar.
# For this simple grammar, the smallest operand is either
# and integer or a variable. This will be the first argument
# to the infixNotation method.
# 2. Define a list of tuples for each level of operator
# precedence. Each tuple is of the form
# (opExpr, numTerms, rightLeftAssoc, parseAction), where
# - opExpr is the pyparsing expression for the operator;
# may also be a string, which will be converted to a Literal
# - numTerms is the number of terms for this operator (must
# be 1 or 2)
# - rightLeftAssoc is the indicator whether the operator is
# right or left associative, using the pyparsing-defined
# constants opAssoc.RIGHT and opAssoc.LEFT.
# - parseAction is the parse action to be associated with
# expressions matching this operator expression (the
# parse action tuple member may be omitted)
# 3. Call infixNotation passing the operand expression and
# the operator precedence list, and save the returned value
# as the generated pyparsing expression. You can then use
# this expression to parse input strings, or incorporate it
# into a larger, more complex grammar.
#
self.equation <<= pp.infixNotation(
operand,
[
(funcop, 1, pp.opAssoc.RIGHT),
(expop, 2, pp.opAssoc.RIGHT),
# (signop, 1, pp.opAssoc.RIGHT),
(multop, 2, pp.opAssoc.LEFT),
(plusop, 2, pp.opAssoc.LEFT),
],
)
return None
def parse_quantity(self, tokens):
"""Convert parsed tokens to Pint Quantity."""
value, unit = tokens[0]
return self.ureg(float(value)) * self.ureg(unit)
def parse_variable(self, tokens):
"""Retrieve Pint Quantity from variable name."""
var_name = tokens[0]
if var_name in self.variables:
return self.variables[var_name]
else:
raise ValueError(f"Variable '{var_name}' not defined.")
def define_variable(self, name, quantity):
"""Define a variable with a Pint Quantity."""
self.variables[name] = quantity
def evaluate(self, expression):
"""Evaluate the mathematical expression with quantities and return a Pint Quantity."""
try:
parsed_expression = self.equation.parseString(expression, parseAll=True)
# there is probably a much better way to handle functions versus operations
if str(parsed_expression[0]) in self.functions:
# this is valid for functions
computed = self.compute(parsed_expression)
else:
# this is used for operations
computed = self.compute(parsed_expression[0])
return computed
except Exception as e:
print('')
return str(e)
def compute(self, tokens):
"""Recursively compute the result of parsed tokens."""
if isinstance(tokens, pp.results.ParseResults):
if len(tokens) == 1:
return tokens[0] # Base case for a single quantity or variable
elif len(tokens) == 2:
if tokens[0] == 'COS':
arg_vall = self.compute(tokens[1])
value = np.cos(arg_vall)
return value
elif tokens[0] == 'SIN':
arg_vall = self.compute(tokens[1])
value = np.sin(arg_vall)
return value
elif tokens[0] == 'TAN':
arg_vall = self.compute(tokens[1])
value = np.tan(arg_vall)
return value
elif tokens[0] == 'ARCTAN2':
arg_val_1 = self.compute(tokens[1][0])
arg_val_2 = self.compute(tokens[1][1])
value = np.arctan2(arg_val_1, arg_val_2)
return value
elif tokens[0] == 'ARCSIN':
arg_vall = self.compute(tokens[1])
value = np.arcsin(arg_vall)
return value
elif tokens[0] == 'ARCCOS':
arg_vall = self.compute(tokens[1])
value = np.arccos(arg_vall)
return value
elif tokens[0] == 'ABS':
arg_vall = self.compute(tokens[1])
value = np.abs(arg_vall)
return value
elif tokens[0] == 'MAX':
arg_vall = self.compute(tokens[1])
value = np.max(arg_vall)
return value
elif tokens[0] == 'MIN':
arg_vall = self.compute(tokens[1])
value = np.min(arg_vall)
return value
elif tokens[0] == 'SUM':
arg_vall = self.compute(tokens[1])
value = np.sum(arg_vall)
return value
elif tokens[0] == 'LOG':
arg_vall = self.compute(tokens[1])
value = np.log(arg_vall)
return value
elif tokens[0] == 'LOG10':
arg_vall = self.compute(tokens[1])
value = np.log10(arg_vall)
return value
elif tokens[0] == 'EXP':
arg_vall = self.compute(tokens[1])
value = np.exp(arg_vall)
return value
# the following handles any odd number (>1) of tokens
else:
n_tokens = len(tokens)
left = self.compute(tokens[0])
for index in range(1, n_tokens, 2):
operator = tokens[index]
right = self.compute(tokens[index + 1])
left = self.compute_value(left, right, operator)
return left
else:
if isinstance(tokens, str):
if tokens == 'PI':
return np.pi
elif tokens == 'E':
return np.e
return tokens
def compute_value(self, left, right, operator):
value = None
if operator == '+':
value = left + right
elif operator == '-':
value = left - right
elif operator == '*':
value = left * right
elif operator == '/':
value = left / right
elif operator == '^':
value = left ** right
elif operator == '%':
value = left % right
return value
# Example usage
if __name__ == "__main__":
# ureg should probably be expressed as a singleton design pattern
evaluator = EquationEvaluator(ureg)
# Define some variables using the preferred method
evaluator.define_variable("length", 10.2 * evaluator.ureg("meter"))
evaluator.define_variable("width", 3.0 * evaluator.ureg("foot"))
evaluator.define_variable("mass", 5.0 * evaluator.ureg("kilogram"))
short_array = np.arange(0.0, 5) * evaluator.ureg.meter * evaluator.ureg.newton
evaluator.define_variable("torque", short_array)
np_array_1 = np.ones((3, 3))
np_array_2 = np.ones((3, 3)) * np.pi
evaluator.define_variable("np_array_1", np_array_1)
evaluator.define_variable("np_array_2", np_array_2)
angle_360 = np.arange(0., 360.1, 15) * ureg.degree
sine_wave = np.sin(angle_360)
evaluator.define_variable("sine_wave", sine_wave)
evaluator.define_variable('tan', 111.11)
# Example expressions using both variables and direct quantities
good_expressions = [
"length",
"width",
"mass",
"torque",
"mass ^ 2.0",
"-3 + 4 * 5",
"+3 + 4 * -5",
"length % width",
"length - length -width",
"length + width / 2",
"length + width",
"width * width * width",
"((width + width)* width) / length",
"((mass * 2))",
"torque / width",
"length * 2 + width",
"np_array_1 + np_array_2",
"np_array_1 / np_array_2",
"np_array_1 % np_array_2",
"np_array_1 % 0.45",
"-1.0 * np_array_1 % 0.21",
"np_array_1 % -0.21",
"COS(np_array_1)",
"COS(3.14159/4.0)",
"SIN(0.0)",
"ARCTAN2(np_array_1, np_array_2)",
"PI",
"PI*2.0",
"COS(PI*2.0)",
"ABS(-100/10.0)",
"ABS(sine_wave)",
"1.3e10",
"TAN(sine_wave)",
"ARCSIN(sine_wave) *180.0/PI",
'MAX(sine_wave)',
'MIN(sine_wave)',
'MIN(14.5)',
'sine_wave/(MAX(sine_wave/2))',
'SUM(sine_wave)',
'ARCCOS(sine_wave)',
]
bad_expressions = [
"length ^ width",
"length ^ ",
"torque % width",
"ABS(sine_wave, 2)",
]
print('Good Expressions {}'.format('#'*60))
for expr in good_expressions:
result = evaluator.evaluate(expr)
print("Result of '{}':\n {}".format(expr, result))
print('\n')
print('Bad Expressions {}'.format('#'*60))
for expr in bad_expressions:
result = evaluator.evaluate(expr)
print("Result of '{}':\n {}".format(expr, result))
代码的可疑区域如下:
def evaluate(self, expression):
"""Evaluate the mathematical expression with quantities and return a Pint Quantity."""
try:
parsed_expression = self.equation.parseString(expression, parseAll=True)
# there is probably a much better way to handle functions versus operations
if str(parsed_expression[0]) in self.functions:
# this is valid for functions
computed = self.compute(parsed_expression)
else:
# this is used for operations
computed = self.compute(parsed_expression[0])
return computed
except Exception as e:
print('')
return str(e)
我还希望能够以更好的方式访问以下函数调用中的令牌。 IE 我想以更通用的方式处理提供函数调用的参数数量。
elif len(tokens) == 2:
if tokens[0] == 'COS':
arg_vall = self.compute(tokens[1])
value = np.cos(arg_vall)
return value
elif tokens[0] == 'SIN':
arg_vall = self.compute(tokens[1])
value = np.sin(arg_vall)
return value
elif tokens[0] == 'TAN':
arg_vall = self.compute(tokens[1])
value = np.tan(arg_vall)
return value
elif tokens[0] == 'ARCTAN2':
arg_val_1 = self.compute(tokens[1][0])
arg_val_2 = self.compute(tokens[1][1])
value = np.arctan2(arg_val_1, arg_val_2)
return value
elif tokens[0] == 'ARCSIN':
arg_vall = self.compute(tokens[1])
value = np.arcsin(arg_vall)
return value