一个非常简单的解析库,基于自顶向下算法。
项目描述
tdparser
此库旨在提供一种在Python中编写简单词法分析器/解析器的高效方法,使用自顶向下解析算法。
代码维护在GitHub上,文档可在ReadTheDocs上找到。
其他Python库提供解析/词法分析工具(请参阅http://nedbatchelder.com/text/python-parsers.html以获取一些示例);tdparser的独特特性包括
避免基于文档字符串的语法定义
提供通用的解析器结构,能够处理任何语法
不生成代码
让用户决定解析结果的本质:抽象语法树、最终表达式等
示例
以下是一个简单算术解析器的定义
import re from tdparser import Lexer, Token class Integer(Token): def __init__(self, text): self.value = int(text) def nud(self, context): """What the token evaluates to""" return self.value class Addition(Token): lbp = 10 # Precedence def led(self, left, context): """Compute the value of this token when between two expressions.""" # Fetch the expression to the right, stoping at the next boundary # of same precedence right_side = context.expression(self.lbp) return left + right_side class Substraction(Token): lbp = 10 # Same precedence as addition def led(self, left, context): return left - context.expression(self.lbp) def nud(self, context): """When a '-' is present on the left of an expression.""" # This means that we are returning the opposite of the next expression return - context.expression(self.lbp) class Multiplication(Token): lbp = 20 # Higher precedence than addition/substraction def led(self, left, context): return left * context.expression(self.lbp) lexer = Lexer(with_parens=True) lexer.register_token(Integer, re.compile(r'\d+')) lexer.register_token(Addition, re.compile(r'\+')) lexer.register_token(Substraction, re.compile(r'-')) lexer.register_token(Multiplication, re.compile(r'\*')) def parse(text): return lexer.parse(text)
使用它返回预期值
>>> parse("1+1") 2 >>> parse("1 + -2 * 3") -5
添加新令牌很简单
class Division(Token): lbp = 20 # Same precedence as Multiplication def led(self, left, context): return left // context.expression(self.lbp) lexer.register_token(Division, re.compile(r'/'))
使用它
>>> parse("3 + 12 / 3") 7
让我们添加指数运算符
class Power(Token): lbp = 30 # Higher than mult def led(self, left, context): # We pick expressions with a lower precedence, so that # 2 ** 3 ** 2 computes as 2 ** (3 ** 2) return left ** context.expression(self.lbp - 1) lexer.register_token(Power, re.compile(r'\*\*'))
并使用它
>>> parse("2 ** 3 ** 2") 512