flexparser

使用类型让解析变得有趣...

这些详情尚未由PyPI验证

项目链接

主页

项目描述

为什么要编写另一个解析器？在开发此项目时，我也问过自己同样的问题。显然，已经有非常优秀的解析器，但我希望尝试用另一种方式编写它们。

想法很简单。您为需要解析的每种内容类型（在此称为ParsedStatement）编写一个类。每个类都应该有一个from_string构造函数。我们广泛使用了typing模块来使输出结构易于使用且更少出错。

例如

from dataclasses import dataclass

import flexparser as fp

@dataclass(frozen=True)
class Assigment(fp.ParsedStatement):
    """Parses the following `this <- other`
    """

    lhs: str
    rhs: str

    @classmethod
    def from_string(cls, s):
        lhs, rhs = s.split("<-")
        return cls(lhs.strip(), rhs.strip())

(使用冻结的数据类不是必需的，但很方便。作为数据类，您可以免费获得init、str、repr等。作为冻结的、类似于不可变的，使它们更容易推理)

在某些情况下，您可能希望通知解析器，他的类不适用于解析该语句。

@dataclass(frozen=True)
class Assigment(fp.ParsedStatement):
    """Parses the following `this <- other`
    """

    lhs: str
    rhs: str

    @classmethod
    def from_string(cls, s):
        if "<-" not in s:
            # This means: I do not know how to parse it
            # try with another ParsedStatement class.
            return None
        lhs, rhs = s.split("<-")
        return cls(lhs.strip(), rhs.strip())

您可能还需要指出这是正确的 ParsedStatement，但某些地方并不正确

@dataclass(frozen=True)
class InvalidIdentifier(fp.ParsingError):
    value: str


@dataclass(frozen=True)
class Assigment(fp.ParsedStatement):
    """Parses the following `this <- other`
    """

    lhs: str
    rhs: str

    @classmethod
    def from_string(cls, s):
        if "<-" not in s:
            # This means: I do not know how to parse it
            # try with another ParsedStatement class.
            return None
        lhs, rhs = (p.strip() for p in s.split("<-"))

        if not str.isidentifier(lhs):
            return InvalidIdentifier(lhs)

        return cls(lhs, rhs)

将此内容放入 source.txt

one <- other
2two <- new
three <- newvalue
one == three

然后运行以下代码

parsed = fp.parse("source.txt", Assigment)
for el in parsed.iter_statements():
    print(repr(el))

将产生以下输出

BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='37bc23cde7cad3ece96b7abf64906c84decc116de1e0486679eb6ca696f233a403f756e2e431063c82abed4f0e342294c2fe71af69111faea3765b78cb90c03f'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source1.txt'), mtime=1658550284.9419456)
Assigment(start_line=1, start_col=0, end_line=1, end_col=12, raw='one <- other', lhs='one', rhs='other')
InvalidIdentifier(start_line=2, start_col=0, end_line=2, end_col=11, raw='2two <- new', value='2two')
Assigment(start_line=3, start_col=0, end_line=3, end_col=17, raw='three <- newvalue', lhs='three', rhs='newvalue')
UnknownStatement(start_line=4, start_col=0, end_line=4, end_col=12, raw='one == three')
EOS(start_line=5, start_col=0, end_line=5, end_col=0, raw=None)

结果是包含 ParsedStatement 或 ParsingError 的集合（分别由 BOF 和 EOS 表示文件的开始和流的结束。另一种情况，它可以从 BOR 开始，这意味着资源的开始，它用于解析与包一起提供的 Python 资源）。

请注意，有两个正确解析的语句（Assigment），发现一个错误（InvalidIdentifier）和一个未知（UnknownStatement）。

很酷，对吧？只需编写一个 from_string 方法，输出数据结构即可产生一个可用的解析对象结构。

现在该做什么呢？假设我们想要支持等价比较。只需这样做

@dataclass(frozen=True)
class EqualityComparison(fp.ParsedStatement):
    """Parses the following `this == other`
    """

    lhs: str
    rhs: str

    @classmethod
    def from_string(cls, s):
        if "==" not in s:
            return None
        lhs, rhs = (p.strip() for p in s.split("=="))

        return cls(lhs, rhs)

parsed = fp.parse("source.txt", (Assigment, Equality))
for el in parsed.iter_statements():
    print(repr(el))

然后再次运行它

BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='37bc23cde7cad3ece96b7abf64906c84decc116de1e0486679eb6ca696f233a403f756e2e431063c82abed4f0e342294c2fe71af69111faea3765b78cb90c03f'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source1.txt'), mtime=1658550284.9419456)
Assigment(start_line=1, start_col=0, end_line=1, end_col=12, raw='one <- other', lhs='one', rhs='other')
InvalidIdentifier(start_line=2, start_col=0, end_line=2, end_col=11, raw='2two <- new', value='2two')
Assigment(start_line=3, start_col=0, end_line=3, end_col=17, raw='three <- newvalue', lhs='three', rhs='newvalue')
EqualityComparison(start_line=4, start_col=0, end_line=4, end_col=12, raw='one == three', lhs='one', rhs='three')
EOS(start_line=5, start_col=0, end_line=5, end_col=0, raw=None)

您需要将某些语句分组在一起：欢迎来到 Block。这个构造允许您分组

class Begin(fp.ParsedStatement):

    @classmethod
    def from_string(cls, s):
        if s == "begin":
            return cls()

        return None

class End(fp.ParsedStatement):

    @classmethod
    def from_string(cls, s):
        if s == "end":
            return cls()

        return None

class ParserConfig:
    pass

class AssigmentBlock(fp.Block[Begin, Assigment, End, ParserConfig]):
    pass

parsed = fp.parse("source.txt", (AssigmentBlock, Equality))

运行代码

BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='37bc23cde7cad3ece96b7abf64906c84decc116de1e0486679eb6ca696f233a403f756e2e431063c82abed4f0e342294c2fe71af69111faea3765b78cb90c03f'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source1.txt'), mtime=1658550284.9419456)
UnknownStatement(start_line=1, start_col=0, end_line=1, end_col=12, raw='one <- other')
UnknownStatement(start_line=2, start_col=0, end_line=2, end_col=11, raw='2two <- new')
UnknownStatement(start_line=3, start_col=0, end_line=3, end_col=17, raw='three <- newvalue')
UnknownStatement(start_line=4, start_col=0, end_line=4, end_col=12, raw='one == three')
EOS(start_line=5, start_col=0, end_line=5, end_col=0, raw=None)

请注意，现在有很多 UnknownStatement，因为我们指示解析器只查找块内的赋值。所以将您的文本文件改为

begin
one <- other
2two <- new
three <- newvalue
end
one == three

然后再次尝试

BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='3d8ce0051dcdd6f0f80ef789a0df179509d927874f242005ac41ed886ae0b71a30b845b9bfcb30194461c0ef6a3ca324c36f411dfafc7e588611f1eb0269bb5a'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source2.txt'), mtime=1658550707.1248093)
Begin(start_line=1, start_col=0, end_line=1, end_col=5, raw='begin')
Assigment(start_line=2, start_col=0, end_line=2, end_col=12, raw='one <- other', lhs='one', rhs='other')
InvalidIdentifier(start_line=3, start_col=0, end_line=3, end_col=11, raw='2two <- new', value='2two')
Assigment(start_line=4, start_col=0, end_line=4, end_col=17, raw='three <- newvalue', lhs='three', rhs='newvalue')
End(start_line=5, start_col=0, end_line=5, end_col=3, raw='end')
EqualityComparison(start_line=6, start_col=0, end_line=6, end_col=12, raw='one == three', lhs='one', rhs='three')
EOS(start_line=7, start_col=0, end_line=7, end_col=0, raw=None)

到目前为止，我们已经使用 parsed.iter_statements 遍历了所有解析语句。但是，让我们看看 parsed，这是一个 ParsedProject 类型的对象。它是一个字典的薄包装，将文件映射到解析内容。因为我们提供了一个文件，并且它不包含链接，所以我们的 parsed 对象包含一个元素。键是 None，表示文件 'source.txt' 从根位置（None）加载。内容是一个具有以下属性的 ParsedSourceFile 对象

path：源文件的完整路径
mtime：源文件的修改时间
content_hash：序列化内容的哈希值
config：可以提供给解析器的额外参数（见下文）。

ParsedSource(
    parsed_source=parse.<locals>.CustomRootBlock(
        opening=BOF(start_line=0, start_col=0, end_line=0, end_col=0, raw=None, content_hash=Hash(algorithm_name='blake2b', hexdigest='3d8ce0051dcdd6f0f80ef789a0df179509d927874f242005ac41ed886ae0b71a30b845b9bfcb30194461c0ef6a3ca324c36f411dfafc7e588611f1eb0269bb5a'), path=PosixPath('/Users/grecco/Documents/code/flexparser/examples/in_readme/source2.txt'), mtime=1658550707.1248093),
        body=(
            Block.subclass_with.<locals>.CustomBlock(
                opening=Begin(start_line=1, start_col=0, end_line=1, end_col=5, raw='begin'),
                body=(
                    Assigment(start_line=2, start_col=0, end_line=2, end_col=12, raw='one <- other', lhs='one', rhs='other'),
                    InvalidIdentifier(start_line=3, start_col=0, end_line=3, end_col=11, raw='2two <- new', value='2two'),
                    Assigment(start_line=4, start_col=0, end_line=4, end_col=17, raw='three <- newvalue', lhs='three', rhs='newvalue')
                ),
                closing=End(start_line=5, start_col=0, end_line=5, end_col=3, raw='end')),
            EqualityComparison(start_line=6, start_col=0, end_line=6, end_col=12, raw='one == three', lhs='one', rhs='three')),
        closing=EOS(start_line=7, start_col=0, end_line=7, end_col=0, raw=None)),
    config=None
)

需要注意几点

我们在不知道的情况下使用了一个块。 RootBlock 是一种特殊的块类型，它自动以文件开始和结束。
opening、body、closing 会自动注解为可能的 ParsedStatement（加上 ParsingError），因此大多数 IDE 中的自动完成功能可以正常工作。
对于定义的 ParsedStatement 也是如此（我们使用 dataclass 的原因）。这使得使用实际的解析结果变得非常方便。
那个讨厌的 subclass_with.<locals> 是因为我们使用 Block.subclass_with 时动态构建了一个类。您可以通过在代码中显式地子类化 Block 来消除它（这实际上对于序列化非常有用）（见下文）。

多个源文件

大多数项目内部都有多个源文件，这些文件相互连接。一个文件可能引用另一个需要解析的文件（例如，c 中的 #include 语句）。 flexparser 提供了一个专门为此目的而设计的 IncludeStatement 基类。

@dataclass(frozen=True)
class Include(fp.IncludeStatement):
    """A naive implementation of #include "file"
    """

    value: str

    @classmethod
    def from_string(cls, s):
        if s.startwith("#include "):
            return None

        value = s[len("#include "):].strip().strip('"')

        return cls(value)

    @propery
    def target(self):
        return self.value

唯一的区别是您需要实现一个 target 属性，该属性返回此语句所指的文件名或资源。

自定义语句化

statementi … 是什么？flexparser 通过尝试使用已知的类之一解析每个语句。因此，公平地问一下，在这个上下文中语句是什么，以及您如何配置它以满足您的需求。一个文本文件被分割成非重叠的字符串，称为语句。解析工作如下

每个文件都被分割成语句（可以是单行或多行）。
每个语句都会使用上下文可用且返回 ParsedStatement 或 ParsingError 的第一个 ParsedStatement 或 Block 子类进行解析

您可以通过向 parse 函数提供两个参数来自定义如何分割每一行成为语句

strip_spaces (bool)：表示在尝试解析之前必须删除前导和尾随空格。（默认：True）
delimiters (dict)：表示如何对每一行进行子分割。（默认：不分割）

一个分隔符的例子可能是 {";": (fp.DelimiterInclude.SKIP, fp.DelimiterAction.CONTINUE)}，这告诉语句化器（抱歉）当遇到“;”时，应该开始新的语句。DelimiterMode.SKIP 表示“;”不应添加到上一个语句或下一个语句中。其他有效值是 SPLIT_AFTER 和 SPLIT_BEFORE，用于将分隔符字符附加到上一个或下一个语句。第二个元素告诉语句化器（再次抱歉）接下来要做什么：有效值有：CONTINUE、CAPTURE_NEXT_TIL_EOL、STOP_PARSING_LINE 和 STOP_PARSING。

这对于注释很有用。例如，{"#": (fp.DelimiterMode.WITH_NEXT, fp.DelimiterAction.CAPTURE_NEXT_TIL_EOL))} 告诉语句化器（它不再好笑了）在第一个“#”之后停止分割并捕获所有内容。

这允许

## This will work as a single statement
# This will work as a single statement #
# This will work as # a single statement #
a = 3 # this will produce two statements (a=3, and the rest)

显式块类

class AssigmentBlock(fp.Block[Begin, Assigment, End]):
    pass

class EntryBlock(fp.RootBlock[Union[AssigmentBlock, Equality]]):
    pass

parsed = fp.parse("source.txt", EntryBlock)

自定义解析

在某些情况下，您可能希望将一些配置细节留给用户。我们为此有方法！不是重写 from_string，而是重写 from_string_and_config。第二个参数是一个对象，可以将其提供给解析器，然后解析器将其传递给每个 ParsedStatement 类。

@dataclass(frozen=True)
class NumericAssigment(fp.ParsedStatement):
    """Parses the following `this <- other`
    """

    lhs: str
    rhs: numbers.Number

    @classmethod
    def from_string_and_config(cls, s, config):
        if "==" not in s:
            # This means: I do not know how to parse it
            # try with another ParsedStatement class.
            return None
        lhs, rhs = s.split("==")
        return cls(lhs.strip(), config.numeric_type(rhs.strip()))

class Config:

    numeric_type = float

parsed = fp.parse("source.txt", NumericAssigment, Config)

该项目作为 Pint（Python 单位包）的一部分开始。

请参阅 AUTHORS 获取维护者列表。

要查看每个版本的项目的重要变更列表，请参阅 CHANGES

项目详细信息

这些详情尚未由PyPI验证

项目链接

主页

发布历史发布通知 | RSS 源

本版本

0.3.1

2024年6月6日

0.3

2024年3月9日

0.2.1 已撤回

2024年3月8日

0.2

2023年11月27日

0.1

2022年6月4日

下载文件

下载适合您平台的自定义文件。如果您不确定选择哪一个，请了解更多关于安装包的信息。

源分发

flexparser-0.3.1.tar.gz (31.4 kB 查看哈希值)

上传于 2024年6月6日 源代码

构建分发

flexparser-0.3.1-py3-none-any.whl (27.3 kB 查看哈希值)

上传于 2024年6月6日 Python 3

flexparser-0.3.1.tar.gz 的哈希值

flexparser-0.3.1.tar.gz 的哈希值
算法	哈希摘要
SHA256	`36f795d82e50f5c9ae2fde1c33f21f88922fdd67b7629550a3cc4d0b40a66856`
MD5	`0548caa6dd104740ff6cfa9af88156c6`
BLAKE2b-256	`dce4a73612499d9c8c450c8f4878e8bb8b3b2dce4bf671b21dd8d5c6549525a7`

flexparser-0.3.1-py3-none-any.whl 的哈希值

flexparser-0.3.1-py3-none-any.whl 的哈希值
算法	哈希摘要
SHA256	`2e3e2936bec1f9277f777ef77297522087d96adb09624d4fe4240fd56885c013`
MD5	`40b15b88e5dafdda764d4caaa78afe58`
BLAKE2b-256	`a3285ce78a4838bb9da1bd9f64bc79ba12ddbfcb4824a11ef41da6f05d3240ef`

flexparser 0.3.1

导航

已验证详情

维护者

未验证详情

项目链接

元信息

分类器

项目描述

flexparser

多个源文件

自定义语句化

显式块类

自定义解析

项目详细信息

已验证详情

维护者

未验证详情

项目链接

元信息

分类器

发布历史发布通知 | RSS 源

下载文件

源分发

构建分发

flexparser 0.3.1

导航

已验证详情

维护者

未验证详情

项目链接

元信息

分类器

项目描述

flexparser

多个源文件

自定义语句化

显式块类

自定义解析

项目详细信息

已验证详情

维护者

未验证详情

项目链接

元信息

分类器

发布历史 发布通知 | RSS 源

下载文件

源分发

构建分发

发布历史发布通知 | RSS 源