reStructuredText的快速解析器

这些详情尚未通过PyPI验证

项目描述

rst_fast_parse

reStructuredText的快速、规范兼容*、具体语法解析器。

处于开发中，使用风险自担

功能

容错解析；设计上永远不会引发异常
具有完整源映射的具体语法令牌
常见问题的诊断
无必要依赖项
功能性解析设计，无可修改的全局状态（线程安全）
完全类型化，使用“严格”mypy设置

此解析器并非旨在完全替代docutils/sphinx rST解析器。初始目标是解析reStructuredText文档的“大纲”，不一定需要了解所有角色/指令的完整信息，以用于工具如linters、formatters和语言服务器（而不是等待完整的sphinx构建）。

计划增量解析和格式化。

* 对于所有rST语法规范兼容（广泛测试与docutils），但由于其高度动态的性质，不存在所有指令/角色的规范。

用法

要解析字符串，请使用parse_string函数。

from rst_fast_parse import parse_string

nodes, diagnostics = parse_string("""
Title
-----
hallo
there *world!*
""",
inline_sourcemaps=True
)
assert nodes.debug_repr() == """\
<title style='-'> 1-2
  <inline> 1-1
    <text> 1:0-1:5
<paragraph> 3-4
  <inline> 3-4
    <text> 3:0-4:6
    <emphasis> 4:6-4:14
"""

提高性能

注意，如果只需要块行解析，使用parse_inlines=False可合理提高速度。

from rst_fast_parse import parse_string

nodes, diagnostics = parse_string("""
Hello
-----
*world!*
""",
parse_inlines=False)
assert nodes.debug_repr() == """\
<title style='-'> 1-2
  <inline> 1-1
<paragraph> 3-3
  <inline> 3-3
"""

此外，inline_sourcemaps选项默认禁用，因为它也会影响性能。

例如，解析restructured规范文件（>3000行）目前需要

25ms使用parse_inlines=False
35ms使用parse_inlines=True
44ms使用parse_inlines=True, inline_sourcemaps=True

嵌套章节

由于这通常不是linting或格式化工具的需求，并且允许增量解析，因此解析器不会自动根据标题下划线/上划线样式嵌套章节，如docutils。

如果您想嵌套章节，可以使用 nest_sections 函数

from rst_fast_parse import parse_string, nest_sections

nodes, diagnostics = parse_string("""
Header 1
========
Header 1.1
----------
""")
nodes = nest_sections(nodes)
assert nodes.debug_repr() == """\
<section> 1-4
  <title style='='> 1-2
    <inline> 1-1
      <text>
  <section> 3-4
    <title style='-'> 3-4
      <inline> 3-3
        <text>
"""

指令解析

由于指令具有高度动态的特性，并且与 docutils/sphinx 紧密耦合，解析器不会尝试解析所有指令。

相反，存在标准指令到简单声明性指令定义的默认映射。这些定义可以根据需要修改并传递给解析器。

from rst_fast_parse import parse_string, get_default_directives

print(get_default_directives())

nodes, diagnostics = parse_string("""
.. note:: This is a note
     :class: my-note
""",
directives={
    'note': {
      "argument": False,  # can have an argument
      "options": True,  # can have an options block
      "content": True,  # can have a content block
      "parse_content": True,  # parse content as rST
    }
})
assert nodes.debug_repr() == """\
<directive name='note'> 1-2
  <options>
    <option name='class'> 2-2
  <body>
    <paragraph> 1-1
      <inline> 1-1
        <text>
"""

诊断

在解析过程中发现任何已知问题时，将返回诊断信息。

from rst_fast_parse import parse_string

nodes, diagnostics = parse_string("""
- list `no role name`
no blank line
""")
assert nodes.debug_repr() == """\
<bullet_list symbol='-'> 1-1
  <list_item> 1-1
    <paragraph> 1-1
      <inline> 1-1
        <text>
        <role>
<paragraph> 2-2
  <inline> 2-2
    <text>
"""
assert [d.as_dict() for d in diagnostics] == [
  {
    'code': 'block.blank_line',
    'message': 'Blank line expected after Bullet list',
    'line_start': 1,
    'character_end': 21
  },
  {
    'code': 'inline.role_no_name',
    'message': 'Inline role without name.',
    'line_start': 1,
    'character_start': 7,
    'character_end': 21
  }
]

可用的诊断代码

source.tab_in_line：警告行中的制表符，这可能会降低源映射的性能。
block.blank_line：警告在语法块之间缺少空行。
block.title_line：警告标题下/上方的线问题。
block.title_disallowed：警告在不允许标题的上下文中出现意外的标题。
block.paragraph_indentation：警告段落行的意外缩进。
block.literal_no_content：警告无内容的文字块。
block.target_malformed：警告格式错误的超链接目标。
block.substitution_malformed：警告格式错误的替换定义。
block.table_malformed：警告格式错误的表格。
block.inconsistent_title_level：警告不一致的标题级别，例如，级别 1 标题样式后面跟级别 3 样式。
block.directive_indented_options：警告指令的第二个行以缩进的 ： 开头。
block.directive_malformed：警告格式错误的指令。
inline.no_closing_marker：警告没有结束标记的行内标记。
inline.role_malformed：警告格式错误的行内角色。
inline.role_no_name：警告没有名称的行内角色。

遍历节点树

使用 walk_children 函数遍历节点（块）的子节点。此函数的一个内置用途是 walk_line_inside 函数，它生成包含给定行号的所有节点。

from rst_fast_parse import parse_string
from rst_fast_parse.nodes import walk_line_inside

nodes, diagnostics = parse_string("""
- a

  1. content

- b
""")
assert [e.tagname for e in walk_line_inside(nodes, 3)] == [
  'bullet_list', 'list_item', 'enum_list', 'list_item', 'paragraph', 'inline'
]

命令行使用

还有一个简单的命令行界面（CLI）用于对 reStructuredText stdin/files 进行 linting。

$ echo "- a\n1. *b" | python -m rst_fast_parse.cli.lint --print-ast --ast-maps -
<bullet_list symbol='-'> 0-0
  <list_item> 0-0
    <paragraph> 0-0
      <inline> 0-0
        <text> 0:2-0:3
<enum_list ptype='period' etype='arabic'> 1-1
  <list_item> 1-1
    <paragraph> 1-1
      <inline> 1-1
        <problematic> 1:3-1:4
        <text> 1:4-1:5

<stdin>:1:1: Blank line expected after Bullet list [block.blank_line]
<stdin>:2:4: Inline emphasis no closing marker. [inline.no_closing_marker]

Found 2 error.

设计决策

解析器不会像 docutils 一样根据标题下划线样式自动嵌套章节，这允许进行增量解析，以及更简单的结构。

我们希望尽量避免任何用户定义的“动态”代码执行，例如解析指令内容，因为这限制了将代码库转换为不同语言、使用声明性格式配置或运行在沙盒环境中的未来能力。

许可

目前，项目处于相当严格的许可之下，且分发的代码相对隐蔽。

这是为了减轻“恶意”复制代码库的风险，特别是在开发期间，不幸的是，我过去就遇到过这种情况 😒

变更日志

0.0.16

🎉 添加行内解析
🎉 添加用于诊断的字符级源映射
重构元素到节点

0.0.15

🎉 添加指令解析
将 ElementProtocol.line_inside 替换为 walk_line_inside 函数。
将 ElementList 替换为 RootElement。
添加 InlineElement、ParagraphElement、BulletListElement、EnumListElement、FieldListElement、FieldItemElement、DefinitionListElement、DefinitionItemElement。

项目详情

这些详情尚未通过PyPI验证

发布历史发布通知 | RSS 源

此版本

0.0.16

2024 年 8 月 20 日

0.0.15

2024 年 8 月 18 日

0.0.14

2024 年 8 月 15 日

0.0.13

2024 年 8 月 13 日

0.0.12

2024 年 8 月 13 日

0.0.11

2024 年 8 月 12 日

下载文件

下载适合您平台的文件。如果您不确定选择哪个，请了解有关安装包的更多信息。

源代码分布

本版本没有可用的源代码分发文件。请参阅关于生成分发存档的教程。

构建的分发

rst_fast_parse-0.0.16-py3-none-any.whl (106.9 kB 查看哈希值)

上传时间 2024年8月20日 Python 3

哈希值 for rst_fast_parse-0.0.16-py3-none-any.whl

哈希值 for rst_fast_parse-0.0.16-py3-none-any.whl
算法	哈希摘要
SHA256	`d506cafb451877d09ec6b5a2f227b67bf03fa5e0798f350a6f614ad74d773fbc`
MD5	`07d209286994b7fb3db1e165fb0b66d1`
BLAKE2b-256	`558bdb82ece0d0026245185052ec74006b0149a7d3414b45d619bf1c8644bed6`

rst-fast-parse 0.0.16

导航

验证详情

维护者

未验证详情

元信息

分类器

项目描述

rst_fast_parse

用法

提高性能

嵌套章节

指令解析

诊断

遍历节点树

命令行使用

设计决策

许可

变更日志

0.0.16

0.0.15

项目详情

验证详情

维护者

未验证详情

元信息

分类器

发布历史发布通知 | RSS 源

下载文件

源代码分布

构建的分发

rst-fast-parse 0.0.16

导航

验证详情

维护者

未验证详情

元信息

分类器

项目描述

rst_fast_parse

用法

提高性能

嵌套章节

指令解析

诊断

遍历节点树

命令行使用

设计决策

许可

变更日志

0.0.16

0.0.15

项目详情

验证详情

维护者

未验证详情

元信息

分类器

发布历史 发布通知 | RSS 源

下载文件

源代码分布

构建的分发

发布历史发布通知 | RSS 源