依赖关系解析搜索

项目描述

depgrep

为CONLL-U DataFrame进行依赖关系解析搜索

版本 0.1.3

注意：此工具目前没有测试、CI等。不建议在《buzz》库提供的depgrep方法之外使用此工具。

安装

pip install depgrep

用法

该工具旨在与由CONLL-U文件制作并由buzz解析为DataFrame的语料库一起使用。最好的做法是使用buzz来建模语料库，然后使用其depgrep方法。

pip install buzz

然后，在Python中

from buzz import Corpus
corpus = Corpus('path/to/conll/files')
query = 'l"have"'  # match the lemma "have"

语法

depgrep搜索通过节点和关系组合进行，就像基于此工具的Tgrep2一样。

节点

节点针对一个标记特征（词、词元、POS、词类、依存角色等）。它可以指定为正则表达式或简单的字符串匹配：f/amod|nsubj/将匹配填充nsubj或amod角色的标记；l"be"将匹配词元，即be。

节点查询的第一部分选择要搜索的标记属性。它可以是指定的任何之一

w : word
l : lemma
p : part of speech tag
x : wordclass / XPOS
f : dependency role
i : index in sentence
s : sentence number

大小写敏感性由您搜索的属性的_case控制：p/VB/是不区分大小写的，而P/VB/是区分大小写的。因此，以下查询匹配以ing、ING、Ing等结尾的单词

w/ing$/

要在查询中跨大小写进行不区分大小写，请使用case_sensitive=False关键字参数。

关系

关系指定节点之间的关系。例如，我们可以使用f"nsubj" <- f"ROOT"来定位由扮演ROOT角色的节点所控制的名词主语。您想要找到的是查询中的最左节点。因此，虽然上述查询找到了名词主语标记，但您可以使用逆关系f"ROOT" -> f"nsubj"来返回ROOT标记。

可用关系

a = b   : a and b are the same node
a & b   : a and b are the same node (same as =)

a <- b  : a is a dependent of b
a <<- b : a is a descendent of b, with any distance in between
a <-: b : a is the only dependent of b
a <-N b : a is descendent of b by N generations

a -> b  : a is the governor of a
a ->> b : a is an ancestor of b, with any distance in between
a ->: b : a is the only governor of b (as is normal in many grammars)
a ->N b : a is ancestor of b by N generations

a + b   : a is immediately to the left of b
a +N b  : a is N places to the left of b
a <| b  : a is left of b, with any distance in between

a - b   : a is immediately to the right of b
a -N b  : a is n places to the right of b
a |> b  : a is right of b, with any distance in between

a $ b   : a and b share a governor (i.e. are sisters)

a $> b  : a is a sister of and to the right of b.
a $< b  : a is a sister of and to the left of b.

否定

在关系前添加!来否定它：f"ROOT" != x"VERB"将找到非动词的ROOT节点。

括号

括号可用于创建更复杂的查询

f"amod" = l/^[abc]/ <- (f/nsubj/ != x/NOUN/)

上述翻译为匹配以a、b或c开头的形容词修饰语，这些修饰语由非名词的名词主语所控制

注意，没有括号时，每个关系/节点都指的是最左边的节点。在以下示例中，复数名词必须与nsubj是同一节点，而不是ROOT

f"nsubj" <- f"ROOT" = p"NNS"

或表达式

您可以使用管道符号（|）来创建一个OR表达式。

# match all kinds of modifiers
x"ADJ" | f"amod" | f"appos" | p/^JJ/
x"NOUN" <- f"ROOT" | = p"NNS"

在上面的例子中，我们匹配由ROOT控制或为复数的名词。

通配符

您可以使用__或*来代表任何标记。为了匹配任何作为动词控制者的标记，请执行

__ -> x"VERB"

项目详情

版本历史发布通知 | RSS源

此版本

0.1.3

2019年8月20日

0.1.2

2019年8月9日

0.1.1

2019年8月9日

0.0.1

2019年6月11日

下载文件

下载适用于您平台的文件。如果您不确定选择哪个，请了解更多关于安装包的信息。

源分发

depgrep-0.1.3.tar.gz (12.6 kB 查看哈希值)

上传时间 2019年8月20日 源

构建分发

depgrep-0.1.3-py3-none-any.whl (10.8 kB 查看哈希值)

上传时间 2019年8月20日 Python 3

depgrep-0.1.3.tar.gz 的哈希值

depgrep-0.1.3.tar.gz 的哈希值
算法	哈希摘要
SHA256	`dc0ca8e8be4f4645b8a9e3eec19e71092144a8da32aa8bd93d7874e21c480acf`
MD5	`5faf625ad4410a9a32008457c403d2fa`
BLAKE2b-256	`874885b55230d0a6e0f11b5843fcafdb899f96ed7e5f815425837bd11e681ec2`

depgrep-0.1.3-py3-none-any.whl 的哈希值

depgrep-0.1.3-py3-none-any.whl 的哈希值
算法	哈希摘要
SHA256	`06d4f136ed8bdfa2e6264d0b1de7023d3a01afc7f930572413266825262bbaaf`
MD5	`0ebf54179beaebf6db75b15f9580657e`
BLAKE2b-256	`f18936076afd0e1bb7adbe93991b91bccf244f11d2c37124fe04a69ec08c95e1`

depgrep 0.1.3

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类器

项目描述

depgrep

安装

用法

语法

节点

关系

否定

括号

或表达式

通配符

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类器

版本历史发布通知 | RSS源

下载文件

源分发

构建分发

depgrep 0.1.3

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类器

项目描述

depgrep

安装

用法

语法

节点

关系

否定

括号

或表达式

通配符

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类器

版本历史 发布通知 | RSS源

下载文件

源分发

构建分发

版本历史发布通知 | RSS源