一组用于训练概率上下文无关文法和使用它们对新的句子进行评分的实用工具。
项目描述
一个用于训练和应用概率上下文无关文法到文本的库。
参考:Kasami, T. (1965). 上下文无关语言的效率识别和句法分析算法。(No. 科学-2). 夏威夷大学,电机工程系。
示例使用
```python
>>> from bllipparser import RerankingParser
>>>
>>> from kasami import TreeScorer
>>> from kasami.normalizers import bllip
>>>
>>> # 将WSJ-PTB3树库加载到bllip的RerankingParser中
... bllip_rrp = RerankingParser.fetch_and_load('WSJ-PTB3')
>>> bllip_parse = lambda s: bllip.normalize_tree(bllip_rrp.parse(s)[0].ptb_parse)
>>>
>>> tree = bllip_parse("I am a little teapot")
>>> print(tree)
(S1 (S (NP (PRP 'I')) (VP (VBP 'am') (NP (DT 'a') (JJ 'little') (NN 'teapot')))))
>>> print(tree.format(depth=1))
(S1
(S
(NP
(PRP 'I')
)
(VP
(VBP 'am')
(NP
(DT 'a')
(JJ 'little')
(NN 'teapot')
)
)
)
)
>>>
>>> for production in tree
... print(str(production))
...
(S1 S)
(S NP VP)
(NP PRP)
(PRP 'I')
(VP VBP NP)
(VBP 'am')
(NP DT JJ NN)
(DT 'a')
(JJ 'little')
(NN 'teapot')
>>> sentences = ["I am a little teapot",
... "Here is my handle",
... "Here is my spout",
... "When I get all steamed up I just shout tip me over and pour me out",
... "I am a very special pot",
... "It is true",
... "Here is an example of what I can do",
... "I can turn my handle into a spout",
... "Tip me over and pour me out"]
>>>
>>>
>>> teapot_grammar = TreeScorer.from_tree_bank(bllip_parse(s) for s in sentences)
>>>
>>> teapot_grammar.score(bllip_parse("Here is a little teapot"))
-9.392661928770137
>>> teapot_grammar.score(bllip_parse("It is my handle"))
-10.296301543090733
>>> teapot_grammar.score(bllip_parse("I am a spout"))
-10.40166205874856
>>> teapot_grammar.score(bllip_parse("Your teapot is gay"))
-12.96352974967269
>>> teapot_grammar.score(bllip_parse("Your mom's teapot is asldasnldansldal"))
-19.424997926026403
```
作者:
* Aaron Halfaker -- https://github.com/halfak
... 以及大量借鉴自 https://github.com/aetilley
参考:Kasami, T. (1965). 上下文无关语言的效率识别和句法分析算法。(No. 科学-2). 夏威夷大学,电机工程系。
示例使用
```python
>>> from bllipparser import RerankingParser
>>>
>>> from kasami import TreeScorer
>>> from kasami.normalizers import bllip
>>>
>>> # 将WSJ-PTB3树库加载到bllip的RerankingParser中
... bllip_rrp = RerankingParser.fetch_and_load('WSJ-PTB3')
>>> bllip_parse = lambda s: bllip.normalize_tree(bllip_rrp.parse(s)[0].ptb_parse)
>>>
>>> tree = bllip_parse("I am a little teapot")
>>> print(tree)
(S1 (S (NP (PRP 'I')) (VP (VBP 'am') (NP (DT 'a') (JJ 'little') (NN 'teapot')))))
>>> print(tree.format(depth=1))
(S1
(S
(NP
(PRP 'I')
)
(VP
(VBP 'am')
(NP
(DT 'a')
(JJ 'little')
(NN 'teapot')
)
)
)
)
>>>
>>> for production in tree
... print(str(production))
...
(S1 S)
(S NP VP)
(NP PRP)
(PRP 'I')
(VP VBP NP)
(VBP 'am')
(NP DT JJ NN)
(DT 'a')
(JJ 'little')
(NN 'teapot')
>>> sentences = ["I am a little teapot",
... "Here is my handle",
... "Here is my spout",
... "When I get all steamed up I just shout tip me over and pour me out",
... "I am a very special pot",
... "It is true",
... "Here is an example of what I can do",
... "I can turn my handle into a spout",
... "Tip me over and pour me out"]
>>>
>>>
>>> teapot_grammar = TreeScorer.from_tree_bank(bllip_parse(s) for s in sentences)
>>>
>>> teapot_grammar.score(bllip_parse("Here is a little teapot"))
-9.392661928770137
>>> teapot_grammar.score(bllip_parse("It is my handle"))
-10.296301543090733
>>> teapot_grammar.score(bllip_parse("I am a spout"))
-10.40166205874856
>>> teapot_grammar.score(bllip_parse("Your teapot is gay"))
-12.96352974967269
>>> teapot_grammar.score(bllip_parse("Your mom's teapot is asldasnldansldal"))
-19.424997926026403
```
作者:
* Aaron Halfaker -- https://github.com/halfak
... 以及大量借鉴自 https://github.com/aetilley
项目详情:
下载文件:
下载适用于您的平台的文件。如果您不确定选择哪个,请了解更多关于 安装包 的信息。
源代码分发:
kasami-0.0.7.tar.gz (7.2 kB 查看哈希值)
构建分发:
kasami-0.0.7-py3-none-any.whl (11.5 kB 查看哈希值)
关闭
kasami-0.0.7.tar.gz 的哈希值
算法: | 哈希摘要: | |
---|---|---|
SHA256: | f821c030ac65be4d1cc219beaa22ea2e97877b0d19a6561f69ce29d8b20883d0 |
|
MD5: | 1191dea9710a0aba1831317b1ecd0207 |
|
BLAKE2b-256: | 822e4a9cb57823bc86caba8cae20c9d885ed3e70f0b7708fe837d130a95f3ebe |
关闭
kasami-0.0.7-py3-none-any.whl 的哈希值
算法: | 哈希摘要: | |
---|---|---|
SHA256: | 19eac1ac9bf5ddf02386cbbc8718ea4c6a7c1367b4c6523850981cb368bf20ca |
|
MD5: | 8ffee5088029d284e6a64cea009e9524 |
|
BLAKE2b-256: | ffc641fc6fefb0596c6bd1bbb554aa2a5c08067be856f576dde20d17f805e186 |