Python中的文本理解引擎。
项目描述
Python中的文本理解引擎
构建状态
什么是Cahoots?
Cahoots是一种软件,试图将文本片段分类到几个类别之一。它运行一系列解析器,并返回一系列可能的数据类型和解释及其置信度值。简而言之,它试图“理解”您提供的文本片段。
它的理想用途是在具有长期生命周期/执行时间的守护程序或服务中。它也可以作为独立的Cahoots服务器运行(详情请见此处)。
Cahoots不是什么?
Cahoots不是为了绘图和挖掘大量文本集而设计的。虽然文本挖掘引擎可以利用Cahoots来针对从大量文本中挖掘出的特定片段,但Cahoots不是用来挖掘文本的。
Cahoots不是您会集成到非守护程序Web应用程序中的软件,该应用程序会在每次页面视图时启动和实例化。启动过程可能有些繁重,并且应在应用程序启动过程中运行一次。
安装
sudo pip install cahoots
基本模块使用
>>> from cahoots.parser import CahootsParser
>>> cahoots = CahootsParser(bootstrap=True)
>>> results = cahoots.parse('http://www.google.com/')
>>> results
{
'date': '2015-03-22T23:47:36.340187',
'query': 'http://www.google.com/',
'top': <cahoots.result.ParseResult object>,
'results': {
'count': 1,
'matches': [
<cahoots.result.ParseResult object>
],
'types': [
"URI"
]
},
'execution_seconds': 0.006306886672973633
}
基本服务器使用
$ cahootserver --help
Cahoots Server Help:
-h, --help
Show this help
-p [port], --port [port]
Set the port the server should listen on
-d, --debug
Run the server in debug mode (errors displayed, debug output)
$ sudo cahootserver --port 80 --debug
* 00:38:18 04/04/15 CDT * Bootstrapping AddressParser
* 00:38:18 04/04/15 CDT * Bootstrapping CoordinateParser
* 00:38:18 04/04/15 CDT * Bootstrapping DateParser
* 00:38:18 04/04/15 CDT * Bootstrapping EmailParser
* 00:38:18 04/04/15 CDT * Bootstrapping LandmarkParser
* 00:38:18 04/04/15 CDT * Bootstrapping MeasurementParser
* 00:53:19 04/05/15 CDT * Bootstrapping NameParser
* 00:38:18 04/04/15 CDT * Bootstrapping PostalCodeParser
* 00:38:18 04/04/15 CDT * Bootstrapping ProgrammingParser
* Running on http://0.0.0.0:80/ (Press CTRL+C to quit)
* Restarting with reloader
* 00:38:18 04/04/15 CDT * Bootstrapping AddressParser
* 00:38:18 04/04/15 CDT * Bootstrapping CoordinateParser
* 00:38:18 04/04/15 CDT * Bootstrapping DateParser
* 00:38:18 04/04/15 CDT * Bootstrapping EmailParser
* 00:38:18 04/04/15 CDT * Bootstrapping LandmarkParser
* 00:38:18 04/04/15 CDT * Bootstrapping MeasurementParser
* 00:53:19 04/05/15 CDT * Bootstrapping NameParser
* 00:38:18 04/04/15 CDT * Bootstrapping PostalCodeParser
* 00:38:18 04/04/15 CDT * Bootstrapping ProgrammingParser
# CTRL+C pressed
$ sudo cahootserver --port 80
* Running on http://0.0.0.0:80/ (Press CTRL+C to quit)
文档
许可
The MIT License (MIT) Copyright (c) 2012-2015 Serenity Software, LLC Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
CaHoots使用许多代码示例来训练贝叶斯分类器。所有代码示例均来自使用BSD或MIT许可证的项目。这些代码在任何时候都不会执行。
项目详情
关闭
Cahoots-0.5.2.zip的散列值
算法 | 散列摘要 | |
---|---|---|
SHA256 | cf16cff674c3fcb4f8f9cc90195928f38e6e4261c352a46ee6ebb4d73d7d8d84 |
|
MD5 | a4c563c2853b0cf779fb9003ced910dc |
|
BLAKE2b-256 | 6b5ca82ebed2368b30a0852a222ccd9400955646de9ae79d1bdb920c3aad5b13 |