内存数据流管道的实验性项目。
项目描述
数据流编程实验。
经过一些实验,Apache Beam的Python SDK获得了正确的API。请使用它。
标准词频统计示例
获取LICENSE.txt中最常见的5个词
from collections import Counter
from tinyflow.serial import ops, Pipeline
pipe = Pipeline() \
| "Split line into words" >> ops.flatmap(lambda x: x.lower().split()) \
| "Remove empty lines" >> ops.filter(bool) \
| "Produce the 5 most common words" >> ops.counter(5) \
| "Sort by frequency desc" >> ops.sort(key=lambda x: x[1], reverse=True)
with open('LICENSE.txt') as f:
results = dict(pipe(f))
仅使用Python的内建函数
from collections import Counter
import itertools as it
with open('LICENSE.txt') as f:
lines = (line.lower().split() for line in f)
words = it.chain.from_iterable(lines)
count = Counter(words)
results = dict(count.most_common(10))
开发
$ git clone https://github.com/geowurster/tinyflow.git
$ cd tinyflow
$ pip install -e .\[all\]
$ pytest --cov tinyflow --cov-report term-missing
许可协议
参见LICENSE.txt
变更日志
参见CHANGES.md
项目详情
下载文件
下载适用于您平台的项目文件。如果您不确定选择哪一个,请了解更多关于安装包的信息。
源代码分发
tinyflow-0.1.macosx-10.12-x86_64.tar.gz (13.1 kB 查看哈希值)
构建分发
tinyflow-0.1-py2.py3-none-any.whl (10.9 kB 查看哈希值)