Substrait的Python包。
项目描述
Substrait
Substrait的Python包,Substrait是数据计算操作的多语言规范。
安装
您可以从PyPI或conda-forge安装Python substrait绑定。
pip install substrait
conda install -c conda-forge python-substrait # or use mamba
目标
此项目旨在为Substrait规范提供Python接口。它将允许用户从Python构建并操作Substrait计划,以便由Substrait消费者(如DataFusion或DuckDB)进行评估。
非目标
此项目不是Substrait计划的执行引擎。
状态
这是一个仍在开发中的实验性包。
示例
生成Substrait计划
substrait.proto
模块提供了访问表示Substrait计划的类的访问权限,从而允许创建新的计划。
以下是一个等价于SELECT first_name FROM person
的计划,其中people
表具有first_name
和surname
列,类型为String
。
>>> from substrait import proto
>>> plan = proto.Plan(
... relations=[
... proto.PlanRel(
... root=proto.RelRoot(
... names=["first_name"],
... input=proto.Rel(
... read=proto.ReadRel(
... named_table=proto.ReadRel.NamedTable(names=["people"]),
... base_schema=proto.NamedStruct(
... names=["first_name", "surname"],
... struct=proto.Type.Struct(
... types=[
... proto.Type(string=proto.Type.String(nullability=proto.Type.Nullability.NULLABILITY_REQUIRED)),
... proto.Type(string=proto.Type.String(nullability=proto.Type.Nullability.NULLABILITY_REQUIRED))
... ] # /types
... ) # /struct
... ) # /base_schema
... ) # /read
... ) # /input
... ) # /root
... ) # /PlanRel
... ] # /relations
... )
>>> print(plan)
relations {
root {
input {
read {
base_schema {
names: "first_name"
names: "surname"
struct {
types {
string {
nullability: NULLABILITY_REQUIRED
}
}
types {
string {
nullability: NULLABILITY_REQUIRED
}
}
}
}
named_table {
names: "people"
}
}
}
names: "first_name"
}
}
>>> serialized_plan = p.SerializeToString()
>>> serialized_plan
b'\x1aA\x12?\n1\n/\x12#\n\nfirst_name\n\x07surname\x12\x0c\n\x04b\x02\x10\x02\n\x04b\x02\x10\x02:\x08\n\x06people\x12\nfirst_name'
消费Substrait计划
与上一个示例中生成的相同计划,可以使用Plan.ParseFromString
方法从其二进制表示中加载。
>>> from substrait.proto import Plan
>>> p = Plan()
>>> p.ParseFromString(serialized_plan)
67
>>> p
relations {
root {
input {
read {
base_schema {
names: "first_name"
names: "surname"
struct {
types {
string {
nullability: NULLABILITY_REQUIRED
}
}
types {
string {
nullability: NULLABILITY_REQUIRED
}
}
}
}
named_table {
names: "people"
}
}
}
names: "first_name"
}
}
从JSON加载Substrait计划
可以使用substrait.json.load_json
和substrait.json.parse_json
函数从其JSON表示中加载Substrait计划。
>>> import substrait.json
>>> jsontext = """{
... "relations":[
... {
... "root":{
... "input":{
... "read":{
... "baseSchema":{
... "names":[
... "first_name",
... "surname"
... ],
... "struct":{
... "types":[
... {
... "string":{
... "nullability":"NULLABILITY_REQUIRED"
... }
... },
... {
... "string":{
... "nullability":"NULLABILITY_REQUIRED"
... }
... }
... ]
... }
... },
... "namedTable":{
... "names":[
... "people"
... ]
... }
... }
... },
... "names":[
... "first_name"
... ]
... }
... }
... ]
... }"""
>>> substrait.json.parse_json(jsontext)
relations {
root {
input {
read {
base_schema {
names: "first_name"
names: "surname"
struct {
types {
string {
nullability: NULLABILITY_REQUIRED
}
}
types {
string {
nullability: NULLABILITY_REQUIRED
}
}
}
}
named_table {
names: "people"
}
}
}
names: "first_name"
}
}
使用Ibis生成Substrait计划
让我们使用现有的Substrait生产者Ibis,以Python Substrait作为消费者提供示例。
In [1]: import ibis
In [2]: movie_ratings = ibis.table(
...: [
...: ("tconst", "str"),
...: ("averageRating", "str"),
...: ("numVotes", "str"),
...: ],
...: name="ratings",
...: )
...:
In [3]: query = movie_ratings.select(
...: movie_ratings.tconst,
...: avg_rating=movie_ratings.averageRating.cast("float"),
...: num_votes=movie_ratings.numVotes.cast("int"),
...: )
In [4]: from ibis_substrait.compiler.core import SubstraitCompiler
In [5]: compiler = SubstraitCompiler()
In [6]: protobuf_msg = compiler.compile(query).SerializeToString()
In [7]: from substrait.proto import Plan
In [8]: my_plan = Plan()
In [9]: my_plan.ParseFromString(protobuf_msg)
Out[9]: 186
In [10]: print(my_plan)
relations {
root {
input {
project {
common {
emit {
output_mapping: 3
output_mapping: 4
output_mapping: 5
}
}
input {
read {
common {
direct {
}
}
base_schema {
names: "tconst"
names: "averageRating"
names: "numVotes"
struct {
types {
string {
nullability: NULLABILITY_NULLABLE
}
}
types {
string {
nullability: NULLABILITY_NULLABLE
}
}
types {
string {
nullability: NULLABILITY_NULLABLE
}
}
nullability: NULLABILITY_REQUIRED
}
}
named_table {
names: "ratings"
}
}
}
expressions {
selection {
direct_reference {
struct_field {
}
}
root_reference {
}
}
}
expressions {
cast {
type {
fp64 {
nullability: NULLABILITY_NULLABLE
}
}
input {
selection {
direct_reference {
struct_field {
field: 1
}
}
root_reference {
}
}
}
failure_behavior: FAILURE_BEHAVIOR_THROW_EXCEPTION
}
}
expressions {
cast {
type {
i64 {
nullability: NULLABILITY_NULLABLE
}
}
input {
selection {
direct_reference {
struct_field {
field: 2
}
}
root_reference {
}
}
}
failure_behavior: FAILURE_BEHAVIOR_THROW_EXCEPTION
}
}
}
}
names: "tconst"
names: "avg_rating"
names: "num_votes"
}
}
version {
minor_number: 24
producer: "ibis-substrait"
}
项目详情
下载文件
下载适合您平台的文件。如果您不确定选择哪个,请了解有关安装包的更多信息。
源分发
substrait-0.23.0.tar.gz (97.7 kB 查看哈希值)
构建分发
substrait-0.23.0-py3-none-any.whl (105.1 kB 查看哈希值)
关闭
substrait-0.23.0.tar.gz 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 456e52ba2643616189c939d7f48044232e8d371772fdafbec0ead20c54ab790f |
|
MD5 | 631a2016271bb280bb7848e4844fbb14 |
|
BLAKE2b-256 | e4cf53db82342e1d86ca6e21689efedfa1253d2073e5ae0f6d8e88edb8015c9d |
关闭
substrait-0.23.0-py3-none-any.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | f97efd5f6ce0d38dc95edb62e3843bcdd4c66e94ff395af8da89f077ca093f74 |
|
MD5 | 43c74420fa95d736877751e20bdc6b5d |
|
BLAKE2b-256 | 140dd7810851790f234fb1afbbf065bd35bb7cb24369ea2a086fcaee620c72c8 |