Arrow, pydantic风格
项目描述
欢迎使用arrowdantic
Arrowdantic是一个小的Python库,它背后有一个成熟的Rust实现,Apache Arrow,它可以与
- Parquet
- Apache Arrow 和
- ODBC(数据库)交互。
对于简单(但数据密集型)的数据工程任务,这个包实际上取代了pyarrow
:它支持从Parquet和Arrow读取和写入,性能更高,安全性更高(例如,没有段错误)。
此外,它支持以与turbodbc
相同的性能或更高的性能从ODBC兼容的数据库(例如,postgres,mongoDB)读取和写入。
这个包特别适合AWS Lambda等环境 - 它仅占用8M的磁盘空间,而pyarrow则占用82M。
功能
- 声明和访问由Arrow支持的数组(整数、浮点数、布尔值、字符串、二进制)
- 从Apache Arrow IPC文件读取和写入
- 从Apache Parquet读取和写入
- 从ODBC兼容的数据库(例如postgres,mongoDB)读取和写入
示例
使用Parquet
import io
import arrowdantic as ad
original_arrays = [ad.UInt32Array([1, None])]
schema = ad.Schema(
[ad.Field(f"c{i}", array.type, True) for i, array in enumerate(original_arrays)]
)
data = io.BytesIO()
with ad.ParquetFileWriter(data, schema) as writer:
writer.write(ad.Chunk(original_arrays))
data.seek(0)
reader = ad.ParquetFileReader(data)
chunk = next(reader)
assert chunk.arrays() == original_arrays
使用Arrow文件
import arrowdantic as ad
original_arrays = [ad.UInt32Array([1, None])]
schema = ad.Schema(
[ad.Field(f"c{i}", array.type, True) for i, array in enumerate(original_arrays)]
)
import io
data = io.BytesIO()
with ad.ArrowFileWriter(data, schema) as writer:
writer.write(ad.Chunk(original_arrays))
data.seek(0)
reader = ad.ArrowFileReader(data)
chunk = next(reader)
assert chunk.arrays() == original_arrays
使用ODBC
import arrowdantic as ad
arrays = [ad.Int32Array([1, None]), ad.StringArray(["aa", None])]
with ad.ODBCConnector(r"Driver={SQLite3};Database=sqlite-test.db") as con:
# create an empty table with a schema
con.execute("DROP TABLE IF EXISTS example;")
con.execute("CREATE TABLE example (c1 INT, c2 TEXT);")
# insert the arrays
con.write("INSERT INTO example (c1, c2) VALUES (?, ?)", ad.Chunk(arrays))
# read the arrays
with con.execute("SELECT c1, c2 FROM example", 1024) as chunks:
assert chunks.fields() == [
ad.Field("c1", ad.DataType.int32(), True),
ad.Field("c2", ad.DataType.string(), True),
]
chunk = next(chunks)
assert chunk.arrays() == arrays
使用时区
此包完全支持datetime及其与arrow之间的转换
import arrowdantic as ad
dt = datetime.datetime(
year=2021,
month=1,
day=1,
hour=1,
minute=1,
second=1,
microsecond=1,
tzinfo=datetime.timezone.utc,
)
a = ad.TimestampArray([dt, None])
assert (
str(a)
== 'Timestamp(Microsecond, Some("+00:00"))[2021-01-01 01:01:01.000001 +00:00, None]'
)
assert list(a) == [dt, None]
assert a.type == ad.DataType.timestamp(datetime.timezone.utc)
项目详情
关闭
哈希值 for arrowdantic-0.2.3-cp310-cp310-macosx_10_7_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | f211b8bd5262d5bd8be098f8a2ca43b7fcdd17cedb810e2f3cf1a2773bf89273 |
|
MD5 | 5fefdd79a8b1fb1ee20183ea494e3218 |
|
BLAKE2b-256 | fdb7301ec72c4f2f9d1180b9d2d4a40a06b31ef4e90b7e175725a9ff3df74d4d |
关闭
哈希值 for arrowdantic-0.2.3-cp39-cp39-macosx_10_7_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 711dbcec7cd2bf2b727b4369f4b34270d7a0707d8da96f629600de7a677ae944 |
|
MD5 | 4a8bfa638e0dba3bfec6b5ea677f5d75 |
|
BLAKE2b-256 | c5a438f1d055cd306da2fa2da6276aca4497d8a5f738a193dd0f5b4bf000be82 |
关闭
哈希值 for arrowdantic-0.2.3-cp38-cp38-macosx_10_7_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | eda287dac0104e60bb4b246b14f8f215756ef3a8ec4007f2a0919b2157ecf755 |
|
MD5 | 79b93073a60a2b0f212955bb121b1966 |
|
BLAKE2b-256 | f8f277bb5ceee722fe26ed7ae1bf75c83c6dbaa5e0135f50ba3aeeae27c71e00 |
关闭
哈希值 for arrowdantic-0.2.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 12644b44e6e29d4ef835cc717d2f9e6bf7d645b12888cf4b9420eb2794099847 |
|
MD5 | 7c1479f8aeab8b122b92b0ecccd3fb01 |
|
BLAKE2b-256 | 3612c31c047833de18fc2f2f7bfdf2ce52217ef0df68c72c5e140235f9949261 |
关闭
哈希值 for arrowdantic-0.2.3-cp37-cp37m-macosx_10_7_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 06fe232564ad73fb09b3a06461e3e64b12304eac0e2fa7de5562852d34ea8db5 |
|
MD5 | 1c2f4c82626f5b997160b2a1e9fd7f42 |
|
BLAKE2b-256 | 64a9b3392c796e38b20823d9cb0998c3937549bfcafbffcfc93adfdc6423173b |