跳转到主要内容

Arrow -> PostgreSQL编码器

项目描述

pgpq

将PyArrow RecordBatches转换为Postgres的本地二进制格式。

用法

"""Example for README.md"""
from tempfile import mkdtemp
import psycopg
import pyarrow.dataset as ds
import requests
from pgpq import ArrowToPostgresBinaryEncoder

# let's get some example data
tmpdir = mkdtemp()
with open(f"{tmpdir}/yellow_tripdata_2023-01.parquet", mode="wb") as f:
    resp = requests.get(
        "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet"
    )
    resp.raise_for_status()
    f.write(resp.content)

# load an arrow dataset
# arrow can load datasets from partitioned parquet files locally or in S3/GCS
# it handles buffering, matching globs, etc.
dataset = ds.dataset(tmpdir)

# create an encoder object which will do the encoding
# and give us the expected Postgres table schema
encoder = ArrowToPostgresBinaryEncoder(dataset.schema)
# get the expected Postgres destination schema
# note that this is _not_ the same as the incoming arrow schema
# and not necessarily the schema of your permanent table
# instead it's the schema of the data that will be sent over the wire
# which for example does not have timezones on any timestamps
pg_schema = encoder.schema()
# assemble ddl for a temporary table
# it's often a good idea to bulk load into a temp table to:
# (1) Avoid indexes
# (2) Stay in-memory as long as possible
# (3) Be more flexible with types
#     (you can't load a SMALLINT into a BIGINT column without casting)
cols = [f'"{col_name}" {col.data_type.ddl()}' for col_name, col in pg_schema.columns]
ddl = f"CREATE TEMP TABLE data ({','.join(cols)})"

with psycopg.connect("postgres://postgres:postgres@localhost:5432/postgres") as conn:
    with conn.cursor() as cursor:
        cursor.execute(ddl)  # type: ignore
        with cursor.copy("COPY data FROM STDIN WITH (FORMAT BINARY)") as copy:
            copy.write(encoder.write_header())
            for batch in dataset.to_batches():
                copy.write(encoder.write_batch(batch))
            copy.write(encoder.finish())
        # load into your actual table, possibly doing type casts
        # cursor.execute("INSERT INTO \"table\" SELECT * FROM data")

项目详情


下载文件

下载适合您平台的文件。如果您不确定选择哪个,请了解更多关于 安装包 的信息。

源代码分发

pgpq-0.9.0.tar.gz (24.9 kB 查看哈希值)

上传时间: 源代码

构建的分发

pgpq-0.9.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (605.6 kB 查看哈希值)

上传时间: PyPy manylinux: glibc 2.17+ x86-64

pgpq-0.9.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (572.8 kB 查看哈希值)

上传时间 PyPy manylinux: glibc 2.17+ ARM64

pgpq-0.9.0-pp39-pypy39_pp73-macosx_10_7_x86_64.whl (563.5 kB 查看哈希值)

上传时间 PyPy macOS 10.7+ x86-64

pgpq-0.9.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (605.9 kB 查看哈希值)

上传时间 PyPy manylinux: glibc 2.17+ x86-64

pgpq-0.9.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (572.6 kB 查看哈希值)

上传时间 PyPy manylinux: glibc 2.17+ ARM64

pgpq-0.9.0-pp38-pypy38_pp73-macosx_10_7_x86_64.whl (563.7 kB 查看哈希值)

上传时间 PyPy macOS 10.7+ x86-64

pgpq-0.9.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (609.2 kB 查看哈希值)

上传时间 PyPy manylinux: glibc 2.17+ x86-64

pgpq-0.9.0-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (575.6 kB 查看哈希值)

上传时间 PyPy manylinux: glibc 2.17+ ARM64

pgpq-0.9.0-pp37-pypy37_pp73-macosx_10_7_x86_64.whl (565.5 kB 查看哈希值)

上传时间 PyPy macOS 10.7+ x86-64

pgpq-0.9.0-cp37-abi3-win_amd64.whl (501.0 kB 查看哈希值)

上传时间 CPython 3.7+ Windows x86-64

pgpq-0.9.0-cp37-abi3-win32.whl (442.2 kB 查看哈希值)

上传时间 CPython 3.7+ Windows x86

pgpq-0.9.0-cp37-abi3-musllinux_1_2_x86_64.whl (776.2 kB 查看哈希值)

上传于 CPython 3.7+ musllinux: musl 1.2+ x86-64

pgpq-0.9.0-cp37-abi3-musllinux_1_2_i686.whl (787.5 kB 查看哈希值)

上传于 CPython 3.7+ musllinux: musl 1.2+ i686

pgpq-0.9.0-cp37-abi3-musllinux_1_2_armv7l.whl (806.5 kB 查看哈希值)

上传于 CPython 3.7+ musllinux: musl 1.2+ ARMv7l

pgpq-0.9.0-cp37-abi3-musllinux_1_2_aarch64.whl (749.7 kB 查看哈希值)

上传于 CPython 3.7+ musllinux: musl 1.2+ ARM64

pgpq-0.9.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (604.6 kB 查看哈希值)

上传于 CPython 3.7+ manylinux: glibc 2.17+ x86-64

pgpq-0.9.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (789.9 kB 查看哈希值)

上传于 CPython 3.7+ manylinux: glibc 2.17+ s390x

pgpq-0.9.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (842.1 kB 查看哈希值)

上传于 CPython 3.7+ manylinux: glibc 2.17+ ppc64le

pgpq-0.9.0-cp37-abi3-manylinux_2_17_ppc64.manylinux2014_ppc64.whl (893.3 kB 查看哈希值)

上传于 CPython 3.7+ manylinux: glibc 2.17+ ppc64

pgpq-0.9.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (543.3 kB 查看哈希值)

上传于 CPython 3.7+ manylinux: glibc 2.17+ ARMv7l

pgpq-0.9.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (572.0 kB 查看哈希值)

上传于 CPython 3.7+ manylinux: glibc 2.17+ ARM64

pgpq-0.9.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl (638.7 kB 查看哈希值)

上传时间 CPython 3.7+ manylinux: glibc 2.5+ i686

pgpq-0.9.0-cp37-abi3-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (1.1 MB 查看哈希值)

上传时间 CPython 3.7+ macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

pgpq-0.9.0-cp37-abi3-macosx_10_7_x86_64.whl (563.7 kB 查看哈希值)

上传时间 CPython 3.7+ macOS 10.7+ x86-64

由...