跳转到主要内容

Python DB-API和SQLAlchemy方言用于Pinot。

项目描述

Python DB-API和SQLAlchemy方言用于Pinot

此模块允许通过其SQL API访问Pinot。

当前支持的Pinot版本: 1.1.0

用法

使用DB API直接查询Pinot Broker

from pinotdb import connect

# this assumes 8000 is the broker port
conn = connect(host='localhost', port=8000, path='/query/sql', scheme='http')
curs = conn.cursor()
curs.execute("""
    SELECT place,
           CAST(REGEXP_EXTRACT(place, '(.*),', 1) AS FLOAT) AS lat,
           CAST(REGEXP_EXTRACT(place, ',(.*)', 1) AS FLOAT) AS lon
      FROM places
     LIMIT 10
""")
for row in curs:
    print(row)

对于HTTPS

from pinotdb import connect

# this assumes that 443 is the broker secure https port
conn = connect(host='localhost', port=443, path='/query/sql', scheme='https')
curs = conn.cursor()
curs.execute("""
    SELECT place,
           CAST(REGEXP_EXTRACT(place, '(.*),', 1) AS FLOAT) AS lat,
           CAST(REGEXP_EXTRACT(place, ',(.*)', 1) AS FLOAT) AS lon
      FROM places
     LIMIT 10
""")
for row in curs:
    print(row)

Pinot还支持基本认证,例如。

conn = connect(host="localhost", port=443, path="/query/sql", scheme="https", username="my-user", password="my-password", verify_ssl=True)

要将额外的查询参数(如 useMultistageEngine=true)传递给程序,您可以将它们作为 execute 方法的参数传递。例如:

curs.execute("select * from airlineStats air limit 10", queryOptions="useMultistageEngine=true")

传递 Pinot 数据库上下文

[!IMPORTANT] 此功能从 5.1.5 版本开始提供

from pinotdb import connect

# this assumes that 443 is the broker secure https port
conn = connect(host='localhost', port=8000, path='/query/sql', scheme='http', database='dbName')
curs = conn.cursor()
curs.execute("""
    SELECT col1 from table1 LIMIT 10
""")
for row in curs:
    print(row)

其中,

  • dbName:需要传递的数据库上下文
  • table1:位于 dbName 数据库下的表

如果没有指定 database,连接将使用 default 数据库上下文。

使用 SQLAlchemy

由于数据库引擎需要比 Pinot 代理更多的信息,因此您需要提供用于表和模式信息的 Pinot 控制器。

数据库引擎连接字符串格式如下

pinot+<pinot-broker-protocol>://<pinot-broker-host>:<pinot-broker-port><pinot-broker-path>?controller=<pinot-controller-protocol>://<pinot-controller-host>:<pinot-controller-port>/

默认方案是 HTTP,因此您可以忽略它。例如,pinot+http://localhost:8099/query/sql?controller=http://localhost:9000/pinot://localhost:8099/query/sql?controller=localhost:9000/ 以相同的方式工作。

对于 HTTPS,您必须显式指定与端口号一起的 https 方案。

pinot+https://<pinot-broker-host>:<pinot-broker-port><pinot-broker-path>?controller=https://<pinot-controller-host>:<pinot-controller-port>/

例如,pinot+https://pinot-broker.pinot.live:443/query/sql?controller=https://pinot-controller.pinot.live/

请注意,必须显式地将代理端口 443 放在那里。

这可以用作 Superset 到 Pinot 的连接

如果您有基本身份验证

pinot+https://<my-user>:<my-password>@<pinot-broker-host>:<pinot-broker-port><pinot-broker-path>?controller=https://<pinot-controller-host>:<pinot-controller-port>/[&&verify_ssl=<true/false>]

例如,pinot+https://my-user:my-password@my-secure-pinot-broker:443/query/sql?controller=https://my-secure-pinot-controller/&&verify_ssl=true

以下是一些使用 sqlalchemy 查询 Pinot 的示例脚本

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *

engine = create_engine('pinot://localhost:8099/query/sql?controller=http://localhost:9000/')  # uses HTTP by default :(
# or, using explicit HTTP:
# engine = create_engine('pinot+http://localhost:8099/query/sql?controller=http://localhost:9000/')
# or, using explicit HTTPS:
# engine = create_engine('pinot+https://localhost:8099/query/sql?controller=https://localhost:9000/')
# or, provide extra argument to connect with multi-stage engine enabled:
# engine = create_engine(
#     "pinot://localhost:8000/query/sql?controller=http://localhost:9000/",
#     connect_args={"useMultistageEngine": "true"}
# )

places = Table('places', MetaData(bind=engine), autoload=True)
print(select([func.count('*')], from_obj=places).scalar())

要在引擎级别配置查询参数(如 timeoutMs=10000),您可以在创建引擎时传递它们。例如:

engine = create_engine(
        "pinot://localhost:8000/query/sql?controller=http://localhost:9000/",
        connect_args={"query_options": "useMultistageEngine=true;timeoutMs=10000"})

传递 Pinot 数据库上下文

[!IMPORTANT] 此功能从 5.1.5 版本开始提供

每个连接应只查询一个 Pinot 数据库,因此我们通过连接字符串本身提供上下文。

数据库引擎连接字符串格式如下

pinot+http://pinot-broker:8099/query/sql?controller=http://pinot-controller:9000/&database=dbName

其中 dbName 是需要传递的数据库上下文。如果没有指定,查询时将使用 default 数据库上下文。

与 Pinot 快速入门示例

启动 Pinot 批量快速入门

docker run --name pinot-quickstart -p 2123:2123 -p 9000:9000 -p 8000:8000 -d apachepinot/pinot:latest QuickStart -type batch

一旦 Pinot 批量快速入门启动,您就可以运行以下示例代码片段来查询 Pinot

python3 examples/pinot_quickstart_batch.py

示例输出

Sending SQL to Pinot: SELECT * FROM baseballStats LIMIT 5
[0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 'NL', 11, 11, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'SFN', 0, 2004]
[2, 45, 0, 0, 0, 0, 0, 0, 0, 0, 'NL', 45, 43, 'aardsda01', 'David Allan', 1, 0, 0, 0, 1, 0, 0, 'CHN', 0, 2006]
[0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 25, 2, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'CHA', 0, 2007]
[1, 5, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 47, 5, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 1, 'BOS', 0, 2008]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 73, 3, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'SEA', 0, 2009]

Sending SQL to Pinot: SELECT playerName, sum(runs) FROM baseballStats WHERE yearID>=2000 GROUP BY playerName LIMIT 5
['Scott Michael', 26.0]
['Justin Morgan', 0.0]
['Jason Andre', 0.0]
['Jeffrey Ellis', 0.0]
['Maximiliano R.', 16.0]

Sending SQL to Pinot: SELECT playerName,sum(runs) AS sum_runs FROM baseballStats WHERE yearID>=2000 GROUP BY playerName ORDER BY sum_runs DESC LIMIT 5
['Adrian', 1820.0]
['Jose Antonio', 1692.0]
['Rafael', 1565.0]
['Brian Michael', 1500.0]
['Alexander Emmanuel', 1426.0]

启动 Pinot 混合快速入门

docker run --name pinot-quickstart -p 2123:2123 -p 9000:9000 -p 8000:8000 -d apachepinot/pinot:latest QuickStart -type hybrid

以下是针对 Pinot 快速入门混合的示例

python3 examples/pinot_quickstart_hybrid.py
Sending SQL to Pinot: SELECT * FROM airlineStats LIMIT 5
[171, 153, 19393, 0, 8, 8, 1433, '1400-1459', 0, 1425, 1240, 165, 'null', 0, 'WN', -2147483648, 1, 27, 17540, 0, 2, 2, 1242, '1200-1259', 0, 'MDW', 13232, 1323202, 30977, 'Chicago, IL', 'IL', 17, 'Illinois', 41, 861, 4, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 402, 1, -2147483648, -2147483648, 1, -2147483648, 'BOS', 10721, 1072102, 30721, 'Boston, MA', 'MA', 25, 'Massachusetts', 13, 1, ['null'], -2147483648, 'N556WN', 6, 12, -2147483648, 'WN', -2147483648, 1254, 1427, 2014]
[183, 141, 20398, 1, 17, 17, 1302, '1200-1259', 1, 1245, 1005, 160, 'null', 0, 'MQ', 0, 1, 27, 17540, 0, -6, 0, 959, '1000-1059', -1, 'CMH', 11066, 1106603, 31066, 'Columbus, OH', 'OH', 39, 'Ohio', 44, 990, 4, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 3574, 1, 0, -2147483648, 1, 17, 'MIA', 13303, 1330303, 32467, 'Miami, FL', 'FL', 12, 'Florida', 33, 1, ['null'], 0, 'N605MQ', 13, 29, -2147483648, 'MQ', 0, 1028, 1249, 2014]
[-2147483648, -2147483648, 20304, -2147483648, -2147483648, -2147483648, -2147483648, '2100-2159', -2147483648, 2131, 2005, 146, 'null', 0, 'OO', -2147483648, 1, 27, 17541, 1, 52, 52, 2057, '2000-2059', 3, 'COS', 11109, 1110902, 30189, 'Colorado Springs, CO', 'CO', 8, 'Colorado', 82, 809, 4, -2147483648, [11292], 1, [1129202], ['DEN'], -2147483648, 73, [9], 0, ['null'], [9], [-2147483648], [2304], 1, -2147483648, '2014-01-27', 5554, 1, -2147483648, -2147483648, 1, -2147483648, 'IAH', 12266, 1226603, 31453, 'Houston, TX', 'TX', 48, 'Texas', 74, 1, ['SEA', 'PSC', 'PHX', 'MSY', 'ATL', 'TYS', 'DEN', 'CHS', 'PDX', 'LAX', 'EWR', 'SFO', 'PIT', 'RDU', 'RAP', 'LSE', 'SAN', 'SBN', 'IAH', 'OAK', 'BRO', 'JFK', 'SAT', 'ORD', 'ACY', 'DFW', 'BWI'], -2147483648, 'N795SK', -2147483648, 19, -2147483648, 'OO', -2147483648, 2116, -2147483648, 2014]
[153, 125, 20436, 1, 41, 41, 1442, '1400-1459', 2, 1401, 1035, 146, 'null', 0, 'F9', 2, 1, 27, 17541, 1, 34, 34, 1109, '1000-1059', 2, 'DEN', 11292, 1129202, 30325, 'Denver, CO', 'CO', 8, 'Colorado', 82, 967, 4, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 658, 1, 8, -2147483648, 1, 31, 'SFO', 14771, 1477101, 32457, 'San Francisco, CA', 'CA', 6, 'California', 91, 1, ['null'], 0, 'N923FR', 11, 17, -2147483648, 'F9', 0, 1126, 1431, 2014]
[-2147483648, -2147483648, 20304, -2147483648, -2147483648, -2147483648, -2147483648, '1400-1459', -2147483648, 1432, 1314, 78, 'B', 1, 'OO', -2147483648, 1, 27, 17541, -2147483648, -2147483648, -2147483648, -2147483648, '1300-1359', -2147483648, 'EAU', 11471, 1147103, 31471, 'Eau Claire, WI', 'WI', 55, 'Wisconsin', 45, 268, 2, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 5455, 1, -2147483648, -2147483648, 1, -2147483648, 'ORD', 13930, 1393003, 30977, 'Chicago, IL', 'IL', 17, 'Illinois', 41, 1, ['null'], -2147483648, 'N903SW', -2147483648, -2147483648, -2147483648, 'OO', -2147483648, -2147483648, -2147483648, 2014]

Sending SQL to Pinot: SELECT count(*) FROM airlineStats LIMIT 5
[17772]

Sending SQL to Pinot: SELECT AirlineID, sum(Cancelled) FROM airlineStats WHERE Year > 2010 GROUP BY AirlineID LIMIT 5
[20409, 40.0]
[19930, 16.0]
[19805, 60.0]
[19790, 115.0]
[20366, 172.0]

Sending SQL to Pinot: select OriginCityName, max(Flights) from airlineStats group by OriginCityName ORDER BY max(Flights) DESC LIMIT 5
['Casper, WY', 1.0]
['Deadhorse, AK', 1.0]
['Austin, TX', 1.0]
['Chicago, IL', 1.0]
['Monterey, CA', 1.0]

Sending SQL to Pinot: SELECT OriginCityName, sum(Cancelled) AS sum_cancelled FROM airlineStats WHERE Year>2010 GROUP BY OriginCityName ORDER BY sum_cancelled DESC LIMIT 5
['Chicago, IL', 178.0]
['Atlanta, GA', 111.0]
['New York, NY', 65.0]
['Houston, TX', 62.0]
['Denver, CO', 49.0]

Sending Count(*) SQL to Pinot
17773

Sending SQL: "SELECT OriginCityName, sum(Cancelled) AS sum_cancelled FROM "airlineStats" WHERE Year>2010 GROUP BY OriginCityName ORDER BY sum_cancelled DESC LIMIT 5" to Pinot
[('Chicago, IL', 178.0), ('Atlanta, GA', 111.0), ('New York, NY', 65.0), ('Houston, TX', 62.0), ('Denver, CO', 49.0)]

与现有的 pinot.live 演示集群示例

只需运行以下脚本即可使用 pinotdb 连接和 sqlalchemy 以两种方式查询 pinot.live 演示集群。

python3 examples/pinot_live.py

响应

Sending SQL to Pinot: SELECT * FROM airlineStats LIMIT 5
[384, 359, 19805, 0, 13, 13, 1238, '1200-1259', 0, 1225, 900, 385, 'null', 0, 'AA', -2147483648, 3, 1, 16071, 0, 14, 14, 914, '0900-0959', 0, 'LAX', 12892, 1289203, 32575, 'Los Angeles, CA', 'CA', 6, 'California', 91, 2475, 10, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-01', 1, 1, -2147483648, -2147483648, 1, -2147483648, 'JFK', 12478, 1247802, 31703, 'New York, NY', 'NY', 36, 'New York', 22, 1, ['SEA', 'PSC', 'PHX', 'MSY', 'ATL', 'TYS', 'DEN', 'CHS', 'PDX', 'LAX', 'EWR', 'SFO', 'PIT', 'RDU', 'RAP', 'LSE', 'SAN', 'SBN', 'IAH', 'OAK', 'BRO', 'JFK', 'SAT', 'ORD', 'ACY', 'DFW', 'BWI', 'TPA', 'BFL', 'BOS', 'SNA', 'ISN'], -2147483648, 'N338AA', 5, 20, -2147483648, 'AA', -2147483648, 934, 1233, 2014]
[269, 251, 19805, 0, -36, 0, 1549, '1600-1659', -2, 1625, 825, 300, 'null', 0, 'AA', -2147483648, 3, 1, 16071, 0, -5, 0, 820, '0800-0859', -1, 'JFK', 12478, 1247802, 31703, 'New York, NY', 'NY', 36, 'New York', 22, 2248, 9, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-01', 44, 1, -2147483648, -2147483648, 1, -2147483648, 'LAS', 12889, 1288903, 32211, 'Las Vegas, NV', 'NV', 32, 'Nevada', 85, 1, ['SEA', 'PSC', 'PHX', 'MSY', 'ATL', 'TYS', 'DEN', 'CHS', 'PDX', 'LAX', 'EWR', 'SFO', 'PIT', 'RDU', 'RAP', 'LSE', 'SAN', 'SBN', 'IAH', 'OAK'], -2147483648, 'N3DVAA', 6, 12, -2147483648, 'AA', -2147483648, 832, 1543, 2014]
[307, 288, 19805, 0, -26, 0, 2039, '2100-2159', -2, 2105, 1340, 325, 'null', 0, 'AA', -2147483648, 3, 1, 16071, 0, -8, 0, 1332, '1300-1359', -1, 'LAX', 12892, 1289203, 32575, 'Los Angeles, CA', 'CA', 6, 'California', 91, 2556, 11, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-01', 162, 1, -2147483648, -2147483648, 1, -2147483648, 'HNL', 12173, 1217301, 32134, 'Honolulu, HI', 'HI', 15, 'Hawaii', 2, 1, ['SEA', 'PSC', 'PHX', 'MSY', 'ATL', 'TYS', 'DEN'], -2147483648, 'N5FCAA', 8, 11, -2147483648, 'AA', -2147483648, 1343, 2031, 2014]
[141, 126, 19805, 0, -19, 0, 1456, '1500-1559', -2, 1515, 1135, 160, 'null', 0, 'AA', -2147483648, 3, 1, 16071, 0, 0, 0, 1135, '1100-1159', 0, 'DCA', 11278, 1127802, 30852, 'Washington, DC', 'VA', 51, 'Virginia', 38, 1192, 5, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-01', 130, 1, -2147483648, -2147483648, 1, -2147483648, 'DFW', 11298, 1129803, 30194, 'Dallas/Fort Worth, TX', 'TX', 48, 'Texas', 74, 1, ['null'], -2147483648, 'N3EGAA', 4, 11, -2147483648, 'AA', -2147483648, 1146, 1452, 2014]
[300, 277, 19805, 0, -8, 0, 32, '0001-0559', -1, 40, 1625, 315, 'null', 0, 'AA', -2147483648, 3, 1, 16071, 0, 7, 7, 1632, '1600-1659', 0, 'JFK', 12478, 1247802, 31703, 'New York, NY', 'NY', 36, 'New York', 22, 2475, 10, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-01', 180, 1, -2147483648, -2147483648, 1, -2147483648, 'LAX', 12892, 1289203, 32575, 'Los Angeles, CA', 'CA', 6, 'California', 91, 1, ['null'], -2147483648, 'N335AA', 10, 13, -2147483648, 'AA', -2147483648, 1645, 22, 2014]

Sending Count(*) SQL to Pinot
9746

Sending SQL: "SELECT playerName, sum(runs) AS sum_runs FROM "baseballStats" WHERE yearID>=2000 GROUP BY playerName ORDER BY sum_runs DESC LIMIT 5" to Pinot
[(19790, 581.0), (19977, 522.0), (19690, 520.0), (19805, 481.0), (20409, 410.0), (21171, 385.0), (19930, 378.0), (20355, 377.0), (19393, 326.0), (20437, 268.0)]

开发

为了开发此库,您需要安装 Poetry 和 tox。

确保您已安装它们后,测试库

  1. 运行 Pinot 快速入门(用于集成测试):$ make run-pinot
  2. 在另一个 shell 中运行:$ make init
  3. 然后:$ make test

发布

准备发布凭据

首先,配置您的发布凭据。您可以简单地通过将 PyPI API 令牌附加到 Poetry 工具来配置凭据

$ poetry config pypi-token.pypi <your_api_token_generated_from_pypi.org>

您只需设置一次即可设置诗歌配置以进行发布。或者,您也可以使用用户名和密码

$ poetry publish --username=<your_username> --password='<your_password>'

构建并发布新的 Pinot DB-API 到 PyPI

根据 SemVer 自动增加项目到更合适的下一个版本。例如,要自动增加补丁版本,只需运行以下命令:

$ poetry version patch

运行以构建发行版

$ poetry build

然后将其发布到 pinotdb 在 PyPI

$ poetry publish

您还可以转到 Github Action:Pinotdb Pypi Publisher 并点击运行工作流程以发布到 PYPI。

项目详细信息


下载文件

下载适用于您平台的文件。如果您不确定选择哪个,请了解有关安装包的更多信息。

源分发

pinotdb-5.6.0.tar.gz (23.2 kB 查看哈希值)

上传时间:

构建分发

pinotdb-5.6.0-py3-none-any.whl (19.5 kB 查看哈希值)

上传时间: Python 3

支持者: