Feast离线存储的Trino支持
项目描述
Feast Trino支持
Trino目前不在Feast的路线图中,此项目旨在为离线存储添加Trino支持。
版本兼容性
feast-trino插件已在以下Python版本上进行了测试[3.7, 3.8, 3.9]
以下是当前feast-trino插件针对Feast和Trino不同版本的测试情况
Feast-trino | Feast | Trino |
---|---|---|
1.0.* | 从0.15.*到0.18.* | 364 |
快速入门
安装feast-trino
- 安装稳定版本
pip install feast-trino
- 安装开发版本(非稳定版本)
pip install git+https://github.com/shopify/feast-trino.git@main
创建特征仓库
feast init feature_repo
编辑feature_store.yaml
将offline_store
类型设置为feast_trino.TrinoOfflineStore
project: feature_repo
registry: data/registry.db
provider: local
offline_store:
type: feast_trino.trino.TrinoOfflineStore
host: localhost
port: 8080
catalog: memory
connector:
type: memory
online_store:
path: data/online_store.db
创建Trino表
编辑feature_repo/example.py
# This is an example feature definition file
import pandas as pd
from google.protobuf.duration_pb2 import Duration
from feast import Entity, Feature, FeatureView, FileSource, ValueType, FeatureStore
from feast_trino.connectors.upload import upload_pandas_dataframe_to_trino
from feast_trino import TrinoSource
from feast_trino.trino_utils import Trino
store = FeatureStore(repo_path="feature_repo")
client = Trino(
user="user",
catalog=store.config.offline_store.catalog,
host=store.config.offline_store.host,
port=store.config.offline_store.port,
)
client.execute_query("CREATE SCHEMA IF NOT EXISTS feast")
client.execute_query("DROP TABLE IF EXISTS feast.driver_stats")
input_df = pd.read_parquet("./feature_repo/data/driver_stats.parquet")
upload_pandas_dataframe_to_trino(
client=client,
df=input_df,
table_ref="feast.driver_stats",
connector_args={"type": "memory"},
)
# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_hourly_stats = TrinoSource(
event_timestamp_column="event_timestamp",
table_ref="feast.driver_stats",
created_timestamp_column="created",
)
# Define an entity for the driver. You can think of entity as a primary key used to
# fetch features.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)
# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
entities=["driver_id"],
ttl=Duration(seconds=86400 * 1),
features=[
Feature(name="conv_rate", dtype=ValueType.FLOAT),
Feature(name="acc_rate", dtype=ValueType.FLOAT),
Feature(name="avg_daily_trips", dtype=ValueType.INT64),
],
online=True,
batch_source=driver_hourly_stats,
tags={},
)
store.apply([driver, driver_hourly_stats_view])
# Run an historical retrieval query
output_df = store.get_historical_features(
entity_df="""
SELECT
1004 AS driver_id,
TIMESTAMP '2021-11-21 15:00:00+00:00' AS event_timestamp
""",
features=["driver_hourly_stats:conv_rate"]
).to_df()
print(output_df.head())
应用特征定义
python feature_repo/example.py
开发和测试
开发
git clone https://github.com/shopify/feast-trino.git
cd feast-trino
# creating virtual env ...
python -v venv venv/
source venv/bin/activate
make build
# before commit
make format
make lint
单元测试
make start-local-cluster
make test
make kill-local-cluster
注意:您可以通过http://localhost:8080/ui/访问Trino的Web UI。这使得查找查询变得容易。
针对Feast通用套件的测试
make install-feast-submodule
make start-local-cluster
make test-python-universal
make kill-local-cluster
使用Feast或Trino的不同版本
makefile包含以下默认值
- FEAST_VERSION: v0.15.1
- TRINO_VERSION: 364
因此,执行make install-feast-submodule
将会自动编译Feast版本v0.15.1
。如果您想尝试其他版本,例如v0.14.1
,只需运行make install-feast-submodule FEAST_VERSION=v0.14.1
在启动本地集群时,同样适用于TRINO_VERSION,使用命令make start-local-cluster TRINO_VERSION=XXX
项目详情
下载文件
下载适用于您平台文件的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。
源代码分发
feast-trino-1.0.1.tar.gz (15.4 kB 查看哈希值)
构建分发
feast_trino-1.0.1-py3-none-any.whl (14.6 kB 查看哈希值)
关闭
feast-trino-1.0.1.tar.gz的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | bfa0df6d5d79f91847f577d3085ab45037cb928f339b64714046eee2bb29091b |
|
MD5 | 230793d54963c598d08b79d35d2c2b66 |
|
BLAKE2b-256 | 4022faab46f2e8b239f67f67c5b6ce62a1ad9839dd98c101e17cdee010dd0b26 |
关闭
feast_trino-1.0.1-py3-none-any.whl的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 9b407aa2b632a8f9ab4afc4758cacea8638fd3313f34f167b36d3ede71a4b017 |
|
MD5 | 1e8c3e3d5594b3f30dda67d7c568ae7b |
|
BLAKE2b-256 | 7fea93cdc0d912107e1503392903a5bff59007b4846ff203a1346b833b05a776 |