eland · PyPI · Python 包索引

适用于DataFrame、大数据、机器学习以及Elasticsearch中ETL的Python客户端和工具包

这些详情尚未通过PyPI验证

项目链接

主页

项目描述

关于

Eland是一个Python Elasticsearch客户端，它使用与Pandas兼容的API来探索和分析Elasticsearch中的数据。

尽可能使用现有的Python API和数据结构，使您能够轻松地在numpy、pandas或scikit-learn及其Elasticsearch等效项之间切换。一般来说，数据驻留在Elasticsearch中，而不是内存中，这使得Eland能够访问存储在Elasticsearch中的大型数据集。

Eland还提供了工具，可以从scikit-learn、XGBoost和LightGBM等常用库中上传训练好的机器学习模型到Elasticsearch。

入门指南

可以使用Pip从PyPI安装Eland

$ python -m pip install eland

如果使用Eland上传NLP模型到Elasticsearch，请安装PyTorch扩展

$ python -m pip install 'eland[pytorch]'

也可以使用Conda从Conda Forge安装Eland

$ conda install -c conda-forge eland

兼容性

支持Python 3.8、3.9、3.10、3.11和Pandas 1.5
支持7.11+版本的Elasticsearch集群，建议使用8.13或更高版本以实现所有功能。如果您正在使用与PyTorch的NLP功能，请确保Eland的次要版本与Elasticsearch集群的次要版本匹配。对于所有其他功能，主要版本匹配就足够了。
您需要安装适当的PyTorch版本来导入NLP模型。运行python -m pip install 'eland[pytorch]'来安装该版本。

先决条件

在基于Debian的发行版上安装Eland的用户可能需要安装Eland的传递依赖项的先决包

$ sudo apt-get install -y \
  build-essential pkg-config cmake \
  python3-dev libzip-dev libjpeg-dev

请注意，CentOS、RedHat、Arch等其他发行版可能需要使用不同的包管理器和指定不同的包名。

Docker

如果您只想运行可用的脚本而不安装Eland，请使用Docker镜像。它可以交互式使用

$ docker run -it --rm --network host docker.elastic.co/eland/eland

也可以在没有交互式shell的情况下运行已安装的脚本，例如

$ docker run -it --rm --network host \
    docker.elastic.co/eland/eland \
    eland_import_hub_model \
      --url http://host.docker.internal:9200/ \
      --hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \
      --task-type ner

连接到Elasticsearch

Eland使用Elasticsearch低级客户端连接到Elasticsearch。此客户端支持一系列连接选项和认证选项。

您可以将elasticsearch.Elasticsearch的实例传递给Eland API，或者传递一个包含要连接的主机的字符串

import eland as ed

# Connecting to an Elasticsearch instance running on 'http://localhost:9200'
df = ed.DataFrame("http://localhost:9200", es_index_pattern="flights")

# Connecting to an Elastic Cloud instance
from elasticsearch import Elasticsearch

es = Elasticsearch(
    cloud_id="cluster-name:...",
    basic_auth=("elastic", "<password>")
)
df = ed.DataFrame(es, es_index_pattern="flights")

Eland中的DataFrame

eland.DataFrame将Elasticsearch索引包装在类似Pandas的API中，并将所有数据处理和过滤操作延迟到Elasticsearch，而不是您的本地机器。这意味着您可以在不超载机器的情况下，从Jupyter Notebook中处理Elasticsearch中的大量数据。

➤ Eland DataFrame API文档

➤ Jupyter Notebook中的高级示例

>>> import eland as ed

>>> # Connect to 'flights' index via localhost Elasticsearch node
>>> df = ed.DataFrame('http://localhost:9200', 'flights')

# eland.DataFrame instance has the same API as pandas.DataFrame
# except all data is in Elasticsearch. See .info() memory usage.
>>> df.head()
   AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
0      841.265642      False  ...         0 2018-01-01 00:00:00
1      882.982662      False  ...         0 2018-01-01 18:27:00
2      190.636904      False  ...         0 2018-01-01 17:11:14
3      181.694216       True  ...         0 2018-01-01 10:33:28
4      730.041778      False  ...         0 2018-01-01 05:13:00

[5 rows x 27 columns]

>>> df.info()
<class 'eland.dataframe.DataFrame'>
Index: 13059 entries, 0 to 13058
Data columns (total 27 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   AvgTicketPrice      13059 non-null  float64       
 1   Cancelled           13059 non-null  bool          
 2   Carrier             13059 non-null  object        
...      
 24  OriginWeather       13059 non-null  object        
 25  dayOfWeek           13059 non-null  int64         
 26  timestamp           13059 non-null  datetime64[ns]
dtypes: bool(2), datetime64[ns](1), float64(5), int64(2), object(17)
memory usage: 80.0 bytes
Elasticsearch storage usage: 5.043 MB

# Filtering of rows using comparisons
>>> df[(df.Carrier=="Kibana Airlines") & (df.AvgTicketPrice > 900.0) & (df.Cancelled == True)].head()
     AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
8        960.869736       True  ...         0 2018-01-01 12:09:35
26       975.812632       True  ...         0 2018-01-01 15:38:32
311      946.358410       True  ...         0 2018-01-01 11:51:12
651      975.383864       True  ...         2 2018-01-03 21:13:17
950      907.836523       True  ...         2 2018-01-03 05:14:51

[5 rows x 27 columns]

# Running aggregations across an index
>>> df[['DistanceKilometers', 'AvgTicketPrice']].aggregate(['sum', 'min', 'std'])
     DistanceKilometers  AvgTicketPrice
sum        9.261629e+07    8.204365e+06
min        0.000000e+00    1.000205e+02
std        4.578263e+03    2.663867e+02

Eland中的机器学习

回归和分类

Eland允许将scikit-learn、XGBoost和LightGBM库中训练好的回归和分类模型转换为序列化格式，并在Elasticsearch中用作推理模型。

➤ Eland机器学习API文档

➤ 了解更多有关Elasticsearch中的机器学习信息

>>> from sklearn import datasets
>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel

# Train and exercise an XGBoost ML model locally
>>> training_data = datasets.make_classification(n_features=5)
>>> xgb_model = XGBClassifier(booster="gbtree")
>>> xgb_model.fit(training_data[0], training_data[1])

>>> xgb_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]

# Import the model into Elasticsearch
>>> es_model = MLModel.import_model(
    es_client="http://localhost:9200",
    model_id="xgb-classifier",
    model=xgb_model,
    feature_names=["f0", "f1", "f2", "f3", "f4"],
)

# Exercise the ML model in Elasticsearch with the training data
>>> es_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]

PyTorch中的NLP

对于NLP任务，Eland允许将PyTorch训练的BERT模型导入到Elasticsearch。模型可以是普通的PyTorch模型，也可以是从Hugging Face模型库支持的transformers模型。

$ eland_import_hub_model \
  --url http://localhost:9200/ \
  --hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \
  --task-type ner \
  --start

上述示例将自动启动模型部署。这对于初始实验来说是一个很好的快捷方式，但对于需要良好吞吐量的任何事物，您应从Eland命令行中省略--start参数，并改用Kibana中的ML UI启动模型。--start参数将以一个分配和一个线程每分配的配置部署模型，这不会提供良好的性能。当您使用Kibana中的ML UI或Elasticsearch API启动模型部署时，您将能够设置线程选项以充分利用您的硬件。

>>> import elasticsearch
>>> from pathlib import Path
>>> from eland.common import es_version
>>> from eland.ml.pytorch import PyTorchModel
>>> from eland.ml.pytorch.transformers import TransformerModel

>>> es = elasticsearch.Elasticsearch("http://elastic:mlqa_admin@localhost:9200")
>>> es_cluster_version = es_version(es)

# Load a Hugging Face transformers model directly from the model hub
>>> tm = TransformerModel(model_id="elastic/distilbert-base-cased-finetuned-conll03-english", task_type="ner", es_version=es_cluster_version)
Downloading: 100%|██████████| 257/257 [00:00<00:00, 108kB/s]
Downloading: 100%|██████████| 954/954 [00:00<00:00, 372kB/s]
Downloading: 100%|██████████| 208k/208k [00:00<00:00, 668kB/s] 
Downloading: 100%|██████████| 112/112 [00:00<00:00, 43.9kB/s]
Downloading: 100%|██████████| 249M/249M [00:23<00:00, 11.2MB/s]

# Export the model in a TorchScrpt representation which Elasticsearch uses
>>> tmp_path = "models"
>>> Path(tmp_path).mkdir(parents=True, exist_ok=True)
>>> model_path, config, vocab_path = tm.save(tmp_path)

# Import model into Elasticsearch
>>> ptm = PyTorchModel(es, tm.elasticsearch_model_id())
>>> ptm.import_model(model_path=model_path, config_path=None, vocab_path=vocab_path, config=config)
100%|██████████| 63/63 [00:12<00:00,  5.02it/s]

反馈 🗣️

Elastic的工程团队正在寻找开发者参与研究和反馈会议，以了解更多关于您如何使用Eland以及我们可以在设计和工作流程中做哪些改进。如果您对分享您对开发者体验和语言客户端设计的见解感兴趣，请填写此简短表格。根据我们收到的回复数量，我们可能会联系您进行一对一对话，或者与使用相同客户端的其他开发者进行焦点小组讨论。提前感谢您——您的反馈对提高所有Elasticsearch开发者的用户体验至关重要！

项目详情

这些详情尚未通过PyPI验证

项目链接

主页

发布历史发布通知 | RSS订阅

此版本

8.15.2

2024年10月2日

8.15.1

2024年10月1日

8.15.0

2024年8月13日

8.14.0

2024年6月10日

8.13.1

2024年5月3日

8.13.0

2024年3月27日

8.12.1

2024年2月1日

8.12.0

2024年1月19日

8.11.1

2023年11月22日

8.11.0

2023年11月8日

8.10.1

2023年10月11日

8.10.0

2023年10月9日

8.9.0

2023年8月24日

8.7.0

2023年3月30日

8.3.0

2022年7月11日

8.2.0

2022年5月11日

8.1.0

2022年3月31日

8.0.0

2022年2月10日

8.0.0b1 预发布

2021年12月16日

7.14.1b1 预发布

2021年8月30日

7.14.0b1 预发布

2021年8月9日

7.13.0b1 预发布

2021年6月22日

7.10.1b1 预发布

2021年1月12日

7.10.0b1 预发布

2020年10月29日

7.9.1a1 预发布

2020年9月30日

7.9.0a1 预发布

2020年8月18日

7.7.0a1 预发布

2020年5月20日

7.6.0a5 预发布

2020年4月14日

7.6.0a4 预发布

2020年3月23日

7.6.0a3 预发布

2020年2月15日

7.6.0a2 预发布

2020年2月15日

7.6.0a1 预发布

2020年2月15日

7.5.1a4 预发布

2020年2月5日

7.5.1a3 预发布

2020年1月16日

7.5.1a2 预发布

2020年1月10日

下载文件

下载适用于您平台的应用程序。如果您不确定该选择哪一个，请了解有关安装包的更多信息。

源分发

eland-8.15.2.tar.gz (137.6 kB 查看哈希值)

上传时间 2024年10月2日 源

构建分发

eland-8.15.2-py3-none-any.whl (166.0 kB 查看哈希值)

上传于 2024年10月2日 Python 3

哈希值用于 eland-8.15.2.tar.gz

eland-8.15.2.tar.gz 的哈希值
算法	哈希摘要
SHA256	`16f8d23458407e735928dcd4ed61af97187fba29a149e86e12e9234c2ec85636`
MD5	`ab3130e85f04049385e5b4a1817d0fda`
BLAKE2b-256	`bc5c69bde9b8ec7a74f6cfdd556040505616320dd836a9afc54b867b3222ce10`

哈希值用于 eland-8.15.2-py3-none-any.whl

eland-8.15.2-py3-none-any.whl 的哈希值
算法	哈希摘要
SHA256	`73401d918141aaf2f8969a1b5068c2ae010b106539f128918073d3873d911dae`
MD5	`dd1fe9caca153caa93928f7c9e5deea2`
BLAKE2b-256	`9491954ca6aaa9e48cef3cf55780847855ce152738e4b4ebf4b657bda9f57dcf`

eland 8.15.2

导航

验证详情

维护者

未验证详情

项目链接

元信息

分类器

项目描述

关于

入门指南

兼容性

先决条件

Docker

连接到Elasticsearch

Eland中的DataFrame

Eland中的机器学习

回归和分类

PyTorch中的NLP

反馈 🗣️

项目详情

验证详情

维护者

未验证详情

项目链接

元信息

分类器

发布历史发布通知 | RSS订阅

下载文件

源分发

构建分发

eland 8.15.2

导航

验证详情

维护者

未验证详情

项目链接

元信息

分类器

项目描述

关于

入门指南

兼容性

先决条件

Docker

连接到Elasticsearch

Eland中的DataFrame

Eland中的机器学习

回归和分类

PyTorch中的NLP

反馈 🗣️

项目详情

验证详情

维护者

未验证详情

项目链接

元信息

分类器

发布历史 发布通知 | RSS订阅

下载文件

源分发

构建分发

发布历史发布通知 | RSS订阅