Pipeline Profiler工具。使Jupyter Notebooks能够探索D3M流水线

这些详情尚未由PyPI 验证

项目链接

主页

项目描述

PipelineProfiler

兼容Jupyter Notebooks的AutoML流水线探索工具。支持auto-sklearn和D3M流水线格式。

System screen

(按住Shift键单击以选择多个流水线)

论文: https://arxiv.org/abs/2005.00160

视频: https://youtu.be/2WSYoaxLLJ8

博客: Medium帖子

演示

实时演示（Google Colab）

在Jupyter Notebook中

import PipelineProfiler
data = PipelineProfiler.get_heartstatlog_data()
PipelineProfiler.plot_pipeline_matrix(data)

安装

选项1：通过pip安装

pip install pipelineprofiler

选项2：运行docker镜像

docker build -t pipelineprofiler .
docker run -p 9999:8888 pipelineprofiler

然后复制访问令牌并在浏览器URL中登录jupyter

localhost:9999

数据预处理

PipelineProfiler从D3M元学习数据库读取数据。您可以从以下位置下载此数据： https://metalearning.datadrivendiscovery.org/dumps/2020/03/04/metalearningdb_dump_20200304.tar.gz

您需要合并两个文件以探索流水线：pipelines.json和pipeline_runs.json。为此，请运行

python -m PipelineProfiler.pipeline_merge [-n NUMBER_PIPELINES] pipeline_runs_file pipelines_file output_file

流水线探索

import PipelineProfiler
import json

在Jupyter Notebook中，加载output_file

with open("output_file.json", "r") as f:
    pipelines = json.load(f)

然后使用

PipelineProfiler.plot_pipeline_matrix(pipelines[:10])

数据后处理

您可能希望按问题类型分组管道，并从每个团队中选择前k个管道。为此，请使用以下代码：

def get_top_k_pipelines_team(pipelines, k):
    team_pipelines = defaultdict(list)
    for pipeline in pipelines:
        source = pipeline['pipeline_source']['name']
        team_pipelines[source].append(pipeline)
    for team in team_pipelines.keys():
        team_pipelines[team] = sorted(team_pipelines[team], key=lambda x: x['scores'][0]['normalized'], reverse=True)
        team_pipelines[team] = team_pipelines[team][:k]
    new_pipelines = []
    for team in team_pipelines.keys():
        new_pipelines.extend(team_pipelines[team])
    return new_pipelines

def sort_pipeline_scores(pipelines):
    return sorted(pipelines, key=lambda x: x['scores'][0]['value'], reverse=True)    

pipelines_problem = {}
for pipeline in pipelines:  
    problem_id = pipeline['problem']['id']
    if problem_id not in pipelines_problem:
        pipelines_problem[problem_id] = []
    pipelines_problem[problem_id].append(pipeline)
for problem in pipelines_problem.keys():
    pipelines_problem[problem] = sort_pipeline_scores(get_top_k_pipelines_team(pipelines_problem[problem], k=100))

项目详情

这些详情尚未由PyPI 验证

项目链接

主页

发布历史发布通知 | RSS源

本版本

0.1.18

2022年5月4日

0.1.17

2021年2月15日

0.1.16

2021年1月30日

0.1.15

2020年7月10日

0.1.14

2020年7月8日

0.1.13

2020年7月1日

0.1.12

2020年6月2日

0.1.11

2020年5月27日

0.1.10

2020年5月20日

0.1.9

2020年5月19日

0.1.8

2020年5月19日

0.1.7

2020年5月19日

0.1.6

2020年5月19日

0.1.5

2020年5月18日

0.1.4

2020年5月14日

0.1.3

2020年5月12日

0.1.2

2020年5月4日

0.1.1

2020年5月4日

0.1.0

2020年5月4日

下载文件

下载您平台上的文件。如果您不确定选择哪个，请了解更多关于安装包的信息。

源分布

pipelineprofiler-0.1.18.tar.gz (871.9 kB 查看哈希值)

上传时间 2022年5月4日 源

构建分布

pipelineprofiler-0.1.18-py3-none-any.whl (881.1 kB 查看哈希值)

上传时间 2022年5月4日 Python 3

哈希值 for pipelineprofiler-0.1.18.tar.gz

pipelineprofiler-0.1.18.tar.gz的哈希值
算法	哈希摘要
SHA256	`1e14ed6ed8a08e0726853c4841263411a566d2dcef44624dfa02ecf6ce264432`
MD5	`a69147df0bc3d8f11e0712a0503c331c`
BLAKE2b-256	`4639204e9f0a7fde560e178dd82d987b747d450a0521b5b4db4bf1d9792ece4d`

哈希值 for pipelineprofiler-0.1.18-py3-none-any.whl

pipelineprofiler-0.1.18-py3-none-any.whl的哈希值
算法	哈希摘要
SHA256	`6efe8bfe0bdfbe153d34f27d9c8100c5d02a67fd825591faed5170137011d9d9`
MD5	`05196d855613f0ff8833028b8d8c8c98`
BLAKE2b-256	`6ec198c22e87afba0a74248632c8b93277b1069083ce580c38edc91c6ecaa30a`

pipelineprofiler 0.1.18

导航

验证详情

维护者

未经验证的详情

项目链接

元数据

分类

项目描述

PipelineProfiler

演示

安装

选项1：通过pip安装

选项2：运行docker镜像

数据预处理

流水线探索

数据后处理

项目详情

验证详情

维护者

未经验证的详情

项目链接

元数据

分类

发布历史发布通知 | RSS源

下载文件

源分布

构建分布

pipelineprofiler 0.1.18

导航

验证详情

维护者

未经验证的详情

项目链接

元数据

分类

项目描述

PipelineProfiler

演示

安装

选项1：通过pip安装

选项2：运行docker镜像

数据预处理

流水线探索

数据后处理

项目详情

验证详情

维护者

未经验证的详情

项目链接

元数据

分类

发布历史 发布通知 | RSS源

下载文件

源分布

构建分布

发布历史发布通知 | RSS源