用于NaN和null的库

这些详情尚未通过PyPI验证

项目链接

主页

项目描述

HyperImpute - 用于NaN和null的库。

HyperImpute简化了为您的ML管道选择数据插补算法的过程。它包含各种用于缺失数据的创新算法，并与sklearn兼容。

HyperImpute特性

:rocket: 快速且可扩展的数据集插补算法，与sklearn兼容。
:key: 新的迭代插补方法：HyperImpute。
:cyclone: 经典方法：MICE、MissForest、GAIN、MIRACLE、MIWAE、Sinkhorn、SoftImpute等。
:fire: 可插拔的架构。

:rocket: 安装

可以使用以下命令从PyPI安装此库：

$ pip install hyperimpute

或从源代码安装，使用

$ pip install .

:boom: 示例用法

列出可用的插补器

from hyperimpute.plugins.imputers import Imputers

imputers = Imputers()

imputers.list()

使用其中一种方法对数据集进行插补

import pandas as pd
import numpy as np
from hyperimpute.plugins.imputers import Imputers

X = pd.DataFrame([[1, 1, 1, 1], [4, 5, np.nan, np.nan], [3, 3, 9, 9], [2, 2, 2, 2]])

method = "gain"

plugin = Imputers().get(method)
out = plugin.fit_transform(X.copy())

print(method, out)

指定HyperImpute的基线模型

import pandas as pd
import numpy as np
from hyperimpute.plugins.imputers import Imputers

X = pd.DataFrame([[1, 1, 1, 1], [4, 5, np.nan, np.nan], [3, 3, 9, 9], [2, 2, 2, 2]])

plugin = Imputers().get(
    "hyperimpute",
    optimizer="hyperband",
    classifier_seed=["logistic_regression"],
    regression_seed=["linear_regression"],
)

out = plugin.fit_transform(X.copy())
print(out)

使用SKLearn管道的插补器

import pandas as pd
import numpy as np

from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor

from hyperimpute.plugins.imputers import Imputers

X = pd.DataFrame([[1, 1, 1, 1], [4, 5, np.nan, np.nan], [3, 3, 9, 9], [2, 2, 2, 2]])
y = pd.Series([1, 2, 1, 2])

imputer = Imputers().get("hyperimpute")

estimator = Pipeline(
    [
        ("imputer", imputer),
        ("forest", RandomForestRegressor(random_state=0, n_estimators=100)),
    ]
)

estimator.fit(X, y)

编写新的插补插件

from sklearn.impute import KNNImputer
from hyperimpute.plugins.imputers import Imputers, ImputerPlugin

imputers = Imputers()

knn_imputer = "custom_knn"

class KNN(ImputerPlugin):
    def __init__(self) -> None:
        super().__init__()
        self._model = KNNImputer(n_neighbors=2, weights="uniform")

    @staticmethod
    def name():
        return knn_imputer

    @staticmethod
    def hyperparameter_space():
        return []

    def _fit(self, *args, **kwargs):
        self._model.fit(*args, **kwargs)
        return self

    def _transform(self, *args, **kwargs):
        return self._model.transform(*args, **kwargs)

imputers.add(knn_imputer, KNN)

assert imputers.get(knn_imputer) is not None

在数据集上对插补模型进行基准测试

from sklearn.datasets import load_iris
from hyperimpute.plugins.imputers import Imputers
from hyperimpute.utils.benchmarks import compare_models

X, y = load_iris(as_frame=True, return_X_y=True)

imputer = Imputers().get("hyperimpute")

compare_models(
    name="example",
    evaluated_model=imputer,
    X_raw=X,
    ref_methods=["ice", "missforest"],
    scenarios=["MAR"],
    miss_pct=[0.1, 0.3],
    n_iter=2,
)

📓 教程

:zap: 插补方法

以下表格包含默认的插补插件

策略	描述	代码
HyperImpute	基于线性模型、树、XGBoost、CatBoost和神经网络的回归和分类方法的迭代插补器	`plugin_hyperimpute.py`
平均值	使用`SimpleImputer`沿每列的均值替换缺失值	`plugin_mean.py`
中位数	使用`SimpleImputer`沿每列的中位数替换缺失值	`plugin_median.py`
最频繁值	使用`SimpleImputer`沿每列的最频繁值替换缺失值	`plugin_most_freq.py`
MissForest	基于`IterativeImputer`和`ExtraTreesRegressor`的随机森林的迭代插补方法	`plugin_missforest.py`
ICE	基于`IterativeImputer`和`BayesianRidge`的正则化线性回归的迭代插补方法	`plugin_ice.py`
MICE	基于`IterativeImputer`和`BayesianRidge`的ICE的多次插补	`plugin_mice.py`
SoftImpute	`通过核范数正则化进行低秩矩阵逼近`	`plugin_softimpute.py`
EM	使用其他变量来插补值（期望），然后检查该值是否最可能（最大化）的迭代过程 - `EM插补算法`	`plugin_em.py`
Sinkhorn	`使用最优传输进行缺失数据插补`	`plugin_sinkhorn.py`
GAIN	`GAIN：使用生成对抗网络进行缺失数据插补`	`plugin_gain.py`
MIRACLE	`MIRACLE：通过学习缺失数据机制进行因果感知插补`	`plugin_miracle.py`
MIWAE	`MIWAE：深度生成建模和不完整数据的插补`	`plugin_miwae.py`

:hammer: 测试

使用以下命令安装测试依赖项

pip install .[testing]

可以使用以下命令执行测试

pytest -vsx

引用

如果您使用此代码，请引用相关论文

@article{Jarrett2022HyperImpute,
  doi = {10.48550/ARXIV.2206.07769},
  url = {https://arxiv.org/abs/2206.07769},
  author = {Jarrett, Daniel and Cebere, Bogdan and Liu, Tennison and Curth, Alicia and van der Schaar, Mihaela},
  keywords = {Machine Learning (stat.ML), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {HyperImpute: Generalized Iterative Imputation with Automatic Model Selection},
  year = {2022},
  booktitle={39th International Conference on Machine Learning},
}

项目详情

这些详情尚未通过PyPI验证

项目链接

主页

发布历史发布通知 | RSS源

此版本

0.1.17

2023年2月28日

0.1.16

2023年2月10日

0.1.15

2023年1月31日

0.1.14

2023年1月20日

0.1.13

2023年1月19日

0.1.12

2022年12月20日

0.1.11

2022年12月14日

0.1.10

2022年12月12日

0.1.9

2022年12月4日

0.1.8

2022年11月17日

0.1.7

2022年11月11日

0.1.6

2022年11月6日

0.1.5

2022年7月4日

0.1.4

2022年6月24日

0.1.2

2022年5月25日

0.1.1

2022年5月25日

下载文件

下载适用于您平台的文件。如果您不确定要选择哪个，请了解更多关于安装包的信息。

源代码发行版

此版本没有可用的源代码发行版文件。请参阅生成发行版存档的教程。

构建的发行版

hyperimpute-0.1.17-py3-none-macosx_10_14_x86_64.whl (92.0 kB 查看哈希值)

上传时间 2023年2月28日 Python 3 macOS 10.14+ x86-64

hyperimpute-0.1.17-py3-none-any.whl (92.9 kB 查看哈希值)

上传时间 2023年2月28日 Python 3

哈希值 for hyperimpute-0.1.17-py3-none-macosx_10_14_x86_64.whl

哈希值 for hyperimpute-0.1.17-py3-none-macosx_10_14_x86_64.whl
算法	哈希摘要
SHA256	`41389e2f59dddd8edeb8ececb14b0c463d186b2e413debec54a0fb154b5ee43e`
MD5	`7b62266e1cacbba8578ad69475f90fd2`
BLAKE2b-256	`eb3492ca733c3966f27b6e527fb94819dee5a4bf2eb7144fd5228d2c2412209a`

哈希值 for hyperimpute-0.1.17-py3-none-any.whl

哈希值 for hyperimpute-0.1.17-py3-none-any.whl
算法	哈希摘要
SHA256	`a856fa2b07592e9edbf69d4370d426b10e75487e2f28923970b6f5bbf391429b`
MD5	`d3ababcaeeeec7b625b1e47906d9cec7`
BLAKE2b-256	`2fab2501ab7e2fb51c88f7e27d6251d6a3f9fc7beec8025dcf0ff134644ea726`

hyperimpute 0.1.17

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类器

项目描述

HyperImpute - 用于NaN和null的库。

HyperImpute特性

:rocket: 安装

:boom: 示例用法

📓 教程

:zap: 插补方法

:hammer: 测试

引用

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类器

发布历史发布通知 | RSS源

下载文件

源代码发行版

构建的发行版

hyperimpute 0.1.17

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类器

项目描述

HyperImpute - 用于NaN和null的库。

HyperImpute特性

:rocket: 安装

:boom: 示例用法

📓 教程

:zap: 插补方法

:hammer: 测试

引用

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类器

发布历史 发布通知 | RSS源

下载文件

源代码发行版

构建的发行版

发布历史发布通知 | RSS源