pyepal · PyPI · Python 包索引

PyePAL实现了epsilon-PAL主动学习算法

这些详情尚未由PyPI验证

项目链接

首页

项目描述


持续集成
代码健康
文档和教程
社交
Python
许可证
引用

Python的广义实现，基于ε-PAL算法的修改版本 [1, 2].

有关更详细的文档，请点击此处.

安装

要安装最新稳定版本，请使用

pip install pyepal

要安装最新开发版本，请使用

pip install git+https://github.com/kjappelbaum/pyepal.git

开发者可以安装额外的[testing, docs, pre-commit]。安装通常只需要几分钟。

附加说明

在MacOS上，您可能需要安装libomp（例如，brew install libomp）以在某些模型中启用多线程。
我们目前支持Python 3.7和3.8。

用法

主要逻辑在PALBase类中实现。有一些用于常见用例的预构建类（GPy、sklearn），它们从这个类继承。有关如何使用代码以及有关教程的说明，请参阅文档.

预构建类

scikit-learn

如果您想使用sklearn模型列表，可以使用PALSklearn类。要使用它进行单步操作，可以遵循以下代码片段。所有不同的PAL类的基本原理相同。

from pyepal import PALSklearn
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, Matern

# For each objective, initialize a model
gpr_objective_0 = GaussianProcessRegressor(RBF())
gpr_objective_1 = GaussianProcessRegressor(RBF())

# The minimal input to create a PAL instance is a list of models,
# the design space (X, in ML terms "feature matrix") and the number of objectives
palsklearn_instance = PALSklearn(X, [gpr_objective_0, gpr_objective_1], 2)

# the next step is to provide some initial measurements.
# You can do this with the update_train_set function, which you
# can use throughout the active learning process to update the training set.
# For this, provide a numpy array of indices in your design space
# and the corresponding measurements
sampled_indices = np.array([1,2,3])
measurements = np.array([[1,2],
                        [0.8, 1],
                        [7,1]])
palsklearn_instance.update_train_set(sampled_indices, measurements)

# Now, you're ready to run the first iteration.
# This will return the next index to sample and update all the attributes
# If there are no unclassified samples left, it will return None and
# print a statement saying that the classification is completed
index_to_sample = palsklearn_instance.run_one_step()

GPy

如果您想使用GPy模型列表，可以使用PALGPy类。

核心区域化GPR

区域化GPR模型可以利用目标之间的相关性，并在某些目标未对所有样本进行测量的情况下工作。

自定义类

如果您从 PALBase 继承，则需要实现 _train() 和 _predict() 函数。如果您想在添加新的训练点的同时调整模型的超参数，可以设置 _should_optimize_hyperparameters() 函数和 _set_hyperparameters() 函数来为模型（们）设置超参数。

如果您需要训练一个模型，请使用 self.design_space 作为特征矩阵，self.y 作为目标向量。请注意，在 self.y 中，所有目标都变成了最大化问题。也就是说，如果您的某个问题是最小化问题，PyePAL 将在其 self.y 中翻转其符号。

自定义类的实现基本示例是 PALSklearn 类

class PALSklearn(PALBase):
    """PAL class for a list of Sklearn (GPR) models, with one model per objective"""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        validate_number_models(self.models, self.ndim)

    def _train(self):
        for i, model in enumerate(self.models):
            model.fit(self.design_space[self.sampled], self.y[self.sampled, i].reshape(-1,1))

    def _predict(self):
        means, stds = [], []
        for model in self.models:
            mean, std = model.predict(self.design_space, return_std=True)
            means.append(mean.reshape(-1, 1))
            stds.append(std.reshape(-1, 1))

        self.means = np.hstack(mean)
        self.std = np.hstack(stds)

对于超参数优化的调度，我们在 pyepal.pal.schedules 模块中有一些预定义的调度。

测试算法

如果已知完整的设计空间，您可以使用while循环用PyePAL完全探索空间。为了PyePAL的理论保证成立，您需要采样，直到所有不确定性都低于epsilon。在实际应用中，通常只需要要求没有未分类的样本作为终止条件就足够了。为此，您可以使用以下代码片段

from pyepal.utils import exhaust_loop
from pyepal.models.gpr import build_model

# indices for initialization
sample_idx = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 50, 60, 70])

# build one model per objective
model_0 = build_model(X[sample_idx], y[sample_idx], 0)
model_1 = build_model(X[sample_idx], y[sample_idx], 1)

# initialize the PAL instance
palinstance = PALGPy(X, [model_0, model_1], 2, beta_scale=1)
palinstance.update_train_set(sample_idx, y[sample_idx])

# This will run the sampling and training as long as there
# are unclassified samples
exhaust_loop(palinstance, y)

为了衡量性能，您可以使用来自 pyepal.pal.utils 的 get_hypervolume 函数。更多的指标已经在诸如 deap、pagmo 或 pymoo 的包中实现。

参考文献

Zuluaga, M.; Krause, A.; Püschel, M. E-PAL: An Active Learning Approach to the Multi-Objective Optimization Problem. Journal of Machine Learning Research 2016, 17 (104), 1–32.
Zuluaga, M.; Sergent, G.; Krause, A.; Püschel, M. Active Learning for Multi-Objective Optimization; Dasgupta, S., McAllester, D., Eds.; Proceedings of machine learning research; PMLR: Atlanta, Georgia, USA, 2013; Vol. 28, pp 462–470.

引用

如果您觉得这段代码对您的工作有帮助，请引用

我们描述实现并将其应用于材料发现的论文：Jablonka, K. M.; Giriprasad, M. J.; Wang, S.; Smit, B.; Yoo, B. Bias Free Multiobjective Active Learning for Materials Design and Discovery, ChemRxiv 2020 (10.26434/chemrxiv.13200197.v1).
描述ε-PAL算法的原始论文：Zuluaga, M.; Krause, A.; Püschel, M. E-PAL: An Active Learning Approach to the Multi-Objective Optimization Problem. Journal of Machine Learning Research 2016, 17 (104), 1–32.

致谢

这项研究得到了欧盟委员会的研究和创新计划（ERC）的支持，该计划是欧洲联盟的“地平线2020”研究计划的一部分（协议号666983，MaGic），得到了瑞士国家科学基金会（NCCR-MARVEL）的资助，并得到了瑞士国家科学基金会（SNSF）的资助（项目号200021_172759）。部分工作是在巴斯夫的“探索共同实习计划”（Explore Together internship program at BASF）中完成的。

项目详情

这些详情尚未由PyPI验证

项目链接

首页

版本历史发布通知 | RSS feed

此版本

0.6.1

2021年1月19日

0.6.0

2021年1月17日

0.5.0

2021年1月15日

0.1.7

2020年11月3日

下载文件

下载适用于您平台的文件。如果您不确定选择哪个，请了解更多关于安装包的信息。

源代码分发

pyepal-0.6.1.tar.gz (60.1 kB 查看哈希值)

上传时间 2021年1月19日 源代码

构建分发

pyepal-0.6.1-py3-none-any.whl (97.7 kB 查看哈希值)

上传时间 2021年1月19日 Python 3

哈希值 for pyepal-0.6.1.tar.gz

pyepal-0.6.1.tar.gz的哈希值
算法	哈希摘要
SHA256	`986381baebc9f6406dc9874f10186ab8e62e603847f4ea7902f09071a786a002`
MD5	`fa86abe2a8ed1eefbf86dbbdfcb541d5`
BLAKE2b-256	`1b39368a465e56b1dd69e4bec6deb0e831e13e1bdec5a9e8a9513217635e35e6`

哈希值 for pyepal-0.6.1-py3-none-any.whl

pyepal-0.6.1-py3-none-any.whl的哈希值
算法	哈希摘要
SHA256	`bdd49ce460e0d2f7ff1534acae302eb674cf9c779af3e9072d47c33e03620e97`
MD5	`022d3d4275d1c6c57d64b329e94ac0fe`
BLAKE2b-256	`d2c94d23a83632d8be2acbb35ad3b7edb3562af0888d88bf52fa73495130617c`

pyepal 0.6.1

导航

验证详情

维护者

未验证详情

项目链接

元信息

分类

项目描述

安装

附加说明

用法

预构建类

scikit-learn

GPy

核心区域化GPR

自定义类

测试算法

参考文献

引用

致谢

项目详情

验证详情

维护者

未验证详情

项目链接

元信息

分类

版本历史发布通知 | RSS feed

下载文件

源代码分发

构建分发

pyepal 0.6.1

导航

验证详情

维护者

未验证详情

项目链接

元信息

分类

项目描述

安装

附加说明

用法

预构建类

scikit-learn

GPy

核心区域化GPR

自定义类

测试算法

参考文献

引用

致谢

项目详情

验证详情

维护者

未验证详情

项目链接

元信息

分类

版本历史 发布通知 | RSS feed

下载文件

源代码分发

构建分发

版本历史发布通知 | RSS feed