frouros · PyPI · Python 包索引

一个开源Python库，用于机器学习系统中漂移检测

这些详情尚未由PyPI验证

项目链接

项目描述

logo

Frouros是一个用于机器学习系统中漂移检测的Python库，它提供了经典和更近期的算法组合，用于概念和数据漂移检测。

"一切都在变化，没有任何东西是静止的"

"你不能两次踏入同一条河流"

伊壁鸠鲁的赫拉克利特（公元前535-475年。）

⚡️ 快速入门

🔄 概念漂移

作为一个快速示例，我们可以使用乳腺癌数据集，该数据集已经受到概念漂移的影响，并展示如何使用DDM（漂移检测方法）等概念漂移检测器。我们可以看到概念漂移如何影响准确率。

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from frouros.detectors.concept_drift import DDM, DDMConfig
from frouros.metrics import PrequentialError

np.random.seed(seed=31)

# Load breast cancer dataset
X, y = load_breast_cancer(return_X_y=True)

# Split train (70%) and test (30%)
(
    X_train,
    X_test,
    y_train,
    y_test,
) = train_test_split(X, y, train_size=0.7, random_state=31)

# Define and fit model
pipeline = Pipeline(
    [
        ("scaler", StandardScaler()),
        ("model", LogisticRegression()),
    ]
)
pipeline.fit(X=X_train, y=y_train)

# Detector configuration and instantiation
config = DDMConfig(
    warning_level=2.0,
    drift_level=3.0,
    min_num_instances=25,  # minimum number of instances before checking for concept drift
)
detector = DDM(config=config)

# Metric to compute accuracy
metric = PrequentialError(alpha=1.0)  # alpha=1.0 is equivalent to normal accuracy

def stream_test(X_test, y_test, y, metric, detector):
    """Simulate data stream over X_test and y_test. y is the true label."""
    drift_flag = False
    for i, (X, y) in enumerate(zip(X_test, y_test)):
        y_pred = pipeline.predict(X.reshape(1, -1))
        error = 1 - (y_pred.item() == y.item())
        metric_error = metric(error_value=error)
        _ = detector.update(value=error)
        status = detector.status
        if status["drift"] and not drift_flag:
            drift_flag = True
            print(f"Concept drift detected at step {i}. Accuracy: {1 - metric_error:.4f}")
    if not drift_flag:
        print("No concept drift detected")
    print(f"Final accuracy: {1 - metric_error:.4f}\n")

# Simulate data stream (assuming test label available after each prediction)
# No concept drift is expected to occur
stream_test(
    X_test=X_test,
    y_test=y_test,
    y=y,
    metric=metric,
    detector=detector,
)
# >> No concept drift detected
# >> Final accuracy: 0.9766

# IMPORTANT: Induce/simulate concept drift in the last part (20%)
# of y_test by modifying some labels (50% approx). Therefore, changing P(y|X))
drift_size = int(y_test.shape[0] * 0.2)
y_test_drift = y_test[-drift_size:]
modify_idx = np.random.rand(*y_test_drift.shape) <= 0.5
y_test_drift[modify_idx] = (y_test_drift[modify_idx] + 1) % len(np.unique(y_test))
y_test[-drift_size:] = y_test_drift

# Reset detector and metric
detector.reset()
metric.reset()

# Simulate data stream (assuming test label available after each prediction)
# Concept drift is expected to occur because of the label modification
stream_test(
    X_test=X_test,
    y_test=y_test,
    y=y,
    metric=metric,
    detector=detector,
)
# >> Concept drift detected at step 142. Accuracy: 0.9510
# >> Final accuracy: 0.8480

更多概念漂移示例可以在这里找到。

📊 数据漂移

作为一个快速示例，我们可以使用受到数据漂移影响的天竺葵数据集，并展示如何使用Kolmogorov-Smirnov测试等数据漂移检测器。

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

from frouros.detectors.data_drift import KSTest

np.random.seed(seed=31)

# Load iris dataset
X, y = load_iris(return_X_y=True)

# Split train (70%) and test (30%)
(
    X_train,
    X_test,
    y_train,
    y_test,
) = train_test_split(X, y, train_size=0.7, random_state=31)

# Set the feature index to which detector is applied
feature_idx = 0

# IMPORTANT: Induce/simulate data drift in the selected feature of y_test by
# applying some gaussian noise. Therefore, changing P(X))
X_test[:, feature_idx] += np.random.normal(
    loc=0.0,
    scale=3.0,
    size=X_test.shape[0],
)

# Define and fit model
model = DecisionTreeClassifier(random_state=31)
model.fit(X=X_train, y=y_train)

# Set significance level for hypothesis testing
alpha = 0.001
# Define and fit detector
detector = KSTest()
_ = detector.fit(X=X_train[:, feature_idx])

# Apply detector to the selected feature of X_test
result, _ = detector.compare(X=X_test[:, feature_idx])

# Check if drift is taking place
if result.p_value <= alpha:
    print(f"Data drift detected at feature {feature_idx}")
else:
    print(f"No data drift detected at feature {feature_idx}")
# >> Data drift detected at feature 0
# Therefore, we can reject H0 (both samples come from the same distribution).

更多数据漂移示例可以在这里找到。

🛠 安装

Frouros可以通过pip安装

pip install frouros

🕵🏻‍♂️️ 漂移检测方法

当前实现的检测器列表如下所示。

漂移检测器	类型	系列	单变量（U）/ 多变量（M）	数值（N）/ 分类（C）	方法	参考
概念漂移	流式处理	变化检测	U	N	BOCD	Adams and MacKay (2007)
			U	N	CUSUM	Page (1954)
			U	N	几何移动平均	Roberts (1959)
			U	N	Page Hinkley	Page (1954)
		统计过程控制	U	N	DDM	Gama et al. (2004)
			U	N	ECDD-WT	Ross et al. (2012)
			U	N	EDDM	Baena-Garcıa et al. (2006)
			U	N	HDDM-A	Frias-Blanco et al. (2014)
			U	N	HDDM-W	Frias-Blanco et al. (2014)
			U	N	RDDM	Barros et al. (2017)
		基于窗口的	U	N	ADWIN	Bifet and Gavalda (2007)
			U	N	KSWIN	Raab et al. (2020)
			U	N	STEPD	Nishida and Yamauchi (2007)
数据漂移	批量	基于距离的	U	N	Bhattacharyya距离	Bhattacharyya (1946)
			U	N	地球迁移距离	Rubner et al. (2000)
			U	N	能量距离	Székely et al. (2013)
			U	N	Hellinger距离	Hellinger (1909)
			U	N	直方图交集归一化补数	Swain and Ballard (1991)
			U	N	Jensen-Shannon距离	Lin (1991)
			U	N	Kullback-Leibler散度	Kullback and Leibler (1951)
			M	N	最大均值差异	Gretton et al. (2012)
			U	N	人口稳定性指数	Wu and Olson (2010)
		统计测试	U	N	安德森-达尔林格测试	Scholz and Stephens (1987)
			U	N	鲍姆加特纳-魏斯-辛德勒测试	Baumgartner et al. (1998)
			U	C	卡方检验	皮尔逊 (1900)
			U	N	Cramér-von Mises测试	Cramér (1902)
			U	N	Kolmogorov-Smirnov测试	Massey Jr (1951)
			U	N	Kuiper's测试	Kuiper (1960)
			U	N	Mann-Whitney U测试	Mann and Whitney (1947)
			U	N	威尔斯的t测试	威尔斯 (1947)
	流式处理	基于距离的	M	N	最大均值差异	Gretton et al. (2012)
	流式处理	统计测试	U	N	增量Kolmogorov-Smirnov测试	dos Reis et al. (2016)

❗ Frouros是什么？什么不是Frouros？

与其他除了提供漂移检测算法外，还包含其他功能，如异常/离群值检测、对抗检测、不平衡学习等库不同，Frouros有，并且将只会有的一个目的：漂移检测。

我们坚信，机器学习相关的库或框架不应遵循万能的，却什么都不会的原则。相反，它们应该专注于单一任务并做好。

✅ 谁在使用Frouros？

Frouros目前正在以下项目中积极使用，以在机器学习管道中实现漂移检测：

AI4EOSC.
iMagine.

如果您想将您的项目列在这里，请不要犹豫，向我们发送pull request。

👍 贡献

查看贡献部分。

💬 引用

尽管Frouros论文仍在预印本阶段，但如果你想要引用它，可以使用预印本版本（一旦发表，将替换为论文）。

@article{cespedes2022frouros,
  title={Frouros: A Python library for drift detection in machine learning systems},
  author={C{\'e}spedes-Sisniega, Jaime and L{\'o}pez-Garc{\'\i}a, {\'A}lvaro },
  journal={arXiv preprint arXiv:2208.06868},
  year={2022}
}

📝 许可证

Frouros是一个开源软件，根据BSD-3-Clause许可证授权。

🙏 致谢

Frouros已获得Agencia Estatal de Investigación，Unidad de Excelencia María de Maeztu，编号MDM-2017-0765的资金支持。

项目详情

这些详情尚未由PyPI验证

项目链接

发布历史发布通知 | RSS源

此版本

0.8.0

2024年4月3日

0.7.1

2024年2月22日

0.7.0

2024年2月18日

0.6.1

2023年8月14日

0.6.0

2023年8月4日

0.5.1

2023年7月22日

0.5.0

2023年7月18日

0.4.1

2023年7月3日

0.4.0

2023年6月21日

0.3.2

2023年6月9日

0.3.1

2023年6月2日

0.3.0

2023年5月31日

0.2.7

2023年5月11日

0.2.6

2023年4月30日

0.2.5

2023年4月28日

0.2.4

2023年4月28日

0.2.3

2023年4月28日

0.2.2

2023年4月10日

0.2.1

2023年3月17日

0.2.0

2023年3月10日

0.1.0

2022年8月6日

下载文件

下载适用于您的平台的文件。如果您不确定选择哪一个，请了解有关安装包的更多信息。

源分发

frouros-0.8.0.tar.gz (79.1 kB 查看散列值)

上传时间： 2024年4月3日 源

构建分发

frouros-0.8.0-py3-none-any.whl (126.0 kB 查看散列值)

上传时间： 2024年4月3日 Python 3

frouros-0.8.0.tar.gz的散列值

frouros-0.8.0.tar.gz的散列值
算法	散列摘要
SHA256	`aeca3180eea5e6d279a716ed3230fc8dafcba15782fcdcf8281acf818569b1f5`
MD5	`1e981aaa6fcfb479964c063a7bd76008`
BLAKE2b-256	`67af72f2b051d80b7fd4a752085020648f8046dcbf81086fbda07378a95e50d0`

frouros-0.8.0-py3-none-any.whl的散列值

frouros-0.8.0-py3-none-any.whl的散列值
算法	散列摘要
SHA256	`5a5459b89ee77ab6149e18888501bd7d30f886a664b8b4cbbe67935e8e4d14cb`
MD5	`0032197a2df0e7e4230d51ccf260e914`
BLAKE2b-256	`1ab362988e8ebd7a87c5715f7dfd8c3a1feaf73d72e8034ee55a60ff262cd23a`

frouros 0.8.0

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类器

项目描述

⚡️ 快速入门

🔄 概念漂移

📊 数据漂移

🛠 安装

🕵🏻‍♂️️ 漂移检测方法

❗ Frouros是什么？什么不是Frouros？

✅ 谁在使用Frouros？

👍 贡献

💬 引用

📝 许可证

🙏 致谢

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类器

发布历史发布通知 | RSS源

下载文件

源分发

构建分发

frouros 0.8.0

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类器

项目描述

⚡️ 快速入门

🔄 概念漂移

📊 数据漂移

🛠 安装

🕵🏻‍♂️️ 漂移检测方法

❗ Frouros是什么？什么不是Frouros？

✅ 谁在使用Frouros？

👍 贡献

💬 引用

📝 许可证

🙏 致谢

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类器

发布历史 发布通知 | RSS源

下载文件

源分发

构建分发

发布历史发布通知 | RSS源