Python包,用于以标准方式清理与机器学习相关的标签。
项目描述
Sanitize ML Labels
Sanitize ML Labels是一个Python包,旨在标准化和清理与机器学习相关的标签。目前支持超过100个标签,包括指标和模型名称。
如果您有与机器学习相关的标签,并且您发现自己以一致的方式重命名和清理它们,使用适当的格式化,此包确保它们始终以标准方式清理。
如何安装此包?
您可以使用pip进行安装
pip install sanitize_ml_labels
使用示例
以下是规范化标签的一些常见用例
指标示例
from sanitize_ml_labels import sanitize_ml_labels
labels = [
"acc",
"loss",
"auroc",
"lr"
]
assert sanitize_ml_labels(labels) == [
"Accuracy",
"Loss",
"AUROC",
"Learning rate"
]
模型示例
from sanitize_ml_labels import sanitize_ml_labels
labels = [
"mlp",
"cnn",
"ffNN",
"Feed-forward neural network",
"perceptron",
"recurrent neural network",
"LStM"
]
assert sanitize_ml_labels(labels) == [
"MLP",
"CNN",
"FFNN",
"FFNN",
"Perceptron",
"RNN",
"LSTM"
]
assert sanitize_ml_labels("vanilla mlp") == "MLP"
assert sanitize_ml_labels("vanilla cnn") == "CNN"
assert sanitize_ml_labels([
"Large Language Model",
"transe",
"Generative Pre-trained Transformer",
"Graph Convolutional Neural Network",
"Convolutional Graph Neural Network",
"Graph Neural Network",
"Graph Attention Network",
"Graph Attention Neural Network",
]) == ["LLM","TransE","GPT","GCN","GCN","GNN","GAT","GAT"]
有时,您可能会遇到所有模型前缀为“vanilla”或“simple”或“basic”的情况。此包可以帮助您删除这些前缀。
from sanitize_ml_labels import sanitize_ml_labels
labels = [
"vanilla mlp",
"vanilla cnn",
"vanilla ffnn",
"vanilla perceptron"
]
assert sanitize_ml_labels(labels) == ["MLP", "CNN", "FFNN", "Perceptron"]
边界情况
有时,您可能会遇到需要正确识别和规范的带连字符的术语。我们使用基于一个超过45K个带连字符的英语单词的扩展列表的启发式方法,这些单词最初来自Metadata consulting网站。
由Tommaso Fontana编写的查找启发式方法确保高效且准确地识别带连字符的单词。
from sanitize_ml_labels import sanitize_ml_labels
# Running the following
assert sanitize_ml_labels("non-existent-edges-in-graph") == "Non-existent edges in graph"
额外工具
除了标签清理之外,该包还提供检查指标规范化的方法
是否是规范化的指标
验证指标是否落在[0, 1]的范围内。
from sanitize_ml_labels import is_normalized_metric
assert not is_normalized_metric("MSE")
assert is_normalized_metric("acc")
assert is_normalized_metric("accuracy")
assert is_normalized_metric("AUROC")
assert is_normalized_metric("auprc")
是否是绝对规范化的指标
验证指标是否落在[-1, 1]的范围内。
from sanitize_ml_labels import is_absolutely_normalized_metric
assert not is_absolutely_normalized_metric("auprc")
assert is_absolutely_normalized_metric("MCC")
assert is_absolutely_normalized_metric("Markedness")
应最大化
一个指标应该最大化还是最小化。未知指标将引发一个 NotImplementedError
。
from sanitize_ml_labels import should_be_maximized
assert not should_be_maximized("MSE")
assert should_be_maximized("AUROC")
assert should_be_maximized("accuracy")
许可
本软件遵照MIT许可证发布。查看LICENSE。
项目详情
关闭
sanitize_ml_labels-1.1.2.tar.gz的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | ebcb07c44d3c8f3384bef8adae2331eb060b3f0154f29d8633703234e99d358a |
|
MD5 | 03cba6157569a7b2301f236bee2fd22f |
|
BLAKE2b-256 | 7f43deb4e265b2595e9bad7e98c6033fb14b1285ec4708c1e27d2552a0cd4a9f |