一个提供创建新度量指标的简单接口、度量计算和检查点存储的简单工具包的库。
项目描述
TorchEval
这个库目前处于alpha阶段,目前没有稳定版本。API可能会更改,并且可能不向后兼容。如果您有改进建议,请打开GitHub问题。我们很高兴听取您的反馈。
一个包含丰富高性能PyTorch模型度量的库,提供创建新度量的简单接口,以及在分布式训练中辅助度量计算的工具包,以及用于PyTorch模型评估的工具。
安装TorchEval
要求Python >= 3.8和PyTorch >= 1.11
从pip
pip install torcheval
对于夜间构建版本
pip install --pre torcheval-nightly
从源码
git clone https://github.com/pytorch/torcheval
cd torcheval
pip install -r requirements.txt
python setup.py install
快速入门
更多示例在示例目录中
cd torcheval
python examples/simple_example.py
文档
文档可以在pytorch.org/torcheval找到
使用TorchEval
TorchEval 可在 CPU、GPU 和多进程或多 GPU 环境中运行。指标通过两种接口提供,功能型和基于类的。功能型接口位于 torcheval.metrics.functional
中,在程序以单进程方式运行时很有用。要使用多进程或多 GPU 配置,请使用位于 torcheval.metrics
中的基于类的接口,这可以提供更简单的体验。基于类的接口还允许你在调用 compute()
之前多次调用 update()
来延迟一些指标的计算,这在单进程设置中也可能是有益的,因为它减少了计算开销。
单进程
对于单进程程序的使用,最简单的用例是使用功能型指标。我们只需导入指标函数,并传入我们的输出和目标。下面的例子展示了最小的 PyTorch 训练循环,用于评估每第四批数据的多元分类准确率。
功能版(立即计算指标)
import torch
from torcheval.metrics.functional import multiclass_accuracy
NUM_BATCHES = 16
BATCH_SIZE = 8
INPUT_SIZE = 10
NUM_CLASSES = 6
eval_frequency = 4
model = torch.nn.Sequential(torch.nn.Linear(INPUT_SIZE, NUM_CLASSES), torch.nn.ReLU())
optim = torch.optim.Adagrad(model.parameters(), lr=0.001)
loss_fn = torch.nn.CrossEntropyLoss()
metric_history = []
for batch in range(NUM_BATCHES):
input = torch.rand(size=(BATCH_SIZE, INPUT_SIZE))
target = torch.randint(size=(BATCH_SIZE,), high=NUM_CLASSES)
outputs = model(input)
loss = loss_fn(outputs, target)
optim.zero_grad()
loss.backward()
optim.step()
# metric only computed every 4 batches,
# data from previous three batches is lost
if (batch + 1) % eval_frequency == 0:
metric_history.append(multiclass_accuracy(outputs, target))
单进程延迟计算
类版(启用指标延迟计算)
import torch
from torcheval.metrics import MulticlassAccuracy
NUM_BATCHES = 16
BATCH_SIZE = 8
INPUT_SIZE = 10
NUM_CLASSES = 6
eval_frequency = 4
model = torch.nn.Sequential(torch.nn.Linear(INPUT_SIZE, NUM_CLASSES), torch.nn.ReLU())
optim = torch.optim.Adagrad(model.parameters(), lr=0.001)
loss_fn = torch.nn.CrossEntropyLoss()
metric = MulticlassAccuracy()
metric_history = []
for batch in range(NUM_BATCHES):
input = torch.rand(size=(BATCH_SIZE, INPUT_SIZE))
target = torch.randint(size=(BATCH_SIZE,), high=NUM_CLASSES)
outputs = model(input)
loss = loss_fn(outputs, target)
optim.zero_grad()
loss.backward()
optim.step()
# metric only computed every 4 batches,
# data from previous three batches is included
metric.update(input, target)
if (batch + 1) % eval_frequency == 0:
metric_history.append(metric.compute())
# remove old data so that the next call
# to compute is only based off next 4 batches
metric.reset()
多进程或多 GPU
下面给出在多个设备上使用的最小示例。在正常的 torch.distributed
范式下,每个设备都分配了自己的进程,并分配了一个唯一的数字 ID,称为“全局排名”,从 0 开始计数。
类版(启用指标延迟计算和多进程)
import torch
from torcheval.metrics.toolkit import sync_and_compute
from torcheval.metrics import MulticlassAccuracy
# Using torch.distributed
local_rank = int(os.environ["LOCAL_RANK"]) #rank on local machine, i.e. unique ID within a machine
global_rank = int(os.environ["RANK"]) #rank in global pool, i.e. unique ID within the entire process group
world_size = int(os.environ["WORLD_SIZE"]) #total number of processes or "ranks" in the entire process group
device = torch.device(
f"cuda:{local_rank}"
if torch.cuda.is_available() and torch.cuda.device_count() >= world_size
else "cpu"
)
metric = MulticlassAccuracy(device=device)
num_epochs, num_batches = 4, 8
for epoch in range(num_epochs):
for i in range(num_batches):
input = torch.randint(high=5, size=(10,), device=device)
target = torch.randint(high=5, size=(10,), device=device)
# Add data to metric locally
metric.update(input, target)
# metric.compute() will returns metric value from
# all seen data on the local process since last reset()
local_compute_result = metric.compute()
# sync_and_compute(metric) syncs metric data across all ranks and computes the metric value
global_compute_result = sync_and_compute(metric)
if global_rank == 0:
print(global_compute_result)
# metric.reset() clears the data on each process so that subsequent
# calls to compute() only act on new data
metric.reset()
更多示例请参阅 示例目录。
贡献
我们欢迎提交 PR!请参阅 CONTRIBUTING 文件。
许可证
TorchEval 采用 BSD 许可协议,如 LICENSE 文件所示。
项目详情
下载文件
下载适合您平台的文件。如果您不确定选择哪个,请了解有关安装包的更多信息。
源代码分发
构建分发
torcheval_nightly-2024.7.31.tar.gz的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 3fd0cbad3065b5ba1cc13968ad92435a111c4e0bbd0510cf695eb325934c7fed |
|
MD5 | 6077f75e34219c25688f365b7083e8bb |
|
BLAKE2b-256 | 05385b1c895aeb15f69e10ef2e89229ba5503cf891c01b001227670d2c58c1ca |
torcheval_nightly-2024.7.31-py3-none-any.whl的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 645726fdd00fcf769894275e90412e685ea6ba72d3c4efe3ebbcbccd47abf1ed |
|
MD5 | 92f1c576a29d1671f455714e3656540f |
|
BLAKE2b-256 | 6e4c0243e1fbfc8cc2caf8442d7a5e8fcdbc81aef92724239780da82426b40c4 |