跳转到主要内容

数据文件的忠实伙伴

项目描述

Pooch: A friend to fetch your data files

文档(最新版)文档(主分支)贡献联系

Fatiando a Terra项目的一部分

Latest version on PyPI Latest version on conda-forge Test coverage status Compatible Python versions. DOI used to cite Pooch

关于

只想下载文件而不用烦恼requestsurllib?试图将样本数据集添加到您的Python包中? 小犬就在这里帮助您!

Pooch 是一个 Python 库,可以通过从服务器(仅在需要时)下载文件来管理数据,并将它们存储在本地数据 缓存(电脑上的一个文件夹)中。

  • 纯 Python 和最少依赖。
  • 支持通过 HTTP、FTP 以及 Zenodo 和 figshare 等数据仓库下载文件。
  • 内置下载后处理程序,用于解压/解压缩数据。
  • 设计为可扩展:创建自定义下载器和后处理程序。

您是科学家或研究员吗?Pooch 也能帮到您!

  • 将您的数据托管在存储库中,并使用 DOI 进行下载。
  • 使用代码自动下载数据,而不是让同事自己完成。
  • 确保运行代码的每个人都具有相同版本的数据文件。

使用 Pooch 的项目

SciPyscikit-imagexarrayEnsaioGemPyMetPynapariSatpyytPyVistaicepackhistolabseaborn-imageOpen AR-Sandboxclimlabmne-pythonGemGISSHTOOLSMOABBGeoViewsScopeSimBrainrenderpyxemcellfinderPVGeogeosnapBioCyphercf-xarrayScirpyrembgDASCorescikit-mobilityPy-ARTHyperSpyRosettaSciIOeXSpy

如果您正在使用 Pooch,请 提交拉取请求,将您的项目添加到列表中。

示例

对于需要下载用于分析的 数据文件 的科学家

import pooch
import pandas as pd

# Download a file and save it locally, returning the path to it.
# Running this again will not cause a download. Pooch will check the hash
# (checksum) of the downloaded file against the given value to make sure
# it's the right file (not corrupted or outdated).
fname_bathymetry = pooch.retrieve(
    url="https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz",
    known_hash="md5:a7332aa6e69c77d49d7fb54b764caa82",
)

# Pooch can also download based on a DOI from certain providers.
fname_gravity = pooch.retrieve(
    url="doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz",
    known_hash="md5:1dee324a14e647855366d6eb01a1ef35",
)

# Load the data with Pandas
data_bathymetry = pd.read_csv(fname_bathymetry)
data_gravity = pd.read_csv(fname_gravity)

对于 包开发者 在项目中包含示例数据

"""
Module mypackage/datasets.py
"""
import pkg_resources
import pandas
import pooch

# Get the version string from your project. You have one of these, right?
from . import version

# Create a new friend to manage your sample data storage
GOODBOY = pooch.create(
    # Folder where the data will be stored. For a sensible default, use the
    # default cache folder for your OS.
    path=pooch.os_cache("mypackage"),
    # Base URL of the remote data store. Will call .format on this string
    # to insert the version (see below).
    base_url="https://github.com/myproject/mypackage/raw/{version}/data/",
    # Pooches are versioned so that you can use multiple versions of a
    # package simultaneously. Use PEP440 compliant version number. The
    # version will be appended to the path.
    version=version,
    # If a version as a "+XX.XXXXX" suffix, we'll assume that this is a dev
    # version and replace the version with this string.
    version_dev="main",
    # An environment variable that overwrites the path.
    env="MYPACKAGE_DATA_DIR",
    # The cache file registry. A dictionary with all files managed by this
    # pooch. Keys are the file names (relative to *base_url*) and values
    # are their respective SHA256 hashes. Files will be downloaded
    # automatically when needed (see fetch_gravity_data).
    registry={"gravity-data.csv": "89y10phsdwhs09whljwc09whcowsdhcwodcydw"}
)
# You can also load the registry from a file. Each line contains a file
# name and it's sha256 hash separated by a space. This makes it easier to
# manage large numbers of data files. The registry file should be packaged
# and distributed with your software.
GOODBOY.load_registry(
    pkg_resources.resource_stream("mypackage", "registry.txt")
)

# Define functions that your users can call to get back the data in memory
def fetch_gravity_data():
    """
    Load some sample gravity data to use in your docs.
    """
    # Fetch the path to a file in the local storage. If it's not there,
    # we'll download it.
    fname = GOODBOY.fetch("gravity-data.csv")
    # Load it with numpy/pandas/etc
    data = pandas.read_csv(fname)
    return data

参与进来

🗨️ 联系我们:fatiando.org/contact 了解如何联系我们。

👩🏾‍💻 贡献项目开发: 请阅读我们的 贡献指南,了解您如何提供帮助和反馈。

🧑🏾‍🤝‍🧑🏼 行为准则: 本项目遵循 行为准则。参与本项目即表示您同意遵守其条款。

冒充者综合症免责声明: 我们需要您的帮助。是的,真的。您内心可能有一个小声音告诉您自己还没有准备好,技能不够以贡献。我们向您保证,您心中的那个小声音是错误的。最重要的是,除了编写代码之外,还有很多有价值的方式来贡献

此免责声明改编自 MetPy 项目

许可

这是一款免费软件:您可以在 BSD 3-clause 许可证 的条款下重新分发和/或修改它。许可证副本可在 LICENSE.txt 中找到。

项目详细信息


下载文件

下载您平台对应的文件。如果您不确定该选择哪个,请了解更多关于安装包的信息。

源代码分布

pooch-1.8.2.tar.gz (59.4 kB 查看哈希值)

上传时间 源代码

构建分布

pooch-1.8.2-py3-none-any.whl (64.6 kB 查看哈希值)

上传时间 Python 3

由以下支持

AWS AWS 云计算和安全赞助商 Datadog Datadog 监控 Fastly Fastly CDN Google Google 下载分析 Microsoft Microsoft PSF 赞助商 Pingdom Pingdom 监控 Sentry Sentry 错误日志 StatusPage StatusPage 状态页面