提取软件仓库元数据的包

这些详情尚未由PyPI验证

项目链接

主页

项目描述

爬虫

爬虫是一个用于从各种代码托管平台（如：GitHub.com、GitHub Enterprise、GitLab.com、托管GitLab和Bitbucket Server）抓取和可视化开源数据的工具。

入门：Code.gov

Code.gov 是美国联邦政府新推出的网站，允许公众访问政府定制开发的软件的元数据。此站点需要元数据才能运行，这个Python库可以帮助实现这一点！

要开始使用，您需要一个 GitHub Personal Auth Token 来向GitHub API发起请求。这应该在您的环境或shell rc 文件中设置为 GITHUB_API_TOKEN

    $ export GITHUB_API_TOKEN=XYZ

    $ echo "export GITHUB_API_TOKEN=XYZ" >> ~/.bashrc

此外，为了执行劳动时间估算，您需要将 cloc 安装到您的环境中。这通常使用包管理器（如 npm 或 homebrew）完成。

然后，为了为您所在的机构生成 code.json 文件，您需要一个 config.json 文件来协调您将要连接和抓取数据的平台。一个示例配置文件可以在 demo.json 中找到。一旦您有了配置文件，您就可以安装并运行爬虫了！

    # Install Scraper from a local copy of this repository
    $ pip install -e .
    # OR
    # Install Scraper from PyPI
    $ pip install llnl-scraper

    # Run Scraper with your config file ``config.json``
    $ scraper --config config.json

生成的 code.json 文件的一个完整示例可以在这里找到。

配置文件选项

配置文件是一个JSON文件，用于指定从哪些代码库平台拉取项目，以及一些可以用于覆盖通过爬取返回的不完整或不准确数据的设置。

基本结构是

{
    // REQUIRED
    "contact_email": "...",  // Used when the contact email cannot be found otherwise

    // OPTIONAL
    "agency": "...",         // Your agency abbreviation here
    "organization": "...",   // The organization within the agency
    "permissions": { ... },  // Object containing default values for usageType and exemptionText

    // Platform configurations, described in more detail below
    "GitHub": [ ... ],
    "GitLab": [ ... ],
    "Bitbucket": [ ... ],
}

"GitHub": [
    {
        "url": "https://github.com",  // GitHub.com or GitHub Enterprise URL to inventory
        "token": null,                // Private token for accessing this GitHub instance
        "public_only": true,          // Only inventory public repositories

        "connect_timeout": 4,  // The timeout in seconds for connecting to the server
        "read_timeout": 10,    // The timeout in seconds to wait for a response from the server

        "orgs": [ ... ],    // List of organizations to inventory
        "repos": [ ... ],   // List of single repositories to inventory
        "exclude": [ ... ]  // List of organizations / repositories to exclude from inventory
    }
],

"GitLab": [
    {
        "url": "https://gitlab.com",  // GitLab.com or hosted GitLab instance URL to inventory
        "token": null,                // Private token for accessing this GitHub instance
        "fetch_languages": false,     // Include individual calls to API for language metadata. Very slow, so defaults to false. (eg, for 191 projects on internal server, 5 seconds for False, 12 minutes, 38 seconds for True)

        "orgs": [ ... ],    // List of organizations to inventory
        "repos": [ ... ],   // List of single repositories to inventory
        "exclude": [ ... ]  // List of groups / repositories to exclude from inventory
    }
]

"Bitbucket": [
    {
        "url": "https://bitbucket.internal",  // Base URL for a Bitbucket Server instance
        "username": "",                       // Username to authenticate with
        "password": "",                       // Password to authenticate with
        "token": "",                          // Token to authenticate with, if supplied username and password are ignored

        "exclude": [ ... ]  // List of projects / repositories to exclude from inventory
    }
]

"TFS": [
    {
        "url": "https://tfs.internal",  // Base URL for a Team Foundation Server (TFS) or Visual Studio Team Services (VSTS) or Azure DevOps instance
        "token": null,                  // Private token for accessing this TFS instance

        "exclude": [ ... ]  // List of projects / repositories to exclude from inventory
    }
]

许可证

爬虫在MIT许可证下发布。更多详情请参阅许可证文件。

LLNL-CODE-705597

项目详情

这些详情尚未由PyPI验证

项目链接

主页

发布历史发布通知 | RSS源

本版本

0.14.0

2023年11月30日

0.13.0

2023年9月7日

0.12.0

2022年9月29日

0.11.0

2022年7月5日

0.10.0

2021年2月8日

0.9.0

2020年7月14日

0.8.1

2019年8月1日

0.7.0

2019年3月19日

0.6.1

2019年2月6日

0.6.0.dev0 预发布

2019年2月6日

0.5.1

2018年8月30日

0.5.0

2018年8月28日

0.4.0

2018年8月26日

0.3.0

2018年8月22日

0.2.1

2018年3月23日

下载文件

下载适合您平台的文件。如果您不确定选择哪一个，请了解更多关于安装包的信息。

源分发

llnl-scraper-0.14.0.tar.gz (27.8 kB 查看哈希)

上传时间 2023年11月30日 源

构建分发

llnl_scraper-0.14.0-py3-none-any.whl (32.2 kB 查看哈希)

上传时间 2023年11月30日 Python 3

llnl-scraper-0.14.0.tar.gz的哈希

llnl-scraper-0.14.0.tar.gz的哈希
算法	哈希摘要
SHA256	`881fbe04c0f0df3dfe6a887413bfd126921f2ec3344f5d9e797629be0aaab60d`
MD5	`99ea32f736954c72b620c2ad007bc3b8`
BLAKE2b-256	`a1a9d32afd4ad6c1ca185856ab62c421e6920c8d9555c349765ea470060220ec`

llnl_scraper-0.14.0-py3-none-any.whl的哈希

llnl_scraper-0.14.0-py3-none-any.whl的哈希
算法	哈希摘要
SHA256	`015e080d24888ef2d48aa9f4602bed866e373adc59276b42bdb1b59ed6d9ad2a`
MD5	`86edefd15a1f1ddbb40a3518831c0dc8`
BLAKE2b-256	`1ff77928878103c1a03c4be1d850f98645de9413a97c927f10ddbb18cfbc2fb7`

llnl-scraper 0.14.0

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类器

项目描述

爬虫

入门：Code.gov

配置文件选项

许可证

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类器

发布历史发布通知 | RSS源

下载文件

源分发

构建分发

llnl-scraper 0.14.0

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类器

项目描述

爬虫

入门：Code.gov

配置文件选项

许可证

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类器

发布历史 发布通知 | RSS源

下载文件

源分发

构建分发

发布历史发布通知 | RSS源