文档构建工具

项目描述

doc-builder

这是我们用来构建Hugging Face仓库文档的包。

安装

您可以使用以下命令从PyPi安装

pip install hf-doc-builder

要从源安装，克隆此存储库然后

cd doc-builder
pip install -e .

预览

要预览文档，使用以下命令

doc-builder preview {package_name} {path_to_docs}

例如

doc-builder preview datasets ~/Desktop/datasets/docs/source/

**preview 命令仅适用于现有文档文件。当你添加一个全新的文件时，需要更新 _toctree.yml 并重新启动 preview 命令（使用 ctrl-c 停止它，然后再次调用 doc-builder preview ...）。

**preview 命令在 Windows 上无法工作。

文档构建

要构建特定包的文档，请使用以下命令

#Add --not_python_module if not building doc for a python lib
doc-builder build {package_name} {path_to_docs} --build_dir {build_dir}

例如，如果你已在 ~/git/datasets 中克隆了仓库，以下是构建 Datasets 文档的方法（需要 pip install datasets[dev]）：

doc-builder build datasets ~/git/datasets/docs/source --build_dir ~/tmp/test-build

这将生成 MDX 文件，你可以在你喜欢的编辑器中像查看任何 Markdown 文件一样预览它们。要查看 HTML 格式的文档，你需要安装 14 或更高版本的 node。然后你可以运行（仍然以 Datasets 为例）：

doc-builder build datasets ~/git/datasets/docs/source --build_dir ~/tmp/test-build --html

这将构建位于 ~/tmp/test-build 的 HTML 文件。你可以在浏览器中查看这些文件。

doc-builder 还可以自动将一些文档指南或教程转换为笔记本。这需要两个步骤

在你的教程中添加 [[open-in-colab]]，以便为其构建笔记本
在构建命令中添加 --notebook_dir {path_to_notebook_folder}。

在笔记本中编写

你可以在 jupyter 笔记本中编写文档，并使用 doc-builder 将 jupyter 笔记本转换为 mdx 文件。

在某些情况下，例如课程和教程，直接在 mdx 文件中编写而不是在 jupyter 笔记本中编写并使用 doc-builder 转换器可能更有意义。

过程如下

在你的 build_main_documentation.yml 和 build_pr_documentation.yml 中启用标志 convert_notebooks: true。
在此标志启用后，doc-builder 将将 path_to_docs 下的所有 .ipynb 文件转换为 mdx 文件。

此外，你还可以在本地将 .ipynb 文件转换为 mdx 文件。

doc-builder notebook-to-mdx {path to notebook file or folder containing notebook files}

GitHub Actions模板

doc-builder 为 GitHub Actions 提供了模板，因此你可以通过每个 pull request、推送到某个分支等方式构建你的文档。要在你的项目中使用它们，只需在 .github/workflows/ 目录中创建以下三个文件

build_main_documentation.yml：负责构建 main 分支、发布等的文档。
build_pr_documentation.yml：负责在每次 pull request 上构建文档。
upload_pr_documentation.yml：负责将 PR 艺术品上传到 Hugging Face Hub。
delete_doc_comment_trigger.yml：负责从提供 PR 文档 URL 的 HuggingFaceDocBuilder 机器人中删除评论。

在每个工作流中，最重要的是在 uses 字段中包含对 doc-builder 中相应工作流的引用。例如，这是在 datasets 库中 PR 工作流的样子：

name: Build PR Documentation

on:
  pull_request:

concurrency:
  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
  cancel-in-progress: true

jobs:
  build:
    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main # Runs this doc-builder workflow
    with:
      commit_sha: ${{ github.event.pull_request.head.sha }}
      pr_number: ${{ github.event.number }}
      package: datasets # Replace this with your package name

注意在 with 字段下使用特殊参数，如 pr_number 和 package。你可以通过检查每个 doc-builder 的工作流文件来找到各种选项。

启用多语言文档

doc-builder 还可以将从英文源翻译的文档转换为一种或多种语言。要启用转换，文档目录应按照以下结构组织

doc_folder
├── en
│   ├── _toctree.yml
│   ├── _redirects.yml
│   ...
└── es
    ├── _toctree.yml
    ├── _redirects.yml
    ...

请注意，每个语言目录都有自己的目录文件 _toctree.yml，所有语言都位于单个 doc_folder 目录下 - 请参阅 course 仓库的示例。然后你可以按以下方式构建单个语言子集：

doc-builder build {package_name} {path_to_docs} --build_dir {build_dir} --language {lang_id}

要通过 GitHub Actions 模板自动构建所有语言的文档，只需在你的工作流中提供 languages 参数，用空格分隔你希望构建的语言列表，例如 languages: en es。

重定向

你可以可选地提供 _redirects.yml 用于“旧链接”。该 yml 文件应如下所示

how_to: getting_started
package_reference/classes: package_reference/main_classes
# old_local: new_local

修复和测试doc-builder

如果您正在修复或更新doc-builder工具本身，您最终可能希望在另一个存储库的CI（如transformers、diffusers、courses等）中测试它。为此，您应该在您的流程文件中将doc_builder_revision参数设置为指向您的分支。以下是在transformers.js项目中的示例。

jobs:
  build:
    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@my-test-branch
    with:
      repo_owner: xenova
      commit_sha: ${{ github.sha }}
      pr_number: ${{ github.event.number }}
      package: transformers.js
      path_to_docs: transformers.js/docs/source
      pre_command: cd transformers.js && npm install && npm run docs-api
      additional_args: --not_python_module
      doc_builder_revision: my-test-branch # <- add this line

一旦您的项目中的文档构建完成，您就可以丢弃该更改。

为Hugging Face库编写文档

doc-builder期望Markdown，因此您应该将任何新文档写入“.mdx”文件，用于教程、指南、API文档。对于文档字符串，我们遵循Google格式，主要区别在于您应该使用Markdown而不是reStructuredText（希望这会更容易！）

应该放在code中的值应被反引号包围：`如此`。请注意，通常应将参数名称和True、None或任何字符串等对象放在code中。

多行代码块可以用于显示示例。它们在Markdown中通常由两行三背引号之间的内容构成。

```
# first line of code
# second line
# etc
```

我们遵循doctest语法，以便自动测试结果与库保持一致。

到对象的内部链接

语法

[`XXXClass`] or [~`XXXClass`] // for class
[`XXXClass.method`] or [~`XXXClass.method`] // for method

示例：[这里](https://github.com/huggingface/transformers/blob/eb849f6604c7dcc0e96d68f4851e52e253b9f0e5/docs/source/en/model_doc/sew-d.md?plain=1#L39) & [这里](https://github.com/huggingface/transformers/blob/6f79d264422245d88c7a34032c1a8254a0c65752/examples/research_projects/performer/modeling_flax_performer.py#L48)（用于内部文档字符串）。

当提及一个类、函数或方法时，建议使用以下语法进行内部链接，以便我们的工具自动将其文档链接添加到其中：[`XXXClass`]或[`function`]。这要求类或函数位于主包中。

如果您想创建对某些内部类或函数的链接，您需要提供其路径。例如，在Transformers文档中，`[file_utils.ModelOutput]`将创建指向ModelOutput文档的链接。此链接将描述为`file_utils.ModelOutput`。要移除路径并仅保留链接对象的名称在描述中，请添加一个~：`[~file_utils.ModelOutput]`将生成描述为`ModelOutput`的链接。

对于方法也是如此，您可以使用`[XXXClass.method]`或`[~`XXXClass.method`]`。

到对象的外部链接

语法

[`XXXLibrary.XXXClass`] or [~`XXXLibrary.XXXClass`] // for class
[`XXXLibrary.XXXClass.method`] or [~`XXXLibrary.XXXClass.method`] // for method

示例：[这里](https://github.com/huggingface/transformers/blob/0f0e1a2c2bff68541a5b9770d78e0fb6feb7de72/docs/source/en/accelerate.md?plain=1#L29)链接来自transformers内部的`accelerate`。

提示

要编写一个您希望作为注释或警告突出显示的块，请将内容放在以下标记之间。

语法

> [!TIP]
> Here is a tip. Go to this url [website](www.tip.com)
> 
> Second line

或

<Tip>

Write your note here

</Tip>

示例：[这里](https://github.com/huggingface/transformers/blob/0f0e1a2c2bff68541a5b9770d78e0fb6feb7de72/docs/source/en/create_a_model.md#L282-L286)。

对于警告，更改介绍为：

语法

> [!WARNING]

或

`<Tip warning={true}>`

示例：[这里](https://github.com/huggingface/transformers/blob/eb849f6604c7dcc0e96d68f4851e52e253b9f0e5/docs/source/de/autoclass_tutorial.md#L102-L108)。

框架内容

如果您的文档中有框架相关的块（PyTorch vs TensorFlow vs Flax），您可以使用以下语法：

语法

<frameworkcontent>
<pt>
PyTorch content goes here
</pt>
<tf>
TensorFlow content goes here
</tf>
<flax>
Flax content goes here
</flax>
</frameworkcontent>

示例：[这里](https://github.com/huggingface/transformers/blob/eb849f6604c7dcc0e96d68f4851e52e253b9f0e5/docs/source/de/autoclass_tutorial.md#L84-L131)。

注意：所有框架都是可选的（例如，您可以仅编写PyTorch的块）且顺序无关紧要。

选项

以用户可以选择并查看所选选项内容的方式显示替代方案（例如，不同版本的库的代码块等）。

语法

<hfoptions id="some id">
<hfoption id="id for option 1">
{YOUR MARKDOWN}
</hfoption>
<hfoption id="id for option 2">
{YOUR MARKDOWN}
</hfoption>
... however many <hfoption> tags
</hfoptions>

示例：[这里](https://github.com/huggingface/diffusers/blob/75ea54a1512ac443d517ab35cb9bf45f8d6f326e/docs/source/en/using-diffusers/kandinsky.md?plain=1#L30-L81)。

注意：对于同一页面上多个<hfoptions id="some id">，您可以考虑使用相同的id，以便当用户选择一个选项时，它会影响所有其他hfoptions块。如果您不希望这种行为，请使用不同的ids。

锚点链接

Markdown标题的锚点链接会自动生成（遵循以下规则：1. 小写，2. 将空格替换为短横线-，3. 移除[^a-z0-9-]）

语法

## My awesome section
// the anchor link is: `my-awesome-section`

示例：这里

此外，还可以自定义锚点链接。

语法

## My awesome section[[some-section]]
// the anchor link is: `some-section`

示例：这里

LaTeX

LaTeX显示模式。$$...$$

语法

$$Y = X * \textbf{dequantize}(W); \text{quantize}(W)$$

示例：这里

LaTeX内联模式。\\( ... )\\

语法

\\( Y = X * \textbf{dequantize}(W); \text{quantize}(W) )\\

示例：这里

代码块

代码块使用常规Markdown语法```. 但是，您可以在mdx文件中放置一个特殊标志来更改生成的HTML的包装样式，从溢出/滚动条更改为换行。

语法

<!-- WRAP CODE BLOCKS -->

示例：这里

编写API文档（Python）

Autodoc

要显示您正在记录的Python库中任何对象的完整文档，请使用[[autodoc]]标记。

语法

[[autodoc]] SomeObject

示例：这里

如果该对象是类，这将包括它所有已记录的公共方法。如果您出于某种原因希望某些方法不在文档中显示，可以通过指定应包含在文档中的方法来实现，以下是一个示例

语法

[[autodoc]] XXXTokenizer
    - build_inputs_with_special_tokens
    - get_special_tokens_mask
    - create_token_type_ids_from_sequences
    - save_vocabulary

示例：这里

如果您只想添加未记录的方法（例如默认未记录的魔法方法__call__），可以将要添加的方法列表放在包含all的列表中

语法

## XXXTokenizer

[[autodoc]] XXXTokenizer
    - all
    - __call__

示例：这里

从文件引用的代码块

您可以通过使用<literalinclude>（Sphinx风格）语法引用文件摘录来创建代码块。应在<literalinclude>的开闭标签之间包含json。

语法

<literalinclude>
{"path": "./data/convert_literalinclude_dummy.txt", # relative path
"language": "python", # defaults to " (empty str)
"start-after": "START python_import",  # defaults to start of file
"end-before": "END python_import",  # defaults to end of file
"dedent": 7 # defaults to 0
}
</literalinclude>

编写源文档

描述

对于类或函数的描述字符串，请使用带有所有自定义语法的Markdown。

示例：这里

参数

函数/类/方法的参数应使用Args:（或Arguments:或Parameters:）前缀定义，后跟换行和缩进。参数后面应跟其类型，如果是张量，则跟其形状，然后是冒号和其描述

语法

    Args:
        n_layers (`int`): The number of layers of the model.

示例：这里

如果描述太长而无法在一行内显示，则在写入描述之前需要在参数后面添加另一个缩进。

语法

    Args:
        input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
            Indices of input sequence tokens in the vocabulary.

            Indices can be obtained using [`AlbertTokenizer`]. See [`~PreTrainedTokenizer.encode`] and
            [`~PreTrainedTokenizer.__call__`] for details.

            [What are input IDs?](../glossary#input-ids)

示例：这里

您可以在这里查看完整的示例

属性

如果一个类类似于数据类，但参数与类的可用属性不匹配，例如以下示例中所示，则应将Attributes实例重写为**Attributes**，以便正确渲染这些文档。否则，它将假设Attributes与Parameters同义。

语法

  class SomeClass:
      """
      Docstring
-     Attributes:
+     **Attributes**:
          - **attr_a** (`type_a`) -- Doc a
          - **attr_b** (`type_b`) -- Doc b
      """
      def __init__(self, param_a, param_b):
          ...

参数类型和默认值

对于可选参数或具有默认值的参数，我们遵循以下语法。假设我们有一个具有以下签名的函数

def my_function(x: str = None, a: float = 1):

那么其文档应如下所示

语法

    Args:
        x (`str`, *optional*):
            This argument controls ...
        a (`float`, *optional*, defaults to 1):
            This argument is used to ...

示例：这里

请注意，当任何参数的默认值是 `None` 时，我们总是省略“默认为 `None`”。另外，即使描述你的参数类型及其默认值的第一个句子变得很长，你也不能将其拆分为多行。然而，你可以在缩进的描述中写尽可能多的行（见上面的 input_ids 示例）。

如果你的参数类型是包中定义的类，你可以使用我们之前看到的语法来链接其文档

    Args:
         config ([`BertConfig`]):
            Model configuration class with all the parameters of the model.

            Initializing with a config file does not load the weights associated with the model, only the
            configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.

返回值

返回块应以 返回： 前缀开头，后面跟一个换行符和缩进。第一行应该是返回值的类型，后面跟一个换行符。构建返回值的元素不需要进一步缩进。

以下是一个单值返回的示例

语法

    Returns:
        `List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.

示例：[这里](https://github.com/huggingface/transformers/blob/910faa3e1f1c566b23a0318f78f5caf5bda8d3b2/examples/flax/language-modeling/run_t5_mlm_flax.py#L273-L275)

以下是一个元组返回的示例，包含多个对象

语法

    Returns:
        `tuple(torch.FloatTensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
        - ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.FloatTensor` of shape `(1,)` --
          Total loss as the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
        - **prediction_scores** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
          Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).

示例：[这里](https://github.com/huggingface/transformers/blob/003a0cf8cc4d78e47ef9debfb1e93a5c1197ca9a/examples/research_projects/bert-loses-patience/pabee/modeling_pabee_albert.py#L107-L130)

生成器

同样，也支持 产生。

语法

Yields:
    `tuple[str, io.BufferedReader]`:
        2-tuple (path_within_archive, file_object).
        File object is opened in binary mode.

示例：[这里](https://github.com/huggingface/datasets/blob/f56fd9d6c877ffa6fb44fb832c13b61227c9cc5b/src/datasets/download/download_manager.py#L459-L462C17)

抛出

你也可以记录 抛出。

语法

    Args:
         config ([`BertConfig`]):
            Model configuration class with all the parameters of the model.

            Initializing with a config file does not load the weights associated with the model, only the
            configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.

    Raises:
        `pa.ArrowInvalidError`: if the arrow data casting fails
        TypeError: if the target type is not supported according, e.g.
            - point1
            - point2
        [`HTTPError`](https://2.python-requests.org/en/master/api/#requests.HTTPError) if credentials are invalid
        [`HTTPError`](https://2.python-requests.org/en/master/api/#requests.HTTPError) if connection got lost

    Returns:
        `List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.

示例：[这里](https://github.com/huggingface/transformers/blob/1b2381c46b834a89e447f7a01f0961c4e940d117/src/transformers/models/mask2former/image_processing_mask2former.py#L167-L168)

添加、更改、弃用指令

有 新增、变更 和 已弃用 指令。语法

    Args:
        cache_dir (`str`, *optional*): Directory to cache data.
        config_name (`str`, *optional*): Name of the dataset configuration.
            It affects the data generated on disk: different configurations will have their own subdirectories and
            versions.
            If not provided, the default configuration is used (if it exists).

            <Added version="2.3.0">

            `name` was renamed to `config_name`.

            </Added>
        name (`str`): Configuration name for the dataset.

            <Deprecated version="2.3.0">

            Use `config_name` instead.

            </Deprecated>

示例：[这里](https://github.com/huggingface/datasets/blob/a1e1867e932f14233244fb25713f3c94c46ff50a/src/datasets/combine.py#L53)

在本地开发svelte

我们使用 svelte 组件来构建文档 UI（例如 Tip 组件，Docstring 组件等）。

按照以下步骤在本地开发 svelte

如果尚不存在，请创建此文件：`doc-builder/kit/src/routes/_toctree.yml`。内容应该是

- sections: 
  - local: index
    title: Index page
  title: Index page

如果尚不存在，请创建此文件：`doc-builder/kit/src/routes/index.mdx`。内容应该是你想要测试的内容。例如

<script lang="ts">
import Tip from "$lib/Tip.svelte";
import Youtube from "$lib/Youtube.svelte";
import Docstring from "$lib/Docstring.svelte";
import CodeBlock from "$lib/CodeBlock.svelte";
import CodeBlockFw from "$lib/CodeBlockFw.svelte";
</script>

<Tip>

  [Here](https://myurl.com)

</Tip>

## Some heading
And some text [Here](https://myurl.com)

Physics is the natural science that studies matter,[a] its fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force.[2] Physics is one of the most fundamental scientific disciplines, with its main goal being to understand how the universe behaves.[b][3][4][5] A scientist who specializes in the field of physics is called a physicist.

安装依赖项并运行开发模式

cd doc-builder/kit
npm ci
npm run dev -- --open

开始开发。参见 doc-builder/kit/src/lib 中的 svelte 文件以供参考。流程应该是
1. 在 doc-builder/kit/src/lib 中创建一个 svelte 组件
2. 在 doc-builder/kit/src/routes/index.mdx 中导入并测试它

项目详情

发布历史发布通知 | RSS 源

此版本

0.5.0

2024 年 3 月 11 日

0.4.0

2022 年 8 月 11 日

0.3.0

2022 年 4 月 29 日

0.2.0

2022 年 3 月 25 日

0.1.1

2022 年 3 月 16 日

0.1.0

2022 年 3 月 15 日

下载文件

下载适用于您的平台的文件。如果您不确定选择哪个，请了解更多关于安装包的信息。

源代码分发

hf-doc-builder-0.5.0.tar.gz （152.4 KB 查看哈希值）

上传时间 2024 年 3 月 11 日 源代码

构建分发

hf_doc_builder-0.5.0-py3-none-any.whl （67.8 KB 查看哈希值）

上传时间 2024 年 3 月 11 日 Python 3

哈希值对于 hf-doc-builder-0.5.0.tar.gz

哈希值对于 hf-doc-builder-0.5.0.tar.gz
算法	哈希摘要
SHA256	`e557660f76d1d90ac79e96d7b17642eca83f16a0a89ceabde180f0437d28179b`
MD5	`326cc01a61ab9e41d73e7ae7163b14ef`
BLAKE2b-256	`97f73a4e5528b79891159cd428d6c6d09bd47ca6b4c26315b9bb77828b162586`

哈希用于 hf_doc_builder-0.5.0-py3-none-any.whl

hf_doc_builder-0.5.0-py3-none-any.whl 的哈希
算法	哈希摘要
SHA256	`bdceb44a26b7eb90a344afb3560306e23e95f0676f94fbd8a8111cf7f096e2ea`
MD5	`10c74814809dd615215a615d5dcb06d7`
BLAKE2b-256	`4089b794a9f2708f2926d3050132541952bc44eeeb1e560dc34fd00bace47655`

hf-doc-builder 0.5.0

导航

验证详情

维护者

未验证详情

项目链接

元数据

项目描述

doc-builder

目录

安装

预览

文档构建

在笔记本中编写

GitHub Actions模板

启用多语言文档

重定向

修复和测试doc-builder

为Hugging Face库编写文档

到对象的内部链接

到对象的外部链接

提示

框架内容

选项

锚点链接

LaTeX

代码块

编写API文档（Python）

Autodoc

从文件引用的代码块

编写源文档

描述

参数

属性

参数类型和默认值

返回值

生成器

抛出

添加、更改、弃用指令

在本地开发svelte

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

发布历史 发布通知 | RSS 源

下载文件

源代码分发

构建分发

发布历史发布通知 | RSS 源