跳转到主要内容

清理您提交给arXiv的论文中的LaTeX代码。

项目描述

arxiv_latex_cleaner

此工具允许您轻松清理提交给arXiv的论文中的LaTeX代码。从一个包含所有代码的文件夹,例如 /path/to/latex/,它创建一个新的文件夹 /path/to/latex_arXiv/,该文件夹已准备好ZIP和上传到arXiv。

示例调用

arxiv_latex_cleaner /path/to/latex --resize_images --im_size 500 --images_allowlist='{"images/im.png":2000}'

或直接从配置文件

arxiv_latex_cleaner /path/to/latex --config cleaner_config.yaml

安装

pip install arxiv-latex-cleaner
:exclamation: arxiv_latex_cleaner仅兼容Python >=3.9 :exclamation

如果使用MacOS,您可以使用 Homebrew 进行安装

brew install arxiv_latex_cleaner

或者,您可以下载源代码

git clone https://github.com/google-research/arxiv-latex-cleaner
cd arxiv-latex-cleaner/
python -m arxiv_latex_cleaner --help

并直接从源代码安装为命令行程序

python setup.py install

主要功能

以隐私为中心

  • 删除所有辅助文件(.aux.log.out等)。
  • 删除您代码中的所有注释(是的,那些在arXiv上可见的注释,您不希望它们存在)。这还包括 \begin{comment}\end{comment}\iffalse\fi\if0\fi 环境。
  • 可选地删除使用 commands_to_delete 输入的用户定义命令(例如,您将其重定义为末尾的空字符串的 \todo{})。
  • 可选地允许您通过 cleaner_config.yaml 文件定义自定义正则表达式替换规则。

以大小为中心

arXiv提交有50MB的限制,因此使其适应

  • 删除所有未使用的 .tex 文件(那些不在根目录中且未包含在任何其他 .tex 文件中的文件)。
  • 删除所有占空间的未使用图像(那些实际上没有包含在任何 .tex 文件中的图像)。
  • 可选地调整所有图像大小至 im_size 像素,以减小提交文件的大小。您可以使用 images_allowlist 允许列表跳过全局大小调整。
  • 可选地使用 ghostscript 压缩 .pdf 文件(仅限 Linux 和 Mac)。您可以使用 images_allowlist 允许列表跳过全局大小调整。

TikZ 图像源代码隐藏

为防止上传 TikZ 图像源代码或原始仿真数据,此功能

  • 将 TikZ 环境替换为相应的 \includegraphics{EXTERNAL_TIKZ_FOLDER/picture_name.pdf} \begin{tikzpicture} ... \end{tikzpicture}
  • 需要在外部编译的 TikZ 图像作为 .pdf 文件放在 EXTERNAL_TIKZ_FOLDER 文件夹中。有关 TikZ 图像外部化的信息,请参阅 PGF/TikZ 手册 中的第 52 节(外部化库)。
  • 仅替换带有前导 \tikzsetnextfilename{picture_name} 命令的环境(如 \tikzsetnextfilename{picture_name}\begin{tikzpicture} ... \end{tikzpicture}),其中外部化的 picture_name.pdf 文件名与 picture_name 匹配。

更复杂的基于正则表达式组的模式替换

在撰写论文时,有时需要使用一组自定义 LaTeX 命令。为了在 arXiv 提交时消除它们,可以使用正则表达式插入简单地将它们还原为普通 LaTeX。

{
    "pattern" : '(?:\\figcomp{\s*)(?P<first>.*?)\s*}\s*{\s*(?P<second>.*?)\s*}\s*{\s*(?P<third>.*?)\s*}',
    "insertion" : '\parbox[c]{{ {second} \linewidth}} {{ \includegraphics[width= {third} \linewidth]{{figures/{first} }} }}',
    "description" : "Replace figcomp"
}

上面的模式将找到所有 \figcomp{path}{w1}{w2} 命令,并将它们替换为 \parbox[c]{w1\linewidth}{\includegraphics[width=w2\linewidth]{figures/path}}。请注意,插入模板用模式中的 命名捕获组 填充。请注意,替换是在所有 \includegraphics 命令处理之前以及相应的文件路径复制之前处理的,确保所有图像文件都被复制到清理后的版本。有关如何指定模式的详细信息,请参阅 cleaner_config.yaml

使用方法

usage: arxiv_latex_cleaner@v1.0.8  [-h] [--resize_images] [--im_size IM_SIZE]
                                   [--compress_pdf]
                                   [--pdf_im_resolution PDF_IM_RESOLUTION]
                                   [--images_allowlist IMAGES_ALLOWLIST]
                                   [--keep_bib]
                                   [--commands_to_delete COMMANDS_TO_DELETE [COMMANDS_TO_DELETE ...]]
                                   [--commands_only_to_delete COMMANDS_ONLY_TO_DELETE [COMMANDS_ONLY_TO_DELETE ...]]
                                   [--environments_to_delete ENVIRONMENTS_TO_DELETE [ENVIRONMENTS_TO_DELETE ...]]
                                   [--if_exceptions IF_EXCEPTIONS [IF_EXCEPTIONS ...]]
                                   [--use_external_tikz USE_EXTERNAL_TIKZ]
                                   [--svg_inkscape [SVG_INKSCAPE]]
                                   [--config CONFIG] [--verbose]
                                   input_folder

Clean the LaTeX code of your paper to submit to arXiv. Check the README for
more information on the use.

positional arguments:
  input_folder          Input folder containing the LaTeX code.

optional arguments:
  -h, --help            show this help message and exit
  --resize_images       Resize images.
  --im_size IM_SIZE     Size of the output images (in pixels, longest side).
                        Fine tune this to get as close to 10MB as possible.
  --compress_pdf        Compress PDF images using ghostscript (Linux and Mac
                        only).
  --pdf_im_resolution PDF_IM_RESOLUTION
                        Resolution (in dpi) to which the tool resamples the
                        PDF images.
  --images_allowlist IMAGES_ALLOWLIST
                        Images (and PDFs) that won't be resized to the default
                        resolution, but the one provided here. Value is pixel
                        for images, and dpi forPDFs, as in --im_size and
                        --pdf_im_resolution, respectively. Format is a
                        dictionary as: '{"path/to/im.jpg": 1000}'
  --keep_bib            Avoid deleting the *.bib files.
  --commands_to_delete COMMANDS_TO_DELETE [COMMANDS_TO_DELETE ...]
                        LaTeX commands that will be deleted. Useful for e.g.
                        user-defined \todo commands. For example, to delete
                        all occurrences of \todo1{} and \todo2{}, run the tool
                        with `--commands_to_delete todo1 todo2`.Please note
                        that the positional argument `input_folder` cannot
                        come immediately after `commands_to_delete`, as the
                        parser does not have any way to know if it's another
                        command to delete.
  --commands_only_to_delete COMMANDS_ONLY_TO_DELETE [COMMANDS_ONLY_TO_DELETE ...]
                        LaTeX commands that will be deleted but the text 
                        wrapped in the commands will be retained. Useful for
                        commands that change text formats and colors, which
                        you may want to remove but keep the text within. Usages
                        are exactly the same as commands_to_delete. Note that if
                        the commands listed here duplicate that after
                        commands_to_delete, the default action will be retaining
                        the wrapped text.
  --environments_to_delete ENVIRONMENTS_TO_DELETE [ENVIRONMENTS_TO_DELETE ...]
                        LaTeX environments that will be deleted. Useful for e.g. 
                        user-defined comment environments. For example, to 
                        delete all occurrences of \begin{note} ... \end{note},
                        run the tool with `--environments_to_delete note`. 
                        Please note that the positional argument `input_folder`
                        cannot come immediately after
                        `environments_to_delete`, as the parser does not have
                        any way to know if it's another environment to delete.
  --if_exceptions IF_EXCEPTIONS [IF_EXCEPTIONS ...]
                        Constant TeX primitive conditionals (\iffalse, \iftrue,
                        etc.) are simplified, i.e., true branches are kept, false
                        branches deleted. To parse the conditional constructs
                        correctly, all commands starting with `\if` are assumed to
                        be TeX primitive conditionals (e.g., declared by
                        \newif\ifvar). Some known exceptions to this rule are
                        already included (e.g., \iff, \ifthenelse, etc.), but you
                        can add custom exceptions using `--if_exceptions iffalt`.
  --use_external_tikz USE_EXTERNAL_TIKZ
                        Folder (relative to input folder) containing
                        externalized tikz figures in PDF format.
  --svg_inkscape [SVG_INKSCAPE]
                        Include PDF files generated by Inkscape via the
                        `\includesvg` command from the `svg` package. This is
                        done by replacing the `\includesvg` calls with
                        `\includeinkscape` calls pointing to the generated
                        `.pdf_tex` files. By default, these files and the
                        generated PDFs are located under `./svg-inkscape`
                        (relative to the input folder), but a different path
                        (relative to the input folder) can be provided in case a
                        different `inkscapepath` was set when loading the `svg`
                        package.
  --config CONFIG       Read settings from `.yaml` config file. If command
                        line arguments are provided additionally, the config
                        file parameters are updated with the command line
                        parameters.
  --verbose             Enable detailed output.

测试

python -m unittest arxiv_latex_cleaner.tests.arxiv_latex_cleaner_test

注意

这不是一个官方支持谷歌产品。

项目详情


下载文件

为您的平台下载文件。如果您不确定选择哪个,请了解更多关于 安装软件包 的信息。

源代码分发

arxiv_latex_cleaner-1.0.8.tar.gz (22.2 kB 查看散列)

上传时间 源代码

构建分发

arxiv_latex_cleaner-1.0.8-py3-none-any.whl (21.8 kB 查看哈希值)

上传时间 Python 3

支持

AWS AWS 云计算和安全赞助商 Datadog Datadog 监控 Fastly Fastly CDN Google Google 下载分析 Microsoft Microsoft PSF 赞助商 Pingdom Pingdom 监控 Sentry Sentry 错误记录 StatusPage StatusPage 状态页面