全球病原体分析服务的命令行客户端
项目描述
与全球病原体分析服务(https://www.gpas.cloud/)交互的命令行客户端。执行快速并行客户端去污和上传,并自动重命名下载的输出文件,以便使用原始样本标识符,同时保护隐私。支持使用Conda或Docker安装,并支持Ubuntu Linux、MacOS和Windows。
命令行界面 | Python API(不稳定) |
---|---|
✅ gpas upload |
✅ lib.Batch().upload() |
✅ gpas download |
✅ lib.download_async() |
✅ gpas validate |
✅ validation.validate() |
✅ gpas status |
✅ lib.fetch_status_async() |
安装
使用conda
建议使用Miniconda(Miniconda安装指南)。如果使用的是最新的ARM架构Mac,您需要在Rosetta终端中安装Miniconda和gpas-cli(在Rosetta终端中安装Miniconda和gpas-cli)。
# Create and activate the conda environment
curl -OJ https://raw.githubusercontent.com/GlobalPathogenAnalysisService/gpas-cli/main/environment.yml
conda env create -f environment.yml
conda activate gpas-sc2
# Show gpas-cli version
gpas --version
# Updating? Run this before creating the conda environment
conda remove -n gpas-sc2 --all
使用docker
gpas-cli发行版已推送到Docker Hub,以便在大多数平台和架构上轻松安装。
# Fetch image, show gpas-cli version
docker run oxfordmmm/gpas-cli:latest gpas --version
# Fetch image, upload example data using a bound volume
docker run \
-v /Users/bede/Research/Git/gpas-cli/tests/test-data:/test-data \
oxfordmmm/gpas-cli:latest \
gpas upload \
--environment dev \
--token /test-data/token.json \
--out-dir /test-data/output \
/test-data/large-nanopore-bam.csv
# Build image from scratch, show gpas-cli version
curl -OJ https://raw.githubusercontent.com/GlobalPathogenAnalysisService/gpas-cli/main/Dockerfile
docker run --rm $(docker build -q .) gpas --version
# Build image from scratch, upload example data
docker run \
-v /Users/bede/Research/Git/gpas-cli/tests/test-data:/test-data \
$(docker build -q .) \
gpas upload \
--environment dev \
--token /test-data/token.json \
--out-dir /test-data/output \
/test-data/large-nanopore-bam.csv
使用pip
可以在Python 3.10+环境中使用pip install
安装PyPI软件包,并手动安装samtools
和readItAndKeep
的二进制依赖项。
# Install inside a new Python environment
python3 -m venv gpas-sc2
source gpas-sc2/bin/activate
pip install gpas
# Show gpas-cli version
gpas --version
# If samtools and read-it-and-keep are not in $PATH, tell gpas-cli where to find them
export GPAS_SAMTOOLS_PATH=path/to/samtools
export GPAS_READITANDKEEP_PATH=path/to/readItAndKeep
使用PyInstaller
为每个发行版生成静态Linux、MacOS和Windows可执行文件。这些可执行文件旨在与GUI客户端一起分发,但也可以独立使用。这些可以从以下每个工作流程运行的“工件”部分下载:[链接](https://github.com/GlobalPathogenAnalysisService/gpas-cli/actions/workflows/distribute.yml)
身份验证
大多数gpas-cli
操作都需要有效的API令牌(token.json
)。这可以通过在GPAS门户的“上传客户端”页面上的“获取API令牌”按钮保存。如果您看不到此按钮,请要求GPAS团队为您启用它。如果您想尝试GPAS,请与我们联系!
命令行用法
gpas validate
验证upload_csv
并检查其引用的fastq或bam文件是否存在。
gpas validate large-nanopore-fastq.csv
# Validate supplied tags
gpas validate --environment dev --token token.json large-nanopore-fastq.csv
% gpas validate -h
usage: gpas validate [-h] [--token TOKEN] [--environment {dev,staging,prod}] [--json-messages] upload_csv
Validate an upload CSV. Validates tags remotely if supplied with an authentication token
positional arguments:
upload_csv Path of upload CSV
options:
-h, --help show this help message and exit
--token TOKEN Path of auth token available from GPAS Portal
(default: None)
--environment {dev,staging,prod}
GPAS environment to use
(default: prod)
--json-messages Emit JSON to stdout
(default: False)
gpas upload
验证、去污并将upload_csv
中指定的读取上传到指定的GPAS环境。
gpas upload --environment dev --token token.json large-illumina-bam.csv
# Dry run; skip submission
gpas upload --dry-run --environment dev --token token.json large-illumina-bam.csv
# Offline mode; quit after decontamination
gpas upload tests/test-data/large-nanopore-fastq.csv
% gpas upload -h
usage: gpas upload [-h] [--token TOKEN] [--working-dir WORKING_DIR] [--out-dir OUT_DIR] [--processes PROCESSES] [--dry-run]
[--debug] [--environment {dev,staging,prod}] [--json-messages]
upload_csv
Validate, decontaminate and upload reads to the GPAS platform
positional arguments:
upload_csv Path of upload csv
options:
-h, --help show this help message and exit
--token TOKEN Path of auth token available from GPAS Portal
(default: None)
--working-dir WORKING_DIR
Path of directory in which to make intermediate files
(default: /tmp)
--out-dir OUT_DIR Path of directory in which to save mapping CSV
(default: .)
--processes PROCESSES
Number of tasks to execute in parallel. 0 = auto
(default: 0)
--dry-run Exit before submitting files
(default: False)
--debug Emit verbose debug messages
(default: False)
--environment {dev,staging,prod}
GPAS environment to use
(default: prod)
--json-messages Emit JSON to stdout
(default: False)
gpas download
通过传递在批量上传期间生成的mapping_csv
或逗号分隔的样本guid列表,从GPAS平台下载json
、fasta
、vcf
和bam
输出。通过传递--mapping-csv
和--rename
,使用本地样本名称保存输出文件,而不需要平台的了解。
# Download and rename BAMs for a previous upload
gpas download --rename --mapping-csv C-a06cbab8.mapping.csv --file-types bam token.json
# Download all outputs for a single guid
gpas download --guids 6e024eb1-432c-4b1b-8f57-3911fe87555f --file-types json,vcf,bam,fasta token.json
% gpas download -h
usage: gpas download [-h] [--mapping-csv MAPPING_CSV] [--guids GUIDS] [--file-types FILE_TYPES] [--out-dir OUT_DIR] [--rename]
[--debug] [--environment {dev,staging,prod}]
token
Download analytical outputs from the GPAS platform for given a mapping csv or list of guids
positional arguments:
token Path of auth token (available from GPAS Portal)
options:
-h, --help show this help message and exit
--mapping-csv MAPPING_CSV
Path of mapping CSV generated at upload time
(default: None)
--guids GUIDS Comma-separated list of GPAS sample guids
(default: )
--file-types FILE_TYPES
Comma separated list of outputs to download (json,fasta,bam,vcf)
(default: fasta)
--out-dir OUT_DIR Path of output directory
(default: /Users/bede/Research/Git/gpas-cli)
--rename Rename outputs using local sample names (requires --mapping-csv)
(default: False)
--debug Emit verbose debug messages
(default: False)
--environment {dev,staging,prod}
GPAS environment to use
(default: prod)
gpas status
通过传递在上传时生成的mapping_csv
或逗号分隔的样本guid列表,检查已上传批次的处理状态。
gpas status --mapping-csv example_mapping.csv --environment dev token.json
gpas status --guids 6e024eb1-432c-4b1b-8f57-3911fe87555f --format json token.json
% gpas status -h
usage: gpas status [-h] [--mapping-csv MAPPING_CSV] [--guids GUIDS] [--format {table,csv,json}] [--rename] [--raw]
[--environment {dev,staging,prod}]
token
Check the status of samples submitted to the GPAS platform
positional arguments:
token Path of auth token available from GPAS Portal
options:
-h, --help show this help message and exit
--mapping-csv MAPPING_CSV
Path of mapping CSV generated at upload time
(default: None)
--guids GUIDS Comma-separated list of GPAS sample guids
(default: )
--format {table,csv,json}
Output format
(default: table)
--rename Use local sample names (requires --mapping-csv)
(default: False)
--raw Emit raw response
(default: False)
--environment {dev,staging,prod}
GPAS environment to use
(default: prod)
开发和测试
使用pre-commit在提交时应用black风格(应自动发生)
git clone https://github.com/GlobalPathogenAnalysisService/gpas-cli
cd gpas-cli
conda env create -f environment-dev.yml
conda activate gpas-sc2-dev
pip install --upgrade --force-reinstall --editable ./
# Offline unit tests
pytest tests/test_gpas.py
# The full test suite requires a valid token for dev inside tests/test-data
pytest --cov=gpas
项目详情
下载文件
下载适用于您的平台的文件。如果您不确定选择哪个,请了解更多关于安装软件包的信息。