跳转到主要内容

用于扩展dask-jobqueue并针对NCAR集群设置适当的设置的工具

项目描述

ncar-jobqueue

ncar-jobqueue 提供了为NCAR集群配置 dask-jobqueue 的实用工具,并具有适当的默认设置。

以下计算服务器受支持

  • Cheyenne (cheyenne.ucar.edu)
  • Casper (DAV) (casper.ucar.edu)
  • Hobart (hobart.cgd.ucar.edu)
  • Izumi (izumi.unified.ucar.edu)

徽章

CI GitHub Workflow Status GitHub Workflow Status Code Coverage Status
Conda PyPI
许可 License

安装

可以从PyPI使用pip安装NCAR-jobqueue

python -m pip install ncar-jobqueue

NCAR-jobqueue也通过conda-forge提供,适用于conda安装

conda install -c conda-forge ncar-jobqueue

配置

ncar-jobqueue 提供了一个具有不同集群适当默认设置的配置文件。该配置文件位于 ~/.config/dask/ncar-jobqueue.yaml

ncar-jobqueue.yaml
cheyenne:
  pbs:
    #project: XXXXXXXX
    name: dask-worker-cheyenne
    cores: 18 # Total number of cores per job
    memory: '109GB' # Total amount of memory per job
    processes: 18 # Number of Python processes per job
    interface: ib0 # Network interface to use like eth0 or ib0
    queue: regular
    walltime: '01:00:00'
    resource-spec: select=1:ncpus=36:mem=109GB
    log-directory: '/glade/scratch/${USER}/dask/cheyenne/logs'
    local-directory: '/glade/scratch/${USER}/dask/cheyenne/local-dir'
    job-extra: []
    env-extra: []
    death-timeout: 60

casper-dav:
  pbs:
    #project: XXXXXXXX
    name: dask-worker-casper-dav
    cores: 2 # Total number of cores per job
    memory: '25GB' # Total amount of memory per job
    processes: 1 # Number of Python processes per job
    interface: ib0
    walltime: '01:00:00'
    resource-spec: select=1:ncpus=1:mem=25GB
    queue: casper
    log-directory: '/glade/scratch/${USER}/dask/casper-dav/logs'
    local-directory: '/glade/scratch/${USER}/dask/casper-dav/local-dir'
    job-extra: []
    env-extra: []
    death-timeout: 60

hobart:
  pbs:
    name: dask-worker-hobart
    cores: 10 # Total number of cores per job
    memory: '96GB' # Total amount of memory per job
    processes: 10 # Number of Python processes per job
    # interface: null              # ib0 doesn't seem to be working on Hobart
    queue: medium
    walltime: '08:00:00'
    resource-spec: nodes=1:ppn=48
    log-directory: '/scratch/cluster/${USER}/dask/hobart/logs'
    local-directory: '/scratch/cluster/${USER}/dask/hobart/local-dir'
    job-extra: ['-r n']
    env-extra: []
    death-timeout: 60

izumi:
  pbs:
    name: dask-worker-izumi
    cores: 10 # Total number of cores per job
    memory: '96GB' # Total amount of memory per job
    processes: 10 # Number of Python processes per job
    # interface: null              # ib0 doesn't seem to be working on Hobart
    queue: medium
    walltime: '08:00:00'
    resource-spec: nodes=1:ppn=48
    log-directory: '/scratch/cluster/${USER}/dask/izumi/logs'
    local-directory: '/scratch/cluster/${USER}/dask/izumi/local-dir'
    job-extra: ['-r n']
    env-extra: []
    death-timeout: 60

注意

  • 要配置 dask-jobqueue 提交批处理作业时使用的默认项目账户,请在 ~/.config/dask/ncar-jobqueue.yaml 中取消注释 project 键/行,并将其设置为适当的值。

使用

注意

⚠️ dask-jobqueue 的在线文档可在此处找到。⚠️

Casper

>>> from ncar_jobqueue import NCARCluster
>>> from dask.distributed import Client
>>> cluster = NCARCluster(project='XXXXXXXX')
>>> cluster
PBSCluster(0f23b4bf, 'tcp://xx.xxx.x.x:xxxx', workers=0, threads=0, memory=0 B)
>>> cluster.scale(jobs=2)
>>> client = Client(cluster)

Cheyenne

>>> from ncar_jobqueue import NCARCluster
>>> from dask.distributed import Client
>>> cluster = NCARCluster(project='XXXXXXXX')
>>> cluster
PBSCluster(0f23b4bf, 'tcp://xx.xxx.x.x:xxxx', workers=0, threads=0, memory=0 B)
>>> cluster.scale(jobs=2)
>>> client = Client(cluster)

Hobart

>>> from ncar_jobqueue import NCARCluster
>>> from dask.distributed import Client
>>> cluster = NCARCluster(project='XXXXXXXX')
>>> cluster
PBSCluster(0f23b4bf, 'tcp://xx.xxx.x.x:xxxx', workers=0, threads=0, memory=0 B)
>>> cluster.scale(jobs=2)
>>> client = Client(cluster)

Izumi

>>> from ncar_jobqueue import NCARCluster
>>> from dask.distributed import Client
>>> cluster = NCARCluster(project='XXXXXXXX')
>>> cluster
PBSCluster(0f23b4bf, 'tcp://xx.xxx.x.x:xxxx', workers=0, threads=0, memory=0 B)
>>> cluster.scale(jobs=2)
>>> client = Client(cluster)

非NCAR机器

在非NCAR机器上,ncar-jobqueue 将会警告用户,并使用 distributed.LocalCluster

>>> from ncar_jobqueue import NCARCluster
.../ncar_jobqueue/cluster.py:17: UserWarning: Unable to determine which NCAR cluster you are running on... Returning a `distributed.LocalCluster` class.
warn(message)
>>> from dask.distributed import Client
>>> cluster = NCARCluster()
>>> cluster
LocalCluster(3a7dd0f6, 'tcp://127.0.0.1:64184', workers=4, threads=8, memory=17.18 GB)

项目详情


下载文件

下载适用于您平台的文件。如果您不确定选择哪个,请了解更多关于 安装包 的信息。

源代码分发

ncar-jobqueue-2021.4.14.tar.gz (16.8 kB 查看哈希值)

上传时间 源代码

构建分发

ncar_jobqueue-2021.4.14-py3-none-any.whl (12.0 kB 查看哈希值)

上传时间 Python 3

由以下提供支持

AWS AWS 云计算和安全赞助商 Datadog Datadog 监控 Fastly Fastly CDN Google Google 下载分析 Microsoft Microsoft PSF 赞助商 Pingdom Pingdom 监控 Sentry Sentry 错误记录 StatusPage StatusPage 状态页面