跳转到主要内容

CLI DGGS索引工具,用于矢量地理空间数据

项目描述

vector2dggs

pypi

基于Python的CLI工具,用于并行将栅格文件索引到DGGS,并以Parquet格式输出。

这是raster2dggs的矢量等效工具raster2dggs

目前仅支持H3 DGGS,并且由于它是为特定的内部使用案例开发的,可能存在其他限制,尽管它旨在作为通用抽象。欢迎贡献、建议、错误报告和措辞强烈的信件。

目前仅支持多边形;但既支持严格非重叠的多边形覆盖,也支持可能重叠的多边形集。重叠的多边形通过确保输出中DGGS单元格ID可能非唯一(重复)来捕获。

Example use case for vector2dggs, showing parcels indexed to a high H3 resolution

安装

pip install vector2dggs

用法

vector2dggs h3 --help
Usage: vector2dggs h3 [OPTIONS] VECTOR_INPUT OUTPUT_DIRECTORY

  Ingest a vector dataset and index it to the H3 DGGS.

  VECTOR_INPUT is the path to input vector geospatial data. OUTPUT_DIRECTORY
  should be a directory, not a file or database table, as it will instead be
  the write location for an Apache Parquet data store.

Options:
  -v, --verbosity LVL             Either CRITICAL, ERROR, WARNING, INFO or
                                  DEBUG  [default: INFO]
  -r, --resolution [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]
                                  H3 resolution to index  [required]
  -pr, --parent_res [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]
                                  H3 Parent resolution for the output
                                  partition. Defaults to resolution - 6
  -id, --id_field TEXT            Field to use as an ID; defaults to a
                                  constructed single 0...n index on the
                                  original feature order.
  -k, --keep_attributes           Retain attributes in output. The default is
                                  to create an output that only includes H3
                                  cell ID and the ID given by the -id field
                                  (or the default index ID).
  -ch, --chunksize INTEGER        The number of rows per index partition to
                                  use when spatially partioning. Adjusting
                                  this number will trade off memory use and
                                  time.  [default: 50; required]
  -s, --spatial_sorting [hilbert|morton|geohash]
                                  Spatial sorting method when perfoming
                                  spatial partitioning.  [default: hilbert]
  -crs, --cut_crs INTEGER         Set the coordinate reference system (CRS)
                                  used for cutting large polygons (see `--cur-
                                  threshold`). Defaults to the same CRS as the
                                  input. Should be a valid EPSG code.
  -c, --cut_threshold INTEGER     Cutting up large polygons into smaller
                                  pieces based on a target length. Units are
                                  assumed to match the input CRS units unless
                                  the `--cut_crs` is also given, in which case
                                  units match the units of the supplied CRS.
                                  [default: 5000; required]
  -t, --threads INTEGER           Amount of threads used for operation
                                  [default: 7]
  -tbl, --table TEXT              Name of the table to read when using a
                                  spatial database connection as input
  -g, --geom_col TEXT             Column name to use when using a spatial
                                  database connection as input  [default:
                                  geom]
  --tempdir PATH                  Temporary data is created during the
                                  execution of this program. This parameter
                                  allows you to control where this data will
                                  be written.
  -o, --overwrite
  --version                       Show the version and exit.
  --help                          Show this message and exit.

可视化输出

输出格式为Apache Parquet,每个分区一个文件。

为了快速查看您的输出,您可以使用pandas读取Apache Parquet,然后使用h3-pandas和geopandas将其转换为GeoPackage或GeoParquet,以便在桌面GIS(如QGIS)中进行可视化。Apache Parquet输出按ID列(您可以指定)索引,因此它应该适用于两个预期用途。

  • 将原始特征级数据中的属性数据合并到计算机DGGS单元格中。
  • 在H3单元格ID上合并其他数据到该输出。(输出包含类似 h3_\d{2} 的列,例如根据目标分辨率,h3_09h3_12。)

GeoParquet输出(六边形边界)

>>> import pandas as pd
>>> import h3pandas
>>> g = pd.read_parquet('./output-data/nz-property-titles.12.parquet').h3.h3_to_geo_boundary()
>>> g
                  title_no                                           geometry
h3_12                                                                        
8cbb53a734553ff  NA94D/635  POLYGON ((174.28483 -35.69315, 174.28482 -35.6...
8cbb53a734467ff  NA94D/635  POLYGON ((174.28454 -35.69333, 174.28453 -35.6...
8cbb53a734445ff  NA94D/635  POLYGON ((174.28416 -35.69368, 174.28415 -35.6...
8cbb53a734551ff  NA94D/635  POLYGON ((174.28496 -35.69329, 174.28494 -35.6...
8cbb53a734463ff  NA94D/635  POLYGON ((174.28433 -35.69335, 174.28432 -35.6...
...                    ...                                                ...
8cbb53a548b2dff  NA62D/324  POLYGON ((174.30249 -35.69369, 174.30248 -35.6...
8cbb53a548b61ff  NA62D/324  POLYGON ((174.30232 -35.69402, 174.30231 -35.6...
8cbb53a548b11ff  NA57C/785  POLYGON ((174.30140 -35.69348, 174.30139 -35.6...
8cbb53a548b15ff  NA57C/785  POLYGON ((174.30161 -35.69346, 174.30160 -35.6...
8cbb53a548b17ff  NA57C/785  POLYGON ((174.30149 -35.69332, 174.30147 -35.6...

[52736 rows x 2 columns]
>>> g.to_parquet('./output-data/parcels.12.geo.parquet')

对于开发

简要来说,要开始

  • 安装Poetry
  • 安装GDAL
    • 如果您使用的是Windows,在运行后续命令之前,可能需要先使用pip install gdal
    • 在Linux上,根据您平台的特定说明安装GDAL 3.6+,包括开发头文件,即libgdal-dev
  • 使用poetry init创建虚拟环境。这将安装必要的依赖项。
  • 之后,可以使用poetry shell重新激活虚拟环境。

如果您运行poetry install,CLI工具将被别名化,您只需使用vector2dggs而不是poetry run vector2dggs,后者是您不使用poetry install时的备选方案。

或者,您也可以使用pip通过pip install -e .安装,并绕过Poetry。

代码格式化

Code style: black

在提交前请运行black .

示例命令

使用本地GPKG

vector2dggs h3 -v DEBUG -id title_no -r 12 -o ~/Downloads/nz-property-titles.gpkg ~/Downloads/nz-property-titles.parquet

使用PostgreSQL/PostGIS连接

vector2dggs h3 -v DEBUG -id ogc_fid -r 9 -p 5 -t 4 --overwrite -tbl topo50_lake postgresql://user:password@host:port/db ./topo50_lake.parquet

引用

@software{vector2dggs,
  title={{vector2dggs}},
  author={Ardo, James and Law, Richard},
  url={https://github.com/manaakiwhenua/vector2dggs},
  version={0.6.1},
  date={2023-04-20}
}

APA/Harvard

Ardo, J., & Law, R. (2023). vector2dggs (0.6.1) [计算机软件]. https://github.com/manaakiwhenua/vector2dggs

manaakiwhenua-standards

项目详情


下载文件

下载适用于您平台的文件。如果您不确定选择哪个,请了解更多关于安装软件包的信息。

源分发

vector2dggs-0.6.1.tar.gz (12.7 kB 查看哈希值)

上传时间

构建分发

vector2dggs-0.6.1-py3-none-any.whl (26.9 kB 查看哈希值)

上传时间 Python 3

由以下支持