CLI DGGS索引工具,用于矢量地理空间数据
项目描述
vector2dggs
基于Python的CLI工具,用于并行将栅格文件索引到DGGS,并以Parquet格式输出。
这是raster2dggs的矢量等效工具raster2dggs。
目前仅支持H3 DGGS,并且由于它是为特定的内部使用案例开发的,可能存在其他限制,尽管它旨在作为通用抽象。欢迎贡献、建议、错误报告和措辞强烈的信件。
目前仅支持多边形;但既支持严格非重叠的多边形覆盖,也支持可能重叠的多边形集。重叠的多边形通过确保输出中DGGS单元格ID可能非唯一(重复)来捕获。
安装
pip install vector2dggs
用法
vector2dggs h3 --help
Usage: vector2dggs h3 [OPTIONS] VECTOR_INPUT OUTPUT_DIRECTORY
Ingest a vector dataset and index it to the H3 DGGS.
VECTOR_INPUT is the path to input vector geospatial data. OUTPUT_DIRECTORY
should be a directory, not a file or database table, as it will instead be
the write location for an Apache Parquet data store.
Options:
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or
DEBUG [default: INFO]
-r, --resolution [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]
H3 resolution to index [required]
-pr, --parent_res [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]
H3 Parent resolution for the output
partition. Defaults to resolution - 6
-id, --id_field TEXT Field to use as an ID; defaults to a
constructed single 0...n index on the
original feature order.
-k, --keep_attributes Retain attributes in output. The default is
to create an output that only includes H3
cell ID and the ID given by the -id field
(or the default index ID).
-ch, --chunksize INTEGER The number of rows per index partition to
use when spatially partioning. Adjusting
this number will trade off memory use and
time. [default: 50; required]
-s, --spatial_sorting [hilbert|morton|geohash]
Spatial sorting method when perfoming
spatial partitioning. [default: hilbert]
-crs, --cut_crs INTEGER Set the coordinate reference system (CRS)
used for cutting large polygons (see `--cur-
threshold`). Defaults to the same CRS as the
input. Should be a valid EPSG code.
-c, --cut_threshold INTEGER Cutting up large polygons into smaller
pieces based on a target length. Units are
assumed to match the input CRS units unless
the `--cut_crs` is also given, in which case
units match the units of the supplied CRS.
[default: 5000; required]
-t, --threads INTEGER Amount of threads used for operation
[default: 7]
-tbl, --table TEXT Name of the table to read when using a
spatial database connection as input
-g, --geom_col TEXT Column name to use when using a spatial
database connection as input [default:
geom]
--tempdir PATH Temporary data is created during the
execution of this program. This parameter
allows you to control where this data will
be written.
-o, --overwrite
--version Show the version and exit.
--help Show this message and exit.
可视化输出
输出格式为Apache Parquet,每个分区一个文件。
为了快速查看您的输出,您可以使用pandas读取Apache Parquet,然后使用h3-pandas和geopandas将其转换为GeoPackage或GeoParquet,以便在桌面GIS(如QGIS)中进行可视化。Apache Parquet输出按ID列(您可以指定)索引,因此它应该适用于两个预期用途。
- 将原始特征级数据中的属性数据合并到计算机DGGS单元格中。
- 在H3单元格ID上合并其他数据到该输出。(输出包含类似
h3_\d{2}
的列,例如根据目标分辨率,h3_09
或h3_12
。)
GeoParquet输出(六边形边界)
>>> import pandas as pd
>>> import h3pandas
>>> g = pd.read_parquet('./output-data/nz-property-titles.12.parquet').h3.h3_to_geo_boundary()
>>> g
title_no geometry
h3_12
8cbb53a734553ff NA94D/635 POLYGON ((174.28483 -35.69315, 174.28482 -35.6...
8cbb53a734467ff NA94D/635 POLYGON ((174.28454 -35.69333, 174.28453 -35.6...
8cbb53a734445ff NA94D/635 POLYGON ((174.28416 -35.69368, 174.28415 -35.6...
8cbb53a734551ff NA94D/635 POLYGON ((174.28496 -35.69329, 174.28494 -35.6...
8cbb53a734463ff NA94D/635 POLYGON ((174.28433 -35.69335, 174.28432 -35.6...
... ... ...
8cbb53a548b2dff NA62D/324 POLYGON ((174.30249 -35.69369, 174.30248 -35.6...
8cbb53a548b61ff NA62D/324 POLYGON ((174.30232 -35.69402, 174.30231 -35.6...
8cbb53a548b11ff NA57C/785 POLYGON ((174.30140 -35.69348, 174.30139 -35.6...
8cbb53a548b15ff NA57C/785 POLYGON ((174.30161 -35.69346, 174.30160 -35.6...
8cbb53a548b17ff NA57C/785 POLYGON ((174.30149 -35.69332, 174.30147 -35.6...
[52736 rows x 2 columns]
>>> g.to_parquet('./output-data/parcels.12.geo.parquet')
对于开发
简要来说,要开始
- 安装Poetry
- 安装GDAL
- 如果您使用的是Windows,在运行后续命令之前,可能需要先使用
pip install gdal
。 - 在Linux上,根据您平台的特定说明安装GDAL 3.6+,包括开发头文件,即
libgdal-dev
。
- 如果您使用的是Windows,在运行后续命令之前,可能需要先使用
- 使用
poetry init
创建虚拟环境。这将安装必要的依赖项。 - 之后,可以使用
poetry shell
重新激活虚拟环境。
如果您运行poetry install
,CLI工具将被别名化,您只需使用vector2dggs
而不是poetry run vector2dggs
,后者是您不使用poetry install
时的备选方案。
或者,您也可以使用pip通过pip install -e .
安装,并绕过Poetry。
代码格式化
在提交前请运行black .
示例命令
使用本地GPKG
vector2dggs h3 -v DEBUG -id title_no -r 12 -o ~/Downloads/nz-property-titles.gpkg ~/Downloads/nz-property-titles.parquet
使用PostgreSQL/PostGIS连接
vector2dggs h3 -v DEBUG -id ogc_fid -r 9 -p 5 -t 4 --overwrite -tbl topo50_lake postgresql://user:password@host:port/db ./topo50_lake.parquet
引用
@software{vector2dggs,
title={{vector2dggs}},
author={Ardo, James and Law, Richard},
url={https://github.com/manaakiwhenua/vector2dggs},
version={0.6.1},
date={2023-04-20}
}
APA/Harvard
Ardo, J., & Law, R. (2023). vector2dggs (0.6.1) [计算机软件]. https://github.com/manaakiwhenua/vector2dggs
项目详情
下载文件
下载适用于您平台的文件。如果您不确定选择哪个,请了解更多关于安装软件包的信息。
源分发
构建分发
vector2dggs-0.6.1.tar.gz的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 8e1b508fbd449decd671ca06f91d4d37008566df181192d46737053afb89e8c6 |
|
MD5 | 7516638c75356c30020472db8999ed70 |
|
BLAKE2b-256 | b762886ad2354cb99e76293b345b8217259efe731104eab6c25278ca473fc12f |