跳转到主要内容

用于解析GIS元数据标准的解析器,包括ArcGIS、FGDC和ISO-19115

项目描述

gis-metadata-parser

为GIS元数据设计的XML解析器,能够读取、验证、更新并输出一组核心属性,这些属性在最常见的标准之间已进行了映射,目前包括

  • FGDC
  • ISO-19139(和ISO-19115)
  • ArcGIS(使用ArcGIS格式1.0进行测试)。

此库与Python版本2.7和3.4至3.6兼容。

Build Status Coverage Status

安装

使用pip install gis-metadata-parser安装。

用法

解析器可以从文件、XML字符串或URL实例化。它们还可以从一种标准转换为另一种标准。

from gis_metadata.arcgis_metadata_parser import ArcGISParser
from gis_metadata.fgdc_metadata_parser import FgdcParser
from gis_metadata.iso_metadata_parser import IsoParser
from gis_metadata.metadata_parser import get_metadata_parser

# From file objects
with open(r'/path/to/metadata.xml') as metadata:
    fgdc_from_file = FgdcParser(metadata)

with open(r'/path/to/metadata.xml') as metadata:
    iso_from_file = IsoParser(metadata)

# Detect standard based on root element, metadata
fgdc_from_string = get_metadata_parser(
    """
    <?xml version='1.0' encoding='UTF-8'?>
    <metadata>
        <idinfo>
        </idinfo>
    </metadata>
    """
)

# Detect ArcGIS standard based on root element and its nodes
iso_from_string = get_metadata_parser(
    """
    <?xml version='1.0' encoding='UTF-8'?>
    <metadata>
        <dataIdInfo/></dataIdInfo>
        <distInfo/></distInfo>
        <dqInfo/></dqInfo>
    </metadata>
    """
)

# Detect ISO standard based on root element, MD_Metadata or MI_Metadata
iso_from_string = get_metadata_parser(
    """
    <?xml version='1.0' encoding='UTF-8'?>
    <MD_Metadata>
        <identificationInfo>
        </identificationInfo>
    </MD_Metadata>
    """
)

# Convert from one standard to another
fgdc_converted = iso_from_file.convert_to(FgdcParser)
iso_converted = fgdc_from_file.convert_to(IsoParser)
arcgis_converted = iso_converted.convert_to(ArcGISParser)

# Output supported properties as key value pairs (dict)
fgdc_key_vals = fgdc_from_file.convert_to(dict)
iso_key_vals = iso_from_file.convert_to(dict)

最后,可以更新、验证、应用和输出解析器的属性

with open(r'/path/to/metadata.xml') as metadata:
    fgdc_from_file = FgdcParser(metadata)

# Example simple properties
fgdc_from_file.title
fgdc_from_file.abstract
fgdc_from_file.place_keywords
fgdc_from_file.thematic_keywords

# :see: gis_metadata.utils.SUPPORTED_PROPS for list of all supported properties

# Complex properties
fgdc_from_file.attributes
fgdc_from_file.bounding_box
fgdc_from_file.contacts
fgdc_from_file.dates
fgdc_from_file.digital_forms
fgdc_from_file.larger_works
fgdc_from_file.process_steps
fgdc_from_file.raster_info

# :see: gis_metadata.utils.COMPLEX_DEFINITIONS for structure of all complex properties

# Update properties
fgdc_from_file.title = 'New Title'
fgdc_from_file.dates = {'type': 'single' 'values': '1/1/2016'}

# Apply updates
fgdc_from_file.validate()                                      # Ensure updated properties are valid
fgdc_from_file.serialize()                                     # Output updated XML as a string
fgdc_from_file.write()                                         # Output updated XML to existing file
fgdc_from_file.write(out_file_or_path='/path/to/updated.xml')  # Output updated XML to new file

扩展和自定义

提示

关于元数据解析器如何连接工作的,有一些未写明的(到目前为止)规则

  1. 属性通常由每个 parser._data_map 中的 XPATH 定义
  2. 简单的解析器属性只接受 stringstringlist
  3. 数据映射中配置的 XPATH 支持对元素属性的引用:'path/to/element/@attr'
  4. 复杂的解析器属性由自定义解析器/更新器函数定义,而不是由 XPATH 定义
  5. 复杂的解析器属性接受包含简单属性的 dict 类型的值,或包含此类 dict 的列表
  6. 数据映射中带前导下划线的 XPATH 将被解析,但不会进行验证或写入
  7. 数据映射中“遮蔽”其他属性但带前导下划线的 XPATH 用作次要值
  8. 如果主要位置(元素或属性)缺失,则在没有主要值的情况下使用次要值
  9. 额外的下划线表示要检查缺失值的进一步位置,例如 title_title__title

以下是一些现有次要属性的示例

# In the ArcGIS parser for distribution contact phone:

ARCGIS_TAG_FORMATS = frozendict({
     ...
    'dist_phone': 'distInfo/distributor/distorCont/rpCntInfo/cntPhone/voiceNum',
    '_dist_phone': 'distInfo/distributor/distorCont/rpCntInfo/voiceNum',  # If not in cntPhone
    ...
})

# In the FGDC parser for sub-properties in the contacts definition:

FGDC_DEFINITIONS = dict({k: dict(v) for k, v in iteritems(COMPLEX_DEFINITIONS)})
FGDC_DEFINITIONS[CONTACTS].update({
    '_name': '{_name}',
    '_organization': '{_organization}'
})
...
class FgdcParser(MetadataParser):
    ...
    def _init_data_map(self):
        ...
        ct_format = FGDC_TAG_FORMATS[CONTACTS]
        fgdc_data_structures[CONTACTS] = format_xpaths(
            ...
            name=ct_format.format(ct_path='cntperp/cntper'),
            _name=ct_format.format(ct_path='cntorgp/cntper'),  # If not in cntperp
            organization=ct_format.format(ct_path='cntperp/cntorg'),
            _organization=ct_format.format(ct_path='cntorgp/cntorg'),  # If not in cntperp
        )

# Also see the ISO parser for secondary and tertiary sub-properties in the attributes definition:

ISO_DEFINITIONS = dict({k: dict(v) for k, v in iteritems(COMPLEX_DEFINITIONS)})
ISO_DEFINITIONS[ATTRIBUTES].update({
    '_definition_source': '{_definition_src}',
    '__definition_source': '{__definition_src}',
    '___definition_source': '{___definition_src}'
})

示例

任何支持的解析器都可以扩展以包含更多标准支持的数据。在这个例子中,我们将向 IsoParser 添加两个新属性

  • metadata_language:一个简单字符串字段,描述元数据文件的本身语言(不是数据集)
  • metadata_contacts:一个利用并增强现有联系结构的复杂结构

本例将涵盖以下内容

  1. 添加一个新简单属性
  2. 配置属性的次要位置
  3. 在 XPATH 中引用元素属性
  4. 添加一个新复杂属性
  5. 自定义复杂属性以包括新的子属性

此外,本例由单元测试专门涵盖。

from gis_metadata.iso_metadata_parser import IsoParser
from gis_metadata.utils import COMPLEX_DEFINITIONS, CONTACTS, format_xpaths, ParserProperty


class CustomIsoParser(IsoParser):

    def _init_data_map(self):
        super(CustomIsoParser, self)._init_data_map()

        # 1. Basic property: text or list (with secondary location referencing `codeListValue` attribute)

        lang_prop = 'metadata_language'
        self._data_map[lang_prop] = 'language/CharacterString'                    # Parse from here if present
        self._data_map['_' + lang_prop] = 'language/LanguageCode/@codeListValue'  # Otherwise, try from here

        # 2. Complex structure (reuse of contacts structure plus phone)

        # 2.1 Define some basic variables
        ct_prop = 'metadata_contacts'
        ct_xpath = 'contact/CI_ResponsibleParty/{ct_path}'
        ct_defintion = COMPLEX_DEFINITIONS[CONTACTS]
        ct_defintion['phone'] = '{phone}'

        # 2.2 Reuse CONTACT structure to specify locations per prop (adapted from parent to add `phone`)
        self._data_structures[ct_prop] = format_xpaths(
            ct_defintion,
            name=ct_xpath.format(ct_path='individualName/CharacterString'),
            organization=ct_xpath.format(ct_path='organisationName/CharacterString'),
            position=ct_xpath.format(ct_path='positionName/CharacterString'),
            phone=ct_xpath.format(
                ct_path='contactInfo/CI_Contact/phone/CI_Telephone/voice/CharacterString'
            ),
            email=ct_xpath.format(
                ct_path='contactInfo/CI_Contact/address/CI_Address/electronicMailAddress/CharacterString'
            )
        )

        # 2.3 Set the contact root to insert new elements at "contact" level given the defined path:
        #   'contact/CI_ResponsibleParty/...'
        # By default we would get multiple "CI_ResponsibleParty" elements under a single "contact"
        # This way we get multiple "contact" elements, each with its own single "CI_ResponsibleParty"
        self._data_map['_{prop}_root'.format(prop=ct_prop)] = 'contact'

        # 2.4 Leverage the default methods for parsing complex properties (or write your own parser/updater)
        self._data_map[ct_prop] = ParserProperty(self._parse_complex_list, self._update_complex_list)

        # 3. And finally, let the parent validation logic know about the two new custom properties

        self._metadata_props.add(lang_prop)
        self._metadata_props.add(ct_prop)


with open(r'/path/to/metadata.xml') as metadata:
    iso_from_file = CustomIsoParser(metadata)

iso_from_file.metadata_language
iso_from_file.metadata_contacts

项目详情


下载文件

下载您平台的文件。如果您不确定选择哪一个,请了解更多关于 安装包 的信息。

源分布

gis-metadata-parser-2.0.1.tar.gz (52.4 kB 查看哈希)

上传

构建分布

gis_metadata_parser-2.0.1-py3-none-any.whl (57.6 kB 查看哈希)

上传 Python 3

由以下赞助

AWS AWS 云计算和安全赞助商 Datadog Datadog 监控 Fastly Fastly CDN Google Google 下载分析 Microsoft Microsoft PSF 赞助商 Pingdom Pingdom 监控 Sentry Sentry 错误记录 StatusPage StatusPage 状态页面