跳转到主要内容

Microsoft Azure AI 文档智能客户端库,用于Python

项目描述

Azure AI 文档智能客户端库,用于Python

Azure AI 文档智能(之前称为表单识别器)是一个云服务,它使用机器学习分析您的文档中的文本和结构化数据。它包括以下主要功能

  • 布局 - 从文档中提取内容结构(例如,单词、选择标记、表格)。
  • 文档 - 除了常规布局外,还分析文档中的键值对。
  • 读取 - 从文档中读取页面信息。
  • 预构建 - 使用预构建模型从选定文档类型(例如,收据、发票、名片、ID文件、美国W-2税表等)中提取常用字段值。
  • 自定义 - 使用您自己的数据构建自定义模型,从文档中提取定制字段值以及常规布局。
  • 分类器 - 构建自定义分类模型,结合布局和语言功能,以准确地检测和识别您在应用程序中处理的文档。
  • 附加功能 - 提取条形码/二维码、公式、字体/样式等,或使用可选参数启用大文档的高分辨率模式。

源代码 | 包(PyPI) | API 参考文档 | 产品文档 | 示例

免责声明

最新的服务API目前仅在部分Azure区域可用,可用区域可在此处找到。

入门

安装包

python -m pip install azure-ai-documentintelligence

此表显示了SDK版本与支持的API服务版本之间的关系

SDK版本 支持的API服务版本
1.0.0b1 2023-10-31-preview
1.0.0b2 2024-02-29-preview

旧API版本在azure-ai-formrecognizer中受支持,请参阅迁移指南以获取如何更新应用的详细说明。

先决条件

  • 使用此包需要Python 3.8或更高版本。
  • 您需要一个Azure订阅才能使用此包。
  • 现有的Azure AI 文档智能实例。

创建认知服务或文档智能资源

文档智能支持多服务和单服务访问。如果您计划通过单个端点/密钥访问多个认知服务,请创建认知服务资源。仅用于文档智能访问,请创建文档智能资源。请注意,如果您打算使用Azure Active Directory身份验证,则需要单服务资源。

您可以使用以下任一方式创建资源

以下是如何使用CLI创建文档智能资源的一个示例

# Create a new resource group to hold the Document Intelligence resource
# if using an existing resource group, skip this step
az group create --name <your-resource-name> --location <location>
# Create the Document Intelligence resource
az cognitiveservices account create \
    --name <your-resource-name> \
    --resource-group <your-resource-group-name> \
    --kind FormRecognizer \
    --sku <sku> \
    --location <location> \
    --yes

有关创建资源或获取位置和SKU信息的更多信息,请参阅此处

认证客户端

为了与文档智能服务交互,您需要创建客户端实例。实例化客户端对象需要端点凭证

获取端点

您可以使用Azure门户Azure CLI找到您的文档智能资源的端点。

# Get the endpoint for the Document Intelligence resource
az cognitiveservices account show --name "resource-name" --resource-group "resource-group-name" --query "properties.endpoint"

可以使用区域端点或自定义子域进行身份验证。它们的格式如下

Regional endpoint: https://<region>.api.cognitive.microsoft.com/
Custom subdomain: https://<resource-name>.cognitiveservices.azure.com/

区域端点对同一区域的每个资源都是相同的。支持的完整区域端点列表可在此处查阅。请注意,区域端点不支持AAD身份验证。

另一方面,自定义子域是文档智能资源独有的名称。它们只能由单服务资源使用。

获取API密钥

API密钥可在Azure门户中找到,或通过运行以下Azure CLI命令获取

az cognitiveservices account keys list --name "<resource-name>" --resource-group "<resource-group-name>"

使用AzureKeyCredential创建客户端

要将API密钥作为credential参数使用,请将密钥作为字符串传递给AzureKeyCredential实例。

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient

endpoint = "https://<my-custom-subdomain>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")
document_intelligence_client = DocumentIntelligenceClient(endpoint, credential)

使用 Azure Active Directory 凭据创建客户端

本入门指南中的示例使用 AzureKeyCredential 进行身份验证,但您也可以使用 azure-identity 库通过 Azure Active Directory 进行身份验证。请注意,区域端点不支持 AAD 身份验证。为您的资源创建一个 自定义子域名,以便使用此类型身份验证。

要使用下面的 DefaultAzureCredential 类型或其他 Azure SDK 提供的凭据类型,请安装 azure-identity

pip install azure-identity

您还需要 注册新的 AAD 应用程序 并通过将 "Cognitive Services User" 角色分配给您的服务主体来授予对 Document Intelligence 的访问权限。

完成后,将 AAD 应用程序的客户端 ID、租户 ID 和客户端密钥的值设置为环境变量:AZURE_CLIENT_IDAZURE_TENANT_IDAZURE_CLIENT_SECRET

"""DefaultAzureCredential will use the values from these environment
variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET
"""
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.identity import DefaultAzureCredential

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
credential = DefaultAzureCredential()

document_intelligence_client = DocumentIntelligenceClient(endpoint, credential)

关键概念

DocumentIntelligenceClient

DocumentIntelligenceClient 提供通过 begin_analyze_document API 使用预构建和自定义模型分析输入文档的操作。使用 model_id 参数选择分析模型类型。有关支持模型的全列表,请参阅此处DocumentIntelligenceClient 还提供通过 begin_classify_document API 对文档进行分类的操作。自定义分类模型可以分类输入文件中的每一页,以识别其中的文档,还可以在输入文件中识别多个文档或单个文档的多个实例。

提供了示例代码片段以说明如何使用 此处 的 DocumentIntelligenceClient。有关分析文档的更多信息,包括支持的功能、区域和文档类型,请参阅 服务文档

DocumentIntelligenceAdministrationClient

DocumentIntelligenceAdministrationClient 提供以下操作

  • 通过为您指定的特定字段创建自定义模型来构建自定义模型以分析。返回一个 DocumentModelDetails,指示模型可以分析哪些文档类型,以及每个字段的估计置信度。请参阅 服务文档 获取更详细的说明。
  • 从现有模型集合中创建组合模型。
  • 管理您账户中创建的模型。
  • 列出操作或获取过去 24 小时内创建的特定模型操作。
  • 将自定义模型从一个 Document Intelligence 资源复制到另一个。
  • 构建和管理自定义分类模型以分类您应用程序中处理的文档。

请注意,模型还可以使用如图形用户界面(例如 Document Intelligence Studio)构建。

提供了示例代码片段以说明如何使用 此处 的 DocumentIntelligenceAdministrationClient。

长时间运行的操作

长时间运行的操作是包含一个初始请求以启动操作发送到服务,然后以间隔轮询服务以确定操作是否完成或失败,如果成功,则获取结果的操作。

分析文档、构建模型或复制/组合模型的方法被建模为长时间运行的操作。客户端公开了一个返回 LROPollerAsyncLROPollerbegin_ 方法。调用者应通过在从 begin_ 方法返回的轮询对象上调用 result() 来等待操作完成。以下示例代码片段展示了如何使用长时间运行的操作如下

示例

以下部分提供了几个代码片段,涵盖了文档智能的一些最常见任务,包括

提取布局

从文档中提取文本、选择标记、文本样式和表格结构,以及它们的边界区域坐标。

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeResult

def _in_span(word, spans):
    for span in spans:
        if word.span.offset >= span.offset and (word.span.offset + word.span.length) <= (span.offset + span.length):
            return True
    return False

def _format_polygon(polygon):
    if not polygon:
        return "N/A"
    return ", ".join([f"[{polygon[i]}, {polygon[i + 1]}]" for i in range(0, len(polygon), 2)])

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]

document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
with open(path_to_sample_documents, "rb") as f:
    poller = document_intelligence_client.begin_analyze_document(
        "prebuilt-layout", analyze_request=f, content_type="application/octet-stream"
    )
result: AnalyzeResult = poller.result()

if result.styles and any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

for page in result.pages:
    print(f"----Analyzing layout from page #{page.page_number}----")
    print(f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}")

    if page.lines:
        for line_idx, line in enumerate(page.lines):
            words = []
            if page.words:
                for word in page.words:
                    print(f"......Word '{word.content}' has a confidence of {word.confidence}")
                    if _in_span(word, line.spans):
                        words.append(word)
            print(
                f"...Line # {line_idx} has word count {len(words)} and text '{line.content}' "
                f"within bounding polygon '{_format_polygon(line.polygon)}'"
            )

    if page.selection_marks:
        for selection_mark in page.selection_marks:
            print(
                f"Selection mark is '{selection_mark.state}' within bounding polygon "
                f"'{_format_polygon(selection_mark.polygon)}' and has a confidence of {selection_mark.confidence}"
            )

if result.paragraphs:
    print(f"----Detected #{len(result.paragraphs)} paragraphs in the document----")
    # Sort all paragraphs by span's offset to read in the right order.
    result.paragraphs.sort(key=lambda p: (p.spans.sort(key=lambda s: s.offset), p.spans[0].offset))
    print("-----Print sorted paragraphs-----")
    for paragraph in result.paragraphs:
        if not paragraph.bounding_regions:
            print(f"Found paragraph with role: '{paragraph.role}' within N/A bounding region")
        else:
            print(f"Found paragraph with role: '{paragraph.role}' within")
            print(
                ", ".join(
                    f" Page #{region.page_number}: {_format_polygon(region.polygon)} bounding region"
                    for region in paragraph.bounding_regions
                )
            )
        print(f"...with content: '{paragraph.content}'")
        print(f"...with offset: {paragraph.spans[0].offset} and length: {paragraph.spans[0].length}")

if result.tables:
    for table_idx, table in enumerate(result.tables):
        print(f"Table # {table_idx} has {table.row_count} rows and " f"{table.column_count} columns")
        if table.bounding_regions:
            for region in table.bounding_regions:
                print(
                    f"Table # {table_idx} location on page: {region.page_number} is {_format_polygon(region.polygon)}"
                )
        for cell in table.cells:
            print(f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'")
            if cell.bounding_regions:
                for region in cell.bounding_regions:
                    print(
                        f"...content on page {region.page_number} is within bounding polygon '{_format_polygon(region.polygon)}'"
                    )

print("----------------------------------------")

从文档中提取图形

将文档中的图形提取为裁剪图像。

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeOutputOption, AnalyzeResult

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]

document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))

with open(path_to_sample_documents, "rb") as f:
    poller = document_intelligence_client.begin_analyze_document(
        "prebuilt-layout",
        analyze_request=f,
        output=[AnalyzeOutputOption.FIGURES],
        content_type="application/octet-stream",
    )
result: AnalyzeResult = poller.result()
operation_id = poller.details["operation_id"]

if result.figures:
    for figure in result.figures:
        if figure.id:
            response = document_intelligence_client.get_analyze_result_figure(
                model_id=result.model_id, result_id=operation_id, figure_id=figure.id
            )
            with open(f"{figure.id}.png", "wb") as writer:
                writer.writelines(response)
else:
    print("No figures found.")

分析PDF文档结果

将模拟PDF转换为嵌入文本的PDF。这样的文本可以在PDF中实现文本搜索,或在LLM聊天场景中使用PDF。

注意:目前,此功能仅由 prebuilt-read 支持。所有其他模型将返回错误。

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeOutputOption, AnalyzeResult

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]

document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))

with open(path_to_sample_documents, "rb") as f:
    poller = document_intelligence_client.begin_analyze_document(
        "prebuilt-read",
        analyze_request=f,
        output=[AnalyzeOutputOption.PDF],
        content_type="application/octet-stream",
    )
result: AnalyzeResult = poller.result()
operation_id = poller.details["operation_id"]

response = document_intelligence_client.get_analyze_result_pdf(model_id=result.model_id, result_id=operation_id)
with open("analyze_result.pdf", "wb") as writer:
    writer.writelines(response)

使用通用文档模型

使用文档智能服务提供的通用文档模型从文档中分析键值对、表格、样式和选择标记。通过将 model_id="prebuilt-document" 传递给 begin_analyze_document 方法来选择通用文档模型

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import DocumentAnalysisFeature, AnalyzeResult

def _in_span(word, spans):
    for span in spans:
        if word.span.offset >= span.offset and (word.span.offset + word.span.length) <= (span.offset + span.length):
            return True
    return False

def _format_bounding_region(bounding_regions):
    if not bounding_regions:
        return "N/A"
    return ", ".join(
        f"Page #{region.page_number}: {_format_polygon(region.polygon)}" for region in bounding_regions
    )

def _format_polygon(polygon):
    if not polygon:
        return "N/A"
    return ", ".join([f"[{polygon[i]}, {polygon[i + 1]}]" for i in range(0, len(polygon), 2)])

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]

document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
with open(path_to_sample_documents, "rb") as f:
    poller = document_intelligence_client.begin_analyze_document(
        "prebuilt-layout",
        analyze_request=f,
        features=[DocumentAnalysisFeature.KEY_VALUE_PAIRS],
        content_type="application/octet-stream",
    )
result: AnalyzeResult = poller.result()

if result.styles:
    for style in result.styles:
        if style.is_handwritten:
            print("Document contains handwritten content: ")
            print(",".join([result.content[span.offset : span.offset + span.length] for span in style.spans]))

print("----Key-value pairs found in document----")
if result.key_value_pairs:
    for kv_pair in result.key_value_pairs:
        if kv_pair.key:
            print(
                f"Key '{kv_pair.key.content}' found within "
                f"'{_format_bounding_region(kv_pair.key.bounding_regions)}' bounding regions"
            )
        if kv_pair.value:
            print(
                f"Value '{kv_pair.value.content}' found within "
                f"'{_format_bounding_region(kv_pair.value.bounding_regions)}' bounding regions\n"
            )

for page in result.pages:
    print(f"----Analyzing document from page #{page.page_number}----")
    print(f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}")

    if page.lines:
        for line_idx, line in enumerate(page.lines):
            words = []
            if page.words:
                for word in page.words:
                    print(f"......Word '{word.content}' has a confidence of {word.confidence}")
                    if _in_span(word, line.spans):
                        words.append(word)
            print(
                f"...Line #{line_idx} has {len(words)} words and text '{line.content}' within "
                f"bounding polygon '{_format_polygon(line.polygon)}'"
            )

    if page.selection_marks:
        for selection_mark in page.selection_marks:
            print(
                f"Selection mark is '{selection_mark.state}' within bounding polygon "
                f"'{_format_polygon(selection_mark.polygon)}' and has a confidence of "
                f"{selection_mark.confidence}"
            )

if result.tables:
    for table_idx, table in enumerate(result.tables):
        print(f"Table # {table_idx} has {table.row_count} rows and {table.column_count} columns")
        if table.bounding_regions:
            for region in table.bounding_regions:
                print(
                    f"Table # {table_idx} location on page: {region.page_number} is {_format_polygon(region.polygon)}"
                )
        for cell in table.cells:
            print(f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'")
            if cell.bounding_regions:
                for region in cell.bounding_regions:
                    print(
                        f"...content on page {region.page_number} is within bounding polygon '{_format_polygon(region.polygon)}'\n"
                    )
print("----------------------------------------")
  • 有关 prebuilt-document 模型提供的功能的更多信息,请参阅此处

使用预构建模型

使用文档智能服务提供的预构建模型从以下文档类型中提取字段,例如收据、发票、名片、身份证件和U.S. W-2税务文件。

例如,要分析销售收据的字段,请使用提供的预构建收据模型,通过将 model_id="prebuilt-receipt" 传递给 begin_analyze_document 方法来分析

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeResult

def _format_price(price_dict):
    return "".join([f"{p}" for p in price_dict.values()])

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]

document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
with open(path_to_sample_documents, "rb") as f:
    poller = document_intelligence_client.begin_analyze_document(
        "prebuilt-receipt", analyze_request=f, locale="en-US", content_type="application/octet-stream"
    )
receipts: AnalyzeResult = poller.result()

if receipts.documents:
    for idx, receipt in enumerate(receipts.documents):
        print(f"--------Analysis of receipt #{idx + 1}--------")
        print(f"Receipt type: {receipt.doc_type if receipt.doc_type else 'N/A'}")
        if receipt.fields:
            merchant_name = receipt.fields.get("MerchantName")
            if merchant_name:
                print(
                    f"Merchant Name: {merchant_name.get('valueString')} has confidence: "
                    f"{merchant_name.confidence}"
                )
            transaction_date = receipt.fields.get("TransactionDate")
            if transaction_date:
                print(
                    f"Transaction Date: {transaction_date.get('valueDate')} has confidence: "
                    f"{transaction_date.confidence}"
                )
            items = receipt.fields.get("Items")
            if items:
                print("Receipt items:")
                for idx, item in enumerate(items.get("valueArray")):
                    print(f"...Item #{idx + 1}")
                    item_description = item.get("valueObject").get("Description")
                    if item_description:
                        print(
                            f"......Item Description: {item_description.get('valueString')} has confidence: "
                            f"{item_description.confidence}"
                        )
                    item_quantity = item.get("valueObject").get("Quantity")
                    if item_quantity:
                        print(
                            f"......Item Quantity: {item_quantity.get('valueString')} has confidence: "
                            f"{item_quantity.confidence}"
                        )
                    item_total_price = item.get("valueObject").get("TotalPrice")
                    if item_total_price:
                        print(
                            f"......Total Item Price: {_format_price(item_total_price.get('valueCurrency'))} has confidence: "
                            f"{item_total_price.confidence}"
                        )
            subtotal = receipt.fields.get("Subtotal")
            if subtotal:
                print(
                    f"Subtotal: {_format_price(subtotal.get('valueCurrency'))} has confidence: {subtotal.confidence}"
                )
            tax = receipt.fields.get("TotalTax")
            if tax:
                print(f"Total tax: {_format_price(tax.get('valueCurrency'))} has confidence: {tax.confidence}")
            tip = receipt.fields.get("Tip")
            if tip:
                print(f"Tip: {_format_price(tip.get('valueCurrency'))} has confidence: {tip.confidence}")
            total = receipt.fields.get("Total")
            if total:
                print(f"Total: {_format_price(total.get('valueCurrency'))} has confidence: {total.confidence}")
        print("--------------------------------------")

您不仅限于收据!这里有一些预构建模型可供选择,每个模型都有自己的支持字段集。有关其他支持的预构建模型,请参阅此处

构建自定义模型

在您的文档类型上构建自定义模型。结果模型可以用于分析训练模型时的文档类型中的值。提供容器SAS URL,您在其中存储训练文档的Azure存储Blob容器。

有关设置容器和所需文件结构的更多详细信息,请参阅服务文档

# Let's build a model to use for this sample
import uuid
from azure.ai.documentintelligence import DocumentIntelligenceAdministrationClient
from azure.ai.documentintelligence.models import (
    DocumentBuildMode,
    BuildDocumentModelRequest,
    AzureBlobContentSource,
    DocumentModelDetails,
)
from azure.core.credentials import AzureKeyCredential

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
container_sas_url = os.environ["DOCUMENTINTELLIGENCE_STORAGE_CONTAINER_SAS_URL"]

document_intelligence_admin_client = DocumentIntelligenceAdministrationClient(endpoint, AzureKeyCredential(key))
poller = document_intelligence_admin_client.begin_build_document_model(
    BuildDocumentModelRequest(
        model_id=str(uuid.uuid4()),
        build_mode=DocumentBuildMode.TEMPLATE,
        azure_blob_source=AzureBlobContentSource(container_url=container_sas_url),
        description="my model description",
    )
)
model: DocumentModelDetails = poller.result()

print(f"Model ID: {model.model_id}")
print(f"Description: {model.description}")
print(f"Model created on: {model.created_date_time}")
print(f"Model expires on: {model.expiration_date_time}")
if model.doc_types:
    print("Doc types the model can recognize:")
    for name, doc_type in model.doc_types.items():
        print(f"Doc Type: '{name}' built with '{doc_type.build_mode}' mode which has the following fields:")
        if doc_type.field_schema:
            for field_name, field in doc_type.field_schema.items():
                if doc_type.field_confidence:
                    print(
                        f"Field: '{field_name}' has type '{field['type']}' and confidence score "
                        f"{doc_type.field_confidence[field_name]}"
                    )

使用自定义模型分析文档

分析文档字段、表格、选择标记等。这些模型使用您自己的数据进行训练,因此它们针对您的文档进行了定制。为了获得最佳结果,您应仅分析与构建自定义模型相同的文档类型的文档。

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeResult

def _print_table(header_names, table_data):
    # Print a two-dimensional array like a table.
    max_len_list = []
    for i in range(len(header_names)):
        col_values = list(map(lambda row: len(str(row[i])), table_data))
        col_values.append(len(str(header_names[i])))
        max_len_list.append(max(col_values))

    row_format_str = "".join(map(lambda len: f"{{:<{len + 4}}}", max_len_list))

    print(row_format_str.format(*header_names))
    for row in table_data:
        print(row_format_str.format(*row))

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
model_id = os.getenv("CUSTOM_BUILT_MODEL_ID", custom_model_id)

document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))

# Make sure your document's type is included in the list of document types the custom model can analyze
with open(path_to_sample_documents, "rb") as f:
    poller = document_intelligence_client.begin_analyze_document(
        model_id=model_id, analyze_request=f, content_type="application/octet-stream"
    )
result: AnalyzeResult = poller.result()

if result.documents:
    for idx, document in enumerate(result.documents):
        print(f"--------Analyzing document #{idx + 1}--------")
        print(f"Document has type {document.doc_type}")
        print(f"Document has document type confidence {document.confidence}")
        print(f"Document was analyzed with model with ID {result.model_id}")
        if document.fields:
            for name, field in document.fields.items():
                field_value = field.get("valueString") if field.get("valueString") else field.content
                print(
                    f"......found field of type '{field.type}' with value '{field_value}' and with confidence {field.confidence}"
                )

    # Extract table cell values
    SYMBOL_OF_TABLE_TYPE = "array"
    SYMBOL_OF_OBJECT_TYPE = "object"
    KEY_OF_VALUE_OBJECT = "valueObject"
    KEY_OF_CELL_CONTENT = "content"

    for doc in result.documents:
        if not doc.fields is None:
            for field_name, field_value in doc.fields.items():
                # Dynamic Table cell information store as array in document field.
                if field_value.type == SYMBOL_OF_TABLE_TYPE and field_value.value_array:
                    col_names = []
                    sample_obj = field_value.value_array[0]
                    if KEY_OF_VALUE_OBJECT in sample_obj:
                        col_names = list(sample_obj[KEY_OF_VALUE_OBJECT].keys())
                    print("----Extracting Dynamic Table Cell Values----")
                    table_rows = []
                    for obj in field_value.value_array:
                        if KEY_OF_VALUE_OBJECT in obj:
                            value_obj = obj[KEY_OF_VALUE_OBJECT]
                            extract_value_by_col_name = lambda key: (
                                value_obj[key].get(KEY_OF_CELL_CONTENT)
                                if key in value_obj and KEY_OF_CELL_CONTENT in value_obj[key]
                                else "None"
                            )
                            row_data = list(map(extract_value_by_col_name, col_names))
                            table_rows.append(row_data)
                    _print_table(col_names, table_rows)

                elif (
                    field_value.type == SYMBOL_OF_OBJECT_TYPE
                    and KEY_OF_VALUE_OBJECT in field_value
                    and field_value[KEY_OF_VALUE_OBJECT] is not None
                ):
                    rows_by_columns = list(field_value[KEY_OF_VALUE_OBJECT].values())
                    is_fixed_table = all(
                        (
                            rows_of_column["type"] == SYMBOL_OF_OBJECT_TYPE
                            and Counter(list(rows_by_columns[0][KEY_OF_VALUE_OBJECT].keys()))
                            == Counter(list(rows_of_column[KEY_OF_VALUE_OBJECT].keys()))
                        )
                        for rows_of_column in rows_by_columns
                    )

                    # Fixed Table cell information store as object in document field.
                    if is_fixed_table:
                        print("----Extracting Fixed Table Cell Values----")
                        col_names = list(field_value[KEY_OF_VALUE_OBJECT].keys())
                        row_dict: dict = {}
                        for rows_of_column in rows_by_columns:
                            rows = rows_of_column[KEY_OF_VALUE_OBJECT]
                            for row_key in list(rows.keys()):
                                if row_key in row_dict:
                                    row_dict[row_key].append(rows[row_key].get(KEY_OF_CELL_CONTENT))
                                else:
                                    row_dict[row_key] = [
                                        row_key,
                                        rows[row_key].get(KEY_OF_CELL_CONTENT),
                                    ]

                        col_names.insert(0, "")
                        _print_table(col_names, list(row_dict.values()))

print("------------------------------------")

此外,还可以使用文档URL通过 begin_analyze_document 方法分析文档。

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest, AnalyzeResult

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]

document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/receipt/contoso-receipt.png"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-receipt", AnalyzeDocumentRequest(url_source=url)
)
receipts: AnalyzeResult = poller.result()

管理您的模型

管理附加到您账户的自定义模型。

# Let's build a model to use for this sample
import uuid
from azure.ai.documentintelligence import DocumentIntelligenceAdministrationClient
from azure.ai.documentintelligence.models import (
    DocumentBuildMode,
    BuildDocumentModelRequest,
    AzureBlobContentSource,
    DocumentModelDetails,
)
from azure.core.credentials import AzureKeyCredential

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
container_sas_url = os.environ["DOCUMENTINTELLIGENCE_STORAGE_CONTAINER_SAS_URL"]

document_intelligence_admin_client = DocumentIntelligenceAdministrationClient(endpoint, AzureKeyCredential(key))
poller = document_intelligence_admin_client.begin_build_document_model(
    BuildDocumentModelRequest(
        model_id=str(uuid.uuid4()),
        build_mode=DocumentBuildMode.TEMPLATE,
        azure_blob_source=AzureBlobContentSource(container_url=container_sas_url),
        description="my model description",
    )
)
model: DocumentModelDetails = poller.result()

print(f"Model ID: {model.model_id}")
print(f"Description: {model.description}")
print(f"Model created on: {model.created_date_time}")
print(f"Model expires on: {model.expiration_date_time}")
if model.doc_types:
    print("Doc types the model can recognize:")
    for name, doc_type in model.doc_types.items():
        print(f"Doc Type: '{name}' built with '{doc_type.build_mode}' mode which has the following fields:")
        if doc_type.field_schema:
            for field_name, field in doc_type.field_schema.items():
                if doc_type.field_confidence:
                    print(
                        f"Field: '{field_name}' has type '{field['type']}' and confidence score "
                        f"{doc_type.field_confidence[field_name]}"
                    )
account_details = document_intelligence_admin_client.get_resource_info()
print(
    f"Our resource has {account_details.custom_document_models.count} custom models, "
    f"and we can have at most {account_details.custom_document_models.limit} custom models"
)
# Next, we get a paged list of all of our custom models
models = document_intelligence_admin_client.list_models()

print("We have the following 'ready' models with IDs and descriptions:")
for model in models:
    print(f"{model.model_id} | {model.description}")
my_model = document_intelligence_admin_client.get_model(model_id=model.model_id)
print(f"\nModel ID: {my_model.model_id}")
print(f"Description: {my_model.description}")
print(f"Model created on: {my_model.created_date_time}")
print(f"Model expires on: {my_model.expiration_date_time}")
if my_model.warnings:
    print("Warnings encountered while building the model:")
    for warning in my_model.warnings:
        print(f"warning code: {warning.code}, message: {warning.message}, target of the error: {warning.target}")
# Finally, we will delete this model by ID
document_intelligence_admin_client.delete_model(model_id=my_model.model_id)

from azure.core.exceptions import ResourceNotFoundError

try:
    document_intelligence_admin_client.get_model(model_id=my_model.model_id)
except ResourceNotFoundError:
    print(f"Successfully deleted model with ID {my_model.model_id}")

附加功能

文档智能支持更复杂的分析功能。根据文档提取的场景,这些可选功能可以启用或禁用。

此SDK中提供以下附加功能:

请注意,某些附加功能可能会产生额外费用。请参阅定价:https://azure.microsoft.com/pricing/details/ai-document-intelligence/

故障排除

一般

文档智能客户端库将抛出在 Azure Core 中定义的异常。文档智能服务抛出的错误代码和消息可以在 服务文档 中找到。

日志记录

此库使用标准的 logging 库进行日志记录。

HTTP会话(URL、头信息等)的基本信息将以 INFO 级别进行记录。

可以通过 logging_enable 关键字参数在客户端或每个操作中启用详细的 DEBUG 级别日志记录,包括请求数据和响应数据以及未编辑的头信息。

有关示例,请参阅完整的 SDK 日志记录文档 此处

可选配置

可以在客户端和每个操作级别传递可选的关键字参数。azure-core 参考文档 描述了重试、日志记录、传输协议等可用的配置。

下一步

更多示例代码

请参阅 示例 README,其中包含几个代码片段,展示了在 Document Intelligence Python API 中使用的常见模式。

其他文档

有关 Azure AI 文档智能的更详细文档,请参阅 docs.microsoft.com 上的 文档智能文档

贡献

本项目欢迎贡献和建议。大多数贡献都需要您同意贡献者许可协议(CLA),声明您有权利,并且实际上确实授予我们使用您贡献的权利。有关详细信息,请访问 https://cla.microsoft.com

提交拉取请求时,CLA-bot 会自动确定您是否需要提供 CLA,并相应地装饰 PR(例如,标签、注释)。只需遵循机器人提供的说明即可。您只需在整个使用我们的 CLA 的所有存储库中做一次。

本项目已采用 Microsoft 开源行为准则。有关更多信息,请参阅行为准则常见问题解答或通过 opencode@microsoft.com 联系我们,提出任何其他问题或意见。

项目详情


下载文件

下载适用于您的平台的文件。如果您不确定要选择哪个,请了解更多关于 安装包 的信息。

源代码分发

azure_ai_documentintelligence-1.0.0b4.tar.gz (159.8 kB 查看哈希值)

上传时间 源代码

构建分发

由以下机构支持