Microsoft Azure AI 文档智能客户端库,用于Python
项目描述
Azure AI 文档智能客户端库,用于Python
Azure AI 文档智能(之前称为表单识别器)是一个云服务,它使用机器学习分析您的文档中的文本和结构化数据。它包括以下主要功能
- 布局 - 从文档中提取内容结构(例如,单词、选择标记、表格)。
- 文档 - 除了常规布局外,还分析文档中的键值对。
- 读取 - 从文档中读取页面信息。
- 预构建 - 使用预构建模型从选定文档类型(例如,收据、发票、名片、ID文件、美国W-2税表等)中提取常用字段值。
- 自定义 - 使用您自己的数据构建自定义模型,从文档中提取定制字段值以及常规布局。
- 分类器 - 构建自定义分类模型,结合布局和语言功能,以准确地检测和识别您在应用程序中处理的文档。
- 附加功能 - 提取条形码/二维码、公式、字体/样式等,或使用可选参数启用大文档的高分辨率模式。
源代码 | 包(PyPI) | API 参考文档 | 产品文档 | 示例
免责声明
最新的服务API目前仅在部分Azure区域可用,可用区域可在此处找到。
入门
安装包
python -m pip install azure-ai-documentintelligence
此表显示了SDK版本与支持的API服务版本之间的关系
SDK版本 | 支持的API服务版本 |
---|---|
1.0.0b1 | 2023-10-31-preview |
1.0.0b2 | 2024-02-29-preview |
旧API版本在azure-ai-formrecognizer
中受支持,请参阅迁移指南以获取如何更新应用的详细说明。
先决条件
- 使用此包需要Python 3.8或更高版本。
- 您需要一个Azure订阅才能使用此包。
- 现有的Azure AI 文档智能实例。
创建认知服务或文档智能资源
文档智能支持多服务和单服务访问。如果您计划通过单个端点/密钥访问多个认知服务,请创建认知服务资源。仅用于文档智能访问,请创建文档智能资源。请注意,如果您打算使用Azure Active Directory身份验证,则需要单服务资源。
您可以使用以下任一方式创建资源
以下是如何使用CLI创建文档智能资源的一个示例
# Create a new resource group to hold the Document Intelligence resource
# if using an existing resource group, skip this step
az group create --name <your-resource-name> --location <location>
# Create the Document Intelligence resource
az cognitiveservices account create \
--name <your-resource-name> \
--resource-group <your-resource-group-name> \
--kind FormRecognizer \
--sku <sku> \
--location <location> \
--yes
有关创建资源或获取位置和SKU信息的更多信息,请参阅此处。
认证客户端
为了与文档智能服务交互,您需要创建客户端实例。实例化客户端对象需要端点和凭证。
获取端点
您可以使用Azure门户或Azure CLI找到您的文档智能资源的端点。
# Get the endpoint for the Document Intelligence resource
az cognitiveservices account show --name "resource-name" --resource-group "resource-group-name" --query "properties.endpoint"
可以使用区域端点或自定义子域进行身份验证。它们的格式如下
Regional endpoint: https://<region>.api.cognitive.microsoft.com/
Custom subdomain: https://<resource-name>.cognitiveservices.azure.com/
区域端点对同一区域的每个资源都是相同的。支持的完整区域端点列表可在此处查阅。请注意,区域端点不支持AAD身份验证。
另一方面,自定义子域是文档智能资源独有的名称。它们只能由单服务资源使用。
获取API密钥
API密钥可在Azure门户中找到,或通过运行以下Azure CLI命令获取
az cognitiveservices account keys list --name "<resource-name>" --resource-group "<resource-group-name>"
使用AzureKeyCredential创建客户端
要将API密钥作为credential
参数使用,请将密钥作为字符串传递给AzureKeyCredential实例。
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
endpoint = "https://<my-custom-subdomain>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")
document_intelligence_client = DocumentIntelligenceClient(endpoint, credential)
使用 Azure Active Directory 凭据创建客户端
本入门指南中的示例使用 AzureKeyCredential
进行身份验证,但您也可以使用 azure-identity 库通过 Azure Active Directory 进行身份验证。请注意,区域端点不支持 AAD 身份验证。为您的资源创建一个 自定义子域名,以便使用此类型身份验证。
要使用下面的 DefaultAzureCredential 类型或其他 Azure SDK 提供的凭据类型,请安装 azure-identity
包
pip install azure-identity
您还需要 注册新的 AAD 应用程序 并通过将 "Cognitive Services User"
角色分配给您的服务主体来授予对 Document Intelligence 的访问权限。
完成后,将 AAD 应用程序的客户端 ID、租户 ID 和客户端密钥的值设置为环境变量:AZURE_CLIENT_ID
、AZURE_TENANT_ID
、AZURE_CLIENT_SECRET
。
"""DefaultAzureCredential will use the values from these environment
variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET
"""
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.identity import DefaultAzureCredential
endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
credential = DefaultAzureCredential()
document_intelligence_client = DocumentIntelligenceClient(endpoint, credential)
关键概念
DocumentIntelligenceClient
DocumentIntelligenceClient
提供通过 begin_analyze_document
API 使用预构建和自定义模型分析输入文档的操作。使用 model_id
参数选择分析模型类型。有关支持模型的全列表,请参阅此处。DocumentIntelligenceClient
还提供通过 begin_classify_document
API 对文档进行分类的操作。自定义分类模型可以分类输入文件中的每一页,以识别其中的文档,还可以在输入文件中识别多个文档或单个文档的多个实例。
提供了示例代码片段以说明如何使用 此处 的 DocumentIntelligenceClient。有关分析文档的更多信息,包括支持的功能、区域和文档类型,请参阅 服务文档。
DocumentIntelligenceAdministrationClient
DocumentIntelligenceAdministrationClient
提供以下操作
- 通过为您指定的特定字段创建自定义模型来构建自定义模型以分析。返回一个
DocumentModelDetails
,指示模型可以分析哪些文档类型,以及每个字段的估计置信度。请参阅 服务文档 获取更详细的说明。 - 从现有模型集合中创建组合模型。
- 管理您账户中创建的模型。
- 列出操作或获取过去 24 小时内创建的特定模型操作。
- 将自定义模型从一个 Document Intelligence 资源复制到另一个。
- 构建和管理自定义分类模型以分类您应用程序中处理的文档。
请注意,模型还可以使用如图形用户界面(例如 Document Intelligence Studio)构建。
提供了示例代码片段以说明如何使用 此处 的 DocumentIntelligenceAdministrationClient。
长时间运行的操作
长时间运行的操作是包含一个初始请求以启动操作发送到服务,然后以间隔轮询服务以确定操作是否完成或失败,如果成功,则获取结果的操作。
分析文档、构建模型或复制/组合模型的方法被建模为长时间运行的操作。客户端公开了一个返回 LROPoller
或 AsyncLROPoller
的 begin_
方法。调用者应通过在从 begin_
方法返回的轮询对象上调用 result()
来等待操作完成。以下示例代码片段展示了如何使用长时间运行的操作如下。
示例
以下部分提供了几个代码片段,涵盖了文档智能的一些最常见任务,包括
提取布局
从文档中提取文本、选择标记、文本样式和表格结构,以及它们的边界区域坐标。
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeResult
def _in_span(word, spans):
for span in spans:
if word.span.offset >= span.offset and (word.span.offset + word.span.length) <= (span.offset + span.length):
return True
return False
def _format_polygon(polygon):
if not polygon:
return "N/A"
return ", ".join([f"[{polygon[i]}, {polygon[i + 1]}]" for i in range(0, len(polygon), 2)])
endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
with open(path_to_sample_documents, "rb") as f:
poller = document_intelligence_client.begin_analyze_document(
"prebuilt-layout", analyze_request=f, content_type="application/octet-stream"
)
result: AnalyzeResult = poller.result()
if result.styles and any([style.is_handwritten for style in result.styles]):
print("Document contains handwritten content")
else:
print("Document does not contain handwritten content")
for page in result.pages:
print(f"----Analyzing layout from page #{page.page_number}----")
print(f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}")
if page.lines:
for line_idx, line in enumerate(page.lines):
words = []
if page.words:
for word in page.words:
print(f"......Word '{word.content}' has a confidence of {word.confidence}")
if _in_span(word, line.spans):
words.append(word)
print(
f"...Line # {line_idx} has word count {len(words)} and text '{line.content}' "
f"within bounding polygon '{_format_polygon(line.polygon)}'"
)
if page.selection_marks:
for selection_mark in page.selection_marks:
print(
f"Selection mark is '{selection_mark.state}' within bounding polygon "
f"'{_format_polygon(selection_mark.polygon)}' and has a confidence of {selection_mark.confidence}"
)
if result.paragraphs:
print(f"----Detected #{len(result.paragraphs)} paragraphs in the document----")
# Sort all paragraphs by span's offset to read in the right order.
result.paragraphs.sort(key=lambda p: (p.spans.sort(key=lambda s: s.offset), p.spans[0].offset))
print("-----Print sorted paragraphs-----")
for paragraph in result.paragraphs:
if not paragraph.bounding_regions:
print(f"Found paragraph with role: '{paragraph.role}' within N/A bounding region")
else:
print(f"Found paragraph with role: '{paragraph.role}' within")
print(
", ".join(
f" Page #{region.page_number}: {_format_polygon(region.polygon)} bounding region"
for region in paragraph.bounding_regions
)
)
print(f"...with content: '{paragraph.content}'")
print(f"...with offset: {paragraph.spans[0].offset} and length: {paragraph.spans[0].length}")
if result.tables:
for table_idx, table in enumerate(result.tables):
print(f"Table # {table_idx} has {table.row_count} rows and " f"{table.column_count} columns")
if table.bounding_regions:
for region in table.bounding_regions:
print(
f"Table # {table_idx} location on page: {region.page_number} is {_format_polygon(region.polygon)}"
)
for cell in table.cells:
print(f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'")
if cell.bounding_regions:
for region in cell.bounding_regions:
print(
f"...content on page {region.page_number} is within bounding polygon '{_format_polygon(region.polygon)}'"
)
print("----------------------------------------")
从文档中提取图形
将文档中的图形提取为裁剪图像。
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeOutputOption, AnalyzeResult
endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
with open(path_to_sample_documents, "rb") as f:
poller = document_intelligence_client.begin_analyze_document(
"prebuilt-layout",
analyze_request=f,
output=[AnalyzeOutputOption.FIGURES],
content_type="application/octet-stream",
)
result: AnalyzeResult = poller.result()
operation_id = poller.details["operation_id"]
if result.figures:
for figure in result.figures:
if figure.id:
response = document_intelligence_client.get_analyze_result_figure(
model_id=result.model_id, result_id=operation_id, figure_id=figure.id
)
with open(f"{figure.id}.png", "wb") as writer:
writer.writelines(response)
else:
print("No figures found.")
分析PDF文档结果
将模拟PDF转换为嵌入文本的PDF。这样的文本可以在PDF中实现文本搜索,或在LLM聊天场景中使用PDF。
注意:目前,此功能仅由 prebuilt-read
支持。所有其他模型将返回错误。
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeOutputOption, AnalyzeResult
endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
with open(path_to_sample_documents, "rb") as f:
poller = document_intelligence_client.begin_analyze_document(
"prebuilt-read",
analyze_request=f,
output=[AnalyzeOutputOption.PDF],
content_type="application/octet-stream",
)
result: AnalyzeResult = poller.result()
operation_id = poller.details["operation_id"]
response = document_intelligence_client.get_analyze_result_pdf(model_id=result.model_id, result_id=operation_id)
with open("analyze_result.pdf", "wb") as writer:
writer.writelines(response)
使用通用文档模型
使用文档智能服务提供的通用文档模型从文档中分析键值对、表格、样式和选择标记。通过将 model_id="prebuilt-document"
传递给 begin_analyze_document
方法来选择通用文档模型
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import DocumentAnalysisFeature, AnalyzeResult
def _in_span(word, spans):
for span in spans:
if word.span.offset >= span.offset and (word.span.offset + word.span.length) <= (span.offset + span.length):
return True
return False
def _format_bounding_region(bounding_regions):
if not bounding_regions:
return "N/A"
return ", ".join(
f"Page #{region.page_number}: {_format_polygon(region.polygon)}" for region in bounding_regions
)
def _format_polygon(polygon):
if not polygon:
return "N/A"
return ", ".join([f"[{polygon[i]}, {polygon[i + 1]}]" for i in range(0, len(polygon), 2)])
endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
with open(path_to_sample_documents, "rb") as f:
poller = document_intelligence_client.begin_analyze_document(
"prebuilt-layout",
analyze_request=f,
features=[DocumentAnalysisFeature.KEY_VALUE_PAIRS],
content_type="application/octet-stream",
)
result: AnalyzeResult = poller.result()
if result.styles:
for style in result.styles:
if style.is_handwritten:
print("Document contains handwritten content: ")
print(",".join([result.content[span.offset : span.offset + span.length] for span in style.spans]))
print("----Key-value pairs found in document----")
if result.key_value_pairs:
for kv_pair in result.key_value_pairs:
if kv_pair.key:
print(
f"Key '{kv_pair.key.content}' found within "
f"'{_format_bounding_region(kv_pair.key.bounding_regions)}' bounding regions"
)
if kv_pair.value:
print(
f"Value '{kv_pair.value.content}' found within "
f"'{_format_bounding_region(kv_pair.value.bounding_regions)}' bounding regions\n"
)
for page in result.pages:
print(f"----Analyzing document from page #{page.page_number}----")
print(f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}")
if page.lines:
for line_idx, line in enumerate(page.lines):
words = []
if page.words:
for word in page.words:
print(f"......Word '{word.content}' has a confidence of {word.confidence}")
if _in_span(word, line.spans):
words.append(word)
print(
f"...Line #{line_idx} has {len(words)} words and text '{line.content}' within "
f"bounding polygon '{_format_polygon(line.polygon)}'"
)
if page.selection_marks:
for selection_mark in page.selection_marks:
print(
f"Selection mark is '{selection_mark.state}' within bounding polygon "
f"'{_format_polygon(selection_mark.polygon)}' and has a confidence of "
f"{selection_mark.confidence}"
)
if result.tables:
for table_idx, table in enumerate(result.tables):
print(f"Table # {table_idx} has {table.row_count} rows and {table.column_count} columns")
if table.bounding_regions:
for region in table.bounding_regions:
print(
f"Table # {table_idx} location on page: {region.page_number} is {_format_polygon(region.polygon)}"
)
for cell in table.cells:
print(f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'")
if cell.bounding_regions:
for region in cell.bounding_regions:
print(
f"...content on page {region.page_number} is within bounding polygon '{_format_polygon(region.polygon)}'\n"
)
print("----------------------------------------")
- 有关
prebuilt-document
模型提供的功能的更多信息,请参阅此处。
使用预构建模型
使用文档智能服务提供的预构建模型从以下文档类型中提取字段,例如收据、发票、名片、身份证件和U.S. W-2税务文件。
例如,要分析销售收据的字段,请使用提供的预构建收据模型,通过将 model_id="prebuilt-receipt"
传递给 begin_analyze_document
方法来分析
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeResult
def _format_price(price_dict):
return "".join([f"{p}" for p in price_dict.values()])
endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
with open(path_to_sample_documents, "rb") as f:
poller = document_intelligence_client.begin_analyze_document(
"prebuilt-receipt", analyze_request=f, locale="en-US", content_type="application/octet-stream"
)
receipts: AnalyzeResult = poller.result()
if receipts.documents:
for idx, receipt in enumerate(receipts.documents):
print(f"--------Analysis of receipt #{idx + 1}--------")
print(f"Receipt type: {receipt.doc_type if receipt.doc_type else 'N/A'}")
if receipt.fields:
merchant_name = receipt.fields.get("MerchantName")
if merchant_name:
print(
f"Merchant Name: {merchant_name.get('valueString')} has confidence: "
f"{merchant_name.confidence}"
)
transaction_date = receipt.fields.get("TransactionDate")
if transaction_date:
print(
f"Transaction Date: {transaction_date.get('valueDate')} has confidence: "
f"{transaction_date.confidence}"
)
items = receipt.fields.get("Items")
if items:
print("Receipt items:")
for idx, item in enumerate(items.get("valueArray")):
print(f"...Item #{idx + 1}")
item_description = item.get("valueObject").get("Description")
if item_description:
print(
f"......Item Description: {item_description.get('valueString')} has confidence: "
f"{item_description.confidence}"
)
item_quantity = item.get("valueObject").get("Quantity")
if item_quantity:
print(
f"......Item Quantity: {item_quantity.get('valueString')} has confidence: "
f"{item_quantity.confidence}"
)
item_total_price = item.get("valueObject").get("TotalPrice")
if item_total_price:
print(
f"......Total Item Price: {_format_price(item_total_price.get('valueCurrency'))} has confidence: "
f"{item_total_price.confidence}"
)
subtotal = receipt.fields.get("Subtotal")
if subtotal:
print(
f"Subtotal: {_format_price(subtotal.get('valueCurrency'))} has confidence: {subtotal.confidence}"
)
tax = receipt.fields.get("TotalTax")
if tax:
print(f"Total tax: {_format_price(tax.get('valueCurrency'))} has confidence: {tax.confidence}")
tip = receipt.fields.get("Tip")
if tip:
print(f"Tip: {_format_price(tip.get('valueCurrency'))} has confidence: {tip.confidence}")
total = receipt.fields.get("Total")
if total:
print(f"Total: {_format_price(total.get('valueCurrency'))} has confidence: {total.confidence}")
print("--------------------------------------")
您不仅限于收据!这里有一些预构建模型可供选择,每个模型都有自己的支持字段集。有关其他支持的预构建模型,请参阅此处。
构建自定义模型
在您的文档类型上构建自定义模型。结果模型可以用于分析训练模型时的文档类型中的值。提供容器SAS URL,您在其中存储训练文档的Azure存储Blob容器。
有关设置容器和所需文件结构的更多详细信息,请参阅服务文档。
# Let's build a model to use for this sample
import uuid
from azure.ai.documentintelligence import DocumentIntelligenceAdministrationClient
from azure.ai.documentintelligence.models import (
DocumentBuildMode,
BuildDocumentModelRequest,
AzureBlobContentSource,
DocumentModelDetails,
)
from azure.core.credentials import AzureKeyCredential
endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
container_sas_url = os.environ["DOCUMENTINTELLIGENCE_STORAGE_CONTAINER_SAS_URL"]
document_intelligence_admin_client = DocumentIntelligenceAdministrationClient(endpoint, AzureKeyCredential(key))
poller = document_intelligence_admin_client.begin_build_document_model(
BuildDocumentModelRequest(
model_id=str(uuid.uuid4()),
build_mode=DocumentBuildMode.TEMPLATE,
azure_blob_source=AzureBlobContentSource(container_url=container_sas_url),
description="my model description",
)
)
model: DocumentModelDetails = poller.result()
print(f"Model ID: {model.model_id}")
print(f"Description: {model.description}")
print(f"Model created on: {model.created_date_time}")
print(f"Model expires on: {model.expiration_date_time}")
if model.doc_types:
print("Doc types the model can recognize:")
for name, doc_type in model.doc_types.items():
print(f"Doc Type: '{name}' built with '{doc_type.build_mode}' mode which has the following fields:")
if doc_type.field_schema:
for field_name, field in doc_type.field_schema.items():
if doc_type.field_confidence:
print(
f"Field: '{field_name}' has type '{field['type']}' and confidence score "
f"{doc_type.field_confidence[field_name]}"
)
使用自定义模型分析文档
分析文档字段、表格、选择标记等。这些模型使用您自己的数据进行训练,因此它们针对您的文档进行了定制。为了获得最佳结果,您应仅分析与构建自定义模型相同的文档类型的文档。
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeResult
def _print_table(header_names, table_data):
# Print a two-dimensional array like a table.
max_len_list = []
for i in range(len(header_names)):
col_values = list(map(lambda row: len(str(row[i])), table_data))
col_values.append(len(str(header_names[i])))
max_len_list.append(max(col_values))
row_format_str = "".join(map(lambda len: f"{{:<{len + 4}}}", max_len_list))
print(row_format_str.format(*header_names))
for row in table_data:
print(row_format_str.format(*row))
endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
model_id = os.getenv("CUSTOM_BUILT_MODEL_ID", custom_model_id)
document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
# Make sure your document's type is included in the list of document types the custom model can analyze
with open(path_to_sample_documents, "rb") as f:
poller = document_intelligence_client.begin_analyze_document(
model_id=model_id, analyze_request=f, content_type="application/octet-stream"
)
result: AnalyzeResult = poller.result()
if result.documents:
for idx, document in enumerate(result.documents):
print(f"--------Analyzing document #{idx + 1}--------")
print(f"Document has type {document.doc_type}")
print(f"Document has document type confidence {document.confidence}")
print(f"Document was analyzed with model with ID {result.model_id}")
if document.fields:
for name, field in document.fields.items():
field_value = field.get("valueString") if field.get("valueString") else field.content
print(
f"......found field of type '{field.type}' with value '{field_value}' and with confidence {field.confidence}"
)
# Extract table cell values
SYMBOL_OF_TABLE_TYPE = "array"
SYMBOL_OF_OBJECT_TYPE = "object"
KEY_OF_VALUE_OBJECT = "valueObject"
KEY_OF_CELL_CONTENT = "content"
for doc in result.documents:
if not doc.fields is None:
for field_name, field_value in doc.fields.items():
# Dynamic Table cell information store as array in document field.
if field_value.type == SYMBOL_OF_TABLE_TYPE and field_value.value_array:
col_names = []
sample_obj = field_value.value_array[0]
if KEY_OF_VALUE_OBJECT in sample_obj:
col_names = list(sample_obj[KEY_OF_VALUE_OBJECT].keys())
print("----Extracting Dynamic Table Cell Values----")
table_rows = []
for obj in field_value.value_array:
if KEY_OF_VALUE_OBJECT in obj:
value_obj = obj[KEY_OF_VALUE_OBJECT]
extract_value_by_col_name = lambda key: (
value_obj[key].get(KEY_OF_CELL_CONTENT)
if key in value_obj and KEY_OF_CELL_CONTENT in value_obj[key]
else "None"
)
row_data = list(map(extract_value_by_col_name, col_names))
table_rows.append(row_data)
_print_table(col_names, table_rows)
elif (
field_value.type == SYMBOL_OF_OBJECT_TYPE
and KEY_OF_VALUE_OBJECT in field_value
and field_value[KEY_OF_VALUE_OBJECT] is not None
):
rows_by_columns = list(field_value[KEY_OF_VALUE_OBJECT].values())
is_fixed_table = all(
(
rows_of_column["type"] == SYMBOL_OF_OBJECT_TYPE
and Counter(list(rows_by_columns[0][KEY_OF_VALUE_OBJECT].keys()))
== Counter(list(rows_of_column[KEY_OF_VALUE_OBJECT].keys()))
)
for rows_of_column in rows_by_columns
)
# Fixed Table cell information store as object in document field.
if is_fixed_table:
print("----Extracting Fixed Table Cell Values----")
col_names = list(field_value[KEY_OF_VALUE_OBJECT].keys())
row_dict: dict = {}
for rows_of_column in rows_by_columns:
rows = rows_of_column[KEY_OF_VALUE_OBJECT]
for row_key in list(rows.keys()):
if row_key in row_dict:
row_dict[row_key].append(rows[row_key].get(KEY_OF_CELL_CONTENT))
else:
row_dict[row_key] = [
row_key,
rows[row_key].get(KEY_OF_CELL_CONTENT),
]
col_names.insert(0, "")
_print_table(col_names, list(row_dict.values()))
print("------------------------------------")
此外,还可以使用文档URL通过 begin_analyze_document
方法分析文档。
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest, AnalyzeResult
endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/receipt/contoso-receipt.png"
poller = document_intelligence_client.begin_analyze_document(
"prebuilt-receipt", AnalyzeDocumentRequest(url_source=url)
)
receipts: AnalyzeResult = poller.result()
管理您的模型
管理附加到您账户的自定义模型。
# Let's build a model to use for this sample
import uuid
from azure.ai.documentintelligence import DocumentIntelligenceAdministrationClient
from azure.ai.documentintelligence.models import (
DocumentBuildMode,
BuildDocumentModelRequest,
AzureBlobContentSource,
DocumentModelDetails,
)
from azure.core.credentials import AzureKeyCredential
endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
container_sas_url = os.environ["DOCUMENTINTELLIGENCE_STORAGE_CONTAINER_SAS_URL"]
document_intelligence_admin_client = DocumentIntelligenceAdministrationClient(endpoint, AzureKeyCredential(key))
poller = document_intelligence_admin_client.begin_build_document_model(
BuildDocumentModelRequest(
model_id=str(uuid.uuid4()),
build_mode=DocumentBuildMode.TEMPLATE,
azure_blob_source=AzureBlobContentSource(container_url=container_sas_url),
description="my model description",
)
)
model: DocumentModelDetails = poller.result()
print(f"Model ID: {model.model_id}")
print(f"Description: {model.description}")
print(f"Model created on: {model.created_date_time}")
print(f"Model expires on: {model.expiration_date_time}")
if model.doc_types:
print("Doc types the model can recognize:")
for name, doc_type in model.doc_types.items():
print(f"Doc Type: '{name}' built with '{doc_type.build_mode}' mode which has the following fields:")
if doc_type.field_schema:
for field_name, field in doc_type.field_schema.items():
if doc_type.field_confidence:
print(
f"Field: '{field_name}' has type '{field['type']}' and confidence score "
f"{doc_type.field_confidence[field_name]}"
)
account_details = document_intelligence_admin_client.get_resource_info()
print(
f"Our resource has {account_details.custom_document_models.count} custom models, "
f"and we can have at most {account_details.custom_document_models.limit} custom models"
)
# Next, we get a paged list of all of our custom models
models = document_intelligence_admin_client.list_models()
print("We have the following 'ready' models with IDs and descriptions:")
for model in models:
print(f"{model.model_id} | {model.description}")
my_model = document_intelligence_admin_client.get_model(model_id=model.model_id)
print(f"\nModel ID: {my_model.model_id}")
print(f"Description: {my_model.description}")
print(f"Model created on: {my_model.created_date_time}")
print(f"Model expires on: {my_model.expiration_date_time}")
if my_model.warnings:
print("Warnings encountered while building the model:")
for warning in my_model.warnings:
print(f"warning code: {warning.code}, message: {warning.message}, target of the error: {warning.target}")
# Finally, we will delete this model by ID
document_intelligence_admin_client.delete_model(model_id=my_model.model_id)
from azure.core.exceptions import ResourceNotFoundError
try:
document_intelligence_admin_client.get_model(model_id=my_model.model_id)
except ResourceNotFoundError:
print(f"Successfully deleted model with ID {my_model.model_id}")
附加功能
文档智能支持更复杂的分析功能。根据文档提取的场景,这些可选功能可以启用或禁用。
此SDK中提供以下附加功能:
请注意,某些附加功能可能会产生额外费用。请参阅定价:https://azure.microsoft.com/pricing/details/ai-document-intelligence/。
故障排除
一般
文档智能客户端库将抛出在 Azure Core 中定义的异常。文档智能服务抛出的错误代码和消息可以在 服务文档 中找到。
日志记录
此库使用标准的 logging 库进行日志记录。
HTTP会话(URL、头信息等)的基本信息将以 INFO
级别进行记录。
可以通过 logging_enable
关键字参数在客户端或每个操作中启用详细的 DEBUG
级别日志记录,包括请求数据和响应数据以及未编辑的头信息。
有关示例,请参阅完整的 SDK 日志记录文档 此处。
可选配置
可以在客户端和每个操作级别传递可选的关键字参数。azure-core 参考文档 描述了重试、日志记录、传输协议等可用的配置。
下一步
更多示例代码
请参阅 示例 README,其中包含几个代码片段,展示了在 Document Intelligence Python API 中使用的常见模式。
其他文档
有关 Azure AI 文档智能的更详细文档,请参阅 docs.microsoft.com 上的 文档智能文档。
贡献
本项目欢迎贡献和建议。大多数贡献都需要您同意贡献者许可协议(CLA),声明您有权利,并且实际上确实授予我们使用您贡献的权利。有关详细信息,请访问 https://cla.microsoft.com。
提交拉取请求时,CLA-bot 会自动确定您是否需要提供 CLA,并相应地装饰 PR(例如,标签、注释)。只需遵循机器人提供的说明即可。您只需在整个使用我们的 CLA 的所有存储库中做一次。
本项目已采用 Microsoft 开源行为准则。有关更多信息,请参阅行为准则常见问题解答或通过 opencode@microsoft.com 联系我们,提出任何其他问题或意见。
项目详情
哈希值 for azure_ai_documentintelligence-1.0.0b4.tar.gz
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 1aa36f0617b0c129fdc82b039b7084fd5b69af08e8e0cb500108b9f6efd61b36 |
|
MD5 | a4c90a06333893e95290431cb1c4c3ab |
|
BLAKE2b-256 | 183a1a8f5cb7df48eeb456bb3498bf49f236316095267be4df82ae09a562c52a |
哈希值 for azure_ai_documentintelligence-1.0.0b4-py3-none-any.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | c3a90560b4029e232dbab1334ac2f3dda4cae7c1f60dad277fe21a876dd6bb9f |
|
MD5 | 7e0f6f0fcb8de48c5542e5c93183b502 |
|
BLAKE2b-256 | b793282ce2ab36081d33d79b9c825d775ee556713af8137c7af6de1a42ccf5e5 |