Amazon Textract Pipeline 组件,用于向页面块类型添加页面尺寸
项目描述
Textract-Pipeline-PageDimensions
提供函数,用于将页面尺寸(doc_width 和 doc_height)添加到自定义属性下的 Textract JSON 方案的 PAGE 块中,形式如下:
例如:
{'PageDimension': {'doc_width': 1549.0, 'doc_height': 370.0} }
安装
> python -m pip install amazon-textract-pipeline-pagedimensions
确保您的环境已通过配置文件、环境变量或附加角色设置了 AWS 凭据。(https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
示例
为本地文件添加页面尺寸
sample 使用 amazon-textract-caller 和 amazon-textract-pipeline-pagedimensions
python -m pip install amazon-textract-caller
from textractpagedimensions.t_pagedimensions import add_page_dimensions
from textractcaller.t_call import call_textract
from trp.trp2 import TDocument, TDocumentSchema
j = call_textract(input_document='<path to some image file>')
t_document: TDocument = TDocumentSchema().load(j)
add_page_dimensions(t_document=t_document, input_document=input_file)
print(t_document.pages[0].custom['PageDimension'])
# output will be something like this:
# {
# 'doc_width': 1544,
# 'doc_height': 1065
# }
使用 Amazon Textact Helper 命令行工具和 PageDimensions
与 Amazon Textract Helper 和 Amazon Textract Response Parser 一起,我们可以构建一个包含页面尺寸和方向信息的管道,作为 Textract JSON 中添加信息的简短演示。
> python -m pip install amazon-textract-helper amazon-textract-response-parser amazon-textract-pipeline-pagedimensions
> amazon-textract --input-document "s3://amazon-textract-public-content/blogs/2-pager-different-dimensions.pdf" | amazon-textract-pipeline-pagedimensions --input-document "s3://amazon-textract-public-content/blogs/2-pager-different-dimensions.pdf" | amazon-textract-pipeline --components add_page_orientation | jq '.Blocks[] | select(.BlockType=="PAGE") | .Custom'
{
"PageDimension": {
"doc_width": 1549,
"doc_height": 370
},
"Orientation": 0
}
{
"PageDimension": {
"doc_width": 1079,
"doc_height": 505
},
"Orientation": 0
}
项目详情
关闭
amazon-textract-pipeline-pagedimensions-0.0.9.tar.gz 的哈希
算法 | 哈希摘要 | |
---|---|---|
SHA256 | efafbaf97d11a2c25ac2a69362a0ff7d98883ff5341f9349ad5021619e4ec4f2 |
|
MD5 | 07a75fff0fce031b73ef1925492331ac |
|
BLAKE2b-256 | c4c173efaf519831daca742cf181458fc9097542f037636f7a2b3112c53fe61a |
关闭
amazon_textract_pipeline_pagedimensions-0.0.9-py2.py3-none-any.whl 的哈希
算法 | 哈希摘要 | |
---|---|---|
SHA256 | d8f4d40c0e14f24664077677af79f40c3858e2344f7f6cf38e0bb8961bdadb5e |
|
MD5 | e568009b4ef8ff2f9b602abcd53777e1 |
|
BLAKE2b-256 | 85e84e12c544ccc841ac5669d47a30f837ddaafe2477afd60baa029c9de2afbc |