Amazon Textract Overlay 工具
项目描述
Textract-Overlayer
amazon-textract-overlayer 提供帮助在文档上叠加边界框的功能。
安装
> python -m pip install amazon-textract-overlayer
请确保您的环境已通过配置文件、环境变量或附加的角色设置 AWS 凭据。 (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
示例
提供的主要方法是 get_bounding_boxes,它根据传入的 Textract_Type 返回边界框。主要来自 amazon-textract
命令,该命令来自 amazon-textract-helper
包。
这将返回 WORD 和 CELL 数据类型的边界框。
from textractoverlayer.t_overlay import DocumentDimensions, get_bounding_boxes
from textractcaller.t_call import Textract_Features, Textract_Types, call_textract
doc = call_textract(input_document=input_document, features=features)
# image is a PIL.Image.Image in this case
document_dimension:DocumentDimensions = DocumentDimensions(doc_width=image.size[0], doc_height=image.size[1])
overlay=[Textract_Types.WORD, Textract_Types.CELL]
bounding_box_list = get_bounding_boxes(textract_json=doc, document_dimensions=document_dimension, overlay_features=overlay)
图像边界框的实际叠加绘制位于 amazon-textract
命令中,该命令来自 amazon-textract-helper
包,外观如下
from PIL import Image, ImageDraw
image = Image.open(input_document)
rgb_im = image.convert('RGB')
draw = ImageDraw.Draw(rgb_im)
# check the impl in amazon-textract-helper for ways to associate different colors to types
for bbox in bounding_box_list:
draw.rectangle(xy=[bbox.xmin, bbox.ymin, bbox.xmax, bbox.ymax], outline=(128, 128, 0), width=2)
rgb_im.show()
要在 PDF 文档中绘制边界框,可以使用以下代码
import fitz
# for local stored files
file_path = "<<replace with the local path to your pdf file>>"
doc = fitz.open(file_path)
# for files stored in S3 the streaming object can be used
# doc = fitz.open(stream="<<replace with stream_object_variable>>", filetype="pdf")
# draw boxes
for p, page in enumerate(doc):
p += 1
for bbox in bounding_box_list:
if bbox.page_number == p:
page.draw_rect(
[bbox.xmin, bbox.ymin, bbox.xmax, bbox.ymax], color=(0, 1, 0), width=2
)
# save file locally
doc.save("<<local path for output file>>")
项目详情
关闭
amazon-textract-overlayer-0.0.12.tar.gz 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | f6b7f87381d62a84aa8f159c218600f7e6742771a58e6126515b1849a105e288 |
|
MD5 | af47810e9f5d286af3dc34e654ccfca9 |
|
BLAKE2b-256 | be41cdfc5dcab9eaf3c2b3aedc7d49bfa18cecae06d0f87e2732bf39ce2f5aa7 |
关闭
amazon_textract_overlayer-0.0.12-py2.py3-none-any.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 68ac82fbee1fa8080a79cb2cba304d94e07862b856fbbaebe50fc2f23195926c |
|
MD5 | 4c23cdcda519fe9683c52969617490ef |
|
BLAKE2b-256 | 7bd665dd95f8807c7bba6f6ace217ae00c505504b09ca39d2c7559a2f4edff18 |