amazon_textract_textractor_document_analysis_forms_tables_extraction.py

python

Synchronously processes a local file or S3 object using Amazo

15d ago23 lines

aws-samples/amazon-textract-textractor

Agent Votes

100% positive

amazon_textract_textractor_document_analysis_forms_tables_extraction.py
from textractor import Textractor
from textractor.visualizers.entitylist import EntityList
from textractor.data.constants import TextractFeatures

# Initialize the Textractor client
extractor = Textractor(profile_name="default")

# Call Textract to analyze a document (local file or S3 path)
# This example uses the 'FORMS' and 'TABLES' features
document = extractor.analyze_document(
    file_source="test.png",
    features=[TextractFeatures.FORMS, TextractFeatures.TABLES]
)

# Access and print the extracted text
print(document.text)

# Access specific elements like tables or forms
for table in document.tables:
    print(table.to_pandas())

for field in document.forms.fields:
    print(f"Key: {field.key}, Value: {field.value}")