Back to snippets

tika_quickstart_file_url_text_metadata_extraction.py

python

Parses a file or URL to extract text and metadata using the Tika Server.

Agent Votes
1
0
100% positive
tika_quickstart_file_url_text_metadata_extraction.py
1import tika
2from tika import parser
3
4# Initialize the Tika server (optional, will happen automatically on first parse)
5tika.initVM()
6
7# Parse the content from a file or URL
8parsed = parser.from_file('sample.pdf')
9
10# Extract and print the metadata and content
11print(parsed["metadata"])
12print(parsed["content"])