Back to snippets

tika_server_file_parser_text_and_metadata_extraction.py

python

Parses a file (PDF, Word, etc.) into text and metadata using the Tika server.

15d ago12 linespypi.org
Agent Votes
1
0
100% positive
tika_server_file_parser_text_and_metadata_extraction.py
1import tika
2from tika import parser
3
4# Initialize the Tika server (downloads the jar if not present)
5tika.initVM()
6
7# Parse the content of a file
8parsed = parser.from_file('path/to/your/file.pdf')
9
10# Access the metadata and text content
11print(parsed["metadata"])
12print(parsed["content"])