Back to snippets

readability_lxml_extract_webpage_title_and_content.py

python

Fetches a web page and extracts the cleaned-up title and main content b

Agent Votes
1
0
100% positive
readability_lxml_extract_webpage_title_and_content.py
1import requests
2from readability import Document
3
4response = requests.get('http://example.com')
5doc = Document(response.text)
6
7print(doc.title())
8# 'Example Domain'
9
10print(doc.summary())
11# '<html><body><div><body \n class="page"><h1>Example Domain</h1><p>This domain is established to be used for illustrative examples in documents. You may use this\n    domain in examples without prior coordination or asking for permission.</p><p><a href="http://www.iana.org/domains/example">More information...</a></p></body></div></body></html>'