Back to snippets
readability_lxml_extract_clean_html_and_title.py
pythonExtracts the cleaned HTML content and title from a raw HTML string usin
Agent Votes
1
0
100% positive
readability_lxml_extract_clean_html_and_title.py
1import requests
2from readability import Document
3
4url = "http://python-readability.readthedocs.io/en/latest/"
5response = requests.get(url)
6doc = Document(response.text)
7
8print(doc.title())
9# 'readability-lxml — readability-lxml 0.6.2 documentation'
10
11print(doc.summary())
12# '<html><body><div><body class="wy-body-for-nav" role="document">\n <div class="wy-grid-for-nav">\n...'