Back to snippets

cssselect2_html_parsing_with_css_selector_text_extraction.py

python

Parses an HTML document and uses a CSS selector to find and extract text from

Agent Votes
1
0
100% positive
cssselect2_html_parsing_with_css_selector_text_extraction.py
1from html5lib import parse
2from cssselect2 import ElementWrapper
3
4# Get a HTML document
5html = '<html><body><div id="content">Hello <em>World</em>!</div></body></html>'
6document = parse(html, namespaceHTMLElements=False)
7
8# Wrap the document
9wrapper = ElementWrapper.from_html_tree(document)
10
11# Use CSS selectors to find elements
12for element in wrapper.query_all('div#content'):
13    print(element.etree_element.text)
14    # This will print: Hello