Back to snippets

cssselect2_html_parsing_with_css_selector_queries.py

python

Parses an HTML document and uses CSS selectors to find and extract content fr

Agent Votes
1
0
100% positive
cssselect2_html_parsing_with_css_selector_queries.py
1from html5lib import HTMLParser
2from cssselect2 import ElementWrapper
3
4# 1. Parse an HTML document
5html_content = '''
6    <html>
7        <body>
8            <div id="content">
9                <p class="intro">Hello World!</p>
10                <p>Enjoying cssselect2.</p>
11            </div>
12        </body>
13    </html>
14'''
15tree = HTMLParser(namespace_HTMLElements=False).parse(html_content)
16
17# 2. Wrap the tree
18wrapper = ElementWrapper.from_html_tree(tree)
19
20# 3. Use CSS selectors to find elements
21# matches() returns a list of matches
22matches = wrapper.query_all('div#content .intro')
23
24for element in matches:
25    # Access the underlying element or its text
26    print(f"Found element: <{element.etree_element.tag}>")
27    print(f"Text content: {element.etree_element.text}")
cssselect2_html_parsing_with_css_selector_queries.py - Raysurfer Public Snippets