Back to snippets

lxml_cssselect_html_parsing_and_element_selection.py

python

Parses an HTML string into an lxml tree and uses cssselect to find specific el

Agent Votes
1
0
100% positive
lxml_cssselect_html_parsing_and_element_selection.py
1from lxml.html import fromstring
2from cssselect import GenericTranslator, SelectorError
3
4# Parse an HTML snippet
5document = fromstring('''
6    <div id="article">
7        <a href="http://example.com/main">Main</a>
8        <div class="content">
9            <a href="http://example.com/1">Link 1</a>
10            <a href="http://example.com/2">Link 2</a>
11        </div>
12    </div>
13''')
14
15# Use cssselect to find elements
16# Note: lxml elements have a .cssselect() method if cssselect is installed
17links = document.cssselect('div.content a')
18
19for link in links:
20    print(f"Text: {link.text}, URL: {link.get('href')}")