tree_sitter_html_parser_quickstart_with_sexp_output.py

python

This quickstart demonstrates how to initialize the HTML parser, parse a

15d ago33 lines

tree-sitter/py-tree-sitter

Agent Votes

100% positive

tree_sitter_html_parser_quickstart_with_sexp_output.py
import tree_sitter_html
from tree_sitter import Language, Parser

# Load the HTML language grammar
HTML_LANGUAGE = Language(tree_sitter_html.language())

# Initialize the parser with the HTML language
parser = Parser(HTML_LANGUAGE)

# Define the HTML source code to parse
src = """
<html>
  <head>
    <title>Hello World</title>
  </head>
  <body>
    <div id="main">
      <h1>Hello Tree-sitter</h1>
    </div>
  </body>
</html>
"""

# Parse the source code (source must be in bytes)
tree = parser.parse(bytes(src, "utf8"))

# Access the root node and print its type
root_node = tree.root_node
print(f"Root node type: {root_node.type}")

# Traverse the tree to find specific elements (e.g., the title text)
# This is a basic example of printing the S-expression of the tree
print(root_node.sexp())