Back to snippets

seo_fundamentals_skill_docs_and_html_page_checker.py

python

A command-line SEO audit tool that scans HTML, JSX, and TSX files in a project directory to check for SEO best practices including meta tags, title tags, Open Graph tags, heading hierarchy, and image alt attributes, outputting a JSON summary of issues found.

Agent Votes
0
0
seo_fundamentals_skill_docs_and_html_page_checker.py
1# SKILL.md
2
3---
4name: seo-fundamentals
5description: >
6  Core principles of SEO including E-E-A-T, Core Web Vitals, technical foundations,
7  content quality, and how modern search engines evaluate pages. This skill explains
8  *why* SEO works, not how to execute specific optimizations.
9allowed-tools: Read, Glob, Grep
10---
11
12---
13
14# SEO Fundamentals
15
16> **Foundational principles for sustainable search visibility.**
17> This skill explains _how search engines evaluate quality_, not tactical shortcuts.
18
19---
20
21## 1. E-E-A-T (Quality Evaluation Framework)
22
23E-E-A-T is **not a direct ranking factor**.
24It is a framework used by search engines to **evaluate content quality**, especially for sensitive or high-impact topics.
25
26| Dimension             | What It Represents                 | Common Signals                                      |
27| --------------------- | ---------------------------------- | --------------------------------------------------- |
28| **Experience**        | First-hand, real-world involvement | Original examples, lived experience, demonstrations |
29| **Expertise**         | Subject-matter competence          | Credentials, depth, accuracy                        |
30| **Authoritativeness** | Recognition by others              | Mentions, citations, links                          |
31| **Trustworthiness**   | Reliability and safety             | HTTPS, transparency, accuracy                       |
32
33> Pages competing in the same space are often differentiated by **trust and experience**, not keywords.
34
35---
36
37## 2. Core Web Vitals (Page Experience Signals)
38
39Core Web Vitals measure **how users experience a page**, not whether it deserves to rank.
40
41| Metric  | Target  | What It Reflects    |
42| ------- | ------- | ------------------- |
43| **LCP** | < 2.5s  | Loading performance |
44| **INP** | < 200ms | Interactivity       |
45| **CLS** | < 0.1   | Visual stability    |
46
47**Important context:**
48
49- CWV rarely override poor content
50- They matter most when content quality is comparable
51- Failing CWV can _hold back_ otherwise good pages
52
53---
54
55## 3. Technical SEO Principles
56
57Technical SEO ensures pages are **accessible, understandable, and stable**.
58
59### Crawl & Index Control
60
61| Element           | Purpose                |
62| ----------------- | ---------------------- |
63| XML sitemaps      | Help discovery         |
64| robots.txt        | Control crawl access   |
65| Canonical tags    | Consolidate duplicates |
66| HTTP status codes | Communicate page state |
67| HTTPS             | Security and trust     |
68
69### Performance & Accessibility
70
71| Factor                 | Why It Matters                |
72| ---------------------- | ----------------------------- |
73| Page speed             | User satisfaction             |
74| Mobile-friendly design | Mobile-first indexing         |
75| Clean URLs             | Crawl clarity                 |
76| Semantic HTML          | Accessibility & understanding |
77
78---
79
80## 4. Content SEO Principles
81
82### Page-Level Elements
83
84| Element          | Principle                    |
85| ---------------- | ---------------------------- |
86| Title tag        | Clear topic + intent         |
87| Meta description | Click relevance, not ranking |
88| H1               | Page’s primary subject       |
89| Headings         | Logical structure            |
90| Alt text         | Accessibility and context    |
91
92### Content Quality Signals
93
94| Dimension   | What Search Engines Look For |
95| ----------- | ---------------------------- |
96| Depth       | Fully answers the query      |
97| Originality | Adds unique value            |
98| Accuracy    | Factually correct            |
99| Clarity     | Easy to understand           |
100| Usefulness  | Satisfies intent             |
101
102---
103
104## 5. Structured Data (Schema)
105
106Structured data helps search engines **understand meaning**, not boost rankings directly.
107
108| Type           | Purpose                |
109| -------------- | ---------------------- |
110| Article        | Content classification |
111| Organization   | Entity identity        |
112| Person         | Author information     |
113| FAQPage        | Q&A clarity            |
114| Product        | Commerce details       |
115| Review         | Ratings context        |
116| BreadcrumbList | Site structure         |
117
118> Schema enables eligibility for rich results but does not guarantee them.
119
120---
121
122## 6. AI-Assisted Content Principles
123
124Search engines evaluate **output quality**, not authorship method.
125
126### Effective Use
127
128- AI as a drafting or research assistant
129- Human review for accuracy and clarity
130- Original insights and synthesis
131- Clear accountability
132
133### Risky Use
134
135- Publishing unedited AI output
136- Factual errors or hallucinations
137- Thin or duplicated content
138- Keyword-driven text with no value
139
140---
141
142## 7. Relative Importance of SEO Factors
143
144There is **no fixed ranking factor order**.
145However, when competing pages are similar, importance tends to follow this pattern:
146
147| Relative Weight | Factor                      |
148| --------------- | --------------------------- |
149| Highest         | Content relevance & quality |
150| High            | Authority & trust signals   |
151| Medium          | Page experience (CWV, UX)   |
152| Medium          | Mobile optimization         |
153| Baseline        | Technical accessibility     |
154
155> Technical SEO enables ranking; content quality earns it.
156
157---
158
159## 8. Measurement & Evaluation
160
161SEO fundamentals should be validated using **multiple signals**, not single metrics.
162
163| Area        | What to Observe            |
164| ----------- | -------------------------- |
165| Visibility  | Indexed pages, impressions |
166| Engagement  | Click-through, dwell time  |
167| Performance | CWV field data             |
168| Coverage    | Indexing status            |
169| Authority   | Mentions and links         |
170
171---
172
173> **Key Principle:**
174> Sustainable SEO is built on _useful content_, _technical clarity_, and _trust over time_.
175> There are no permanent shortcuts.
176
177
178
179# seo_checker.py
180
181```python
182#!/usr/bin/env python3
183"""
184SEO Checker - Search Engine Optimization Audit
185Checks HTML/JSX/TSX pages for SEO best practices.
186
187PURPOSE:
188    - Verify meta tags, titles, descriptions
189    - Check Open Graph tags for social sharing
190    - Validate heading hierarchy
191    - Check image accessibility (alt attributes)
192
193WHAT IT CHECKS:
194    - HTML files (actual web pages)
195    - JSX/TSX files (React page components)
196    - Only files that are likely PUBLIC pages
197
198Usage:
199    python seo_checker.py <project_path>
200"""
201import sys
202import json
203import re
204from pathlib import Path
205from datetime import datetime
206
207# Fix Windows console encoding
208try:
209    sys.stdout.reconfigure(encoding='utf-8', errors='replace')
210except:
211    pass
212
213
214# Directories to skip
215SKIP_DIRS = {
216    'node_modules', '.next', 'dist', 'build', '.git', '.github',
217    '__pycache__', '.vscode', '.idea', 'coverage', 'test', 'tests',
218    '__tests__', 'spec', 'docs', 'documentation', 'examples'
219}
220
221# Files to skip (not pages)
222SKIP_PATTERNS = [
223    'config', 'setup', 'util', 'helper', 'hook', 'context', 'store',
224    'service', 'api', 'lib', 'constant', 'type', 'interface', 'mock',
225    '.test.', '.spec.', '_test.', '_spec.'
226]
227
228
229def is_page_file(file_path: Path) -> bool:
230    """Check if this file is likely a public-facing page."""
231    name = file_path.name.lower()
232    stem = file_path.stem.lower()
233    
234    # Skip utility/config files
235    if any(skip in name for skip in SKIP_PATTERNS):
236        return False
237    
238    # Check path - pages in specific directories are likely pages
239    parts = [p.lower() for p in file_path.parts]
240    page_dirs = ['pages', 'app', 'routes', 'views', 'screens']
241    
242    if any(d in parts for d in page_dirs):
243        return True
244    
245    # Filename indicators for pages
246    page_names = ['page', 'index', 'home', 'about', 'contact', 'blog', 
247                  'post', 'article', 'product', 'landing', 'layout']
248    
249    if any(p in stem for p in page_names):
250        return True
251    
252    # HTML files are usually pages
253    if file_path.suffix.lower() in ['.html', '.htm']:
254        return True
255    
256    return False
257
258
259def find_pages(project_path: Path) -> list:
260    """Find page files to check."""
261    patterns = ['**/*.html', '**/*.htm', '**/*.jsx', '**/*.tsx']
262    
263    files = []
264    for pattern in patterns:
265        for f in project_path.glob(pattern):
266            # Skip excluded directories
267            if any(skip in f.parts for skip in SKIP_DIRS):
268                continue
269            
270            # Check if it's likely a page
271            if is_page_file(f):
272                files.append(f)
273    
274    return files[:50]  # Limit to 50 files
275
276
277def check_page(file_path: Path) -> dict:
278    """Check a single page for SEO issues."""
279    issues = []
280    
281    try:
282        content = file_path.read_text(encoding='utf-8', errors='ignore')
283    except Exception as e:
284        return {"file": str(file_path.name), "issues": [f"Error: {e}"]}
285    
286    # Detect if this is a layout/template file (has Head component)
287    is_layout = 'Head>' in content or '<head' in content.lower()
288    
289    # 1. Title tag
290    has_title = '<title' in content.lower() or 'title=' in content or 'Head>' in content
291    if not has_title and is_layout:
292        issues.append("Missing <title> tag")
293    
294    # 2. Meta description
295    has_description = 'name="description"' in content.lower() or 'name=\'description\'' in content.lower()
296    if not has_description and is_layout:
297        issues.append("Missing meta description")
298    
299    # 3. Open Graph tags
300    has_og = 'og:' in content or 'property="og:' in content.lower()
301    if not has_og and is_layout:
302        issues.append("Missing Open Graph tags")
303    
304    # 4. Heading hierarchy - multiple H1s
305    h1_matches = re.findall(r'<h1[^>]*>', content, re.I)
306    if len(h1_matches) > 1:
307        issues.append(f"Multiple H1 tags ({len(h1_matches)})")
308    
309    # 5. Images without alt
310    img_pattern = r'<img[^>]+>'
311    imgs = re.findall(img_pattern, content, re.I)
312    for img in imgs:
313        if 'alt=' not in img.lower():
314            issues.append("Image missing alt attribute")
315            break
316        if 'alt=""' in img or "alt=''" in img:
317            issues.append("Image has empty alt attribute")
318            break
319    
320    # 6. Check for canonical link (nice to have)
321    # has_canonical = 'rel="canonical"' in content.lower()
322    
323    return {
324        "file": str(file_path.name),
325        "issues": issues
326    }
327
328
329def main():
330    project_path = Path(sys.argv[1] if len(sys.argv) > 1 else ".").resolve()
331    
332    print(f"\n{'='*60}")
333    print(f"  SEO CHECKER - Search Engine Optimization Audit")
334    print(f"{'='*60}")
335    print(f"Project: {project_path}")
336    print(f"Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
337    print("-"*60)
338    
339    # Find pages
340    pages = find_pages(project_path)
341    
342    if not pages:
343        print("\n[!] No page files found.")
344        print("    Looking for: HTML, JSX, TSX in pages/app/routes directories")
345        output = {"script": "seo_checker", "files_checked": 0, "passed": True}
346        print("\n" + json.dumps(output, indent=2))
347        sys.exit(0)
348    
349    print(f"Found {len(pages)} page files to analyze\n")
350    
351    # Check each page
352    all_issues = []
353    for f in pages:
354        result = check_page(f)
355        if result["issues"]:
356            all_issues.append(result)
357    
358    # Summary
359    print("=" * 60)
360    print("SEO ANALYSIS RESULTS")
361    print("=" * 60)
362    
363    if all_issues:
364        # Group by issue type
365        issue_counts = {}
366        for item in all_issues:
367            for issue in item["issues"]:
368                issue_counts[issue] = issue_counts.get(issue, 0) + 1
369        
370        print("\nIssue Summary:")
371        for issue, count in sorted(issue_counts.items(), key=lambda x: -x[1]):
372            print(f"  [{count}] {issue}")
373        
374        print(f"\nAffected files ({len(all_issues)}):")
375        for item in all_issues[:5]:
376            print(f"  - {item['file']}")
377        if len(all_issues) > 5:
378            print(f"  ... and {len(all_issues) - 5} more")
379    else:
380        print("\n[OK] No SEO issues found!")
381    
382    total_issues = sum(len(item["issues"]) for item in all_issues)
383    passed = total_issues == 0
384    
385    output = {
386        "script": "seo_checker",
387        "project": str(project_path),
388        "files_checked": len(pages),
389        "files_with_issues": len(all_issues),
390        "issues_found": total_issues,
391        "passed": passed
392    }
393    
394    print("\n" + json.dumps(output, indent=2))
395    
396    sys.exit(0 if passed else 1)
397
398
399if __name__ == "__main__":
400    main()
401
402```