seo_fundamentals_skill_docs_and_html_page_checker.py

python
A command-line SEO audit tool that scans HTML, JSX, and TSX files in a project directory to check for SEO best practices including meta tags, title tags, Open Graph tags, heading hierarchy, and image alt attributes, outputting a JSON summary of issues found.
20d ago402 lines
sickn33/antigravity-awesome-skills
Agent Votes
seo_fundamentals_skill_docs_and_html_page_checker.py
# SKILL.md

---
name: seo-fundamentals
description: >
  Core principles of SEO including E-E-A-T, Core Web Vitals, technical foundations,
  content quality, and how modern search engines evaluate pages. This skill explains
  *why* SEO works, not how to execute specific optimizations.
allowed-tools: Read, Glob, Grep
---

---

# SEO Fundamentals

> **Foundational principles for sustainable search visibility.**
> This skill explains _how search engines evaluate quality_, not tactical shortcuts.

---

## 1. E-E-A-T (Quality Evaluation Framework)

E-E-A-T is **not a direct ranking factor**.
It is a framework used by search engines to **evaluate content quality**, especially for sensitive or high-impact topics.

| Dimension             | What It Represents                 | Common Signals                                      |
| --------------------- | ---------------------------------- | --------------------------------------------------- |
| **Experience**        | First-hand, real-world involvement | Original examples, lived experience, demonstrations |
| **Expertise**         | Subject-matter competence          | Credentials, depth, accuracy                        |
| **Authoritativeness** | Recognition by others              | Mentions, citations, links                          |
| **Trustworthiness**   | Reliability and safety             | HTTPS, transparency, accuracy                       |

> Pages competing in the same space are often differentiated by **trust and experience**, not keywords.

---

## 2. Core Web Vitals (Page Experience Signals)

Core Web Vitals measure **how users experience a page**, not whether it deserves to rank.

| Metric  | Target  | What It Reflects    |
| ------- | ------- | ------------------- |
| **LCP** | < 2.5s  | Loading performance |
| **INP** | < 200ms | Interactivity       |
| **CLS** | < 0.1   | Visual stability    |

**Important context:**

- CWV rarely override poor content
- They matter most when content quality is comparable
- Failing CWV can _hold back_ otherwise good pages

---

## 3. Technical SEO Principles

Technical SEO ensures pages are **accessible, understandable, and stable**.

### Crawl & Index Control

| Element           | Purpose                |
| ----------------- | ---------------------- |
| XML sitemaps      | Help discovery         |
| robots.txt        | Control crawl access   |
| Canonical tags    | Consolidate duplicates |
| HTTP status codes | Communicate page state |
| HTTPS             | Security and trust     |

### Performance & Accessibility

| Factor                 | Why It Matters                |
| ---------------------- | ----------------------------- |
| Page speed             | User satisfaction             |
| Mobile-friendly design | Mobile-first indexing         |
| Clean URLs             | Crawl clarity                 |
| Semantic HTML          | Accessibility & understanding |

---

## 4. Content SEO Principles

### Page-Level Elements

| Element          | Principle                    |
| ---------------- | ---------------------------- |
| Title tag        | Clear topic + intent         |
| Meta description | Click relevance, not ranking |
| H1               | Page’s primary subject       |
| Headings         | Logical structure            |
| Alt text         | Accessibility and context    |

### Content Quality Signals

| Dimension   | What Search Engines Look For |
| ----------- | ---------------------------- |
| Depth       | Fully answers the query      |
| Originality | Adds unique value            |
| Accuracy    | Factually correct            |
| Clarity     | Easy to understand           |
| Usefulness  | Satisfies intent             |

---

## 5. Structured Data (Schema)

Structured data helps search engines **understand meaning**, not boost rankings directly.

| Type           | Purpose                |
| -------------- | ---------------------- |
| Article        | Content classification |
| Organization   | Entity identity        |
| Person         | Author information     |
| FAQPage        | Q&A clarity            |
| Product        | Commerce details       |
| Review         | Ratings context        |
| BreadcrumbList | Site structure         |

> Schema enables eligibility for rich results but does not guarantee them.

---

## 6. AI-Assisted Content Principles

Search engines evaluate **output quality**, not authorship method.

### Effective Use

- AI as a drafting or research assistant
- Human review for accuracy and clarity
- Original insights and synthesis
- Clear accountability

### Risky Use

- Publishing unedited AI output
- Factual errors or hallucinations
- Thin or duplicated content
- Keyword-driven text with no value

---

## 7. Relative Importance of SEO Factors

There is **no fixed ranking factor order**.
However, when competing pages are similar, importance tends to follow this pattern:

| Relative Weight | Factor                      |
| --------------- | --------------------------- |
| Highest         | Content relevance & quality |
| High            | Authority & trust signals   |
| Medium          | Page experience (CWV, UX)   |
| Medium          | Mobile optimization         |
| Baseline        | Technical accessibility     |

> Technical SEO enables ranking; content quality earns it.

---

## 8. Measurement & Evaluation

SEO fundamentals should be validated using **multiple signals**, not single metrics.

| Area        | What to Observe            |
| ----------- | -------------------------- |
| Visibility  | Indexed pages, impressions |
| Engagement  | Click-through, dwell time  |
| Performance | CWV field data             |
| Coverage    | Indexing status            |
| Authority   | Mentions and links         |

---

> **Key Principle:**
> Sustainable SEO is built on _useful content_, _technical clarity_, and _trust over time_.
> There are no permanent shortcuts.



# seo_checker.py

```python
#!/usr/bin/env python3
"""
SEO Checker - Search Engine Optimization Audit
Checks HTML/JSX/TSX pages for SEO best practices.

PURPOSE:
    - Verify meta tags, titles, descriptions
    - Check Open Graph tags for social sharing
    - Validate heading hierarchy
    - Check image accessibility (alt attributes)

WHAT IT CHECKS:
    - HTML files (actual web pages)
    - JSX/TSX files (React page components)
    - Only files that are likely PUBLIC pages

Usage:
    python seo_checker.py <project_path>
"""
import sys
import json
import re
from pathlib import Path
from datetime import datetime

# Fix Windows console encoding
try:
    sys.stdout.reconfigure(encoding='utf-8', errors='replace')
except:
    pass


# Directories to skip
SKIP_DIRS = {
    'node_modules', '.next', 'dist', 'build', '.git', '.github',
    '__pycache__', '.vscode', '.idea', 'coverage', 'test', 'tests',
    '__tests__', 'spec', 'docs', 'documentation', 'examples'
}

# Files to skip (not pages)
SKIP_PATTERNS = [
    'config', 'setup', 'util', 'helper', 'hook', 'context', 'store',
    'service', 'api', 'lib', 'constant', 'type', 'interface', 'mock',
    '.test.', '.spec.', '_test.', '_spec.'
]


def is_page_file(file_path: Path) -> bool:
    """Check if this file is likely a public-facing page."""
    name = file_path.name.lower()
    stem = file_path.stem.lower()
    
    # Skip utility/config files
    if any(skip in name for skip in SKIP_PATTERNS):
        return False
    
    # Check path - pages in specific directories are likely pages
    parts = [p.lower() for p in file_path.parts]
    page_dirs = ['pages', 'app', 'routes', 'views', 'screens']
    
    if any(d in parts for d in page_dirs):
        return True
    
    # Filename indicators for pages
    page_names = ['page', 'index', 'home', 'about', 'contact', 'blog', 
                  'post', 'article', 'product', 'landing', 'layout']
    
    if any(p in stem for p in page_names):
        return True
    
    # HTML files are usually pages
    if file_path.suffix.lower() in ['.html', '.htm']:
        return True
    
    return False


def find_pages(project_path: Path) -> list:
    """Find page files to check."""
    patterns = ['**/*.html', '**/*.htm', '**/*.jsx', '**/*.tsx']
    
    files = []
    for pattern in patterns:
        for f in project_path.glob(pattern):
            # Skip excluded directories
            if any(skip in f.parts for skip in SKIP_DIRS):
                continue
            
            # Check if it's likely a page
            if is_page_file(f):
                files.append(f)
    
    return files[:50]  # Limit to 50 files


def check_page(file_path: Path) -> dict:
    """Check a single page for SEO issues."""
    issues = []
    
    try:
        content = file_path.read_text(encoding='utf-8', errors='ignore')
    except Exception as e:
        return {"file": str(file_path.name), "issues": [f"Error: {e}"]}
    
    # Detect if this is a layout/template file (has Head component)
    is_layout = 'Head>' in content or '<head' in content.lower()
    
    # 1. Title tag
    has_title = '<title' in content.lower() or 'title=' in content or 'Head>' in content
    if not has_title and is_layout:
        issues.append("Missing <title> tag")
    
    # 2. Meta description
    has_description = 'name="description"' in content.lower() or 'name=\'description\'' in content.lower()
    if not has_description and is_layout:
        issues.append("Missing meta description")
    
    # 3. Open Graph tags
    has_og = 'og:' in content or 'property="og:' in content.lower()
    if not has_og and is_layout:
        issues.append("Missing Open Graph tags")
    
    # 4. Heading hierarchy - multiple H1s
    h1_matches = re.findall(r'<h1[^>]*>', content, re.I)
    if len(h1_matches) > 1:
        issues.append(f"Multiple H1 tags ({len(h1_matches)})")
    
    # 5. Images without alt
    img_pattern = r'<img[^>]+>'
    imgs = re.findall(img_pattern, content, re.I)
    for img in imgs:
        if 'alt=' not in img.lower():
            issues.append("Image missing alt attribute")
            break
        if 'alt=""' in img or "alt=''" in img:
            issues.append("Image has empty alt attribute")
            break
    
    # 6. Check for canonical link (nice to have)
    # has_canonical = 'rel="canonical"' in content.lower()
    
    return {
        "file": str(file_path.name),
        "issues": issues
    }


def main():
    project_path = Path(sys.argv[1] if len(sys.argv) > 1 else ".").resolve()
    
    print(f"\n{'='*60}")
    print(f"  SEO CHECKER - Search Engine Optimization Audit")
    print(f"{'='*60}")
    print(f"Project: {project_path}")
    print(f"Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print("-"*60)
    
    # Find pages
    pages = find_pages(project_path)
    
    if not pages:
        print("\n[!] No page files found.")
        print("    Looking for: HTML, JSX, TSX in pages/app/routes directories")
        output = {"script": "seo_checker", "files_checked": 0, "passed": True}
        print("\n" + json.dumps(output, indent=2))
        sys.exit(0)
    
    print(f"Found {len(pages)} page files to analyze\n")
    
    # Check each page
    all_issues = []
    for f in pages:
        result = check_page(f)
        if result["issues"]:
            all_issues.append(result)
    
    # Summary
    print("=" * 60)
    print("SEO ANALYSIS RESULTS")
    print("=" * 60)
    
    if all_issues:
        # Group by issue type
        issue_counts = {}
        for item in all_issues:
            for issue in item["issues"]:
                issue_counts[issue] = issue_counts.get(issue, 0) + 1
        
        print("\nIssue Summary:")
        for issue, count in sorted(issue_counts.items(), key=lambda x: -x[1]):
            print(f"  [{count}] {issue}")
        
        print(f"\nAffected files ({len(all_issues)}):")
        for item in all_issues[:5]:
            print(f"  - {item['file']}")
        if len(all_issues) > 5:
            print(f"  ... and {len(all_issues) - 5} more")
    else:
        print("\n[OK] No SEO issues found!")
    
    total_issues = sum(len(item["issues"]) for item in all_issues)
    passed = total_issues == 0
    
    output = {
        "script": "seo_checker",
        "project": str(project_path),
        "files_checked": len(pages),
        "files_with_issues": len(all_issues),
        "issues_found": total_issues,
        "passed": passed
    }
    
    print("\n" + json.dumps(output, indent=2))
    
    sys.exit(0 if passed else 1)


if __name__ == "__main__":
    main()

```