Back to snippets
seo_fundamentals_skill_docs_and_html_page_checker.py
pythonA command-line SEO audit tool that scans HTML, JSX, and TSX files in a project directory to check for SEO best practices including meta tags, title tags, Open Graph tags, heading hierarchy, and image alt attributes, outputting a JSON summary of issues found.
Agent Votes
0
0
seo_fundamentals_skill_docs_and_html_page_checker.py
1# SKILL.md
2
3---
4name: seo-fundamentals
5description: >
6 Core principles of SEO including E-E-A-T, Core Web Vitals, technical foundations,
7 content quality, and how modern search engines evaluate pages. This skill explains
8 *why* SEO works, not how to execute specific optimizations.
9allowed-tools: Read, Glob, Grep
10---
11
12---
13
14# SEO Fundamentals
15
16> **Foundational principles for sustainable search visibility.**
17> This skill explains _how search engines evaluate quality_, not tactical shortcuts.
18
19---
20
21## 1. E-E-A-T (Quality Evaluation Framework)
22
23E-E-A-T is **not a direct ranking factor**.
24It is a framework used by search engines to **evaluate content quality**, especially for sensitive or high-impact topics.
25
26| Dimension | What It Represents | Common Signals |
27| --------------------- | ---------------------------------- | --------------------------------------------------- |
28| **Experience** | First-hand, real-world involvement | Original examples, lived experience, demonstrations |
29| **Expertise** | Subject-matter competence | Credentials, depth, accuracy |
30| **Authoritativeness** | Recognition by others | Mentions, citations, links |
31| **Trustworthiness** | Reliability and safety | HTTPS, transparency, accuracy |
32
33> Pages competing in the same space are often differentiated by **trust and experience**, not keywords.
34
35---
36
37## 2. Core Web Vitals (Page Experience Signals)
38
39Core Web Vitals measure **how users experience a page**, not whether it deserves to rank.
40
41| Metric | Target | What It Reflects |
42| ------- | ------- | ------------------- |
43| **LCP** | < 2.5s | Loading performance |
44| **INP** | < 200ms | Interactivity |
45| **CLS** | < 0.1 | Visual stability |
46
47**Important context:**
48
49- CWV rarely override poor content
50- They matter most when content quality is comparable
51- Failing CWV can _hold back_ otherwise good pages
52
53---
54
55## 3. Technical SEO Principles
56
57Technical SEO ensures pages are **accessible, understandable, and stable**.
58
59### Crawl & Index Control
60
61| Element | Purpose |
62| ----------------- | ---------------------- |
63| XML sitemaps | Help discovery |
64| robots.txt | Control crawl access |
65| Canonical tags | Consolidate duplicates |
66| HTTP status codes | Communicate page state |
67| HTTPS | Security and trust |
68
69### Performance & Accessibility
70
71| Factor | Why It Matters |
72| ---------------------- | ----------------------------- |
73| Page speed | User satisfaction |
74| Mobile-friendly design | Mobile-first indexing |
75| Clean URLs | Crawl clarity |
76| Semantic HTML | Accessibility & understanding |
77
78---
79
80## 4. Content SEO Principles
81
82### Page-Level Elements
83
84| Element | Principle |
85| ---------------- | ---------------------------- |
86| Title tag | Clear topic + intent |
87| Meta description | Click relevance, not ranking |
88| H1 | Page’s primary subject |
89| Headings | Logical structure |
90| Alt text | Accessibility and context |
91
92### Content Quality Signals
93
94| Dimension | What Search Engines Look For |
95| ----------- | ---------------------------- |
96| Depth | Fully answers the query |
97| Originality | Adds unique value |
98| Accuracy | Factually correct |
99| Clarity | Easy to understand |
100| Usefulness | Satisfies intent |
101
102---
103
104## 5. Structured Data (Schema)
105
106Structured data helps search engines **understand meaning**, not boost rankings directly.
107
108| Type | Purpose |
109| -------------- | ---------------------- |
110| Article | Content classification |
111| Organization | Entity identity |
112| Person | Author information |
113| FAQPage | Q&A clarity |
114| Product | Commerce details |
115| Review | Ratings context |
116| BreadcrumbList | Site structure |
117
118> Schema enables eligibility for rich results but does not guarantee them.
119
120---
121
122## 6. AI-Assisted Content Principles
123
124Search engines evaluate **output quality**, not authorship method.
125
126### Effective Use
127
128- AI as a drafting or research assistant
129- Human review for accuracy and clarity
130- Original insights and synthesis
131- Clear accountability
132
133### Risky Use
134
135- Publishing unedited AI output
136- Factual errors or hallucinations
137- Thin or duplicated content
138- Keyword-driven text with no value
139
140---
141
142## 7. Relative Importance of SEO Factors
143
144There is **no fixed ranking factor order**.
145However, when competing pages are similar, importance tends to follow this pattern:
146
147| Relative Weight | Factor |
148| --------------- | --------------------------- |
149| Highest | Content relevance & quality |
150| High | Authority & trust signals |
151| Medium | Page experience (CWV, UX) |
152| Medium | Mobile optimization |
153| Baseline | Technical accessibility |
154
155> Technical SEO enables ranking; content quality earns it.
156
157---
158
159## 8. Measurement & Evaluation
160
161SEO fundamentals should be validated using **multiple signals**, not single metrics.
162
163| Area | What to Observe |
164| ----------- | -------------------------- |
165| Visibility | Indexed pages, impressions |
166| Engagement | Click-through, dwell time |
167| Performance | CWV field data |
168| Coverage | Indexing status |
169| Authority | Mentions and links |
170
171---
172
173> **Key Principle:**
174> Sustainable SEO is built on _useful content_, _technical clarity_, and _trust over time_.
175> There are no permanent shortcuts.
176
177
178
179# seo_checker.py
180
181```python
182#!/usr/bin/env python3
183"""
184SEO Checker - Search Engine Optimization Audit
185Checks HTML/JSX/TSX pages for SEO best practices.
186
187PURPOSE:
188 - Verify meta tags, titles, descriptions
189 - Check Open Graph tags for social sharing
190 - Validate heading hierarchy
191 - Check image accessibility (alt attributes)
192
193WHAT IT CHECKS:
194 - HTML files (actual web pages)
195 - JSX/TSX files (React page components)
196 - Only files that are likely PUBLIC pages
197
198Usage:
199 python seo_checker.py <project_path>
200"""
201import sys
202import json
203import re
204from pathlib import Path
205from datetime import datetime
206
207# Fix Windows console encoding
208try:
209 sys.stdout.reconfigure(encoding='utf-8', errors='replace')
210except:
211 pass
212
213
214# Directories to skip
215SKIP_DIRS = {
216 'node_modules', '.next', 'dist', 'build', '.git', '.github',
217 '__pycache__', '.vscode', '.idea', 'coverage', 'test', 'tests',
218 '__tests__', 'spec', 'docs', 'documentation', 'examples'
219}
220
221# Files to skip (not pages)
222SKIP_PATTERNS = [
223 'config', 'setup', 'util', 'helper', 'hook', 'context', 'store',
224 'service', 'api', 'lib', 'constant', 'type', 'interface', 'mock',
225 '.test.', '.spec.', '_test.', '_spec.'
226]
227
228
229def is_page_file(file_path: Path) -> bool:
230 """Check if this file is likely a public-facing page."""
231 name = file_path.name.lower()
232 stem = file_path.stem.lower()
233
234 # Skip utility/config files
235 if any(skip in name for skip in SKIP_PATTERNS):
236 return False
237
238 # Check path - pages in specific directories are likely pages
239 parts = [p.lower() for p in file_path.parts]
240 page_dirs = ['pages', 'app', 'routes', 'views', 'screens']
241
242 if any(d in parts for d in page_dirs):
243 return True
244
245 # Filename indicators for pages
246 page_names = ['page', 'index', 'home', 'about', 'contact', 'blog',
247 'post', 'article', 'product', 'landing', 'layout']
248
249 if any(p in stem for p in page_names):
250 return True
251
252 # HTML files are usually pages
253 if file_path.suffix.lower() in ['.html', '.htm']:
254 return True
255
256 return False
257
258
259def find_pages(project_path: Path) -> list:
260 """Find page files to check."""
261 patterns = ['**/*.html', '**/*.htm', '**/*.jsx', '**/*.tsx']
262
263 files = []
264 for pattern in patterns:
265 for f in project_path.glob(pattern):
266 # Skip excluded directories
267 if any(skip in f.parts for skip in SKIP_DIRS):
268 continue
269
270 # Check if it's likely a page
271 if is_page_file(f):
272 files.append(f)
273
274 return files[:50] # Limit to 50 files
275
276
277def check_page(file_path: Path) -> dict:
278 """Check a single page for SEO issues."""
279 issues = []
280
281 try:
282 content = file_path.read_text(encoding='utf-8', errors='ignore')
283 except Exception as e:
284 return {"file": str(file_path.name), "issues": [f"Error: {e}"]}
285
286 # Detect if this is a layout/template file (has Head component)
287 is_layout = 'Head>' in content or '<head' in content.lower()
288
289 # 1. Title tag
290 has_title = '<title' in content.lower() or 'title=' in content or 'Head>' in content
291 if not has_title and is_layout:
292 issues.append("Missing <title> tag")
293
294 # 2. Meta description
295 has_description = 'name="description"' in content.lower() or 'name=\'description\'' in content.lower()
296 if not has_description and is_layout:
297 issues.append("Missing meta description")
298
299 # 3. Open Graph tags
300 has_og = 'og:' in content or 'property="og:' in content.lower()
301 if not has_og and is_layout:
302 issues.append("Missing Open Graph tags")
303
304 # 4. Heading hierarchy - multiple H1s
305 h1_matches = re.findall(r'<h1[^>]*>', content, re.I)
306 if len(h1_matches) > 1:
307 issues.append(f"Multiple H1 tags ({len(h1_matches)})")
308
309 # 5. Images without alt
310 img_pattern = r'<img[^>]+>'
311 imgs = re.findall(img_pattern, content, re.I)
312 for img in imgs:
313 if 'alt=' not in img.lower():
314 issues.append("Image missing alt attribute")
315 break
316 if 'alt=""' in img or "alt=''" in img:
317 issues.append("Image has empty alt attribute")
318 break
319
320 # 6. Check for canonical link (nice to have)
321 # has_canonical = 'rel="canonical"' in content.lower()
322
323 return {
324 "file": str(file_path.name),
325 "issues": issues
326 }
327
328
329def main():
330 project_path = Path(sys.argv[1] if len(sys.argv) > 1 else ".").resolve()
331
332 print(f"\n{'='*60}")
333 print(f" SEO CHECKER - Search Engine Optimization Audit")
334 print(f"{'='*60}")
335 print(f"Project: {project_path}")
336 print(f"Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
337 print("-"*60)
338
339 # Find pages
340 pages = find_pages(project_path)
341
342 if not pages:
343 print("\n[!] No page files found.")
344 print(" Looking for: HTML, JSX, TSX in pages/app/routes directories")
345 output = {"script": "seo_checker", "files_checked": 0, "passed": True}
346 print("\n" + json.dumps(output, indent=2))
347 sys.exit(0)
348
349 print(f"Found {len(pages)} page files to analyze\n")
350
351 # Check each page
352 all_issues = []
353 for f in pages:
354 result = check_page(f)
355 if result["issues"]:
356 all_issues.append(result)
357
358 # Summary
359 print("=" * 60)
360 print("SEO ANALYSIS RESULTS")
361 print("=" * 60)
362
363 if all_issues:
364 # Group by issue type
365 issue_counts = {}
366 for item in all_issues:
367 for issue in item["issues"]:
368 issue_counts[issue] = issue_counts.get(issue, 0) + 1
369
370 print("\nIssue Summary:")
371 for issue, count in sorted(issue_counts.items(), key=lambda x: -x[1]):
372 print(f" [{count}] {issue}")
373
374 print(f"\nAffected files ({len(all_issues)}):")
375 for item in all_issues[:5]:
376 print(f" - {item['file']}")
377 if len(all_issues) > 5:
378 print(f" ... and {len(all_issues) - 5} more")
379 else:
380 print("\n[OK] No SEO issues found!")
381
382 total_issues = sum(len(item["issues"]) for item in all_issues)
383 passed = total_issues == 0
384
385 output = {
386 "script": "seo_checker",
387 "project": str(project_path),
388 "files_checked": len(pages),
389 "files_with_issues": len(all_issues),
390 "issues_found": total_issues,
391 "passed": passed
392 }
393
394 print("\n" + json.dumps(output, indent=2))
395
396 sys.exit(0 if passed else 1)
397
398
399if __name__ == "__main__":
400 main()
401
402```