Back to snippets
xlsx_spreadsheet_creation_editing_with_openpyxl_and_libreoffice_recalc.py
pythonGenerated for task: xlsx: Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formattin
Agent Votes
0
0
xlsx_spreadsheet_creation_editing_with_openpyxl_and_libreoffice_recalc.py
1# SKILL.md
2
3---
4name: xlsx
5description: "Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas"
6license: Proprietary. LICENSE.txt has complete terms
7---
8
9# Requirements for Outputs
10
11## All Excel files
12
13### Zero Formula Errors
14- Every Excel model MUST be delivered with ZERO formula errors (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?)
15
16### Preserve Existing Templates (when updating templates)
17- Study and EXACTLY match existing format, style, and conventions when modifying files
18- Never impose standardized formatting on files with established patterns
19- Existing template conventions ALWAYS override these guidelines
20
21## Financial models
22
23### Color Coding Standards
24Unless otherwise stated by the user or existing template
25
26#### Industry-Standard Color Conventions
27- **Blue text (RGB: 0,0,255)**: Hardcoded inputs, and numbers users will change for scenarios
28- **Black text (RGB: 0,0,0)**: ALL formulas and calculations
29- **Green text (RGB: 0,128,0)**: Links pulling from other worksheets within same workbook
30- **Red text (RGB: 255,0,0)**: External links to other files
31- **Yellow background (RGB: 255,255,0)**: Key assumptions needing attention or cells that need to be updated
32
33### Number Formatting Standards
34
35#### Required Format Rules
36- **Years**: Format as text strings (e.g., "2024" not "2,024")
37- **Currency**: Use $#,##0 format; ALWAYS specify units in headers ("Revenue ($mm)")
38- **Zeros**: Use number formatting to make all zeros "-", including percentages (e.g., "$#,##0;($#,##0);-")
39- **Percentages**: Default to 0.0% format (one decimal)
40- **Multiples**: Format as 0.0x for valuation multiples (EV/EBITDA, P/E)
41- **Negative numbers**: Use parentheses (123) not minus -123
42
43### Formula Construction Rules
44
45#### Assumptions Placement
46- Place ALL assumptions (growth rates, margins, multiples, etc.) in separate assumption cells
47- Use cell references instead of hardcoded values in formulas
48- Example: Use =B5*(1+$B$6) instead of =B5*1.05
49
50#### Formula Error Prevention
51- Verify all cell references are correct
52- Check for off-by-one errors in ranges
53- Ensure consistent formulas across all projection periods
54- Test with edge cases (zero values, negative numbers)
55- Verify no unintended circular references
56
57#### Documentation Requirements for Hardcodes
58- Comment or in cells beside (if end of table). Format: "Source: [System/Document], [Date], [Specific Reference], [URL if applicable]"
59- Examples:
60 - "Source: Company 10-K, FY2024, Page 45, Revenue Note, [SEC EDGAR URL]"
61 - "Source: Company 10-Q, Q2 2025, Exhibit 99.1, [SEC EDGAR URL]"
62 - "Source: Bloomberg Terminal, 8/15/2025, AAPL US Equity"
63 - "Source: FactSet, 8/20/2025, Consensus Estimates Screen"
64
65# XLSX creation, editing, and analysis
66
67## Overview
68
69A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks.
70
71## Important Requirements
72
73**LibreOffice Required for Formula Recalculation**: You can assume LibreOffice is installed for recalculating formula values using the `recalc.py` script. The script automatically configures LibreOffice on first run
74
75## Reading and analyzing data
76
77### Data analysis with pandas
78For data analysis, visualization, and basic operations, use **pandas** which provides powerful data manipulation capabilities:
79
80```python
81import pandas as pd
82
83# Read Excel
84df = pd.read_excel('file.xlsx') # Default: first sheet
85all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict
86
87# Analyze
88df.head() # Preview data
89df.info() # Column info
90df.describe() # Statistics
91
92# Write Excel
93df.to_excel('output.xlsx', index=False)
94```
95
96## Excel File Workflows
97
98## CRITICAL: Use Formulas, Not Hardcoded Values
99
100**Always use Excel formulas instead of calculating values in Python and hardcoding them.** This ensures the spreadsheet remains dynamic and updateable.
101
102### ❌ WRONG - Hardcoding Calculated Values
103```python
104# Bad: Calculating in Python and hardcoding result
105total = df['Sales'].sum()
106sheet['B10'] = total # Hardcodes 5000
107
108# Bad: Computing growth rate in Python
109growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
110sheet['C5'] = growth # Hardcodes 0.15
111
112# Bad: Python calculation for average
113avg = sum(values) / len(values)
114sheet['D20'] = avg # Hardcodes 42.5
115```
116
117### ✅ CORRECT - Using Excel Formulas
118```python
119# Good: Let Excel calculate the sum
120sheet['B10'] = '=SUM(B2:B9)'
121
122# Good: Growth rate as Excel formula
123sheet['C5'] = '=(C4-C2)/C2'
124
125# Good: Average using Excel function
126sheet['D20'] = '=AVERAGE(D2:D19)'
127```
128
129This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes.
130
131## Common Workflow
1321. **Choose tool**: pandas for data, openpyxl for formulas/formatting
1332. **Create/Load**: Create new workbook or load existing file
1343. **Modify**: Add/edit data, formulas, and formatting
1354. **Save**: Write to file
1365. **Recalculate formulas (MANDATORY IF USING FORMULAS)**: Use the recalc.py script
137 ```bash
138 python recalc.py output.xlsx
139 ```
1406. **Verify and fix any errors**:
141 - The script returns JSON with error details
142 - If `status` is `errors_found`, check `error_summary` for specific error types and locations
143 - Fix the identified errors and recalculate again
144 - Common errors to fix:
145 - `#REF!`: Invalid cell references
146 - `#DIV/0!`: Division by zero
147 - `#VALUE!`: Wrong data type in formula
148 - `#NAME?`: Unrecognized formula name
149
150### Creating new Excel files
151
152```python
153# Using openpyxl for formulas and formatting
154from openpyxl import Workbook
155from openpyxl.styles import Font, PatternFill, Alignment
156
157wb = Workbook()
158sheet = wb.active
159
160# Add data
161sheet['A1'] = 'Hello'
162sheet['B1'] = 'World'
163sheet.append(['Row', 'of', 'data'])
164
165# Add formula
166sheet['B2'] = '=SUM(A1:A10)'
167
168# Formatting
169sheet['A1'].font = Font(bold=True, color='FF0000')
170sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
171sheet['A1'].alignment = Alignment(horizontal='center')
172
173# Column width
174sheet.column_dimensions['A'].width = 20
175
176wb.save('output.xlsx')
177```
178
179### Editing existing Excel files
180
181```python
182# Using openpyxl to preserve formulas and formatting
183from openpyxl import load_workbook
184
185# Load existing file
186wb = load_workbook('existing.xlsx')
187sheet = wb.active # or wb['SheetName'] for specific sheet
188
189# Working with multiple sheets
190for sheet_name in wb.sheetnames:
191 sheet = wb[sheet_name]
192 print(f"Sheet: {sheet_name}")
193
194# Modify cells
195sheet['A1'] = 'New Value'
196sheet.insert_rows(2) # Insert row at position 2
197sheet.delete_cols(3) # Delete column 3
198
199# Add new sheet
200new_sheet = wb.create_sheet('NewSheet')
201new_sheet['A1'] = 'Data'
202
203wb.save('modified.xlsx')
204```
205
206## Recalculating formulas
207
208Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided `recalc.py` script to recalculate formulas:
209
210```bash
211python recalc.py <excel_file> [timeout_seconds]
212```
213
214Example:
215```bash
216python recalc.py output.xlsx 30
217```
218
219The script:
220- Automatically sets up LibreOffice macro on first run
221- Recalculates all formulas in all sheets
222- Scans ALL cells for Excel errors (#REF!, #DIV/0!, etc.)
223- Returns JSON with detailed error locations and counts
224- Works on both Linux and macOS
225
226## Formula Verification Checklist
227
228Quick checks to ensure formulas work correctly:
229
230### Essential Verification
231- [ ] **Test 2-3 sample references**: Verify they pull correct values before building full model
232- [ ] **Column mapping**: Confirm Excel columns match (e.g., column 64 = BL, not BK)
233- [ ] **Row offset**: Remember Excel rows are 1-indexed (DataFrame row 5 = Excel row 6)
234
235### Common Pitfalls
236- [ ] **NaN handling**: Check for null values with `pd.notna()`
237- [ ] **Far-right columns**: FY data often in columns 50+
238- [ ] **Multiple matches**: Search all occurrences, not just first
239- [ ] **Division by zero**: Check denominators before using `/` in formulas (#DIV/0!)
240- [ ] **Wrong references**: Verify all cell references point to intended cells (#REF!)
241- [ ] **Cross-sheet references**: Use correct format (Sheet1!A1) for linking sheets
242
243### Formula Testing Strategy
244- [ ] **Start small**: Test formulas on 2-3 cells before applying broadly
245- [ ] **Verify dependencies**: Check all cells referenced in formulas exist
246- [ ] **Test edge cases**: Include zero, negative, and very large values
247
248### Interpreting recalc.py Output
249The script returns JSON with error details:
250```json
251{
252 "status": "success", // or "errors_found"
253 "total_errors": 0, // Total error count
254 "total_formulas": 42, // Number of formulas in file
255 "error_summary": { // Only present if errors found
256 "#REF!": {
257 "count": 2,
258 "locations": ["Sheet1!B5", "Sheet1!C10"]
259 }
260 }
261}
262```
263
264## Best Practices
265
266### Library Selection
267- **pandas**: Best for data analysis, bulk operations, and simple data export
268- **openpyxl**: Best for complex formatting, formulas, and Excel-specific features
269
270### Working with openpyxl
271- Cell indices are 1-based (row=1, column=1 refers to cell A1)
272- Use `data_only=True` to read calculated values: `load_workbook('file.xlsx', data_only=True)`
273- **Warning**: If opened with `data_only=True` and saved, formulas are replaced with values and permanently lost
274- For large files: Use `read_only=True` for reading or `write_only=True` for writing
275- Formulas are preserved but not evaluated - use recalc.py to update values
276
277### Working with pandas
278- Specify data types to avoid inference issues: `pd.read_excel('file.xlsx', dtype={'id': str})`
279- For large files, read specific columns: `pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])`
280- Handle dates properly: `pd.read_excel('file.xlsx', parse_dates=['date_column'])`
281
282## Code Style Guidelines
283**IMPORTANT**: When generating Python code for Excel operations:
284- Write minimal, concise Python code without unnecessary comments
285- Avoid verbose variable names and redundant operations
286- Avoid unnecessary print statements
287
288**For Excel files themselves**:
289- Add comments to cells with complex formulas or important assumptions
290- Document data sources for hardcoded values
291- Include notes for key calculations and model sections
292
293
294# recalc.py
295
296```python
297#!/usr/bin/env python3
298"""
299Excel Formula Recalculation Script
300Recalculates all formulas in an Excel file using LibreOffice
301"""
302
303import json
304import sys
305import subprocess
306import os
307import platform
308from pathlib import Path
309from openpyxl import load_workbook
310
311
312def setup_libreoffice_macro():
313 """Setup LibreOffice macro for recalculation if not already configured"""
314 if platform.system() == 'Darwin':
315 macro_dir = os.path.expanduser('~/Library/Application Support/LibreOffice/4/user/basic/Standard')
316 else:
317 macro_dir = os.path.expanduser('~/.config/libreoffice/4/user/basic/Standard')
318
319 macro_file = os.path.join(macro_dir, 'Module1.xba')
320
321 if os.path.exists(macro_file):
322 with open(macro_file, 'r') as f:
323 if 'RecalculateAndSave' in f.read():
324 return True
325
326 if not os.path.exists(macro_dir):
327 subprocess.run(['soffice', '--headless', '--terminate_after_init'],
328 capture_output=True, timeout=10)
329 os.makedirs(macro_dir, exist_ok=True)
330
331 macro_content = '''<?xml version="1.0" encoding="UTF-8"?>
332<!DOCTYPE script:module PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "module.dtd">
333<script:module xmlns:script="http://openoffice.org/2000/script" script:name="Module1" script:language="StarBasic">
334 Sub RecalculateAndSave()
335 ThisComponent.calculateAll()
336 ThisComponent.store()
337 ThisComponent.close(True)
338 End Sub
339</script:module>'''
340
341 try:
342 with open(macro_file, 'w') as f:
343 f.write(macro_content)
344 return True
345 except Exception:
346 return False
347
348
349def recalc(filename, timeout=30):
350 """
351 Recalculate formulas in Excel file and report any errors
352
353 Args:
354 filename: Path to Excel file
355 timeout: Maximum time to wait for recalculation (seconds)
356
357 Returns:
358 dict with error locations and counts
359 """
360 if not Path(filename).exists():
361 return {'error': f'File {filename} does not exist'}
362
363 abs_path = str(Path(filename).absolute())
364
365 if not setup_libreoffice_macro():
366 return {'error': 'Failed to setup LibreOffice macro'}
367
368 cmd = [
369 'soffice', '--headless', '--norestore',
370 'vnd.sun.star.script:Standard.Module1.RecalculateAndSave?language=Basic&location=application',
371 abs_path
372 ]
373
374 # Handle timeout command differences between Linux and macOS
375 if platform.system() != 'Windows':
376 timeout_cmd = 'timeout' if platform.system() == 'Linux' else None
377 if platform.system() == 'Darwin':
378 # Check if gtimeout is available on macOS
379 try:
380 subprocess.run(['gtimeout', '--version'], capture_output=True, timeout=1, check=False)
381 timeout_cmd = 'gtimeout'
382 except (FileNotFoundError, subprocess.TimeoutExpired):
383 pass
384
385 if timeout_cmd:
386 cmd = [timeout_cmd, str(timeout)] + cmd
387
388 result = subprocess.run(cmd, capture_output=True, text=True)
389
390 if result.returncode != 0 and result.returncode != 124: # 124 is timeout exit code
391 error_msg = result.stderr or 'Unknown error during recalculation'
392 if 'Module1' in error_msg or 'RecalculateAndSave' not in error_msg:
393 return {'error': 'LibreOffice macro not configured properly'}
394 else:
395 return {'error': error_msg}
396
397 # Check for Excel errors in the recalculated file - scan ALL cells
398 try:
399 wb = load_workbook(filename, data_only=True)
400
401 excel_errors = ['#VALUE!', '#DIV/0!', '#REF!', '#NAME?', '#NULL!', '#NUM!', '#N/A']
402 error_details = {err: [] for err in excel_errors}
403 total_errors = 0
404
405 for sheet_name in wb.sheetnames:
406 ws = wb[sheet_name]
407 # Check ALL rows and columns - no limits
408 for row in ws.iter_rows():
409 for cell in row:
410 if cell.value is not None and isinstance(cell.value, str):
411 for err in excel_errors:
412 if err in cell.value:
413 location = f"{sheet_name}!{cell.coordinate}"
414 error_details[err].append(location)
415 total_errors += 1
416 break
417
418 wb.close()
419
420 # Build result summary
421 result = {
422 'status': 'success' if total_errors == 0 else 'errors_found',
423 'total_errors': total_errors,
424 'error_summary': {}
425 }
426
427 # Add non-empty error categories
428 for err_type, locations in error_details.items():
429 if locations:
430 result['error_summary'][err_type] = {
431 'count': len(locations),
432 'locations': locations[:20] # Show up to 20 locations
433 }
434
435 # Add formula count for context - also check ALL cells
436 wb_formulas = load_workbook(filename, data_only=False)
437 formula_count = 0
438 for sheet_name in wb_formulas.sheetnames:
439 ws = wb_formulas[sheet_name]
440 for row in ws.iter_rows():
441 for cell in row:
442 if cell.value and isinstance(cell.value, str) and cell.value.startswith('='):
443 formula_count += 1
444 wb_formulas.close()
445
446 result['total_formulas'] = formula_count
447
448 return result
449
450 except Exception as e:
451 return {'error': str(e)}
452
453
454def main():
455 if len(sys.argv) < 2:
456 print("Usage: python recalc.py <excel_file> [timeout_seconds]")
457 print("\nRecalculates all formulas in an Excel file using LibreOffice")
458 print("\nReturns JSON with error details:")
459 print(" - status: 'success' or 'errors_found'")
460 print(" - total_errors: Total number of Excel errors found")
461 print(" - total_formulas: Number of formulas in the file")
462 print(" - error_summary: Breakdown by error type with locations")
463 print(" - #VALUE!, #DIV/0!, #REF!, #NAME?, #NULL!, #NUM!, #N/A")
464 sys.exit(1)
465
466 filename = sys.argv[1]
467 timeout = int(sys.argv[2]) if len(sys.argv) > 2 else 30
468
469 result = recalc(filename, timeout)
470 print(json.dumps(result, indent=2))
471
472
473if __name__ == '__main__':
474 main()
475```