homelab-codex-ws/codex_context.yaml

135 lines
12 KiB
YAML

SESSION_STATE:
meta:
goal: "Maintain compressed lossless session memory in ./codex_context.yaml"
environment:
cwd: "/home/oskar/projects/homelab-codex-ws"
shell: "zsh"
date: "2026-05-03"
tz: "Europe/Warsaw"
systems:
S1:
name: "session_state"
file: "./codex_context.yaml"
format: "YAML"
root: "SESSION_STATE"
ops:
save: "overwrite after every meaningful change/decision"
load: "on startup if file exists"
export: "print file content only"
import: "load user-provided YAML"
constraints:
- "lossless"
- "compressed"
- "valid_yaml"
- "no_fluff"
- "dedupe"
- "use_ids"
- "never_delete_unless_explicit"
- "no_confirm_on_save"
S2:
name: "saturn_tailscale_llm_check"
obs:
O1: "SATURN hostname=saturn; ts_ipv4=100.121.168.72."
O2: "tailscale status: piha=100.108.208.3 active relay:waw; solaria=100.100.231.104 listed; DNS health warning."
O3: "tailscale ping piha: DERP(waw) 230/33/47ms; no direct; exit=1."
O4: "tailscale ping solaria: DERP(waw) 223/66/32ms; no direct; exit=1."
O5: "direct curl 100.100.231.104:11434/api/tags: run1 http=200 total=0.323280s connect=0.273345s size=690; run2 http=200 total=0.118377s connect=0.064582s size=690."
O6: "gateway curl 100.108.208.3:8080/api/tags: run1 exit=7 http=000 total=0.247810s; run2 exit=7 http=000 total=0.063145s."
O7: "direct response models: deepseek-coder:latest, deepcoder:14b."
configs:
CFG1:
name: "local_model_gateway"
base_url: "http://piha:8080"
preflight: "GET /"
routes:
coding: "/api/code"
general: "/api/chat"
body:
prompt: "<task>"
stream: false
constraints:
- "use_piha_only"
- "never_call_solaria_direct"
- "never_call_localhost_direct"
- "retry_once_on_failure"
- "report_endpoint_summary_errors"
output:
- "endpoint_used"
- "result_summary"
- "errors"
decisions:
D1: "No prior codex_context.yaml existed; initialized state file."
D2: "User requested commit; include current repo changes: ./codex_context.yaml, ./.gitignore, ./codex_context."
D3: "Git commit created with message: Add session context state."
D4: "User requested SATURN network verification: Tailscale active, piha/solaria reachable, test direct LLM 100.100.231.104:11434 and gateway 100.108.208.3:8080; no remote modifications."
D5: "Created ./start-codex.sh launcher to start Codex with embedded SESSION_STATE policy prompt and auto-load ./codex_context.yaml when present."
D6: "Startup 2026-04-21: loaded user-provided SESSION_STATE as authoritative memory; retained prior entries."
D7: "Gateway policy set: use http://piha:8080 only; coding->POST /api/code; general->POST /api/chat; preflight GET / before tasks; retry once on failure."
D8: "Startup 2026-04-22: loaded provided SESSION_STATE, verified disk state parity, refreshed meta.environment.date, overwrote ./codex_context.yaml."
D9: "Created ./ollama_client.py: minimal Python Ollama client using POST http://localhost:11434/api/chat, model=deepseek-coder, stream=false, ask(prompt)->message.content, with inline test call."
D10: "Updated ./ollama_client.py for reliability: urlopen timeout=10, try/except guards for HTTPError, URLError, JSONDecodeError, invalid response shape, fallback Exception; errors return 'ERROR: <message>'."
D11: "Created ./deploy_agent.py: imports ask from ollama_client; generate_compose(service)->strict YAML-only prompt; propagates 'ERROR:' responses; inline test generate_compose('nginx')."
D12: "User requested git commit on 2026-04-22; commit scope includes ./codex_context.yaml, ./ollama_client.py, ./deploy_agent.py, ./start-codex.sh."
D13: "Git commit created on 2026-04-22: 4cf42fc 'Add local Ollama automation scripts'."
D14: "Updated ./deploy_agent.py: added PyYAML validation, requires top-level services key, retries invalid output up to 2 times with corrective prompt, returns 'ERROR: invalid docker-compose' after exhaustion."
D15: "Extended ./deploy_agent.py with deploy_service(service): generates compose, writes ./deployments/<service[-n]>/docker-compose.yml without overwriting existing directories, runs 'docker compose up -d' via subprocess, returns DEPLOYED or ERROR."
D16: "Updated ./deploy_agent.py with get_service_status(path), post-deploy 'docker compose ps' verification requiring 'Up', error outputs including ps output when available, and pre-deploy 'docker ps' port-80 check that adds prompt note 'Use a different port than 80'."
D17: "User requested git commit on 2026-04-22; commit scope includes ./deploy_agent.py and ./codex_context.yaml for deployment status and safety updates."
D18: "Git commit created on 2026-04-22: 0abe9cb 'Improve deploy agent safety checks'."
D19: "Updated ./deploy_agent.py to use local LLM for one bounded deployment-failure retry: capture service/error/status, request corrected YAML only, replace docker-compose.yml, retry once, then return final error plus last status if still failing."
D20: "User requested git commit on 2026-04-22; commit scope includes ./deploy_agent.py and ./codex_context.yaml for one-shot LLM-assisted deployment failure recovery."
D21: "Git commit created on 2026-04-22: 185a866 'Add LLM-assisted deploy retry'."
D22: "Updated ./deploy_agent.py failure analysis to collect 'docker compose ps -q' container IDs, fetch per-container 'docker logs --tail=50', cap combined logs at 2000 chars, and include logs in the single-retry LLM correction prompt."
D23: "Fixed malformed duplicate function header introduced during D22 patch; deploy_agent.py function structure restored."
D24: "Updated deploy_agent.py status validation: deployment success now requires status containing 'Up' and not containing 'unhealthy' case-insensitively."
D25: "User reiterated file-only output expectation after status-validation request; no code change beyond D24."
D26: "User requested git commit on 2026-04-22; commit scope includes ./deploy_agent.py and ./codex_context.yaml for log-analysis and status-validation updates."
D27: "Git commit created on 2026-04-22: 72290cd 'Improve deploy failure analysis'."
D28: "Updated deploy_agent.py second-failure path to return 'ESCALATE_TO_CODEX' with formatted debug block containing service, error, status, and logs instead of returning plain ERROR."
D29: "User requested git commit on 2026-04-22; commit scope includes ./deploy_agent.py and ./codex_context.yaml for Codex escalation-path update."
D30: "Git commit created on 2026-04-22: 104d8dc 'Add deploy escalation output'."
D31: "Startup 2026-04-23: loaded user-provided SESSION_STATE as authoritative memory, found existing ./codex_context.yaml, refreshed meta.environment.date, overwrote state file."
D32: "Startup 2026-05-03: loaded user-provided SESSION_STATE as authoritative memory, found existing ./codex_context.yaml, refreshed meta.environment.date, overwrote state file."
D33: "Updated ./ollama_client.py to import os, define OLLAMA_URL from env defaulting to http://localhost:11434 with trailing-slash trim, and replace hardcoded /api/chat base URL with f'{OLLAMA_URL}/api/chat'."
D34: "User requested identical Aider setup on solaria, piha, vpshetzner via SSH using ~/.ssh/config; per-host flow: install uv if missing, ensure ~/.local/bin PATH in ~/.zshrc, install aider-chat with uv tool install --python 3.12, ensure OLLAMA_API_BASE export in ~/.zshrc, source ~/.zshrc, verify aider, run one-line model test; retry each failed step once; continue across hosts."
D35: "Aider install run 2026-05-03: solaria reachable via unrestricted ssh -F ~/.ssh/config; installed aider-chat with uv on remote Python 3.12, ensured ~/.zshrc contains PATH export for ~/.local/bin and OLLAMA_API_BASE=http://100.100.231.104:11434; verify: which aider=/home/oskar/.local/bin/aider, version=aider 0.86.2."
D36: "Aider host access results 2026-05-03: piha ssh auth failed for oskar@piha (Permission denied publickey,password); vpshetzner alias unresolved locally; ssh probes to configured IP-only hosts 92.43.115.112 and 92.43.115.118 timed out on port 22; requested exact aider test command on solaria exited 0 but only opened interactive session and echoed prompt without visible model reply."
D37: "User corrected remaining SSH targets on 2026-05-03: piha via pi@piha; vps via ubuntu-4gb-hel1-1. Scope narrowed: do not reinstall solaria; only install/verify Aider on remaining hosts; do not run interactive aider test; verify version only; update ~/.zshrc and/or ~/.bashrc idempotently."
D38: "Aider retry run 2026-05-03 succeeded on both corrected targets. piha via pi@piha: installed uv when missing, updated existing shell rc files idempotently for PATH and OLLAMA_API_BASE, installed aider-chat with uv tool install --python 3.12, verify=aider 0.86.2. VPS via ubuntu-4gb-hel1-1: same actions, verify=aider 0.86.2."
D39: "Shared context bootstrap update 2026-05-03: start-codex.sh now runs from repo root, prints that it is loading ./codex_context.yaml, and injects the required initial instruction 'Before doing any task, read codex_context.yaml and treat it as shared project memory.' before existing SESSION_STATE bootstrap content."
D40: "Created ./start-aider.sh and ./update-context.md on 2026-05-03. start-aider.sh runs from repo root, defaults OLLAMA_API_BASE to http://100.100.231.104:11434, uses model ollama/deepseek-coder:latest, and attaches ./codex_context.yaml via aider --read after confirming read-only support from local aider help. update-context.md documents shared context rules for Codex and Aider; scripts set executable."
D41: "Startup 2026-05-03: read existing ./codex_context.yaml before task work, verified parity with user-provided SESSION_STATE, retained state, overwrote file."
D42: "Aider is installed as a local coding assistant, but current local Ollama models are not reliable enough for context-file editing."
todos:
T1: "For all future meaningful changes/decisions, update and overwrite ./codex_context.yaml."
T2: "DONE: Commit current changes."
T3: "DONE: Tailscale active."
T4: "DONE: piha and solaria reachable via DERP(waw); direct TS path not established."
T5: "DONE: direct vs gateway /api/tags measured."
T6: "DONE: Add local launcher script for Codex session memory bootstrap."
T7: "DONE: Add minimal local Ollama Python client."
T8: "DONE: Harden local Ollama Python client error handling."
T9: "DONE: Add compose-generation agent using local LLM client."
T10: "DONE: Commit local Ollama automation scripts."
T11: "DONE: Add docker-compose YAML validation and retry logic."
T12: "DONE: Add automatic service deployment workflow."
T13: "DONE: Add deployment status verification and basic port-80 safety check."
T14: "DONE: Commit deploy agent safety/status updates."
T15: "DONE: Add one-shot LLM-assisted deployment failure recovery."
T16: "DONE: Commit LLM-assisted deploy retry changes."
T17: "DONE: Add bounded container log analysis to deploy failure recovery."
T18: "DONE: Tighten deploy status validation against unhealthy containers."
T19: "DONE: Commit deploy failure analysis and status validation updates."
T20: "DONE: Add Codex escalation output on second deployment failure."
T21: "DONE: Commit deploy escalation output changes."
T22: "DONE: Retry Aider setup on remaining hosts using corrected SSH targets pi@piha and ubuntu-4gb-hel1-1; both verified at aider 0.86.2."
T23: "DONE: Add shared Codex/Aider context bootstrap scripts and update-context protocol doc."
T24: "Use Codex for codex_context.yaml updates; use Aider only for simple code edits until a better local model/edit format is validated."
issues:
I1: "Tailscale DNS health warning: configured DNS servers unreachable."
I2: "Preferred gateway path unavailable: 100.108.208.3:8080 connection failed."
I3: "Prior direct solaria/gateway-IP checks remain historical only; current policy forbids direct solaria/localhost use."
I4: "SSH access mismatch vs user expectation: ~/.ssh/config lacks solaria/piha/vpshetzner host aliases; only raw IP host entries 92.43.115.112 and 92.43.115.118 exist."
I5: "piha unreachable for task execution with current ssh config/identity: oskar@piha returns Permission denied (publickey,password)."
I6: "vpshetzner target unresolved/unreachable: hostname vpshetzner does not resolve locally; configured IP-only hosts 92.43.115.112 and 92.43.115.118 timed out on port 22."