Apr 2, 2026
4 min read
The Ops Layer That Keeps Your OpenClaw Agents Alive
Your OpenClaw agents die overnight and you don't know why. I built an ops layer with health checks, auto-repair, and a watchdog that fixes the four most common failures.
By Cathryn Lavery
Here’s what nobody tells you about running OpenClaw agents: the agents aren’t the hard part.
The gateway goes down. An update resets a config field you didn’t know existed. Your exec approval rule looks right but some named agent entry is silently shadowing the wildcard, so your agents are stuck waiting for /approve all night. You find out because the agents have gone quiet — not because anything announced itself.
I built openclaw-ops to handle this. It’s the operations layer I install in every OpenClaw environment I run. Health checks, auto-repair scripts, a watchdog that restarts the gateway if it dies at 3am, and update triage that tells you exactly what broke after a version bump.
Here’s what it actually solves.
The Four Things That Kill Your OpenClaw Agents (Usually Overnight)
1. The Gateway Goes Down
This one’s obvious once it happens — nothing works. But the why varies:
- Port conflict blocking startup
auth: "none"was removed in v2026.1.29, so upgrading killed the gateway immediately with no useful error- Discord WebSocket disconnects that left a stuck typing indicator (v2026.2.24)
The watchdog script handles this. It pings the gateway every 5 minutes and restarts it if it’s down. If three restarts fail in 15 minutes, it fires a macOS notification and stops trying. You can’t auto-fix everything, but you can stop silent loops and know when to intervene.
2. Exec Approvals Break After Updates
This is the most common post-update breakage, and it’s subtle.
You set a wildcard rule that should allow all exec commands. An update resets tools.exec.ask and tools.exec.security to defaults. Your agents start sending /approve requests for every command. But here’s the part that gets people: named agent entries with empty allowlists silently shadow the * wildcard. So even after you fix the global rule, specific agents keep getting blocked.
Both layers have to be correct. The heal script checks both and fixes them.
3. Cron Jobs Go Silent
Cron jobs auto-disable after consecutive errors. There’s no notification. You just notice your scheduled agents stopped doing their thing, and it might be days before you catch it.
4. Session Files Bloat Past 10MB
Agents in a rapid-fire loop can push session files past 10MB. They appear to be running — 0 tokens, empty content, spinning. The heal script identifies and clears dead sessions.
What’s in the Repo
Scripts you run from your shell:
heal.sh— one-shot auto-fix for the most common gateway issues. Run it first whenever something feels wrong.check-update.sh— detects version changes and explains what config broke and why. Run this after every OpenClaw update.watchdog.sh— runs every 5 minutes, restarts gateway if down, escalates after 3 failures.watchdog-install.sh— installs the watchdog as a macOS LaunchAgent so it survives reboots.health-check.sh— declarative URL/process health checks for gateway-adjacent dependencies.security-scan.sh— config hardening and credential exposure scan with redacted findings.skill-audit.sh— static audit for third-party skills before you install them from ClawHub.
As a Claude skill:
Load /openclaw-ops and your AI does the triage: checks gateway health, auth, exec approvals, cron jobs, channels, and sessions, then explains what’s broken and fixes it.
Security Stuff Worth Knowing
Running security-scan.sh scores your config hardening 0-100 with specific fixes. It also checks for:
config.getleaking unredacted secrets viasourceConfig- Credential patterns leaked into
~/.openclaw/files or wrong file permissions - Third-party ClawHub skills with hardcoded secrets, suspicious network calls, or prompt injection
The skill-audit catches the third-party skill issues before you install them. Worth running before you pull anything from ClawHub.
On the update side: if you’re running OpenClaw below v2026.2.12, upgrade now. That version fixed CVE-2026-25253 (one-click RCE via token leakage) plus 40+ SSRF, path traversal, and prompt injection issues. The check-update.sh --fix flag handles the config migration.
Setup
Requires OpenClaw v2026.2.12 or later.
# Clone into your skills folder
git clone https://github.com/cathrynlavery/openclaw-ops.git ~/.openclaw/skills/openclaw-ops
cd ~/.openclaw/skills/openclaw-ops
# Fix whatever is currently broken
bash scripts/heal.sh
# Install the always-on watchdog (macOS)
bash scripts/watchdog-install.sh
# Check if a recent update broke your config
bash scripts/check-update.sh
For Linux, the watchdog works via cron instead of LaunchAgent:
*/5 * * * * bash /path/to/scripts/watchdog.sh >> ~/.openclaw/logs/watchdog.log 2>&1
The Watchdog Escalation Tiers
I set it up in three layers:
- Tier 1 — HTTP ping every 5 minutes via LaunchAgent
- Tier 2 — Gateway restart +
heal.shif simple restart fails - Tier 3 — macOS notification after 3 failed attempts in 15 minutes; requires manual intervention
The goal is: the system handles what it can, tells you what it can’t, and doesn’t spam you with false alarms. Most overnight failures resolve at Tier 2.
One Note on the Health Checks
If you run health-check.sh right after an OpenClaw update or gateway restart, it can fail immediately — some process targets require minimum uptime (like 300 seconds) before reporting healthy. That’s expected. Lower the threshold during smoke tests, then restore it once you’re in steady-state.
This is the ops layer I wish someone had handed me when I started running OpenClaw agents. It doesn’t make your agents smarter. It keeps them running.
GitHub: cathrynlavery/openclaw-ops
Written by
Cathryn Lavery
Cathryn built and sold BestSelf, bought it back from private equity, and still runs it. She writes Little Might so she doesn't have to keep these lessons in her head.
Related reading
-
Mar 15, 2026
How I built my wife a personal AI assistant on OpenClaw (and what actually took time)
-
Mar 9, 2026
Everything You Need Before Your AI Employee Starts
-
Mar 4, 2026
Turn a Mac Mini into Your First AI Employee in 20 Minutes
-
Feb 15, 2026
Why Your OpenClaw Agent Doesn't Remember You
-
Feb 4, 2026
How to Export Your ChatGPT History to OpenClaw