| name | process-guardian |
| description | Manage long-running background processes (bots, scrapers, monitors, servers) with reliable detached execution, automatic health monitoring, auto-restart on failure, and proactive alerts. Use when launching any process that should survive session disconnects, when checking process health, or when a background task keeps dying silently. |
Process Guardian
The Problem
When AI agents launch background processes via exec & or nohup, those processes are tied to the parent session. When the session ends (timeout, context switch, cleanup), child processes receive SIGTERM and die silently. Nobody knows until someone manually checks.
The Rule
ALL long-running processes MUST go through this framework. No exceptions.
- ❌
python script.py &
- ❌
nohup python script.py &
- ❌
exec with background: true
- ✅
bash scripts/managed-process.sh register <name> <cmd> then start <name>
Quick Start
bash scripts/managed-process.sh register my-bot "python3 /path/to/bot.py" 480
bash scripts/managed-process.sh start my-bot
bash scripts/managed-process.sh status
Commands
| Command | Usage | Description |
|---|
register | register <name> <command> [duration_min] | Define a managed process. Duration 0 = indefinite. |
start | start <name> | Launch registered process (fully detached). |
stop | stop <name> | Graceful shutdown via SIGTERM. |
restart | restart <name> | Stop then start. |
status | status [name] | Show all processes or one specific. |
healthcheck | healthcheck | Check all processes, restart dead ones, send alerts. |
Setup
Run the install script to configure cron-based health monitoring:
bash scripts/install.sh
This adds a cron job running healthcheck every 5 minutes.
How It Works
Detached Execution
Processes launch via setsid + nohup + disown, giving them:
- Own session ID (SID) — not tied to any terminal
- PPID=1 (init) — survives parent death
- Immune to SIGHUP/session cleanup
Health Monitoring
Cron runs healthcheck every 5 minutes:
- Checks each registered process is alive (PID exists)
- If dead: checks if it completed normally (ran ≥80% of expected duration)
- If premature death: auto-restarts and writes alert flag
- Alert cooldown: 15 min between alerts for same issue (no spam)
Alert Integration
When a process dies, the healthcheck writes:
/tmp/process_monitor_alert.flag — trigger file
/tmp/process_monitor_alert.txt — alert message
Configure your agent's heartbeat to check these files and forward alerts.
Process Registry
All processes are tracked in .process-registry.json:
{
"my-bot": {
"command": "python3 /path/to/bot.py",
"duration_min": 480,
"auto_restart": true,
"max_restarts": 5,
"restart_cooldown": 30
}
}
If a process isn't in the registry, it doesn't exist.
Signal Handling Best Practice
Your scripts should log signals, not silently exit:
import signal
from datetime import datetime
def handler(signum, frame):
sig_name = signal.Signals(signum).name
print(f"⚠️ SIGNAL: {sig_name} at {datetime.now()}", flush=True)
global shutdown
shutdown = True
signal.signal(signal.SIGTERM, handler)
signal.signal(signal.SIGINT, handler)
Never call sys.exit(0) in signal handlers — it makes crashes look like clean exits, preventing watchdogs from restarting.