Plan — EADDRINUSE on restart (#15)#

Summary#

Add a PM2 ecosystem config with restart_delay, max_restarts, and min_uptime to prevent EADDRINUSE failures after crash restarts. Update the deploy workflow to use the config file, and add a post-restart port-availability check script that can alert on-call if port 8998 remains unreachable.

Files#

File	Action	Description
`ecosystem.config.cjs`	create	PM2 process config with restart_delay, max_restarts, min_uptime
`.github/workflows/deploy.yml`	modify	Use `pm2 start ecosystem.config.cjs` instead of `pm2 restart viewerv2-backend`; add post-restart port monitoring step
`scripts/check-port.sh`	create	Port-availability check script — polls port 8998 after restart, exits non-zero (alertable) if unreachable after 30s

Steps#

Create ecosystem.config.cjs at repo root with restart_delay: 3000, max_restarts: 5, min_uptime: 5000, script pointing to dist/main.js, and env vars loaded from .env.
Create scripts/check-port.sh that polls localhost:8998/health for up to 30 seconds, printing status, and exits 1 if the port is still unreachable — suitable for triggering on-call alerts.
Update .github/workflows/deploy.yml "Restart service" step to use pm2 startOrRestart ecosystem.config.cjs so the config is always applied on deploy.
Add a "Post-restart port monitoring" step in deploy.yml that runs scripts/check-port.sh to verify port availability after restart.
Run npm run lint and npm run test to verify no regressions.

Verification#

kill -9 <pid> of the PM2-managed process results in a clean restart with no EADDRINUSE in pm2 logs
Process accepts requests within 10 seconds of restart
scripts/check-port.sh exits 0 on healthy restart and exits 1 when port is unreachable after 30s

Risks#

Changing from pm2 restart to pm2 startOrRestart with an ecosystem file requires the VPS to have the config file present; the deploy workflow already pulls latest master so this is handled automatically.
The 3-second delay adds ~3s to recovery time — acceptable tradeoff vs. EADDRINUSE loops.

Files#

File	Action	Description
`ecosystem.config.cjs`	create	PM2 process config with restart_delay, max_restarts, min_uptime
`.github/workflows/deploy.yml`	modify	Use `pm2 start ecosystem.config.cjs` instead of `pm2 restart viewerv2-backend`; add post-restart port monitoring step
`scripts/check-port.sh`	create	Port-availability check script — polls port 8998 after restart, exits non-zero (alertable) if unreachable after 30s

Steps#

Create ecosystem.config.cjs at repo root with restart_delay: 3000, max_restarts: 5, min_uptime: 5000, script pointing to dist/main.js, and env vars loaded from .env.

Create scripts/check-port.sh that polls localhost:8998/health for up to 30 seconds, printing status, and exits 1 if the port is still unreachable — suitable for triggering on-call alerts.

Update .github/workflows/deploy.yml "Restart service" step to use pm2 startOrRestart ecosystem.config.cjs so the config is always applied on deploy.

Add a "Post-restart port monitoring" step in deploy.yml that runs scripts/check-port.sh to verify port availability after restart.

Run npm run lint and npm run test to verify no regressions.