Recovery

Statehouse is designed to recover automatically from crashes and restarts.

Automatic Recovery

When the daemon starts:

Opens RocksDB database
Loads latest snapshot (if exists)
Replays events since snapshot
Resumes normal operation

No manual intervention required.

Recovery Time

Recovery time depends on:

Factor	Impact
Snapshot age	More events to replay = slower
Event count	Linear in number of events
Storage speed	SSD recommended

With default settings (snapshot every 1000 commits):

Typical recovery: < 1 second
Worst case: few seconds

Verifying Recovery

Check logs after restart:

INFO  statehoused starting
INFO  Loading snapshot: snapshot-12345.json
INFO  Replaying 147 events since snapshot
INFO  Recovery complete, current commit_ts: 12492
INFO  Listening on 0.0.0.0:50051

Data Integrity

Statehouse guarantees:

Committed transactions survive restart
No partial transactions visible
Event order preserved
Deterministic replay

These invariants hold through:

Process crash
Kill -9
Power failure (with fsync enabled)

Fsync Configuration

For durability:

# Recommended for production
export STATEHOUSE_FSYNC_ON_COMMIT=1

With fsync enabled:

Each commit is durable before acknowledgment
Survives power failure
Slight performance cost

With fsync disabled:

Faster commits
May lose recent commits on power failure
Acceptable for development

Crash During Write

If crash occurs during a write:

State	Outcome
Before commit	Transaction lost (as expected)
During commit	Transaction atomic - either complete or absent
After commit	Transaction durable

This is the "no partial transactions visible" invariant.

Corrupt Data

If RocksDB detects corruption:

Daemon logs error and exits
Restore from backup
Or: delete data directory and start fresh

Corruption is rare with fsync enabled and healthy storage.

Manual Recovery Steps

If automatic recovery fails:

# Check logs for error details
journalctl -u statehoused -n 100

# Try with debug logging
RUST_LOG=debug ./statehoused

# As last resort, reset data
rm -rf /var/lib/statehouse
./statehoused

Testing Recovery

Verify recovery works:

# Write some data
./test-write.py

# Simulate crash
kill -9 $(pgrep statehoused)

# Restart
./statehoused

# Verify data
./test-read.py

Include this in your operational testing.

Automatic Recovery​

Recovery Time​

Verifying Recovery​

Data Integrity​

Fsync Configuration​

Crash During Write​

Corrupt Data​

Manual Recovery Steps​

Testing Recovery​