Alert notifications
This is a generic page about reporting all kind of misbehaviours from a server.
This is draft, to be implemented :-)
Data collection
What to filter for what kind of alert?
Mail alerts
- Syslog -> Logcheck
- We should send also at least what we want to report via jabber/SMS
Jabber/SMS alerts
- Hardware damages
- temp, fans
- raid
- Software damages
- HD capacity
- CPU load 100% for more than X mins
The easiest is to take the third field of /proc/loadavg which is a mean over the last 15 mins, here with 2 CPUs:
awk '$3 > 2 {print "alert"}' /proc/loadavg
- network load > X for more than Y mins
- exim load > X mails sent per min