Difference between revisions of "Vserver watchdogs"

From YobiWiki
Jump to navigation Jump to search
m (Reverted edits by Etegohy (Talk) to last revision by PhilippeTeuwen)
 
(21 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
===Monitoring listening TCP/UDP processes===
 
===Monitoring listening TCP/UDP processes===
 
====mynetstat====
 
====mynetstat====
  +
<source lang=bash>
<pre>
 
 
#!/bin/bash
 
#!/bin/bash
   
 
# Copyright: 2007 Philippe Teuwen <phil at teuwen dot org>
 
# Copyright: 2007 Philippe Teuwen <phil at teuwen dot org>
# License: GPL
+
# License: GPLv3 or later
 
#
 
#
 
# Description:
 
# Description:
Line 32: Line 32:
   
 
###### Adapt this one to your needs:
 
###### Adapt this one to your needs:
LOGFILE="$(dirname $0)/mynetstat.log"
+
LOGFILE="$(dirname $0)/mynetstat.snapshot"
 
####################################
 
####################################
   
Line 54: Line 54:
 
esac
 
esac
   
  +
function FILTER() {
/usr/sbin/chcontext --ctx 1 netstat --ip -lp $* |\
 
awk '\
+
/usr/bin/awk -v vs=$i '
FNR>2 {
+
/^(tcp|udp)/{
printf $1 "\t%-23s\t", $4;
+
if ($7=="")
if ($7 == "")
+
$7=$6;
{$7=$6};
+
sub(/^[0-9]+\//,"",$7);
sub(/\/.*/,"",$7);
+
printf "%-20s%-10s%-30s%s\n", vs, $1, $4, $7
  +
}'
system("/usr/sbin/vps -p " $7 " -o pid,comm|sed \"1d;s/^[ 0-9]\\+//\"")
 
  +
}
}'|sort -bk 3 | eval $CMD
 
  +
(
# To get also args of the command, change "pid,command" by "pid,comm"
 
  +
for i in $(/usr/sbin/vserver-stat |awk '/^[0-9]/{print $8}'); do
  +
/usr/sbin/vserver $i exec /bin/netstat --ip -lp $NUMERIC|FILTER
  +
done
  +
i=HOST
  +
/bin/netstat --ip -lp $NUMERIC|FILTER
  +
) |sort -b | eval $CMD
   
 
[ $? -eq 1 ] && $POSTUPDATE && exec $0 update $*
 
[ $? -eq 1 ] && $POSTUPDATE && exec $0 update $*
</pre>
+
</source>
   
 
====/etc/cron.d/watch-vservers====
 
====/etc/cron.d/watch-vservers====
 
MAILTO=root
 
MAILTO=root
 
*/15 * * * * root /home/vservers/tools/watch/mynetstat diff
 
*/15 * * * * root /home/vservers/tools/watch/mynetstat diff
===Monitoring CPU-hungry logins===
+
===Monitoring running processes===
  +
====myps====
  +
<source lang=bash>
  +
#!/bin/bash
  +
  +
# Copyright: 2007 Philippe Teuwen <phil at teuwen dot org>
  +
# License: GPLv3 or later
  +
#
  +
# Description:
  +
# Watch for modifications in the list of running processes across all vservers
  +
#
  +
# Usage:
  +
# myps
  +
# Gives a clean list of processes in which vserver
  +
# myps update
  +
# Creates a snapshot in a logfile
  +
# Be careful that transient processes as those ran by this script itself will be there!
  +
# myps update-filtered
  +
# Same a bit smarter, it filters out processes of the day and according to
  +
# extra rules written in LOGFILE_OPT
  +
# myps missing
  +
# Shows missing processes apparently not running but well in the reference snapshot
  +
# This is the one expected to be called by cron on the host
  +
# myps new
  +
# Shows processes running but not present in the reference snapshot
  +
# Be careful that transient processes as those ran by this script itself will be there!
  +
# myps new-filtered
  +
# Same a bit smarter, it filters out processes of the day and according to
  +
# extra rules written in EXTRAFILTER
  +
# Note that you could miss processes forking at abnormal rate as they would be "too young"
  +
# This is one could be interestingly called by cron too
  +
# myps missing+new-filtered
  +
# Guess what...
  +
# Changelog:
  +
# v1.0
  +
  +
###### Adapt this one to your needs:
  +
LOGFILE="$(dirname $0)/myps.snapshot"
  +
EXTRAFILTER="$(dirname $0)/myps-filter.conf"
  +
####################################
  +
  +
# Default passthrough filter:
  +
FILTER="cat"
  +
case "$1" in
  +
"update")
  +
shift
  +
CMD="cat > $LOGFILE && echo \"Done. You can clean manually the snapshot from transient processes\""
  +
;;
  +
"update-filtered")
  +
shift
  +
# Filter our processes of today:
  +
FILTER="egrep -v \"[ ]+ +[0-9]\""
  +
# Filter out lines matching those in EXTRAFILTER
  +
CMD="cat |grep -v -f $EXTRAFILTER > $LOGFILE && echo \"Done. You can clean manually the snapshot from transient processes\""
  +
;;
  +
"missing")
  +
shift
  +
CMD="diff -U0 $LOGFILE - |sed \"1,2d\" |grep \"^-\""
  +
;;
  +
"new")
  +
shift
  +
# Filter our processes of today:
  +
FILTER="egrep -v \"[ ]+ +[0-9]\""
  +
# Filter out lines matching those in EXTRAFILTER
  +
CMD="diff -U0 $LOGFILE - |sed \"1,2d\" |grep \"^+\"|grep -v -f $EXTRAFILTER"
  +
;;
  +
"missing+new-filtered")
  +
shift
  +
$0 missing $*
  +
exec $0 new-filtered $*
  +
;;
  +
*)
  +
CMD="cat"
  +
;;
  +
esac
  +
  +
# To get also args of the command, change "pid,command" by "pid,comm"
  +
/usr/sbin/vps ax -o pid,start_time,command|\
  +
sed "1d;s/^[ 0-9]\\+//"|\
  +
eval $FILTER|\
  +
cut -c-14,21-|\
  +
sort|\
  +
uniq|\
  +
eval $CMD
  +
</source>
  +
Some explanation:
  +
<br>myps-filter.conf is supposed to contain lines used by grep -v to clean the output from processes that can run but not necessarily so you don't want to hear about them
  +
====/etc/cron.d/watch-vservers====
  +
MAILTO=root
  +
57 * * * * root /home/vservers/tools/watch/myps missing+new-filtered
  +
We choose once per day among others at 23:57 and 0:57 so we avoid to capture processes of less than 1 hour but "of yesterday"
  +
  +
===Monitoring and killing CPU-hungry logins===
 
In some situations my CPU was burning like hell for days.
 
In some situations my CPU was burning like hell for days.
 
<br>This was the vcontext/login process when using the command "vserver XX enter".
 
<br>This was the vcontext/login process when using the command "vserver XX enter".
Line 88: Line 186:
   
 
So I wrote a script and installed it in a cron to kill any such dandling process:
 
So I wrote a script and installed it in a cron to kill any such dandling process:
  +
<source lang=bash>
<pre>
 
 
pids=$(/usr/sbin/vps axc|\
 
pids=$(/usr/sbin/vps axc|\
 
egrep ' ? +[RS] +[1-9][0-9]*:[0-9]+ +vcontext'|\
 
egrep ' ? +[RS] +[1-9][0-9]*:[0-9]+ +vcontext'|\
Line 95: Line 193:
 
echo "Killing dandling login" && \
 
echo "Killing dandling login" && \
 
kill $pids
 
kill $pids
</pre>
+
</source>
 
And /etc/cron.d/watch-vservers:
 
And /etc/cron.d/watch-vservers:
 
<pre>
 
<pre>
Line 109: Line 207:
 
kill 29825
 
kill 29825
 
<works!!>
 
<works!!>
It seems that on newer versions, such direct kill will not work and will have to be chcontexted first.
+
It seems that on newer versions (2.6.18?), such direct kill will not work => use vkill --xid <vserver1> <pid>.
  +
===Monitoring syslog-ng===
 
  +
===Monitoring and restarting daemons===
 
I have a big syslog-ng aggregator + dumping into mysql, cf [[Syslog]]
 
I have a big syslog-ng aggregator + dumping into mysql, cf [[Syslog]]
  +
<br>But sometimes during/after cron.daily (logrotate?), syslog-ng dies
  +
<br>I experienced similar problems with apache2 at logrotate weekly and with denyhosts
  +
<br>So I decided to monitor all my daemons and try to restart them automatically when there is a problem
  +
====mywatchdog====
  +
<source lang=bash>
  +
#!/bin/bash
  +
  +
# Copyright: Philippe Teuwen <phil at teuwen dot org>
  +
# License: GPLv3 or later
  +
#
  +
# Usage:
  +
# mywatchdog <vserver> <space-separated list of daemons>
  +
# Checks in vserver if such daemons are running otherwise it tries to launch them with /etc/init.d/<daemon> start
  +
# Special vserver MAIN refers to the host
  +
#
  +
# Note that <daemon> must be the name of the script in /etc/init.d/
  +
# and must be usable as a pattern to find the process otherwise add a specific filter below
  +
  +
VSERVER=$1
  +
shift
  +
  +
for i in $*; do
  +
FILTER=${i}
  +
# Special daemons to tweak as the process has another name
  +
# -f greps on the full command line otherwise filter applies only on the process name
  +
[ "${i}" == "backuppc" ] && FILTER=BackupPC
  +
[ "${i}" == "courier-authdaemon" ] && FILTER=authdaemond
  +
[ "${i}" == "courier-imap" ] && FILTER=couriertcpd
  +
[ "${i}" == "denyhosts" ] && FILTER="-f denyhosts"
  +
[ "${i}" == "openbsd-inetd" ] && FILTER=inetd
  +
[ "${i}" == "ozyman" ] && FILTER="-f nomce.pl"
  +
[ "${i}" == "postgresql-8.1" ] && FILTER=postmaster
  +
[ "${i}" == "spamassassin" ] && FILTER=spamd
  +
[ "${i}" == "sysklogd" ] && FILTER=syslogd
  +
[ "${i}" == "uml-utilities" ] && FILTER=uml_switch
  +
[ "${i}" == "zope2.9" ] && FILTER="-f zope2.9"
  +
if [ "$VSERVER" == "MAIN" ]; then
  +
if [ ! -e "/etc/init.d/${i}" ]; then
  +
echo "Error in using mywatchdog script, /etc/init.d/${i} does not exist in MAIN!"
  +
exit 1
  +
fi
  +
pgrep ${FILTER}\|logrotate\|apt-get >/dev/null || \
  +
{
  +
echo "Error! ${i}@${VSERVER} seems to be down! I try to restart it..." && \
  +
/etc/init.d/${i} start;
  +
}
  +
else
  +
if [ ! -e "/home/vservers/${VSERVER}/etc/init.d/${i}" ]; then
  +
echo "Error in using mywatchdog script, /etc/init.d/${i} does not exist in ${VSERVER}!"
  +
exit 1
  +
fi
  +
/usr/sbin/chcontext --ctx ${VSERVER} pgrep ${FILTER}\|logrotate\|apt-get >/dev/null || \
  +
{
  +
echo "Error! ${i}@${VSERVER} seems to be down! I try to restart it..." && \
  +
/usr/sbin/vserver ${VSERVER} exec /etc/init.d/${i} start;
  +
}
  +
fi
  +
done
  +
</source>
  +
  +
====/etc/cron.d/watch-vservers====
  +
I added to my /etc/cron.d/watch-vservers entries similar to
  +
<pre>
  +
*/15 * * * * root : "WD MAIN" ;/home/vservers/tools/watch/mywatchdog MAIN acpid atop exim4 munin-node ssh denyhosts sysklogd udev xinetd uml-utilities
  +
*/15 * * * * root : "WD MX" ;/home/vservers/tools/watch/mywatchdog mx syslog2mysql courier-authdaemon courier-imap cron exim4 fetchmail munin-node spamassassin ssh denyhosts syslog-ng xinetd
  +
*/15 * * * * root : "WD NS0" ;/home/vservers/tools/watch/mywatchdog ns0 cron dnsmasq sysklogd
  +
*/15 * * * * root : "WD PUBLIC" ;/home/vservers/tools/watch/mywatchdog public apache2 cron exim4 munin-node sysklogd
  +
*/15 * * * * root : "WD SQL" ;/home/vservers/tools/watch/mywatchdog sql cron munin-node mysql sysklogd
  +
[...]
  +
</pre>
  +
"WD VSERVER" are the first strings to appear in the mails titles, this helps me to quickly identify problems
  +
<br>I don't monitor cron on MAIN as if it's dead, well, the watchdogs won't be called :-/
  +
  +
Note that if you use logcheck, you will get security warnings because our calls contain "deny" when we want to check "denyhosts" so you could need to add sth like this in /etc/logcheck/violations.ignore.d/local-logcheck-watchdogs
  +
<pre>
  +
^\w{3} [ :0-9]{11} [._[:alnum:]-]+ /USR/SBIN/CRON\[[0-9]+\]: \(root\) CMD \(.*mywatchdog.*denyhosts.*\)$
  +
</pre>
   
But sometimes during/after cron.daily (logrotate?), syslog-ng dies, so I added to my /etc/cron.d/watch-vservers
 
*/15 * * * * root /usr/sbin/chcontext --ctx vserverX pgrep syslog-ng\|logrotate >/dev/null|| { echo "Error! Syslog-ng@vserverX seems to be down! I try to restart it..." && /usr/sbin/vserver vserverX exec /etc/init.d/syslog-ng start; }
 
 
===Monitoring imapproxy===
 
===Monitoring imapproxy===
 
Imapproxy is in one vserver, the one with apache2 and squirrelmail and IMAP server is in another vserver, the mail server.
 
Imapproxy is in one vserver, the one with apache2 and squirrelmail and IMAP server is in another vserver, the mail server.

Latest revision as of 21:35, 24 November 2010

Monitoring listening TCP/UDP processes

mynetstat

#!/bin/bash 

# Copyright: 2007 Philippe Teuwen <phil at teuwen dot org>
# License:   GPLv3 or later
#
# Description:
#   Watch for modifications in the list of listening TCP/UDP processes across all vservers
#   Examples: mysql, imapproxy,.. crashes
#             IP binding broken after package upgrade
#             new suspicious listening process, etc
#
# Usage:
#   mynetstat [update|diff|diff+update] options
#
#   mynetstat
#       Gives a clean list of listening processes in which vserver and with which binding
#   mynetstat update
#       Creates a snapshot in a logfile
#   mynetstat diff
#       Shows diffs compared to the snapshot
#       This is the one expected to be called by cron on the host
#   mynetstat diff+update
#       Same as diff but takes a new snapshot in case of changes
#       so cron doesn't send repetitively alerts
#   Options:
#   You can use numeric options of netstat: [--numeric|-n] [--numeric-hosts] [--numeric-ports]
# Changelog:
#   v1.0

###### Adapt this one to your needs:        
LOGFILE="$(dirname $0)/mynetstat.snapshot"
####################################

POSTUPDATE=false
case "$1" in
    "update")
        shift
        CMD="cat > $LOGFILE"
    ;;
    "diff")
        shift
        CMD="diff -U0 $LOGFILE -"
    ;;
    "diff+update")
        shift
        CMD="diff -U0 $LOGFILE -"
        POSTUPDATE=true
    ;;
    *)
        CMD="cat"
esac

function FILTER() {
    /usr/bin/awk -v vs=$i '
        /^(tcp|udp)/{
            if ($7=="")
                $7=$6;
            sub(/^[0-9]+\//,"",$7);
            printf "%-20s%-10s%-30s%s\n", vs, $1, $4, $7
        }'
}
(
for i in $(/usr/sbin/vserver-stat |awk '/^[0-9]/{print $8}'); do
    /usr/sbin/vserver $i exec /bin/netstat --ip -lp $NUMERIC|FILTER
done
i=HOST
/bin/netstat --ip -lp $NUMERIC|FILTER
) |sort -b | eval $CMD

[ $? -eq 1 ] && $POSTUPDATE && exec $0 update $*

/etc/cron.d/watch-vservers

MAILTO=root
*/15 * * * *    root /home/vservers/tools/watch/mynetstat diff

Monitoring running processes

myps

#!/bin/bash 

# Copyright: 2007 Philippe Teuwen <phil at teuwen dot org>
# License: GPLv3 or later
#
# Description:
#   Watch for modifications in the list of running processes across all vservers
#
# Usage:
#   myps
#       Gives a clean list of processes in which vserver
#   myps update
#       Creates a snapshot in a logfile
#       Be careful that transient processes as those ran by this script itself will be there!
#   myps update-filtered
#       Same a bit smarter, it filters out processes of the day and according to 
#       extra rules written in LOGFILE_OPT
#   myps missing
#       Shows missing processes apparently not running but well in the reference snapshot
#       This is the one expected to be called by cron on the host
#   myps new
#       Shows processes running but not present in the reference snapshot
#       Be careful that transient processes as those ran by this script itself will be there!
#   myps new-filtered
#       Same a bit smarter, it filters out processes of the day and according to 
#       extra rules written in EXTRAFILTER
#       Note that you could miss processes forking at abnormal rate as they would be "too young"
#       This is one could be interestingly called by cron too
#   myps missing+new-filtered
#       Guess what...
# Changelog:
#   v1.0

###### Adapt this one to your needs:        
LOGFILE="$(dirname $0)/myps.snapshot"
EXTRAFILTER="$(dirname $0)/myps-filter.conf"
####################################

# Default passthrough filter:
FILTER="cat"
case "$1" in
    "update")
        shift
        CMD="cat > $LOGFILE && echo \"Done. You can clean manually the snapshot from transient processes\""
    ;;
    "update-filtered")
        shift
        # Filter our processes of today:
        FILTER="egrep -v \"[ ]+ +[0-9]\""
        # Filter out lines matching those in EXTRAFILTER
        CMD="cat |grep -v -f $EXTRAFILTER > $LOGFILE && echo \"Done. You can clean manually the snapshot from transient processes\""
    ;;
    "missing")
        shift
        CMD="diff -U0 $LOGFILE - |sed \"1,2d\" |grep \"^-\""
    ;;
    "new")
        shift
        # Filter our processes of today:
        FILTER="egrep -v \"[ ]+ +[0-9]\""
        # Filter out lines matching those in EXTRAFILTER
        CMD="diff -U0 $LOGFILE - |sed \"1,2d\" |grep \"^+\"|grep -v -f $EXTRAFILTER"
    ;;
    "missing+new-filtered")
        shift
        $0 missing $*
        exec $0 new-filtered $*
    ;;
    *)
        CMD="cat"
    ;;
esac

# To get also args of the command, change "pid,command" by "pid,comm"
/usr/sbin/vps ax -o pid,start_time,command|\
    sed "1d;s/^[ 0-9]\\+//"|\
    eval $FILTER|\
    cut -c-14,21-|\
    sort|\
    uniq|\
    eval $CMD

Some explanation:
myps-filter.conf is supposed to contain lines used by grep -v to clean the output from processes that can run but not necessarily so you don't want to hear about them

/etc/cron.d/watch-vservers

MAILTO=root
57 * * * *    root /home/vservers/tools/watch/myps missing+new-filtered

We choose once per day among others at 23:57 and 0:57 so we avoid to capture processes of less than 1 hour but "of yesterday"

Monitoring and killing CPU-hungry logins

In some situations my CPU was burning like hell for days.
This was the vcontext/login process when using the command "vserver XX enter".
If it happens that the process is disconnected from its terminal it eats all the CPU :-(
I could reproduce the problem as follow:

  • ssh to the server as normal user
  • sudo su -
  • vserver XX enter
  • from another term, kill -9 one of the 2 su processes ("su -" or "-su")
  • quit the current terminal

Now from another terminal you can see that

 vps ax|grep " login"

shows a "login" process not attached to any tty (cf the "?") and using top you see it's running 100% cpu.

Actually the real process is called vcontext as seen by top, pgrep and as you can see in /proc/XXX/status but its command line was overwritten with "login\0\0\0\0\0\0\0\0\0\0\0..." and commands like vps or pgrep -f show you this name "login" as you can see in /proc/XXX/cmdline.

So I wrote a script and installed it in a cron to kill any such dandling process:

pids=$(/usr/sbin/vps axc|\
    egrep ' ? +[RS] +[1-9][0-9]*:[0-9]+ +vcontext'|\
    sed 's/^ *\([0-9]\+\) .*$/\1/')
[ "$pids" != "" ] && \
    echo "Killing dandling login" && \
    kill $pids

And /etc/cron.d/watch-vservers:

MAILTO=root
*/5 * * * *     root /home/vservers/tools/kill-vcontext

This also shows something weird: processes in vservers don't appear with ps ax, ok, they appear with vps ax.
Now if I want to kill one of them from the host:

vkill 29825
vkill: vc_ctx_kill(): No such process
chcontext --ctx 1 kill 29825
<no error but nothing happen neither because ctx1 can watch all but can only watch!>
kill 29825
<works!!>

It seems that on newer versions (2.6.18?), such direct kill will not work => use vkill --xid <vserver1> <pid>.

Monitoring and restarting daemons

I have a big syslog-ng aggregator + dumping into mysql, cf Syslog
But sometimes during/after cron.daily (logrotate?), syslog-ng dies
I experienced similar problems with apache2 at logrotate weekly and with denyhosts
So I decided to monitor all my daemons and try to restart them automatically when there is a problem

mywatchdog

#!/bin/bash

# Copyright: Philippe Teuwen <phil at teuwen dot org>
# License: GPLv3 or later
#
# Usage:
#   mywatchdog <vserver> <space-separated list of daemons>
# Checks in vserver if such daemons are running otherwise it tries to launch them with /etc/init.d/<daemon> start
# Special vserver MAIN refers to the host
#
# Note that <daemon> must be the name of the script in /etc/init.d/
# and must be usable as a pattern to find the process otherwise add a specific filter below

VSERVER=$1
shift

for i in $*; do 
    FILTER=${i}
    # Special daemons to tweak as the process has another name
    # -f greps on the full command line otherwise filter applies only on the process name
    [ "${i}" == "backuppc" ] && FILTER=BackupPC
    [ "${i}" == "courier-authdaemon" ] && FILTER=authdaemond
    [ "${i}" == "courier-imap" ] && FILTER=couriertcpd
    [ "${i}" == "denyhosts" ] && FILTER="-f denyhosts"
    [ "${i}" == "openbsd-inetd" ] && FILTER=inetd
    [ "${i}" == "ozyman" ] && FILTER="-f nomce.pl"
    [ "${i}" == "postgresql-8.1" ] && FILTER=postmaster
    [ "${i}" == "spamassassin" ] && FILTER=spamd
    [ "${i}" == "sysklogd" ] && FILTER=syslogd
    [ "${i}" == "uml-utilities" ] && FILTER=uml_switch
    [ "${i}" == "zope2.9" ] && FILTER="-f zope2.9"
    if [ "$VSERVER" == "MAIN" ]; then
        if [ ! -e "/etc/init.d/${i}" ]; then
            echo "Error in using mywatchdog script, /etc/init.d/${i} does not exist in MAIN!"
            exit 1
        fi
        pgrep ${FILTER}\|logrotate\|apt-get >/dev/null || \
        { 
            echo "Error! ${i}@${VSERVER} seems to be down! I try to restart  it..." && \
            /etc/init.d/${i} start; 
        }
    else
        if [ ! -e "/home/vservers/${VSERVER}/etc/init.d/${i}" ]; then
            echo "Error in using mywatchdog script, /etc/init.d/${i} does not exist in ${VSERVER}!"
            exit 1
        fi
        /usr/sbin/chcontext --ctx ${VSERVER} pgrep ${FILTER}\|logrotate\|apt-get >/dev/null || \
        { 
            echo "Error! ${i}@${VSERVER} seems to be down! I try to restart  it..." && \
            /usr/sbin/vserver ${VSERVER} exec /etc/init.d/${i} start; 
        }
    fi
done

/etc/cron.d/watch-vservers

I added to my /etc/cron.d/watch-vservers entries similar to

*/15 * * * *    root : "WD MAIN"     ;/home/vservers/tools/watch/mywatchdog MAIN          acpid atop exim4 munin-node ssh denyhosts sysklogd udev xinetd uml-utilities
*/15 * * * *    root : "WD MX"       ;/home/vservers/tools/watch/mywatchdog mx            syslog2mysql courier-authdaemon courier-imap cron exim4 fetchmail munin-node spamassassin ssh denyhosts syslog-ng xinetd
*/15 * * * *    root : "WD NS0"      ;/home/vservers/tools/watch/mywatchdog ns0           cron dnsmasq sysklogd
*/15 * * * *    root : "WD PUBLIC"   ;/home/vservers/tools/watch/mywatchdog public        apache2 cron exim4 munin-node sysklogd
*/15 * * * *    root : "WD SQL"      ;/home/vservers/tools/watch/mywatchdog sql           cron munin-node mysql sysklogd
[...]

"WD VSERVER" are the first strings to appear in the mails titles, this helps me to quickly identify problems
I don't monitor cron on MAIN as if it's dead, well, the watchdogs won't be called :-/

Note that if you use logcheck, you will get security warnings because our calls contain "deny" when we want to check "denyhosts" so you could need to add sth like this in /etc/logcheck/violations.ignore.d/local-logcheck-watchdogs

^\w{3} [ :0-9]{11} [._[:alnum:]-]+ /USR/SBIN/CRON\[[0-9]+\]: \(root\) CMD \(.*mywatchdog.*denyhosts.*\)$

Monitoring imapproxy

Imapproxy is in one vserver, the one with apache2 and squirrelmail and IMAP server is in another vserver, the mail server.
Because of racing between both vservers at boot time, imapproxy can fail to start if courier was not yet there

*/15 * * * *    root /usr/sbin/chcontext --ctx vserverX pgrep imapproxy >/dev/null|| { /usr/sbin/chcontext --ctx vserverY pgrep couriertcp >/dev/null && { echo "Error! Imapproxy@vserverX seems to be down! I try to restart it..." && /usr/sbin/vserver vserverX exec /etc/init.d/imapproxy start; } }