Difference between revisions of "Vserver watchdogs"

From YobiWiki
Jump to navigation Jump to search
m
Line 32: Line 32:
   
 
###### Adapt this one to your needs:
 
###### Adapt this one to your needs:
LOGFILE="$(/usr/bin/dirname $0)/mynetstat.log"
+
LOGFILE="$(dirname $0)/mynetstat.log"
 
####################################
 
####################################
   
Line 39: Line 39:
 
"update")
 
"update")
 
shift
 
shift
CMD="/bin/cat > $LOGFILE"
+
CMD="cat > $LOGFILE"
 
;;
 
;;
 
"diff")
 
"diff")
 
shift
 
shift
CMD="/usr/bin/diff -U0 $LOGFILE -"
+
CMD="diff -U0 $LOGFILE -"
 
;;
 
;;
 
"diff+update")
 
"diff+update")
 
shift
 
shift
CMD="/usr/bin/diff -U0 $LOGFILE -"
+
CMD="diff -U0 $LOGFILE -"
 
POSTUPDATE=true
 
POSTUPDATE=true
 
;;
 
;;
 
*)
 
*)
CMD="/bin/cat"
+
CMD="cat"
 
esac
 
esac
   
/usr/sbin/chcontext --ctx 1 /bin/netstat --ip -lp $* |\
+
/usr/sbin/chcontext --ctx 1 netstat --ip -lp $* |\
/usr/bin/awk '\
+
awk '\
 
FNR>2 {
 
FNR>2 {
 
printf $1 "\t%-23s\t", $4;
 
printf $1 "\t%-23s\t", $4;
Line 61: Line 61:
 
{$7=$6};
 
{$7=$6};
 
sub(/\/.*/,"",$7);
 
sub(/\/.*/,"",$7);
system("/usr/sbin/vps -p " $7 " -o pid,comm|/bin/sed \"1d;s/^[ 0-9]\\+//\"")
+
system("/usr/sbin/vps -p " $7 " -o pid,comm|sed \"1d;s/^[ 0-9]\\+//\"")
}'|/usr/bin/sort -bk 3 | eval $CMD
+
}'|sort -bk 3 | eval $CMD
 
# To get also args of the command, change "pid,command" by "pid,comm"
 
# To get also args of the command, change "pid,command" by "pid,comm"
   
 
[ $? -eq 1 ] && $POSTUPDATE && exec $0 update $*
 
[ $? -eq 1 ] && $POSTUPDATE && exec $0 update $*
 
</pre>
 
</pre>
  +
 
====/etc/cron.d/watch-vservers====
 
====/etc/cron.d/watch-vservers====
 
MAILTO=root
 
MAILTO=root

Revision as of 20:59, 17 February 2007

Monitoring listening TCP/UDP processes

mynetstat

#!/bin/bash 

# Copyright: 2007 Philippe Teuwen <phil at teuwen dot org>
# License:   GPL
#
# Description:
#   Watch for modifications in the list of listening TCP/UDP processes across all vservers
#   Examples: mysql, imapproxy,.. crashes
#             IP binding broken after package upgrade
#             new suspicious listening process, etc
#
# Usage:
#   mynetstat [update|diff|diff+update] options
#
#   mynetstat
#       Gives a clean list of listening processes in which vserver and with which binding
#   mynetstat update
#       Creates a snapshot in a logfile
#   mynetstat diff
#       Shows diffs compared to the snapshot
#       This is the one expected to be called by cron on the host
#   mynetstat diff+update
#       Same as diff but takes a new snapshot in case of changes
#       so cron doesn't send repetitively alerts
#   Options:
#   You can use numeric options of netstat: [--numeric|-n] [--numeric-hosts] [--numeric-ports]
# Changelog:
#   v1.0

###### Adapt this one to your needs:        
LOGFILE="$(dirname $0)/mynetstat.log"
####################################

POSTUPDATE=false
case "$1" in
    "update")
        shift
        CMD="cat > $LOGFILE"
    ;;
    "diff")
        shift
        CMD="diff -U0 $LOGFILE -"
    ;;
    "diff+update")
        shift
        CMD="diff -U0 $LOGFILE -"
        POSTUPDATE=true
    ;;
    *)
        CMD="cat"
esac

/usr/sbin/chcontext --ctx 1 netstat --ip -lp $* |\
    awk '\
        FNR>2 {
            printf $1 "\t%-23s\t", $4;
            if ($7 == "")
                {$7=$6};
            sub(/\/.*/,"",$7);
            system("/usr/sbin/vps -p " $7 " -o pid,comm|sed \"1d;s/^[ 0-9]\\+//\"")
        }'|sort -bk 3 | eval $CMD
# To get also args of the command, change "pid,command" by "pid,comm"

[ $? -eq 1 ] && $POSTUPDATE && exec $0 update $*

/etc/cron.d/watch-vservers

MAILTO=root
*/15 * * * *    root /home/vservers/tools/watch/mynetstat diff

Monitoring CPU-hungry logins

In some situations my CPU was burning like hell for days.
This was the vcontext/login process when using the command "vserver XX enter".
If it happens that the process is disconnected from its terminal it eats all the CPU :-(
I could reproduce the problem as follow:

  • ssh to the server as normal user
  • sudo su -
  • vserver XX enter
  • from another term, kill -9 one of the 2 su processes ("su -" or "-su")
  • quit the current terminal

Now from another terminal you can see that

 vps ax|grep " login"

shows a "login" process not attached to any tty (cf the "?") and using top you see it's running 100% cpu.

Actually the real process is called vcontext as seen by top, pgrep and as you can see in /proc/XXX/status but its command line was overwritten with "login\0\0\0\0\0\0\0\0\0\0\0..." and commands like vps or pgrep -f show you this name "login" as you can see in /proc/XXX/cmdline.

So I wrote a script and installed it in a cron to kill any such dandling process:

pids=$(/usr/sbin/vps axc|\
    egrep ' ? +[RS] +[1-9][0-9]*:[0-9]+ +vcontext'|\
    sed 's/^ *\([0-9]\+\) .*$/\1/')
[ "$pids" != "" ] && \
    echo "Killing dandling login" && \
    kill $pids

And /etc/cron.d/watch-vservers:

MAILTO=root
*/5 * * * *     root /home/vservers/tools/kill-vcontext

This also shows something weird: processes in vservers don't appear with ps ax, ok, they appear with vps ax.
Now if I want to kill one of them from the host:

vkill 29825
vkill: vc_ctx_kill(): No such process
chcontext --ctx 1 kill 29825
<no error but nothing happen neither because ctx1 can watch all but can only watch!>
kill 29825
<works!!>

It seems that on newer versions, such direct kill will not work and will have to be chcontexted first.

Monitoring syslog-ng

I have a big syslog-ng aggregator + dumping into mysql, cf Syslog

But sometimes during/after cron.daily (logrotate?), syslog-ng dies, so I added to my /etc/cron.d/watch-vservers

*/15 * * * *    root /usr/sbin/chcontext --ctx vserverX pgrep syslog-ng\|logrotate >/dev/null|| { echo "Error! Syslog-ng@vserverX seems to be down! I try to restart it..." && /usr/sbin/vserver vserverX exec /etc/init.d/syslog-ng start; }

Monitoring imapproxy

Imapproxy is in one vserver, the one with apache2 and squirrelmail and IMAP server is in another vserver, the mail server.
Because of racing between both vservers at boot time, imapproxy can fail to start if courier was not yet there

*/15 * * * *    root /usr/sbin/chcontext --ctx vserverX pgrep imapproxy >/dev/null|| { /usr/sbin/chcontext --ctx vserverY pgrep couriertcp >/dev/null && { echo "Error! Imapproxy@vserverX seems to be down! I try to restart it..." && /usr/sbin/vserver vserverX exec /etc/init.d/imapproxy start; } }