Have you ever had a process that dies on occasion? For me, I hate that situation and prefer to fix the software as opposed to have a monitor that restarts the process when it dies. I’ve run into a case lately however, that has defied me for a solution to my dying process. I think it may be a hardware related issue but haven’t tracked down the cause yet. Anyhow, I read an email on the Provo Linux User Group in which the poster referred to PS-Watcher. I thought I’d give it a try for kicks.
After installing the program and reading through the documentation, I found that PS-Watcher is really quite nice. In addition to monitoring the results of the ps command, you can add custom actions that occur at the beginning or ending of the monitor cycle ($PROLOG and $EPILOG). You can also customize actions to be taken based on the number of processes, memory size, and a few other useful metrics.
For most situations where you want to monitor a process and take action, I think PS-Watcher will probably do the job nicely. After all this however, I decided what I really wanted was a little script that did a custom restart of my particular web server when the test URL wasn’t functioning properly. I decided to simply run it on a scheduled interval with cron. I’ve placed the script below for all to glean information from or make fun of as appropriate. Feel free to provide some additional tips as I don’t claim to be a “Bash Jedi Master”. The following script sends a request to the web server and parses the response for a string that lets us know the server is working properly.
#!/bin/bash
user="<the user my process is running under>"
port="<the port>"
okresp="^OK$" # I configured a test URL that returns OK if the server is up and running right.
# make a simple HTTP request to send
req="GET /lbuptest HTPP/1.0
"
# send it using netcat
resp=$(echo "$req" | nc localhost $port)
# test for the ok string
ok=0
echo "$resp" | grep $okresp 2>&1 >> /dev/null && ok="1"
# you could really place whatever actions you want here.
if [[ $ok != "1" ]]; then
/etc/init.d/<my process init script> restart
fi
The process I’m having trouble with is a TurboGears web application. I don’t think this is a Python problem however. Like I mentioned before, it only happens on this one server so I think I’ve got a hardware problem. Either way, if you found this page searching for TurboGears information, you might as well be interested in my TurboGears Init Scripts.