I’ve been working to track down why one of my web apps seems to hang in the middle of the night when no one is using it. When it does, the CPU shoots up to consume 80-100% of a core and no longer responds to queries.
Now I’ve got this app set up with SystemD so I can run it as a service but apparently either SystemD can’t quit it either or the watchdog messages are still getting through even though the rest of the app is non-responsive.
Anyway… What I did recently was I created a bash script that runs on the machine once a minute via a cron job which checks the CPU usage and if it’s above a certain threshold, it kills and restarts the process. I’m providing this as a template for anyone that’s running into these types of issues…
#!/bin/bash
processname=$1
maxcpu=80
scriptname=$(basename $0)
usage() {
local exitcode=0
if [ "$1" != "" ]; then
echo $1
echo ""
exitcode=1
fi
echo "Usage: $scriptname -p processname [ -c max_cpu ]"
echo "-p process to check"
echo "-c max cpu threshold for killing the process"
echo "-h show this text"
exit $exitcode
}
while getopts ":p:c:h" opt; do
case $opt in
h) usage ;;
p) processname="$OPTARG" ;;
c) maxcpu="$OPTARG" ;;
:) usage "Error: Options -$OPTARG requires an argument" ;;
\?) usage "Error: Invalid option -$OPTARG" ;;
esac
done
if [ "$processname" == "" ]; then
usage "You must specify the process name"
exit 0
fi
numericcpu=$(echo "$maxcpu" | bc -l)
if [ $numericcpu -eq 0 ]; then
usage "Max CPU must be a number greater than zero"
fi
# get the process id
appPid=$(pidof "$processname")
if [ "$appPid" == "" ]; then
exit 0
fi
# checks the cpu usage of the process
cpu=$(ps -p $appPid -o %cpu=)
# ...and kills it if it's above the threshold (which means it's hung)
if (( $(echo "$cpu > $maxcpu" | bc -l) )); then
echo "killing $processname because it's cpu usage is $cpu which is greater than our threshold of $maxcpu"
kill -9 $appPid
systemctl restart "$processname"
fi
Hi Greg, what kind of server is this? I used to see this behavior with Linux servers, even came up with the same “restart at CPU%” solution (we used CloudWatch).
However, I haven’t seen the issue since ensuring all of my Linux servers are configured with a swapfile. Not once in the three years since learning this trick (I don’t recall how). I am confident enough now that I no longer use those CloudWatch CPU% triggers.
Any use of the native Xojo MySQL plugin? Any use of threads?
In a private thread (necessary because of some confidential implementation details) I created regarding a Web app built in 2024r4.1 (though 2024r4.2 builds had the same problem) Ricardo indicated both of these (native Xojo MySQL plugin, thread usage) might be an issue with web framework “misbehaving”.
In my case, the app just became 100% non-responsive (but CPU was idle, not spiking) but I can imagine behavior might vary based on OS and at which point in code execution the framework “breaks”.
The only workaround I found (for 2024r4.1) was to use the MBS SQL plugin for accessing MySQL. This 100% “fixed” my problem.
If you’re trying to track things down, threads and use of native Xojo database plugins might be a good place to start?