Monitor resource overruns in OpenVZ

March 18, 2009 by alex

OpenVZ is a great virtualization tool for Linux servers. I’ve repeatedly run up against various resource limits, which can sometimes lead to really weird errors like ‘cannot allocate memory’ when you do something awful like bashls -l . I cooked up the following script to keep a log file of times when a server overruns its bounds. I can then either raise the limits, or try to correlate the overrun time to something going on at that time. Read more for the full details. The script will examine the /proc/user_beancounters file and print any ‘failcnt’ values which have changed since the last run. Set up a cron job every minute (or whatever frequency you like), and redirect output to get an easy failure log. bash * * * * * /usr/local/bin/beancount.py >> /var/log/beancount.log , but it doesn’t track how long it’s been since the last run, which you’d need for calculating failures/second or some other rate. I’m working on the assumption that these failures should be fairly rare, and you’re most interested in the fact that they’re happening at all.

Some possible enhancements:

  • Script this to run from the host OS (via bashvzctl enter ) rather than the guest.
  • Add syslog or other logging facilities in addition to stdout.

Update 6/14/2009

I got a patch idea for this script from a co-worker, and decided it was time to move it to some version-control system. Chose GitHub since I’m interested in learning a bit more about git and how it works.

http://github.com/alexdean/beancounter/tree/master

☙ ☙ ☙