Understanding why a MOSH or a Physical Server is under heavy load
When you find a MOSH or physical server is under heavy load it is often due to a single web site or user getting slammed.
Here are some tricks to find that user.
Munin
See our munin page for how to see graphs that show resource usage over time.
Resource Hog
The resourcehog
script will output the top CPU, disk and memory users going back up to three days.
By default, it shows a summary of usage over the last hour.
Run resourcehog -h
to get the full usage:
usage: resourcehog [-h] [--include-root] [--include-system] [--munin] [--resource {cpu,read,write,rss}] [--limit LIMIT] [--include-commands] [--quiet] [--debug] [--since SINCE] [--until UNTIL] Report on resource usage by user. optional arguments: -h, --help show this help message and exit --include-root Include root user in the report results --include-system Include root user in the report results --munin Instead of printing a summary of usage, output in a format suitable for munin. --resource {cpu,read,write,rss} Limit to a particular resource, repeat as needed --limit LIMIT Limit results to this number --include-commands Instead of grouping by user, group by user and command --quiet Print less extraneous information --debug Print out sql statements used and other usefuld ebug information --since SINCE Limit results to entries after this data --until UNTIL Limit results to entries before this data
How does it work?
There are two python scripts.
resourcehog-collector
This script, written in python, runs via a systemd process and monitors system usage via pidstat
and then outputs the results every 5 minutes to a sqlite database located on the tempfs /run/resourcehog/rh.db.
resourcehog
This script queries the database for you and prints out the results in easy to read format.
sysstat
Another useful tool enabled on all MOSH'es is sysstat
. It can provide a recent history of resource usage on the server. This information helps you determine if a resource constraint has only been happening recently, or has been on-going for some time.
It collects data via a cron job that runs sa1
. To view the data, run:
sar
To see the data from yesterday:
sar -1
The sysstat
commands won't break down resource usage on a per-user basis.