wiki:resource-hog

Version 3 (modified by Jamie McClelland, 12 months ago) (diff)

--

Understanding why a MOSH or a Physical Server is under heavy load

When you find a MOSH or physical server is under heavy load it is often due to a single web site or user getting slammed.

Here are some tricks to find that user.

Munin

See our munin page for how to see graphs that show resource usage over time.

Resource Hog

The resourcehog script will output the top CPU, disk and memory users going back up to three days.

By default, it shows a summary of usage over the last hour.

Run resourcehog -h to get the full usage:

usage: resourcehog [-h] [--include-root] [--include-system] [--munin]
                   [--resource {cpu,read,write,rss}] [--limit LIMIT]
                   [--include-commands] [--quiet] [--debug] [--since SINCE]
                   [--until UNTIL]

Report on resource usage by user.

optional arguments:
  -h, --help            show this help message and exit
  --include-root        Include root user in the report results
  --include-system      Include root user in the report results
  --munin               Instead of printing a summary of usage, output in a
                        format suitable for munin.
  --resource {cpu,read,write,rss}
                        Limit to a particular resource, repeat as needed
  --limit LIMIT         Limit results to this number
  --include-commands    Instead of grouping by user, group by user and command
  --quiet               Print less extraneous information
  --debug               Print out sql statements used and other usefuld ebug
                        information
  --since SINCE         Limit results to entries after this data
  --until UNTIL         Limit results to entries before this data

How does it work?

There are two python scripts.

resourcehog-collector

This script, written in python, runs via a systemd process and monitors system usage via pidstat and then outputs the results every 5 minutes to a sqlite database located on the tempfs /run/resourcehog/rh.db.

resourcehog

This script queries the database for you and prints out the results in easy to read format.

sysstat

Another useful tool enabled on all MOSH'es is sysstat. It can provide a recent history of resource usage on the server. This information helps you determine if a resource constraint has only been happening recently, or has been on-going for some time.

It collects data via a cron job that runs sa1. To view the data, run:

sar

To see the data from yesterday:

sar -1

The sysstat commands won't break down resource usage on a per-user basis.