= Understanding why a MOSH or a Physical Server is under heavy load = When you find a MOSH or physical server is under heavy load it is often due to a single web site or user getting slammed. Here are some tricks to find that user. == Munin == See our [wiki:munin munin page] for how to see graphs that show resource usage over time. == Resource Hog == The `resourcehog` script will output the top CPU, disk and memory users going back up to three days. By default, it shows a summary of usage over the last hour. Run `resourcehog -h` to get the full usage: {{{ usage: resourcehog [-h] [--include-root] [--include-system] [--munin] [--resource {cpu,read,write,rss}] [--limit LIMIT] [--include-commands] [--quiet] [--debug] [--since SINCE] [--until UNTIL] Report on resource usage by user. optional arguments: -h, --help show this help message and exit --include-root Include root user in the report results --include-system Include root user in the report results --munin Instead of printing a summary of usage, output in a format suitable for munin. --resource {cpu,read,write,rss} Limit to a particular resource, repeat as needed --limit LIMIT Limit results to this number --include-commands Instead of grouping by user, group by user and command --quiet Print less extraneous information --debug Print out sql statements used and other usefuld ebug information --since SINCE Limit results to entries after this data --until UNTIL Limit results to entries before this data }}} === How does it work? === There are two python scripts. ==== resourcehog-collector ==== This script, written in python, runs via a systemd process and monitors system usage via `pidstat` and then outputs the results every 5 minutes to a sqlite database located on the tempfs /run/resourcehog/rh.db. ==== resourcehog ==== This script queries the database for you and prints out the results in easy to read format. == sysstat == Another useful tool enabled on all MOSH'es is `sysstat`. It can provide a recent history of resource usage on the server. This information helps you determine if a resource constraint has only been happening recently, or has been on-going for some time. It collects data via a cron job that runs `sa1`. To view the data, run: {{{ sar }}} To see the data from yesterday: {{{ sar -1 }}} The `sysstat` commands won't break down resource usage on a per-user basis.