wiki:procedure_during_server_crisis

Version 2 (modified by Jamie McClelland, 16 years ago) (diff)

--

Procedure during Server Crisis

Sometimes our servers will stop responding, experience extremely high loads or do other unpredictable behavior.

Our shared servers now have scripts installed to help analyze and record what is going on so you don't have to remember all the right commands.

At the moment, these scripts are only installed on our shared servers (malcolm, chavez, mandela, viewsic, etc.).

If a shared server experiences unpredictable behavior, please do the following:

  • Become root on the machine
    sudo -i
    
  • Run the analyze-server script
    mf-analyze-server
    

The analyze server scripts will generate a lot of files that you can read to help determine what the problems is. In addition, you may want to use the scripts installed in the /usr/local/sbin directory. In particular the suite of mf-check- scripts. They will show which IP addresses are accessing the server and can be useful in determining if single IP is causing a disproportionate amount of trouble.