wiki:restart-failed-server

Version 5 (modified by Ross, 6 years ago) (diff)

--

Restart Failed Server

From time to time our virtual servers exceed their allocated memory and fail with oom (out of memory) errors. It is usually necessary to perform a hard reset of the virtual servers.

How to reset or reboot a virtual server depends on the virtualization technology in use by the server.

The first step is to checkout our SVN repository of servers.

Then, examine the "available-servers" directory. You should see a directory for every server we are running.

Inside each directory, there's a file called "virtualization".

The virtualization file will say either: vserver, xen, or kvm.

In addition, there's a symlink called "host" which will point to the host server for this virtual server. For all servers, you will need to ssh into the host server to perform a reset.

Important Note If the host server is sontag or gramsci you may be prompted to enter an encryption pass phrase. You will need this passphrase to start the host because sontag and gramsci do not have their base disks encrypted. If the host server is fred - you may be prompted to enter a passphrase, but you can hit enter to continue (fred does have it's base disks encrypted). All other hosts have their base disks encrypted so no need to decrypt a virtual server to start it.

xen

  • Unplug the server:
    xm destroy <server-name>
    
  • Restart the server:
    xm create -c <server-name>
    

Important note: if you are restarting a xen server on sontag and you are prompted for a cryptsetup passphrase, you must lookup it up via keyringer. If you are restarting a xen server on fred and you are prompted for a cryptsetup passphrase, there is no passphrase - just keep hitting enter until the prompt stops appearing. The reason you get the prompt is because each virtual server is using the initramfs from the host server (fred) which does have a cryptsetup password. However, the virtual servers do not.

kvm

From root@HOST.mayfirst.org, run:

  • unplug the server:
    sv down <server-name>
    
  • restart:
    sv up <server-name>
    

In some situations, the above commands will fail. If they do, you can try this:

  • remove the guest
    update-service --remove /etc/sv/kvm/GUESTNAME
    
  • re-add the guest
    update-service --add /etc/sv/kvm/GUESTNAME
    

Should these commands fail as well, you can take a more drastic step and kill the kvm process itself (be careful). Here's how:

ps -eFH | grep GUESTNAME

ps will give you output that lists the kvm process in question. You want to find a line like this (marx is used in this example):

marx       614   531 99 1118262 4130180 2 2012 ?       106751-23:47:16         /usr/bin/kvm -drive file=/dev/mapper/vg_bolivar0-marx,if=virtio,id=hda,boot=on,format=raw -M pc -enable-kvm -nodefaults -nographic -name marx -m 4G -boot c -chardev socket,id=monitor,path=/home/marx/vms/marx/monitor.socket,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -usb -device virtio-balloon-pci,id=balloon0,bus=pci.0 -chardev socket,id=serial0,path=/home/marx/vms/marx/console.socket,server -device isa-serial,chardev=serial0 -smp 1,maxcpus=8 -device virtio-net-pci,vlan=0,id=net0,mac=02:00:00:00:00:0b,bus=pci.0 -net tap,ifname=tap10,script=no,downscript=no,vlan=0,name=hostnet0
  • Then kill the process:
    kill PID
    

Where PID is the first number of the output (in the example above it is 614).

  • Next restart the server with:
    sv up GUESTNAME
    
  • If this fails, you may need to add the server with.
    update-service --add /etc/sv/kvm/GUESTNAME
    

vserver

  • shutdown
    vserver <server-name> stop
    
  • start
    vserver <server-name> start