= Kiyoshi recovery 2010-02 =

This page documents the planned steps for the Kiyoshi disk recovery (see #2828).

== Monday Night 9:00 pm ==

 * Booting into debirf image (which allows us full access to the underlying filesystems:
  * Configure debirf for networking (IP: 209.51.171.182/27, Gateway: 209.51.171.161, netmark: 255.255.255.224)
  * Prepare the disks so we can access them:
   * Initialize the RAID arrays
{{{
# incomplete!!
for foo in 0 1 2 ;do mknod /dev/md$foo b 9 $foo ; done
mdadm --assemble /dev/md1 /dev/sda2
mdadm --assemble /dev/md2 /dev/sdb2
}}}
   * Decrypt the RAID arrays
{{{
cryptsetup luksOpen /dev/md1 crypt_md1
cryptsetup luksOpen /dev/md2 crypt_md2
}}}
   * Scan for logical volumes
{{{
vgscan --mknodes
vgchange -aly
}}}
  * Move all logical volumes to sdc2
   * Find logical volumes with:
{{{
lvs
}}}
   * Examine to figure out which ones are on which physical volumes
{{{
lvdisplay -m vg_kiyoshi0/<logical-volume-name>
}}}
   * Move with:
{{{
pvmove --verbose --name vg_kiyoshi0/<logical-volume-name> <path/to/old/volume> /dev/sdc2
}}}
  * Ensure /dev/sdb is no longer in use
{{{
pvdisplay /dev/mapper/md2_crypt
}}}

== Evaluation point ==

If the transfer to sdc goes quickly and smoothly, then this is an acceptable stopping point. We can restart with all data coming from sdc and expect a stable (although not redundant) system for tuesday.

If the transfer is going really slowly or we have doubts it can finish by tomorrow, we should stop. The last thing we want to do is move all the data to sdc and when it finally completes at 7:00 am, realize the performance is even worse.

== Monday night or Tuesday during the day ==

  * Setup benchmarking to test performance on sdb prior to our change
   * Install postmark
{{{
aptitude install postmark
}}}
   * Move testy partition to sdb
{{{
pvmove --verbose vg_kiyoshi0/testy /dev/sdc2 /dev/mapper/md2_crypt
}}}
   * Create a file system
{{{
mkfs -t ext3 /dev/mapper/vg_kiyoshi0-testy
}}}
   * Mount it
{{{
mount /dev/mapper/vg_kiyoshi0-tesyt /mnt
}}}
   * Create a file called postmark.conf:
{{{
set location /mnt/
set seed 12345678
set read 1024
set write 1024
set buffering false
set transactions 4096
set size 512 2048
set number 51115
run
quit
}}}
   * Run postmark:
{{{
postmark postmark.conf
}}}
  * If you get Error: Cannot open /mnt/123 for writing then reduce the set number to a lower number
  * It should output something like:
{{{
guest@chicken:~$ postmark postmark.conf 
PostMark v1.51 : 8/14/01
Reading configuration from file 'postmark.conf'
Creating files...Done
Performing transactions...........Done
Deleting files...Done
Time:
	24 seconds total
	11 seconds of transactions (372 per second)

Files:
	53225 created (2217 per second)
		Creation alone: 51115 files (8519 per second)
		Mixed with transactions: 2110 files (191 per second)
	2061 read (187 per second)
	2035 appended (185 per second)
	53225 deleted (2217 per second)
		Deletion alone: 51239 files (7319 per second)
		Mixed with transactions: 1986 files (180 per second)

Data:
	2.54 megabytes read (108.33 kilobytes per second)
	65.72 megabytes written (2.74 megabytes per second)
guest@chicken:~$
}}}
  * De-commission /dev/sdb
   * Move testy partition back
{{{
umount /mnt
pvmove --verbose --name vg_kiyoshi0/testy /dev/mapper/md2_crypt /dev/sdc2
}}}
   * Remove as logical volume
{{{
pvremove /dev/mapper/md2_crypt
}}}
   * Unmap crypto layer:
{{{
cryptsetup luksClose md2_crypt 
}}}
   * Remove from RAID
{{{
mdadm --fail /dev/md2 /dev/sdb2 
### shouldn't this remove /dev/md2 entirely?
mdadm --fail /dev/md0 /dev/sdb1 
}}}
  * Properly re-partition
   * Setup partitions
{{{
# following http://article.gmane.org/gmane.linux.utilities.util-linux-ng/2955
parted /dev/sdb
# do we want gpt??
(parted) mklabel gpt
(parted) unit s
### dkg thinks we should not go all the way to the end; rather, we should leave a bit of free space
### this is because we don't know if the other disk is exactly the same size or not.
### so we should change -1 to something several sectors in from the end.
(parted) mkpart primary ext2 40 -1
# parted will complain about end location, ignore
(parted) quit
}}}
   * Re-add to raid arrays: this step re-adds it to the LVM raid array it was previously a part of and adds it to the boot partition RAID array
{{{
mdadm /dev/md2 --create --level=mirror -n 2 /dev/sdb2 missing
mdadm --add /dev/md0 /dev/sdb1
}}}
  * add crypto layer to md2
{{{
cryptsetup luksFormat /dev/md2
cryptsetup luksOpen /dev/md2 md2_crypt
}}}
  * Test to make sure this new partition means it really does write/read faster.
	 * Move test partition back and mount
{{{
pvmove --verbose --name vg_kiyoshi0/testy /dev/sdc2 /dev/mapper/md2_crypt
mount /dev/mapper/vg_kiyoshi0-test /mnt
}}}
   * Run postmark and compare with earlier test results
{{{
postmark postmark.conf
}}}
  * Move logical volumes back from sdc (see above)
  * Restart vservers

This is another stopping place 

== Tuesday/Wednesday night ==

 * Fail sda on all raids it is a part of.
 * Take down the machine
 * Replace sda disk with new disk
 * Start machine
 * Create partition table on sda matching sdb
 * Add sda partitions back to RAID