wiki:kiyoshi-recovery-2010-02

Kiyoshi recovery 2010-02

This page documents the planned steps for the Kiyoshi disk recovery (see #2828).

Monday Night 9:00 pm

  • Booting into debirf image (which allows us full access to the underlying filesystems:
    • Configure debirf for networking (IP: 209.51.171.182/27, Gateway: 209.51.171.161, netmark: 255.255.255.224)
    • Prepare the disks so we can access them:
      • Initialize the RAID arrays
        # incomplete!!
        for foo in 0 1 2 ;do mknod /dev/md$foo b 9 $foo ; done
        mdadm --assemble /dev/md1 /dev/sda2
        mdadm --assemble /dev/md2 /dev/sdb2
        
      • Decrypt the RAID arrays
        cryptsetup luksOpen /dev/md1 crypt_md1
        cryptsetup luksOpen /dev/md2 crypt_md2
        
      • Scan for logical volumes
        vgscan --mknodes
        vgchange -aly
        
    • Move all logical volumes to sdc2
      • Find logical volumes with:
        lvs
        
      • Examine to figure out which ones are on which physical volumes
        lvdisplay -m vg_kiyoshi0/<logical-volume-name>
        
      • Move with:
        pvmove --verbose --name vg_kiyoshi0/<logical-volume-name> <path/to/old/volume> /dev/sdc2
        
    • Ensure /dev/sdb is no longer in use
      pvdisplay /dev/mapper/md2_crypt
      

Evaluation point

If the transfer to sdc goes quickly and smoothly, then this is an acceptable stopping point. We can restart with all data coming from sdc and expect a stable (although not redundant) system for tuesday.

If the transfer is going really slowly or we have doubts it can finish by tomorrow, we should stop. The last thing we want to do is move all the data to sdc and when it finally completes at 7:00 am, realize the performance is even worse.

Monday night or Tuesday during the day

  • Setup benchmarking to test performance on sdb prior to our change
    • Install postmark
      aptitude install postmark
      
    • Move testy partition to sdb
      pvmove --verbose vg_kiyoshi0/testy /dev/sdc2 /dev/mapper/md2_crypt
      
    • Create a file system
      mkfs -t ext3 /dev/mapper/vg_kiyoshi0-testy
      
    • Mount it
      mount /dev/mapper/vg_kiyoshi0-tesyt /mnt
      
    • Create a file called postmark.conf:
      set location /mnt/
      set seed 12345678
      set read 1024
      set write 1024
      set buffering false
      set transactions 4096
      set size 512 2048
      set number 51115
      run
      quit
      
    • Run postmark:
      postmark postmark.conf
      
  • If you get Error: Cannot open /mnt/123 for writing then reduce the set number to a lower number
  • It should output something like:
    guest@chicken:~$ postmark postmark.conf 
    PostMark v1.51 : 8/14/01
    Reading configuration from file 'postmark.conf'
    Creating files...Done
    Performing transactions...........Done
    Deleting files...Done
    Time:
    	24 seconds total
    	11 seconds of transactions (372 per second)
    
    Files:
    	53225 created (2217 per second)
    		Creation alone: 51115 files (8519 per second)
    		Mixed with transactions: 2110 files (191 per second)
    	2061 read (187 per second)
    	2035 appended (185 per second)
    	53225 deleted (2217 per second)
    		Deletion alone: 51239 files (7319 per second)
    		Mixed with transactions: 1986 files (180 per second)
    
    Data:
    	2.54 megabytes read (108.33 kilobytes per second)
    	65.72 megabytes written (2.74 megabytes per second)
    guest@chicken:~$
    
  • De-commission /dev/sdb
    • Move testy partition back
      umount /mnt
      pvmove --verbose --name vg_kiyoshi0/testy /dev/mapper/md2_crypt /dev/sdc2
      
    • Remove as logical volume
      pvremove /dev/mapper/md2_crypt
      
    • Unmap crypto layer:
      cryptsetup luksClose md2_crypt 
      
    • Remove from RAID
      mdadm --fail /dev/md2 /dev/sdb2 
      ### shouldn't this remove /dev/md2 entirely?
      mdadm --fail /dev/md0 /dev/sdb1 
      
  • Properly re-partition
    • Setup partitions
      # following http://article.gmane.org/gmane.linux.utilities.util-linux-ng/2955
      parted /dev/sdb
      # do we want gpt??
      (parted) mklabel gpt
      (parted) unit s
      ### dkg thinks we should not go all the way to the end; rather, we should leave a bit of free space
      ### this is because we don't know if the other disk is exactly the same size or not.
      ### so we should change -1 to something several sectors in from the end.
      (parted) mkpart primary ext2 40 -1
      # parted will complain about end location, ignore
      (parted) quit
      
    • Re-add to raid arrays: this step re-adds it to the LVM raid array it was previously a part of and adds it to the boot partition RAID array
      mdadm /dev/md2 --create --level=mirror -n 2 /dev/sdb2 missing
      mdadm --add /dev/md0 /dev/sdb1
      
  • add crypto layer to md2
    cryptsetup luksFormat /dev/md2
    cryptsetup luksOpen /dev/md2 md2_crypt
    
  • Test to make sure this new partition means it really does write/read faster.
    • Move test partition back and mount
      pvmove --verbose --name vg_kiyoshi0/testy /dev/sdc2 /dev/mapper/md2_crypt
      mount /dev/mapper/vg_kiyoshi0-test /mnt
      
  • Run postmark and compare with earlier test results
    postmark postmark.conf
    
  • Move logical volumes back from sdc (see above)
  • Restart vservers

This is another stopping place

Tuesday/Wednesday night

  • Fail sda on all raids it is a part of.
  • Take down the machine
  • Replace sda disk with new disk
  • Start machine
  • Create partition table on sda matching sdb
  • Add sda partitions back to RAID
Last modified 14 years ago Last modified on Feb 9, 2010, 2:58:28 AM
Note: See TracWiki for help on using the wiki.