Opened 4 years ago

Closed 4 years ago

Last modified 7 months ago

#11870 closed Bug/Something is broken (fixed)

Add solid state drives to some servers

Reported by: Jamie McClelland Owned by: Jamie McClelland
Priority: Medium Component: Tech
Keywords: Cc:
Sensitive: no

Description

We are in the process of investigating how to restrict disk i/o on a per-kvm basis in #11856.

While this ability is important if we want to ensure more consistent disk i/o speeds, I think it won't actually speed up disk i/o for guests that really need it.

So, I think we should also investigate whether we can speed up disk i/o by adding SSDs to our existing servers.

Attachments (1)

install-ssd-trays.pdf (874.4 KB) - added by Jamie McClelland 4 years ago.

Download all attachments as: .zip

Change History (22)

comment:1 Changed 4 years ago by Jamie McClelland

I'm first working on wiwa - to see if the motherboard will support it.

We still have to answer the questions:

  • I'm assuming we would want at least two SSDs and put them in RAID1, and then encrypt and then put in a volume group (same approach we use for regular disks). However, I'm not sure if there are any special considerations we should make when using SSD in this setup.
  • I think we allocate to guests as a second hard drives (e.g. HDB) via kvm. I suspect our best use would be for the SSD card to take over the mysql partition.

I've added "tps" to our resource hog scripts so we can count transactions per second on physical servers (so far, just wiwa has the code) and on guests (so far malcolm and june).

I'm hoping to figure out both:

  • Which guests on wiwa are good candidates for flash drive
  • Which devices on the MOSH'es are the best candidates (I'm hoping to demonstrate that the mysql partition gets most of the disk operations).

We'll see...

Last edited 4 years ago by Jamie McClelland (previous) (diff)

comment:3 Changed 4 years ago by Daniel Kahn Gillmor

that post is confusing. why would you try to use fstrim on an encrypted block device?

also, the resource hog scripts are themselves potentially resource hogs, since they write to disk. Have you considered trying to minimize their use if disk throughput?

comment:4 Changed 4 years ago by Jamie McClelland

I didn't post that link due to the fstrim part - just as an example of someone using a similar raid/crypt/lvm approach that we use with a SSD.

The resource hog scripts read entirely from proc and only write 4 very small files once a minute (and then another 4 files once an hour to consolidate) - so I don't expect them to have a big impact on disk i/o.

There are a few other scripts that measure disk usage by members on mosh's - which are more io intensive - but they are all running via ionice.

comment:5 in reply to:  3 Changed 4 years ago by Chris Thompson

Whoops - relocated comment to #11855

Last edited 4 years ago by Chris Thompson (previous) (diff)

Changed 4 years ago by Jamie McClelland

Attachment: install-ssd-trays.pdf added

comment:6 Changed 4 years ago by Jamie McClelland

I just attached instructions for how to install the SSD trays I ordered into wiwa. It appears that it will require removing the cover.

comment:7 Changed 4 years ago by Jamie McClelland

The pricing for SSD cards is all over the place, ranging from as low as $150 for about 500GB to nearly a $1,000.

I've been narrowing the search by focusing on ssd drives designed for data center/enterprise use and for write capacity.

Although it is a bit more expensive - I'm considering the Samsung SM863 for $289. It's 480GB and has a good review. Here are the manufacturer's page.

I plan to get two and put them in a RAID.

Last edited 4 years ago by Jamie McClelland (previous) (diff)

comment:8 Changed 4 years ago by Jamie McClelland

Unfortunately, my latest attempt to install the drives has failed.

But, I do know what we need.

There is one power source available. I had a molex cable that fit and provided two connectors that fit the SSD drives but... instead of a 12 inch cable I need a 24 inch cable. Also, the power connectors for the SSD drives need to be flat, not L shaped.

The sata cable connectors are in the middle of the server, which means the two cables I brought were long enough but they were flat on one side and L shaped on the other. We need flat on both sides.

Lastly, the server manufacturer sent us two trays - one to fit in the DVD slot and one to fit in the back. Neither one really works (we don't have a DVD slot). However, if we had two of the back trays on can stack them in the back and it should do the trick.

comment:9 Changed 4 years ago by Steve Revilak

I don't know if this would help at all, but there are adapters that allow you to put a pair of 2.5" drives in a 3.5" tray.

A few examples (which just happen to be the first ones I found on the fine web)

comment:10 Changed 4 years ago by Jamie McClelland

Yeah, those would be perfect but... all of our 3.5" trays are taken :(.

So... I've just ordered:

comment:11 Changed 4 years ago by JaimeV

Owner: set to Jamie McClelland
Status: newassigned

What is the latest update on this jamie?

comment:12 Changed 4 years ago by Jamie McClelland

I just released a service advisory for attempt number 3 tomorrow night. Let's hope that three's a charm.

comment:14 Changed 4 years ago by Jamie McClelland

So far, my conclusions based on reading all of this is:

  • The security problems with TRIM (ability to identify disk as encrypted and identify the filesystem type) are not serious enough to outweigh the benefits.
  • We should enabled TRIM on all layers
  • We should use a run fstrim regularly rather than rely on the discard fstab option

Nonetheless, there is still some risk (https://www.archlinux.org/news/data-corruption-on-software-raid-0-when-discard-is-used/).

Informative quote from asalor blog:

How to active TRIM on Linux? The first thing to know is that TRIM should be enabled on all I/O abstraction layers. This means that if you have an ext4 partition on top of LVM, which in turn is on top of an encrypted volume with LUKS/dm-crypt, then you must enable support for TRIM in these three layers: The filesystem, LVM and dm-crypt. There is no point in enabling it at the filesystem level if you don’t enable it also on the other layers. The TRIM command should be translated from one layer to another until reaching the SSD.

Last edited 4 years ago by Jamie McClelland (previous) (diff)

comment:15 Changed 4 years ago by Jamie McClelland

I updated puppet to have an $ssd variable when defining a physical server that is false by default. When true, it sets issue_discards = 1 in /etc/lvm/lvm.conf. I set this for wiwa and ran it.

I created a single partition on each disk with:

0 wiwa:/etc/lvm# parted /dev/sdb
GNU Parted 3.2
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel gpt
(parted) unit s mkpart main 8192 -196608                                       
(parted) p                                                                
Model: ATA SAMSUNG MZ7KM480 (scsi)
Disk /dev/sdb: 937703088s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start  End         Size        File system  Name  Flags
 1      8192s  937506480s  937498289s               main

(parted) 

Then, created a RAID array:

mdadm --create --raid-devices=2 --level=1 --metadata=1.0 --verbose /dev/md2 /dev/sda1 /dev/sdb1

Then, based on the output of:

mdadm --examine --scan

I pasted the following into /etc/mdadm/mdadm.conf:

ARRAY /dev/md/2  metadata=1.0 UUID=70345fc6:661e67ce:715261cf:84112f10 name=wiwa:2

The next steps are:

  • Setup encryption
  • Add encrypted device as physical volume for LVM
  • Update initramfs so device will be decrypted on boot
  • Create volume group: vg_wiwa1
  • Create logical volume for first guest server
  • Update kvm-manager with latest patch from #12096
  • Allocate logical volume to first test guest server
  • Reboot guest
  • Format device
  • Ensure we can extend it at a later time
  • Start using it

comment:16 Changed 4 years ago by Jamie McClelland

I completed these steps with:

0 wiwa:/etc/lvm# cryptsetup luksFormat /dev/md2 

WARNING!
========
This will overwrite data on /dev/md2 irrevocably.

Are you sure? (Type uppercase yes): YES
Enter passphrase: 
Verify passphrase: 
0 wiwa:/etc/lvm# cryptsetup --allow-discards luksOpen /dev/md2 md2_crypt
Enter passphrase for /dev/md2: 
0 wiwa:/etc/lvm# blkid /dev/md2 
/dev/md2: UUID="b360ef4f-84ea-457c-8f86-a7f94b1f9277" TYPE="crypto_LUKS"
0 wiwa:/etc/lvm# echo md2_crypt UUID=b360ef4f-84ea-457c-8f86-a7f94b1f9277 none luks,discard >> /etc/crypttab 
0 wiwa:/etc/lvm# cat /etc/crypttab 
# <target name> <source device>         <key file>      <options>
md1_crypt UUID=ae7a55a5-cc91-4064-8e5f-06eb293188a2 none luks
md2_crypt UUID=b360ef4f-84ea-457c-8f86-a7f94b1f9277 none luks,discard
0 wiwa:/etc/lvm#  pvcreate /dev/mapper/md2_crypt 
  Physical volume "/dev/mapper/md2_crypt" successfully created
0 wiwa:/etc/lvm# vgcreate vg_wiwa1 /dev/mapper/md2_crypt 
  Volume group "vg_wiwa1" successfully created
0 wiwa:/etc/lvm#

comment:17 Changed 4 years ago by Jamie McClelland

Create logical volume for jacobs:

0 wiwa:/etc/lvm# lvcreate --size 10GB --name jacobs vg_wiwa1
  Logical volume "jacobs" created
0 wiwa:/etc/lvm#

comment:18 Changed 4 years ago by Jamie McClelland

I had forgotten to grant access to jacobs on the host:

1 wiwa:/etc/sv/kvm/jacobs# ls -l /dev/mapper/vg_wiwa1-jacobs 
lrwxrwxrwx 1 root root 8 Sep 23 15:16 /dev/mapper/vg_wiwa1-jacobs -> ../dm-28
0 wiwa:/etc/sv/kvm/jacobs# chgrp jacobs /dev/dm-28
0 wiwa:/etc/sv/kvm/jacobs#

Now rebooted jacobs and we have a new disk:

0 jacobs:~# cat /proc/partitions 
major minor  #blocks  name

   8       16   10485760 sdb
   8        0  524288000 sda
   8        1     248832 sda1
   8        2  524037120 sda2
 254        0    3997696 dm-0
 254        1     499712 dm-1
 254        2     999424 dm-2
 254        3    4997120 dm-3
 254        4    4997120 dm-4
 254        5   20971520 dm-5
0 jacobs:~#

First creating a single partition:

0 jacobs:~# parted /dev/sdb
GNU Parted 3.2
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel gpt                                                      
(parted) unit s mkpart main 8192 -196608                                  
(parted) p                                                                
Model: QEMU QEMU HARDDISK (scsi)
Disk /dev/sdb: 20971520s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start  End        Size       File system  Name  Flags
 1      8192s  20774912s  20766721s               main

(parted) quit                                                             
Information: You may need to update /etc/fstab.

0 jacobs:~#

I'm not sure this step is absolutely necessary. I could have installed a filesystem directly on the device. However, by adding a single partition I am sure we have a good, even sector boundary and seem to leave more options for future uses of the device.

However, I'm intentionally not adding this to a logical volume group because we have a very specific use for this partition, we can always extend it via the host, and I don't want to add any unnecessary disk layers that could slow things down.

comment:19 Changed 4 years ago by Jamie McClelland

Resolution: fixed
Status: assignedclosed

jacobs is rebooted and running with the new solid state devices as the mysql partition.

I just added a wiki page documenting how to do this so I'm closing this ticket.

We may want to open new tickets to run this process on existing wiwa guests and also to repeat on a different host.

comment:20 Changed 14 months ago by updater

Sensitive: set

Changed to sensitive as part of leadership decision to make all tickets sensitive.

comment:21 Changed 7 months ago by Jamie McClelland

Sensitive: unset

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.