Cloud Testing: Storage Failover

As you’d expect, we’ve been extensively testing the failover and high availability features as it’s one of the key selling points of our Cloud Platform, our main area for concern has of course been data storage – without data or disk, there’s no point in having compute power really.

In terms of storage availability initially we will have a pair of SAN SUs (Storage Area Network Storage Units) with 15k RPM SAS Drives, each SU has redundant PSU and Fans, has Dual Quad Core CPUs and 32GB of RAM for Cache and boots from an SSD. Storage is configured equally over both SUs in a round-robin fashion, this balances the load over the two SUs and maximises performance – So for half of the virtual machine instances SAN SU1 will be primary and for the other half SAN SU2 will be the primary – If a failure should ever occur then each SU is configured as a mirror for the other SUs volumes, so if SU1 fails and your storage is primary on SU1 then SU2 will start serving your storage to you.

In our testing so far we’ve seen from zero seconds impact to a maximum of two seconds impact in a a failover situation – depending on the exact nature of the failure. Whilst ideally we’d like to bring this down to zero seconds impact for all failure types, unfortunately it then becomes a delicate balance between false positives (where the system things something has failed because it takes fractionally longer to respond than normal) and detecting actual failures – if we start detecting lots of failures that aren’t, then it effects the stability of the system as it flips and flops between failure and recovery – which is far worse than a second or two of actual pause in disk i/o (Note: you shouldn’t see disk i/o fail, as it is queued, it will just pause momentarily). In a maintenance situation we can take out an SU without any impact to your service at all :)

Overall the initial SAN consists of:

  • Multiple SAN SUs mirroring data for each other
  • Multiple network switches

Each SAN SU consists of:

  • Dual Quad Core CPU
  • 32GB RAM
  • SSD for Storage OS
  • Enterprise SAS 15k RPM Drives
  • RAID-10 (Disk Mirroring + Striping)
  • N+1 Redundant PSU – Fed from two separate power feeds
  • Multiple connections to multiple switches

What all this boils down to is that each SU is highly redundant on it’s own, as well as being very fast, we then add to that another SAN SU which mirrors data for it, giving even more redundancy in the system, as well as increased throughput. What it also means is that we’ll never be the cheapest for disk space – for every 1GB of disk space available on the system we have to provision 4GB of space, spread over 4 drives – RAID-10 inside the SUs, then mirrored between the SUs. For reference we are using Seagate and Hitachi 15k RPM SAS drives in 450GB capacity – considerably more expensive per GB than SATA drives, but worth every penny for the performance and reliability :)

Also, as you’d expect from us, we’re also looking at what changes can be made to see if we can bring all failover situations down to zero impact – but we’ll be doing this in our lab and it will likely appear in future revisions of our cloud hosting platform. We’re always looking to improve :)

Cloud Testing: Disk

I know quite a few of you are following the development of our new cloud hosting platform closely, so here are some very initial result from some brief disk testing. First up we have the standard Linux hdparm, nothing too strenuous, but it does give a quick idea as to disk performance:

/dev/sda1:
 Timing cached reads:   23004 MB in  1.99 seconds = 11575.07 MB/sec
 Timing buffered disk reads:  336 MB in  3.02 seconds = 111.37 MB/sec

As you can see we’re getting 111MB/s – not bad for initial test, and something confirmed by Bonnie++ – A far more strenuous disk test:

Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
karl-test3.sheff 4G 61756  79 121058  17 50688   1 51968  54 111809   0  5428   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 45927  71 315487  99  2746   2 44193  69 392805 100  2221   2

Bonnie backs up our initial numbers from hdparm, which is nice to see – and does so without using 100% CPU for either reads or writes.

These are very preliminary numbers – we’ve not even got multipath running to the SANs yet or the HA going – in theory we could get 4x those numbers with both of those items up and running. We’ve also not got all the disks running on the SAN either for those tests, in fact that’s only running off of 4 disks, in production each SAN will have 8 disks in the SAN head end plus at least 1 x 16 Bay Disk Tray as well.

We’ll have more numbers as the testing progresses, also if there is anything you’d like us to test then please do let us know.