As you’d expect, we’ve been extensively testing the failover and high availability features as it’s one of the key selling points of our Cloud Platform, our main area for concern has of course been data storage – without data or disk, there’s no point in having compute power really.
In terms of storage availability initially we will have a pair of SAN SUs (Storage Area Network Storage Units) with 15k RPM SAS Drives, each SU has redundant PSU and Fans, has Dual Quad Core CPUs and 32GB of RAM for Cache and boots from an SSD. Storage is configured equally over both SUs in a round-robin fashion, this balances the load over the two SUs and maximises performance – So for half of the virtual machine instances SAN SU1 will be primary and for the other half SAN SU2 will be the primary – If a failure should ever occur then each SU is configured as a mirror for the other SUs volumes, so if SU1 fails and your storage is primary on SU1 then SU2 will start serving your storage to you.
In our testing so far we’ve seen from zero seconds impact to a maximum of two seconds impact in a a failover situation – depending on the exact nature of the failure. Whilst ideally we’d like to bring this down to zero seconds impact for all failure types, unfortunately it then becomes a delicate balance between false positives (where the system things something has failed because it takes fractionally longer to respond than normal) and detecting actual failures – if we start detecting lots of failures that aren’t, then it effects the stability of the system as it flips and flops between failure and recovery – which is far worse than a second or two of actual pause in disk i/o (Note: you shouldn’t see disk i/o fail, as it is queued, it will just pause momentarily). In a maintenance situation we can take out an SU without any impact to your service at all :)
Overall the initial SAN consists of:
- Multiple SAN SUs mirroring data for each other
- Multiple network switches
Each SAN SU consists of:
- Dual Quad Core CPU
- 32GB RAM
- SSD for Storage OS
- Enterprise SAS 15k RPM Drives
- RAID-10 (Disk Mirroring + Striping)
- N+1 Redundant PSU – Fed from two separate power feeds
- Multiple connections to multiple switches
What all this boils down to is that each SU is highly redundant on it’s own, as well as being very fast, we then add to that another SAN SU which mirrors data for it, giving even more redundancy in the system, as well as increased throughput. What it also means is that we’ll never be the cheapest for disk space – for every 1GB of disk space available on the system we have to provision 4GB of space, spread over 4 drives – RAID-10 inside the SUs, then mirrored between the SUs. For reference we are using Seagate and Hitachi 15k RPM SAS drives in 450GB capacity – considerably more expensive per GB than SATA drives, but worth every penny for the performance and reliability :)
Also, as you’d expect from us, we’re also looking at what changes can be made to see if we can bring all failover situations down to zero impact – but we’ll be doing this in our lab and it will likely appear in future revisions of our cloud hosting platform. We’re always looking to improve :)
I know quite a few of you are following the development of our new cloud hosting platform closely, so here are some very initial result from some brief disk testing. First up we have the standard Linux hdparm, nothing too strenuous, but it does give a quick idea as to disk performance:
Timing cached reads: 23004 MB in 1.99 seconds = 11575.07 MB/sec
Timing buffered disk reads: 336 MB in 3.02 seconds = 111.37 MB/sec
As you can see we’re getting 111MB/s – not bad for initial test, and something confirmed by Bonnie++ – A far more strenuous disk test:
Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
karl-test3.sheff 4G 61756 79 121058 17 50688 1 51968 54 111809 0 5428 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 45927 71 315487 99 2746 2 44193 69 392805 100 2221 2
Bonnie backs up our initial numbers from hdparm, which is nice to see – and does so without using 100% CPU for either reads or writes.
These are very preliminary numbers – we’ve not even got multipath running to the SANs yet or the HA going – in theory we could get 4x those numbers with both of those items up and running. We’ve also not got all the disks running on the SAN either for those tests, in fact that’s only running off of 4 disks, in production each SAN will have 8 disks in the SAN head end plus at least 1 x 16 Bay Disk Tray as well.
We’ll have more numbers as the testing progresses, also if there is anything you’d like us to test then please do let us know.
As promised, we’ve got some pictures of some of the hardware we’re using in our cloud hosting platform that will be used to support our business class web hosting as well as provide cloud based solutions to you. I apologise for some of the pictures – even with 5MP the iPhone still isn’t quite the great photo taking tool it should be for the money.
First up we have some factory fresh ECC memory – 192GB to be precise:
Next up we have one of our SAN head end boxes, probably the most important component in the whole of the cloud platform:
Inside the SAN head end boxes we’re using Adaptec 5805 and 5085 SAS RAID Cards – These provide us with 8 x Internal SAS ports, as well as two x12 SAS expansion ports for connecting up disk trays to. Once we’re done testing we’ll be adding disk trays with up to 24 x 15k 450GB SAS drives per tray.
The next most important components and the ones that will actually run the cloud computing are the hypervisor boxes, here you can see two of them next to each other (minus CPUs):
Just in case anything should go wrong, we have our backup NAS system, I don’t have a picture of the 1U head end box, but we do have pictures of the 24 bay disk trays that we’ll be using for them:
That’s all for now. When we get a minute we’ll get some pictures of all the kit racked up for you, and maybe even some video (we know a lot of you like flashy lights :))
…Simon from INX-Gaming. Congratulations to Simon who won a free bottle of Champagne for coming up with the most novel use for the stack of equipment we recently bought for our new Cloud Hosting Platform.
There will be more chances to win a bottle of Champagne over coming months – so keep your eyes on our blog, and also keep an eye out for more information about our new Cloud Hosting Platform, which you’ll be able to benefit from automatically if you’re a user of our business class shared hosting, as we’ll be moving all accounts over to it once it goes live.
This week we’ve been getting the hardware ready to test our planned new cloud hosting platform and as you can imagine it’s not been your average shopping list. So far this week we’ve purchased:
- 408GB of RAM
- 200.48Ghz of CPU
- 240GB of Solid State Disk (SSD) drives
- 6TB of 15k RPM Enterprise SAS drives
- 6TB of SATA drives
- 8 x Server Chassis for Hypervisors
- 3 x Server Chassis for Storage
- 1 x 24 Bay Disk Tray
- 3 x 8 Internal Port Adaptec RAID Cards
- 3 x 8 External Port Adaptec RAID Cards
- 8 x Dual Port Intel Network Card
- 3 x Quad Port Intel Network Card
So, not your average shopping list by some margin and the bank managers’ face is looking a bit grim right now as well. It’s shaping up to be a fun couple of weeks building and testing all this kit – although I’m just wondering if I can sneak it all for a new PC :)
Once our new cloud platform is up and running we’ll be migrating all of our business class web hosting over to it – so you’ll benefit from our extensive investment in our cloud hosting platform even if you’re not utilising it directly, just by making use of our business web hosting service from as little as £50 per year.
We’ll post some updates and pictures when it all arrives, for those of you whom we know like to see lots of hardware – in the meantime I look forward to all the interesting suggestions that we’ll no doubt get for what else we could use all the hardware for – The most interesting posted to the comments by this time next week gets a bottle of Champagne.