DataCore’s recent SPC-1 published numbers have been making quite the splash in the industry. If you missed them, here’s a link to their article about them. The reactions from the storage industry have been a very interesting range of outrage, distrust, disregard, and full fan-boy amazement. Which is the proper response? Well, we are a consulting company, so you guessed it … it depends. This blog is going to cut through the sensationalism to say what you can ACTUALLY have as a takeaway.
First, the results: there are three results currently published (and they were published in this order), all of which are pretty interesting, but for different reasons. They are as follows: single node with hyper-convergence (SPC-1 Full Disclosure), a pair of high availability nodes with hyper-convergence (SPC-1 Full Disclosure), and a pair of high availability nodes with separate storage (SPC-1 Full Disclosure). Here’s a link to the full top-ten list on the SPC site.
Single Node with Hyper-Convergence
Let’s start with the first test, a single node with hyper-convergence. This was met with a lot of (valid) critical response. Most importantly, it was a single node. If anything went awry in that one node, then all storage goes with it. It’s not duplicated anywhere else, it’s all on its lonesome. That is, obviously, not good for anyone’s production. It’s no worse than say, a single Hyper-V host with internal storage, but that’s not generally hitting the target of people who are looking at SPC benchmarks, so there IS a use case for it. Mostly, the key part of this test is that it shows that with a reasonably moderate server (dual 14-core processors, 544 GB of RAM, 16 SSDs and eight 15k drives) DataCore can push 459,000 IOPs at a shockingly low .08 ms. That’s still 29,000 IOPs per SSD — which isn’t bad, especially considering the mirroring taking place — but the response time is incredible at that throughput.
The key thing to keep in mind with this configuration is that there is no external connectivity or ability to survive one box dying. This test was really set up purely to be a shot across the bow, not a real-world scenario.
Two Nodes with Hyper-Convergence
Moving onto two nodes hyper-converged … This test also got a lot of critical response, again, because the workload was running on the storage, thereby eliminating transport latency (Fibre Channel or iSCSI). That being said, hyper-converged storage is a big thing now, with growing markets and many different competitors.
The most noteworthy thing, though, is the data is made highly available across two boxes, if either box dies or if any component within a box dies, data is still available. That instantly makes this a much more viable real-world scenario. The hardware is larger though in this case: two 18-core CPUs and 768 GB of RAM per controller. While significantly larger than the above server, this is still nothing exotic or unreasonable to procure. Interestingly, this configuration only has 18 SSDs and four 15k drives per controller, (36 and eight, respectively) but generated 1,200,000 IOPs at .09 ms. With this test, they’re now up to about 33,000 IOPs per SSD, excluding the IO necessary for mirroring, which is a significant improvement from the first.
This is a much more likely scenario than the above, but there is one thing to keep in mind: the ONLY redundancy in this scenario is between the two DataCore controllers. There isn’t drive redundancy within each controller. That, again, pulls this scenario as it’s designed a bit further away from the real world, but is still a good example. Changing the SSDs to be redundant would add cost — not an astronomical amount — so it’s easy to apply this to a real-world situation.
Two Mirrored Nodes
Finally, the last SPC-1 benchmark is two mirrored DataCore nodes with the SPC-1 workloads being hosted exterior. This is the closest solution to a normal real-world SAN configuration. This was even bigger hardware still, two 18-core processors and 1.5 TB of RAM with a total of 36 480 GB SSDs, and seven 15k drives. However, despite the hardware only being an incremental bump from the previous, the performance went up to 5,120,000 IOPs at .12 ms of latency. That means we’re averaging 71,111 IOPs per SSD (excluding mirror IO) — a HUGE leap from previous tests.
This test, more than any other, truly shows how parallel IO is able to push serious input/output. Similarly to the previous test, this one does not have any redundancy WITHIN a single DataCore node, so a loss of any drive would cause that whole side to go offline.
Making Sense of the Results
So, looking holistically at all three of these tests, what is there to take from them? Well, for a round of clarification, almost every vendor tweaks their deployments for SPC-1 when they run them. Do most DataCore nodes have 1.5 TB of RAM? No. 512 GB? Several, 256 GB? Not unreasonable at all. Does the entire SPC workload fit in cache? No, there are still cache misses, there are still writes that have to be serviced quickly, the fact that DataCore was able to keep such an insanely low response time average with these other things proves that the system isn’t just cheating by running purely out of RAM.
Further, if your production workload would fit in RAM, there’s no reason to not put it in your DataCore nodes. For comparison, one of the other highest IOP vendors had 4TB of DRAM cache and 512 SSDs to get 3,000,000 IOPs. That’s 60 percent more RAM cache — and seven times as many SSDs — for 60 percent of the performance of the DataCore solution. DataCore is also achieving that performance while doing a fully synchronous mirroring between two separate storage arrays, which means that an entire array could die and the system could survive. All of this alone says there is some black magic (parallel IO) in their software that’s creating some insanely higher performance numbers than normally presented hardware is capable of.
It’s not all roses though, there are some concerns here.
In a real world scenario, we don’t want a single DataCore node to lack redundancy (as is revealed within the full disclosure report). Normally, we run some form of RAID within each system’s pools, it will increase costs and lower performance some (depending on what type of RAID). Further, the way in which the drives in these MD arrays are laid out is in order to optimize SAS expander bandwidth, which actually sacrifices a lot of capacity and density. Also, the OS and swap/page drives are also not redundant within these systems. Finally, the UPSes on these reports are a bit small to give the system time to de-stage the RAM.
All of that being said, fixing these concerns/problems isn’t a massive cost in most situations, while the cost per IO would increase some, these configurations are closer to a real world solution than most.
What we’ve shown is that while the tests may not be perfectly “real world,” the results are still very applicable. The flexible nature and extreme performance capability of DataCore make it an excellent choice for your storage ecosystem.