Customers frequently come to us for assistance when they are building a new virtualization cluster, or upgrading an existing one with what appears to be a very straightforward question: “How big should each virtual host be? Should I go with fewer but larger hosts, or more hosts that are smaller in size?” The answer, hopefully unsurprisingly to you, is simply that it depends.
To start with, let’s levelset on one thing: hardware is incredibly powerful now. In a 1U server you can get 44 cores and three TB of RAM. For a lot of smaller environments that will run the entirety of their workload on a single 1U box.
So how do you pick where you land in the whole cycle? You have choices for four cores all the way up to 22 per socket, as little as eight GB up to three TB of RAM … so what do you use to run your environment?
There are a few things to keep in mind:
There are still some applications in the world that are single threaded. Sometimes that is due to age, or it doesn’t scale out well. Sometimes it’s due to a single-threaded operation within a scaled application (bad SQL query for example). When you’re dealing with single-threaded workloads, having a much faster (in GHz) core will accomplish the workload much faster than having a higher core count. This means that sometimes you may choose a lower core count processor in order to get more speed out of it.
Those big 22-core processors aren’t the fastest on a per-GHz basis. 2.2 Ghz is a far cry from the 3.2-3.6 Ghz we used to get out of processors a few years ago … or is it? If you look at the actual IPC for how much processors accomplish in a given time, every new generation of processor is around three to five percent more efficient; which is to say, with the same clock speed, it gets more done. Compare for example, a E5-2699 V3 processor to an E5-2699 V4 processor. The V3 is 2.3 Ghz and the V4 is 2.2 Ghz, so you would inherently think the 2.3 would be faster, right? But if you accommodate the performance gain going from one generation to another, the 2.2 runs equivalent to a 2.3 Ghz from the previous generation. Factor in the extra core count of the V4 over the V3, and it’s actually MUCH faster.
This is why if you take an old processor from eight years ago that’s clocked at 3.2 GHz, excluding very certain situations (none of which I can actually think of) it will be beaten every time by a substantially lower clocked modern processor.
So what’s the real answer here? Balance it out. Figure out how many VMs you’re going to be running, figure out how many cores they need, identify your oversubscription ratio and then size the box appropriately, striking a balance between Ghz and core count. Make sure every host in the cluster can run your largest VM as well, if one or two VMs are enormous, make sure that’s accommodated for in your host sizing.
You must have enough memory to run your workload. When choosing RAM make sure you select the stick that gives you the most gigabytes per dollar. For example, a single 32-GB stick is currently cheaper than two 16 GB sticks. A single 64-GB is much more than two 32 GB, though, and a single 128 GB is triple four 32GB. In other words, calculate how dense you really need to be in a server. There are some performance differences between different memory configurations, but realistically, in modern systems, they aren’t a huge concern for most workloads.
How are you going to connect all of these VMs to the rest of your network? One Gbit? 10 Gbit? Make sure you accommodate how much throughput all of those VMs will realistically generate, and then get enough bandwidth for them to function within a server. If your network is only at one Gbit, you may not be able to build as big of hosts, simply because too many VMs are landing on one host for the network to accommodate, or you may have to add a LOT more one Gbit ports. More hosts normally means more ports used — do you have enough network ports for eight hosts? Or should you size it back down to four larger hosts? By the way, all these same questions apply for storage connectivity as well.
Power and Cooling
When you have multiple hosts, you’re often duplicating power draw overhead, motherboard, OS drives, extra chips on the board, etc. Normally this is pretty negligible, but sometimes it can add up in large scale. What does add up are actual power cables and ports. Do your PDUs or UPS accommodate the number of servers you are building out? Do you need extra cooling because of how dense your hosts are in the rack?
How much of your environment goes down if a single host dies? If you have eight hosts, and a single one dies, an eighth of your processing power went away, and, presumably, an eighth of your servers hard crashed and powered on elsewhere. If you have just two huge servers, then half your processing power was lost, and half your servers may have hard crashed and powered up elsewhere. One of those is much more disruptive than the other.
Ah, the wonderful topic of licensing that we all love so much. A lot of virtualization software is licensed per CPU socket: VMware, Microsoft (not for long, though) Veeam, Turbonomic, etc. If you have two servers with very high CPU core count, you can only buy four sockets of each of these technologies. If you have eight servers with low CPU core count, you have to buy 16 sockets of each of them. This very quickly adds up to where the software costs substantially more than does the hardware. Software also has large recurring support yearly, whereas hardware’s maintenance is normally low. This is another reason to lean back towards fewer hosts.
|Item||Big Hosts||Small Hosts|
|Processor clock speed||Lower with high core count||Higher with low core count|
|Processing density||More processing per U||Less processing per U|
|Memory||Possibly higher $/GB||Possibly lower $/GB|
|Connectivity – density||Possible performance bottleneck based on bandwidth||Easier to service lower VM density on lower bandwidth|
|Connectivity port density||Can consolidate to fewer ports and save expensive switch real estate||More ports needed for more physical boxes (overhead ports like management too)|
|Power||Less power overhead for fewer hosts||More power overhead, more hosts|
|Cooling||Denser cooling requirements for big servers||Easier to distribute heat load and eliminate hotspots|
|Failure Domains||More goes down if it dies||Smaller impact to environment|
|Licensing||Per socket licensed software cheaper with fewer sockets||Per socket licensed software quickly costs more than the hardware|
Don’t look at the chart based solely on more green — some of these weigh a lot more than others, like licensing, for example. One of our customers has so much software that is licensed per CPU socket that the difference between two big hosts and four small hosts is $50,000 just in licensing. To get around some of that, you can buy hosts with a single CPU, but not all software lets you license one socket per box, and have a minimum of two (Microsoft). So for every environment, you have to think through the different items involved and figure out which makes more sense… or call seasoned consultants who do this every day.