Since virtualization encompasses so much in your environment, there are a lot of small ways it can go wrong. When we wrote our first installment, we realized there were quite a few more that we wanted to point out.
1. Not separating different network traffic
You can easily assign different port groups to separate these networks. We typically do this through vLANs or separate physical interfaces and/or switches.
There are a variety of performance and security reasons to keep them separated. For example, vMotion traffic is very demanding when in use and you wouldn’t want it to hinder performance on your production. We’ve seen people accidentally DDoS their production network due to vMotion sucking up all the network resources, and usually we just have to wait it out until the processes are done to bring everything back online.
2. Not using hardware features when available
Performance is typically better when utilizing features within your hardware instead of adding them in a software layer. Software has to boot up and load and uses host overhead to do it. And, if something is going awry, it’s easier to troubleshoot hardware than it is software.
3. Only doing one VM per datastore
There are max limits for VM datastores per host, and you could hit that much quicker if you assign one datastore per VM. This can also waste a lot of storage since you’re carving it up for each VM. It’s also much more difficult to manage a high number of datastores. Less is more.
We see this frequently in the spirit of disk queuing and capacity management. However, Hyper-V and VMware both have management tools now that make the management of these much easier.
However, if you have a smaller environment and use SAN-based replication, you may still want to go this route.
4. Placing any redundant virtual machines on same datastore or host
If the datastore goes offline, both nodes in your HA cluster can go down. If your redundant VMs are both on the same datastore, they’ll both go down, defeating their purpose. Using tools like DRS in VMware or policies in Hyper-V will help you prevent this from happening, even inadvertently.
For example, don’t put all your domain controllers on the same datastore or host. You can lose authentication and DNS when it goes down and it tanks your whole environment.
5. Not accounting for snapshot storage space requirements
This one can cause the aforementioned problem. If you run out of space — potentially caused by a snapshot-based virtual backup tool like Veeam — it can take your datastore offline, which would then take all your VMs that are on it offline as well. Hello, outage.
As a rule of thumb, we usually tell people to add at least 20 percent of their total machine to their datastore to account for these snapshots. For example, if you’ve got one terabyte, we’ll recommend you have at least 1.2 terabytes of space available. Additionally, you should fully understand your workloads and subsequently your data change rate in order to properly calculate your storage size to accommodate snapshots.
6. Not setting up NTP properly
Network Time Protocol (NTP) needs to connect externally to a hosted source, i.e. a physical domain controller outside your virtual environment or outside of your organization all together like pool.ntp.org in order to keep the hosts in time sync. The virtual machines within a host typically pull their time from the host, so if the host is misconfigured so will the VMs be. PLEASE DO NOT point your host to a virtual domain controller for NTP. This can cause time drift, which is a lot less cool than it sounds, and it’s a major pain to troubleshoot.
Everything in your environment should point to the same time source. Time drift and time misalignment can wreak havoc on your domain functions, like AD replication.
7. Misconfiguring default gateway
When routing doesn’t work, lots of stuff doesn’t work. It’s common to want to create an isolated management network for security purposes, but you have to be careful when doing so because that management network can take control of the routing for all your networks. If you lock it down, then it can’t reach the other networks and traffic will get lost.
8. Giving iSCSI its own gateway
It’s not recommended to route iSCSI. It should be on layer 2 only since adding additional hops to it when it goes through firewalls and/or routers on layer 3 causes problems, on top of it adding a lots overhead to your layer 3 network.
9. Using legacy virtual adapters
A lot of machines default to the legacy virtual adapter since it’s the most compatible with all the OSes in your environment. However, you lose performance and tuning capabilities by utilizing the adapter, since it’s built to completely emulate the physical adapter. Hyper-V is smart enough to prevent you using this unless your OS requires it.
Also, if you need 10 Gbe capabilities, the VMX Net3 adapter in VMware or Network Adapter in Hyper-V will give you those capabilities, but you would have to some additional configuration to get that throughput.
10. Using external DNS
Please do not configure your virtualization hosts externally, such as to Google, 220.127.116.11. Point your DNS to internal DNS servers that you control, preferably DNS servers that are outside your virtual environment to prevent any additional problems if your hosts go down. So, if you have a physical domain controller, make that your primary DNS server. For example, if all your VMs are down and you’re trying to bring vCenter back up, it doesn’t function properly without DNS. You have to bring up your domain controllers and DNS first. Not knowing where those servers are can add precious time to your stressful outage window.
If you’re wondering how to prevent DNS issues if your physical domain controller goes down, there are some tricks you can do within your virtual environment to add some redundancy.
11. Not thinking through your virtual disk type
Thin provisioning at both the SAN and hypervisor level causes additional management overhead, as the capacity has to be watched at both the front-end and the back-end. It is also very hard to figure out how much storage is actually in use at any time with both sides masking capacities. It is recommended to either do it on the front-end or on the back-end, but not both.
For more information, check out our blog comparing virtual disk types.
12. Losing sight of ROI
Remember why you virtualized to start with: to consolidate and save money and time. We see a lot of server creep, which is an unnecessary amount of servers, because it’s so much easier to just provision them. This drains resources, requires extra licensing, and adds layers of complexity. Natural growth makes sense, of course, and certain applications require isolated VMs or servers for maintenance reasons, but it’s important to think carefully about the most efficient use of your virtual servers before provisioning a new one.
We wouldn’t be IT guys if we didn’t throw out the disclaimer that this all “depends” on a lot of factors. The rules above go for most typical environments and workloads, but there edge cases to everything that might require special configuration.