In recent years, Dell has come a long way in the networking department. The days of shuddering in disgust at the sight of a Dell PowerConnect in the datacenter are over. Some time ago, after Dell acquired Force10, their networking game has stepped up.
We’ve had a lot of success with Dell’s N-Series switching gear with our clients. They boast very good performance, a solid feature set including OSPF, PoE, QoS, 10GbE, stacking, and so on. This all makes for a solid alternative to other, pricier options …
Enter the Dell S-Series. This a step up the ladder and is geared for the datacenter. This also boasts all the features you could ask for in a datacenter switch.
A Dell S-Series in the Wild
Over the summer we installed a pair of Dell S4128-ON switches: 10 GbE, Dell’s new OS10, Layer 3 functionality. The first stage of the project was to integrate new networking, storage and hypervisors in to the network. The new equipment was to connect to the old long enough to allow us to migrate all the VMs to the new environment.
Phase two is where the wheels came off. After migrating everything to the new environment, we started to see issues. Our client called and reported that all network services were down. We could not even connect to the Dell switches remotely. Once the colocation’s staff connected a crash cart to the switch and established a support session, we started to dig in.
Specifically, we saw the following behavior:
- The default route was no longer in the routing table despite being in the config. Even worse, we were not able to add/remove the default route. We ended up adding a new default route to a completely different interface. Only then were we able to remove and re-added our real default route.
- ARP was not passing over our channel group interfaces from our VMware hosts.
- Spanning tree was not able to see 802.1Q to differentiate between VLANs.
- Editing VLAN assignments on interfaces caused the CLI to crash.
Turns out ALL of these issues were related to the firmware (10.3.0E) running on the switches. Full disclosure: firmware 10.3.0E was the most recent firmware available at that time. This is why you should always pay attention to update bulletins.
When we upgraded to firmware version 10.4.2.1, all issues were resolved.
This is disappointing, because the firmware should have been upgraded sooner. Hindsight is always 20/20, but the purpose of these systems is to support the business. That can make it difficult to get the proper maintenance windows to apply updates or perform other tasks. Let’s face it, some businesses must run 24/7 365. Every second systems are down — planned or unplanned — translates to dollars out the window.
We often come in to an environment where firmware, updates, etc. are all severely out of date. The answer to the question of performing these basic maintenance tasks are typically: “We can’t afford to be down,” or, “These systems have to stay up all the time.” In the eyes of the business, maintenance is a guaranteed loser.
Ultimately, IT is a function to support the business. It’s on us to keep the systems running and minimize outages. If your systems require 24/7 uptime but cannot be have basic maintenance without bringing business to a halt, there’s a problem your systems do not have a redundant infrastructure. There is no such thing as an individual component that can have zero down time. There are ways that a business can keep running and be properly maintained.
If you have questions about the best procedure to apply patches and updates or need help figuring out a way to drastically minimize your downtime for these types of projects, we’re here to help.