Updates are by far one of the most trying things in IT, but they need to be a priority. The potential ramifications for not updating software and hardware are just too extreme to ignore.
It doesn’t seem to matter how many updates you run, you are ALWAYS behind. The thing about all of these patches and updates, though, is that they truly are vital at every level of the environment and you have to figure out the best way for your organization to test them and apply them regularly.
I’ve seen old versions of RAID card firmware cause all configuration to randomly be lost, BIOS versions that prevent the server from rebooting, VMware bugs that prevent VMs from ever being turned on again, storage bugs that corrupt data, switch bugs that cause random reboots, firewall bugs that allow for backdoors….I could go on. Every update is vital and they’re all a constantly moving target.
We all either have a server in our environment or know of one out there that has spawned a similar conversation:
“This thing hasn’t been updated in six years! Why not?”
“It’s running like a champ, we’re not going to touch it.”
And this all seems wonderful…until something breaks.
Then the manufacturers will tell you that version of the software is no longer supported and you HAVE to do an emergency upgrade; or the hardware is so out of date that a replacement part won’t work with the old versions of BIOS. A customer had this hands-off approach and then had a RAID card lose configuration to a bug that was patched four years ago. It ran great, so they didn’t touch it… and we had to retag all RAID sets on a 15-disk JBOD through trial and error because no one quite remembered the layout.
What Should Regular Updating Look Like?
For smaller organizations, this usually takes the form of every one to three months doing updates on production and hoping it doesn’t break anything. There isn’t usually a lot of testing involved.
For larger organizations, there are full test environments that are set up for things like new patches and updates to make sure they won’t affect production. Large companies invest a lot of time and money into these test environments because one bad patch could cost them millions.
Test environments are great and are how things should be done. The rub is that testing often takes so long that there is a large delay between patches being released and getting applied; or the test environment doesn’t represent the real world variables properly and there are still some problems in production. In short, there’s no perfect solution in the world; but patches HAVE to be applied in a reasonably frequent manner.
Many companies have a good update system for Windows and associated products, but then miss all the surrounding ecosystems. Think about all the pieces that make up each system:
- Hardware firmware
- Microsoft based applications
- Line of business applications
- Ancillary applications
Almost every product has a manner available for centralized and automated updates. Hardware vendors offer distributed update methods for their products, anti-virus has distributed updates, Microsoft SCCM or Dell KACE can help patch other software and drivers. All of these products, of course, require their own care and feeding to be successful.
Keep this list in mind and create update policies and procedures for all of it. It may be that anti-virus updates go out daily, while Windows patches go out weekly and the rest of the items are updated quarterly. For small companies, this will almost certainly be a manual patch procedure while large companies HAVE to have automation. Regardless of your design, the key is to make sure it happens regularly. Even quarterly updates will prevent a large amount of issues from happening and will prevent the truly terrifying updates: the multi-hop.
I recently had a customer need to go from version three to 10 of a software. It required eight separate major updates, each of which had its own special needs and requirements spanning back six years. When the updates were new, everyone knew how to install them, but with it being so far out it was a major five-day long series of hops and testing.
Don’t Ignore Networking and Storage
Don’t forget the rest of your environment either. Networking and storage equipment are two of the biggest victims of the “it’s running fine, why would I update it?” attitude. Many admins of these systems pride themselves on uptimes measured in hundreds or thousands of days. The problem is that no devices are perfect (shocker, right?). As more and more complexity is added to the devices, there are more potential undetectable problems that break and need to be patched.
Nothing should be exempted from patching.
Don’t Blame the Devs
In their defense, the job of modern developers is rough. They have to be constantly adding new features and functions to their products, continue to support all of their current functionality, turn the software around quickly enough to be competitive, and validate that it will work in a nearly infinite permutation of environments. This is why the most stable software is also often the most expensive and with the most restrictive hardware options — they need it to be in order to qualify for sure that it WILL work, without question.
It’s not that developers are just lazy now, programs are infinitely more complicated than they were even a few years ago. Windows 95 fit onto 13 floppy drives, totaling 19 megabytes. The video driver I just downloaded was 285 megabytes. The world is getting more and more complicated, which makes developers of even simple applications have to deal with hundreds of permutations on how their product may be ran. This is the big reason that software used to “run fine without all these huge bugs” — computers were rarer and people did a lot less with them. It was a simpler time.
Unfortunately now, it’s necessary to stay as up to date as possible with updates.