I recently came across a neglected IT system. The software itself was up to date and patched, but the processes and management of this system had fallen away. The explanation I received was a story that I hadn’t heard before, but one that is used frequently to describe such a phenomenon in a business. The story goes like this…
A newly married couple was preparing a roast beef dinner. The wife cut the ends off of the roast and placed it in the pan. The husband asked her why she was doing that. She explained that was the way her mother always did it. The next day, she rang up her mother who explained that the grandmother had done that when she was a little girl. When they asked the grandmother about it, she explained that the roasting pan she used to use was too small to fit a roast in, so she cut the ends off.
Clearly, the point of the story is to show how outdated processes can be a hindrance, or at least wasteful if not checked. When it comes to IT processes though, what can the harm be? Well, it turns out the harm can be pretty high.
The rest of this is a little technical, so hang on!
Active Directory Replication
The particular system that brought this story up was Active Directory. After working with a company, I heard a complaint that when computers would lose their domain membership, Site Support couldn’t delete the computer object from the domain and add it back with the same name. There would always be a conflict that would prevent it despite deleting the computer object from AD. As a result, their domain was riddled with computer01, computer01a, computer01b, etc.
After going through several Microsoft Active Directory Healthchecks over the years, this sounded to me like the convergence time was too high across the domain. My assumption was that the computer object deletion wouldn’t replicate fast enough before a new computer was joined to the domain. I found an excellent PowerShell script to test out my theory.
After running the script, I noticed that it was taking 15 minutes to replicate an object across the domain. In my own observation with manual object creation, this appeared to be upwards of 45 minutes! Furthermore, it was taking most of the time just to reach one particular site. I started my investigation.
I found that Site Links were neglected and in their place, manual connections were created between the domain controllers. So, I dug deeper and saw that IP subnets were incorrectly configured and domain controllers in 3 different physical locations were in 1 particular site. When I asked about this, it seemed this had been part of some legacy process.
The first thing I did was fix the Site Links to make sure where I wanted things to replicate is where they were going. I also enabled change notification on all the Site Links. If your network can handle this, I highly recommend it.
After that, I split out 3 different site domain controllers into 3 separate sites, added in the IP subnets accordingly and configured the new Site Links between them.
Now that things were looking better, I waited 15 minutes for everything to replicate around. I then logged into each domain controller and deleted the manual links that were created. I tried to do this (as much as possible) in pairs so that when I kicked off the KCC, it would find the new pair and create it, which it did.
I gave the domain around an hour to flesh out the new connections. One sidenote – there were 2 legacy 2003 domain controllers. I noticed the 2 Sites with those were having problems automatically creating all the Site Links. These were set to be retired, so I isolated them using their own IPs as the subnet and then the Site Links were able to be created properly.
I let everything settle down for around an hour and then ran the convergence test again. It was down to 43 seconds! No more sequential computer names need to be created!
Now that replication was fixed, it was time to check out Group Policies. What I found there was mind boggling. There were at least 8 DNS based group policies on site based computer OUs that did the exact same thing. Several of these had legacy VBS scripts that no longer existed on the NETLOGON share.
There were also other Group Policies with other VBS scripts that no longer existed and policies set with no settings. I found computer policies with the User Configuration enabled, but no user settings. There were also Domain Level policies with security settings configured to apply to only 1 user or a few computers (like Domain Controllers).
Where to start? Well, I consolidated the DNS policies down into one and removed the offending VBS script. I also combined several other Computer based (not site based) GPOs into one and disabled the User Configuration setting from processing. This would speed up the GPO processing in general.
I also combined the top level domain policy that was only applying to some Domain Controllers (because new ones hadn’t been added), into the Default Domain Controllers Policy. I also added in another policy that was on the Default Domain Controllers OU into the same policy.
I moved down top level policies that only applied to one user or computer (for testing) to avoid someone accidentally turning it on for everyone. This should also speed things up because users and computers would no longer even see this policy.
Needless to say, this whole exercise took about 4 hours, but the benefits were massive. Local errors were reduced, login times were decreased and simplicity was restored! The changes were communicated out to support teams so that these legacy processes were removed or updated.
No more Roast Beef!
Active Directory is a great example of something that can be so easy to manage, it falls to the way side for support. It can be passed around to those that just know enough and perpetuate legacy issues over and over. This served as a good lesson that can be applied to any IT systems. It is easy to assume legacy processes are still relevant enough to support the environment, but what does it take to give the users a good experience? In this case, it was buckling down for a day and sorting things out. That wasn’t so bad!