« Linux Kernel 2.6.3 Part 3 | Main | Kernel 2.6.4 »

March 09, 2004

Another one of dem days

Everyone has a bad day at work from time to time. I swear that I've had a bad 4 months.

Today, I got a call at 7:30am from our news department. Apparently people were having trouble logging into their machines. This isn't the first time that this has happened. The issue was a result of having two WINS servers, one for the old address range we've been moving from, and one for the new address ranges that we're moving to. In the past, we've gone into safe mode and manually added the WINS servers to the networking properties, which has fixed the issue. Well, it happened again with a machine that had already been fixed.

I've been holding off on doing anything with the WINS servers until we get our new server infrastructure in place. I'd spoken with corporate about the issue, which they recommended going ahead and switching over the existing servers so that we could eliminate the problem. So, I decided that the only way to get around this problem once and for all was to go ahead and make the move.

I came into work about 45 minutes early, which I considered more than enough time to change 3 IP addresses and any DNS, DHCP, and WINS configurations. Initially, everything went off without a hitch. I re-IP'ed the domain server and pointed it to the new WINS address. Following reboot, the machine came back up and logged in with no problems. Next, I changed the print server, which was also acting as the old WINS server. I turned off all of the services, turned off their boot up options, re-IP'ed the address, and rebooted the machine. Again, I was able to log in with no problems. Finally, I did the same thing with our e-mail server, changing the IP addresses and the like, making sure that the DNS was pointing to the new address, etc.

Before I rebooted the e-mail server, I went into the WINS and DNS server and made sure that everything was pointing at the correct addresses. I removed all old DNS and WINS references and replaced the DNS references with the new IP addresses of all the machines. I then restarted the e-mail server and left the room, heading back to the news department. I told users to reboot and to log back in to make sure that everything was working properly.

Problems popped up immediately.

Problem #1: No one in news could print.
Apparently the sales printers were unaffected by this change, but for some reason the printers in news, in addition to several other printers throughout the building could no longer print. Apparently, these printers were unable to speak with the print server on the new IP address range, so I configured the print servers that they were connected on with IPs in the new address range, in addition to pointing them to the new WINS server. Problem resolved in about 20 minutes total.

Problem #2: People started telling me that they couldn't get into e-mail.
Someone in news said to me "Microsoft isn't working". Since this was extremely vague (it could've been anything, if you think about it), I stopped what I was doing and went to their machine. Apparently, whenever someone tried to log into Outlook, it would pop open a box asking for the user name, password, and domain name. I tried logging the user in several times, but with no luck. I was able to get Outlook to see the Exchange server only once, using the administrator password.

I logged into the e-mail server remotely, and noticed that an error message had popped up. It seems that upon the last reboot, some services had failed to start. I opened the event viewer and looked at the error logs. The logs reported that the machine hadn't logged into the domain properly, and therefore the service didn't start.

I went back to the server room and sure enough, all of the exchange services weren't running. I rebooted the machine, a common fix for NT servers that are being screwy, and logged back in. Again, I'm given an error saying that the services aren't running. In addition, I'm given an error saying that the domain controller couldn't be contacted.

Great. So, I go into the networking properties, thinking that I mistyped an IP address in wrong. Nope. Everything was in correctly, so I opt to re-join the domain. As the domain was already selected, I typed in the administrator account and password and waited. Got another error message saying that the domain couldn't be found.

At this point, I was in a panic. It was 8:20am and everyone would be coming into work in roughly 10 minutes. I called corporate and spoke with the corporate IT manager, asking him to relay a message to the exchange guru that I needed help. It would still be another 40 minutes before she got in, so I started looking through other machines to make sure that IP addresses were correct, WINS and DNS entries matched what they were supposed to be, and any other thing I can think of.

Finally, I get a call from the exchange guru, and she gets into the e-mail server. Her and I start going through different scenarios, each leading to a dead end. We worked for close to 4 hours total, trying different machines, reconfiguring DNS and WINS, using LMHOST files, and everything else that you can think of. Nothing was working.

Finally, we came to the conclusion that a master browser must either be corrupt on one of the domain controllers, or that another machine in the building had taken over the master browser and was chatting up the network, keeping other machines from authenticating. We try turning off the PDC and leaving the BDC up, then restarting the e-mail server. No good, still can't find the domain. We take down the BDC, leaving the PDC up and restart the e-mail server. Still no good.

We noticed when trying to get netbios to talk to the PDC that it's not communicating properly. Desparate, we try one last thing before we break out the network tools ... promoting the backup domain controller to primary. We made the change, rebooted the e-mail server, and tried to add the e-mail server back to the domain.

Success.

Now, I'm the kind of person that believes in luck, fate, and bad/good karma. This machine somehow knew that it was going to be replaced, and decided to fuck up. I'm sure of it. There's no other reason why a simple IP address change would cause this machine to fart out on us like it did. Well ... all I have to say to this machine is ... FUCK YOU!!!! I can't wait to take a fucking sledge hammer to this piece of shit and destroy the fuck out of it. Think of how those guys in Office Space destroyed that printer, and multiply the damage by 10. You won't recognize this piece of shit when I'm through with it. It's been a headache all along, and destroying it will do me good.

There ... I'm done.

Posted by ed at March 9, 2004 07:04 PM

Comments

Post a comment

Thanks for signing in, . Now you can comment. (sign out)

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


Remember me?