An Article from Aaron's Article ArchiveWhen Good Upgrades Go Bad
Photo: BeedaisyIPv4You are not logged in. Click here to log in.
Use Google to search aarongifford.com:
When Good Upgrades Go Bad
Thursday, 22 January 2004 7:08 PM MST
Web Site News
Monday night, I shut down my personal web site and other services running on my server at my house in preparation to upgrade to from FreeBSD 5.1 to FreeBSD 5.2. On Tuesday my server spent it's CPU cycles building version 5.2.
Sometime later, I rebooted to 5.2 and things looked good, or nearly so. My mail system refused to start, so there must have been something between FreeBSD 5.1 and 5.2 that freaked Postfix out.
Once running version 5.2, I then started rebuilding various software packages, including Postfix, so that I'd have the latest stable versions and so that I wouldn't run into any more unexpected weirdness like I had with Postfix.
Wednesday morning after I got up, I restarted the server again. Trouble! My software RAID-5 array failed to come up, the vinum system crashing my kernel. Eeek!
I went into panic mode, booted to single-user (my root partition is NOT on a vinum partition, and so was accessable). I typed
vinum start on the command-line to load vinum and bring my RAID-5 volume up. Crash!
Next reboot, instead of doing
vinum start, I slowed down and ran vinum interactively, then read the config in from each disk one-at-a-time. Vinum seemed to decide that my RAID-5 plex was crashed, even though I knew it wasn't. Ick.
I went through many contortions trying to get my RAID-5 volume back up, since it has lots of my personal data I've accumulated through the years on it (and it has this web site too), including lots of my digital photographs, etc.--I really didn't want to lose it. I even stooped to removing the old vinum configuration, and reconfiguring the volume, plex, and subdisks carefully by hand, forcing them into the down state, then forcing them up when I was ready.
Through it all, something in either the 5.2 kernel or the 5.2 vinum command-line tool just kept freaking the set-up out. Sometimes I'd get errors that wouldn't let me write the configuration back to disk. *Grrrrrrr!* I was frustrated—very!
At one point, I decided to double-check my FreeBSD slice partition information with
disklable). Whoa! My vinum partitions were there, but listed as unused instead of vinum as I expected.
All this time, I was using
ed as my editor, since my preferred
vi was on my inaccessable
/usr filesystem. It was mostly trial-and-error, since I don't know ed at all.
After resetting my vinum partitions (using something like
bsdlabel -R ad0s1 ad0s1.file after having dumped the existing config to file and having edited it), I tried reconstructing my plex and volume by hand again. No go! I just got the same errors as before.
This morning, I decided to try the only thing I could think of that I hadn't yet tried. I would attempt to boot from my backup root partition which had my old 5.1 kernel and root-parition binaries on it, and try one last time to bring my RAID-5 array back online.
If this failed, I would either lose my data forever, or have to spend some big $$$ to build myself a new hardware-based RAID-5 array large enough to store disk images from my four 120 gigabyte drives, then hope that over the course of weeks or months I could decipher how vinum works well enough to reconstruct my data by hand.
Hallelujah! It appeared to work! With FreeBSD 5.1's vinum, when I configured things by hand and then brought my volume, plex and subdisks to the up state, I got no errors. Then, when I saved the configuration back to disk, vinum didn't crash like it was wont to do with my 5.2 kernel.
But the true test was to run vinum's parity checker,
fsck the filesystem, then mount it. I started the parity checker. It would take quite a long time to scan the entire 333 gigabyte array. Sweet! There weren't any errors in the first 1% of the scan. So far so good...
I left the machine scanning away while I did other things, then later in the day returned to see that it had completed without finding any parity errors. What a sweet and hopeful sight that was to see! The filesystem check went more quickly and found no errors either.
You don't know how happy I was. I was elated as I ran
mount /usr and then browsed my data. It was all there. What a relief!
This afternoon, I set my machine busy downloading sources for FreeBSD 5.1 so I could reinstall 5.1 cleanly from sources, then rebuild the various pieces of server software (like Apache for running this web site) and other utilities I like to use.
Tonight, things are at last mostly back to normal, web, database, and e-mail reinstalled. I've still got some final clean-up, and I need to reinstall Samba so I can share my files with my Windows boxes.
After this incident, I don't know if I encountered a bug in vinum, or FreeBSD 5.2, or just a problem more specific to my compiler settings in
/etc/make.conf. Likewise, I didn't collect any crash data, so I can't really submit a bug report. But since I want to keep my data intact, I don't think I'll be messing around with 5.2 for a while. I'll stick with 5.1.
Some of you might wonder why in the world I'm running 5.x in the first place, since it's the 4.x line (4.9 RELEASE currently) that FreeBSD recommends for production use. Well, I only run 5.1 because I was crazy enough last year to try out 5.x. And once 5.1 seemed stable, it just didn't seem worth the trouble to try to backtrack to 4.x.
But I have learned my lesson with my personal data. I'll definitely be far more conservative in all future upgrades of my personal system. I mean, I wouldn't try things like this on my employer's production boxes, so why on earth would I experiment with my own system when I don't want to lose my personal data stored there?
Now others of you might wonder why I don't have something in place for doing backups. Hey, I'd love to be able to back things up, but after having spent so much on this RAID-5 array, and having collected more than can fit on my external USB/Firewire 120 GB drive, I don't really have a good way to back this stuff up. Someday, I hope to, though...
And that is how my good upgrade went bad.
Track Back: 0 Post Comment
Copyright © 1993-2012 - Aaron D. Gifford - All Rights Reserved