Raid Management: Know Whats Really Going On
In today’s hosting environment it is common place for servers to have hardware based raid cards but what is not common place is having a reliable method for checking the status of the raid arrays. Few would question the value to data integrity by making use of raid technology but very few organizations and businesses implement the tools required to proactively maintain raid arrays, they simply hope for a DC tech to hear a raid alarm and assume the technician will handle the failure. The reality is very different, data centers are loud and increasingly server-dense so hearing a raid alarm let alone pin-pointing the server with the alarm going off, is a daunting task. I remember more than a few times where I found myself with a paper towel tube to my ear listening server to server to try find that troubled box with the annoying alarm going off. This is not how servers should be managed.
As server administrators or web host operators, it is your responsibility, your duty, to have tools in place that can proactively monitor the status of raid arrays and alert you when an array becomes degraded. That way you can have a paper trail of sorts when something has went wrong, submit a ticket to your data center technicians and have the situation corrected before a degraded array from a single disk failure turns into a multi-disk failure and failed array with data loss.
I have created and been using for sometime a script that can query the status of raid controllers from Areca, 3Ware & MegaRaid. The MegaRaid support is mostly intended for Dell PowerEdge PERC cards, however it should work for most MegaRaid based controllers, ymmv though.
The principle is very simple, the package contains the proprietary command line tools from Areca, 3Ware & MegaRaid that can query the status of respective controllers and then an accompanied ‘check’ script handles determining what raid controller is on the system and then runs the appropriate tool in order to get the raid status and if it is degraded or in any state other than a consistent one, it will dispatch an alert to a configured e-mail address.
Download and extract the package:
# wget http://www.rfxn.com/downloads/raid_check_pub.tar.gz # tar xvfz raid_check_pub.tar.gz
The package will extract to raid_check/, you should place this under /root/ as the check script expects to be run from /root/raid_check/. If you wish to change the path then please modify ‘raid_check/check’.
With the package now setup under /root/raid_check/ you need to modify the ‘raid_check/check’ script to set an email address that alerts are going to get sent too. Once this is done you should symlink the check script to cron.daily so that raid failures will be picked up on daily cron runs, you may change this to cron.hourly if so desired.
# ln -s /root/raid_check/check /etc/cron.daily/raid_check
That’s it, you can give the check script a run to see if things are working. If there is a failure or inconsistency detected then it will be shown on console in addition to the email alert being sent. If everything is OK and there is no issues detected, then no output will be presented.
# sh /root/raid_check/check
Tip: You can check if your server has a raid card by running the following command:
# cat /proc/scsi/scsi | grep Vendor
If you see Vendors listed as ATA followed by hard drive model names (i.e: WDC, HD etc…) then your servers disks are directly connected and there is no raid controller present. If on the other hand you see vendor names such as Areca, AMC, 3ware, or MegaRaid then you have a hardware raid controller.
|Print article||This entry was posted by Ryan M. on February 14, 2011 at 1:59 am, and is filed under HowTo, My Blog. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site.|
No trackbacks yet.
about 2 years ago - 2 comments
The much awaited for 1.4 release of Linux Malware Detect is here! In this release there is quite literally something for everyone, from massive performance gains to FreeBSD support and everything in between . For those who wish to dive straight into it, you can run the -d or –update-ver option to update your install More >
about 2 years ago - 5 comments
New technologies, new toys — Oh how I love getting my hands dirty with them. Today I am going to have a look at ATA Over Ethernet (AoE) as an alternative solution to NFS in the role of a NAS/SAN implementation. We will look at both the server side vblade setup and the client side More >
about 2 years ago - 1 comment
Today marks the release of LMD 1.3.7, which is a minor release update that fixes a few bugs and is also the final 1.x release before version 2.0 as described in the LMD: one year later blog post. The bug list for LMD has remained very small over the last 6 months and this release More >
about 2 years ago - 7 comments
With my move back to Canada behind me and adjusting to some new routines with life, its about time to get back into the mix with the projects. Though things have been slow the last couple of months, it has not stopped me from making sure regular and prompt malware updates are released. Today, we More >
about 3 years ago - No comments
In LMD 1.3.3 there was allot of changes, 29 to be exact, that made LMD much more robust and especially the monitoring component, much more usable. If that release was about making good things better, then this release is about bringing loose ends together. I spent a couple of days running LMD through its paces More >
about 3 years ago - No comments
In my last post, I reflected on the last 7-8 years of projects here at rfxn.com, in doing so I also dug up some statistics on project downloads. I not only did this for my own curiosity but to prioritize the mile long to do list I have for the projects, based on downloads. One More >
about 3 years ago - 3 comments
Today I have released Linux Malware Detect (LMD) 1.3, the first public stable release of my malware detection tool. The documentation is a little thin but the details are on the project page and the README file should fill you in on anything you need to know, otherwise you can post a comment on the More >
about 3 years ago - 4 comments
Today I have put up a new release of BFD, version 1.4, that addresses an unsanitized variable issue that is used on the command line. This is a serious issue and should be treated as such, if you currently have BFD installed I would encourage you to update it immediately, the install.sh script in the More >
about 3 years ago - 7 comments
Recently I started to tackle a load problem on one of my personal sites, the issue was that of a poorly written but exceedingly MySQL heavy application and the load it would induce on the SQL server when 400-500 people were hammering the site at once. Further compounding this was Apache’s horrible ability to gracefully More >
about 3 years ago - No comments
Anyone who has ever used SSH key-pairs to access more than a couple of servers (or hundreds in my case), will tell you they are an invaluable convenience. It is a natural progression and very common usage that SSH key-pairs are coupled with other common tasks or tools, where having a pass phrase attached to More >