ATA Over Ethernet: As an Alternative

New technologies, new toys — Oh how I love getting my hands dirty with them. Today I am going to have a look at ATA Over Ethernet (AoE) as an alternative solution to NFS in the role of a NAS/SAN implementation. We will look at both the server side vblade setup and the client side AoE kernel module along with a practical deployment setup which includes a convenience script I developed to make vbladed slightly less of a nuisance to maintain.

First things first though, what exactly is ATA Over Ethernet? Straight off the wikipedia page, here are the important parts that describe AoE best:

"ATA over Ethernet (AoE) is a network protocol developed by the Brantley Coile Company, designed for simple, high-performance access of SATA storage devices over Ethernet networks. It is used to build storage area networks (SANs) with low-cost, standard technologies.
AoE runs on layer 2 Ethernet, it does not use internet protocol (IP), so it cannot be accessed over the Internet or other IP networks. In this regard it is more comparable to Fibre Channel over Ethernet.
SATA (and older PATA) hard drives use the Advanced Technology Attachment (ATA) protocol to issue commands, such as read, write, and status. AoE encapsulates those commands inside Ethernet frames and lets them travel over an Ethernet network instead of a SATA or 40-pin ribbon cable. By using an AoE driver, the host operating system is able to access a remote disk as if it were directly attached."

OK, of note here is that AoE is an ATA implementation over Ethernet, being layer 2 it is a dumb protocol with no knowledge of the TCP/IP stack, as such it can only communicate in the simplest of ways inside a switched network (its packets cant be routed between multiple networks). As such, AoE is ideal when used on a private network or better yet a network dedicated to SAN (Storage Area Network), it can however be used on a public facing network as so long as the hosts in the AoE network are all within the same switched segment of the network (More info here on routable AoE).

That all said, what makes AoE a viable alternative to NFS? Well in the role of storage access in its simplest capacity, NFS is just bloated and adds a significant amount of overhead and complexity to something that deserves to be simple. Further, NFS is woefully inadequate at maintaining the level of reliability required when you are, for example, exporting an entire file system to another device for the purpose of high-availability usage such as a /home extension or MySQL file system. Personally, I am slightly biased as I hate NFS; I use it, but only for a lack of often anything better to meet the role in exporting file systems and directory trees across networks. Although it does what its supposed to just fine, more often than not you can get woken up at 4AM with the most mysterious and sudden of NFS issues that are notorious for being mind numbing to resolve. It is for this simple reason — NFS’s lack of reliability, that sent me searching for a simple, scalable and reliable alternative. AoE has managed to meet two of these three points — simple and reliable, while coming up short on the scalable side, more on that in a bit.

There are two components of an AoE setup, the server side storage device that will run vblade and the client side that will access the exported storage using the AoE kernel module under Linux. I should note that although the vblade server package is for Linux, the client side drivers are available for Windows, OS X, FreeBSD and more; in Linux the AoE kernel module is part of the mainline kernel.

The server you choose to run vblade can be any device that you want to export files or devices on, there is little in the way of requirements as vblade is a pretty slim package and doesn’t consume much in the way of resources other than CPU. For a modest environment where you plan to export to no more than 10-15 clients, a dual core system with 2GB RAM is more than sufficient for the vblade server. For my deployment, I run vblade on a quad core Xeon 3.0Ghz, 6GB RAM and 9TB Raid5 array that exports to 54 client servers. More on my setup later when we review scalability but for now lets jump right into the vblade setup and usage.

Lets go ahead and grab the vblade package, compile and install it:

# wget
# tar xvfz vblade-20.tgz
# cd vblade-20
# make && make install
install vblade /usr/sbin/
install vbladed /usr/sbin/
install vblade.8 /usr/share/man/man8/

There is no compile time configure script or any other real configuration required, vblade installs straight into /usr/sbin and is an overall painless process. The simplicity of the vblade package comes at a cost, in that there is no support for a configuration file to control multiple vblade instances, making things slightly tedious. This should not detract from the use of vblade, it is a mature and reliable package but one with a very simple approach that does little in the way other than what it is supposed to do.

To make life easier for myself, I created a wrapper of the sorts to add support for a configuration file along with limited error checking and some command line conveniences — we’ll grab the wrapper and default config template as follows:

# wget
# wget
# mv vbladed.conf /etc/
# mv vbladectl /usr/sbin
# chmod 640 /etc/vbladed.conf
# chmod 750 /usr/sbin/vbladectl
# ln -s /usr/sbin/vbladectl /etc/init.d/vbladed
# chkconfig --level 2345 on

You will note, that we enabled vblade to start on boot through init, although the wrapper is not technically an init script, it does support being called from init and managed through chkconfig for convenience. Lets look at the configuration file /etc/vbladed.conf then we’ll review the vbladectl usage after that:

# vbladed export configuration file

# unique shelf identifier for this vblade server
SHELF="0"     # must be numeric 0-254, default 0

# 0 /data/server.img FF:FF:FF:FF:FF:FF eth1 server

The configuration file is pretty straight forward, the SHELF variable only matters if you intend to run multiple vblade servers on the same network, if that is the case then this value must be unique to each vblade server or you will run into client side conflicts of being unable to distinguish between vblade servers. The export definitions follow in the format of “AOESLOT FILE MAC IFACE ALIAS” which the below breaks down further:
AOESLOT is a per-client identifier for EACH exported file or device to the SAME client; in other words if you configure multiple exports to the same client server then this value needs to be unique for each.
FILE is the full path to a device or file you want to export, this can be an unformatted raw device such as /dev/sdb, a preformatted partition such as /dev/sdb5 or a loopback image such as /data/server.img.
MAC is the MAC address of the client-side interface that is attached to the network you intend AoE traffic to move over; more appropriately, it is the interface connected to your private network on the client server
IFACE is the server-side interface that can reach the client-side interface you defined the MAC address for; more appropriately, it is the interface connected to your private network on the vblade server
ALIAS is a reference alias for each configuration entry, this must be unique to each vbladed.conf definition

For the purpose of this article, we will go ahead and create a loopback image, format it and export it for a client server called apollo, then we will review how to import the file system onto the apollo server using the AoE kernel module. First, lets create our image:

# dd if=/dev/zero of=/home/apollo.img bs=1 count=0 seek=10G
# yes | mkfs.ext3 /home/apollo.img

This will create a sparse, zero filled file, meaning it will be 0bytes on disk and allocate space, up to 10G, as data is stored to it. There is a slight performance hit to this as the image file must grow itself as data is written, this however is made up for in improved efficiency of space usage. To create an image that preallocates space on the disk you would run ‘# dd if=/dev/zero of=/home/apollo.img bs=1M count=10000‘, be patient as this will take some time to complete, then format it as described above.

Now that we have the image/device we want to export, we need to add the definition for it into the vbladed.conf file, to do so we need to note the MAC address of the interface on apollo that will communicate with the vblade server, in our case this is a private interface eth1 but in your setup it can be a public facing interface if needed — just make sure its within the same subnet as the vblade server.

[root@apollo ~]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:16:E6:D3:ED:E5
          inet addr:  Bcast:  Mask:
    ... truncated ...

We now have the client side MAC address (00:16:E6:D3:ED:E5) and we have the device/file we want to export (/home/apollo.img), and we also know the private network interface on our vblade server is eth1 as well, so we can create the vbladed.conf definition:

0 /home/apollo.img 00:16:E6:D3:ED:E51 eth1 apollo

That should be appended into the bottom of /etc/vbladed.conf, then we are ready to start the vblade instance for the configuration we’ve added. The vbladectl wrapper includes start, stop and restart flags which also accept an optional alias for performing actions against only a specific vblade instance, run vbladectl with no options for usage help. Time to start the vblade instance for apollo as follows:

# /usr/sbin/vbladectl start apollo
started vbladed for apollo (pid:16320 file:/home/apollo.img iface:eth1 mac:00:16:E6:D3:ED:E51)
( you could also just pass the start option without an alias to start instances for all entries in vbladectl.conf )

The default behavior for vblade also sends log data to the kernel log, typically /var/log/messages on most systems, so tailing the log will produce the following logs if all is normal:

# tail /var/log/messages
Apr  3 16:49:25 backup5 vbladed: started vbladed for apollo (pid:16320 file:/home/apollo.img iface:eth1 mac:00:16:E6:D3:ED:E51)
Apr  3 16:49:24 backup5 vbladed: pid 16320: e0.0, 419430400 sectors O_RDWR

The important part there is the ‘vbladed: pid 16320: e0.0, 419430400 sectors O_RDWR’ entry in the log as this comes from vblade itself, the other log entry comes from the wrapper. This log entry tells us that vbladed forked off successfully and that it has exported our data for the defined server as e0.0 (etherdrive shelf 0 slot 0), you’ll see the significance of this shortly.

We are now ready to move over to our client server, apollo, and import our new AoE file system. This is an easy task and if you are running a current Fedora / RHEL (CentOS) based distribution, you’ll find the AoE kernel module already included. The module is also part of the mainline kernel so if you are using a custom kernel, please be sure to enable the corresponding config option (CONFIG_ATA_OVER_ETH).

There is really no right way to load a kernel module, you can either use modprobe which I recommend or you can use insmod on the modules full path, which is a matter of preference. Let’s first verify the module exists, which modprobe does for us but for the sake of this article and familiarity, we will check (remember you’re running this on the client server, i.e apollo):

# find /lib/modules/$(uname -r)/ -name "aoe.ko"

There we have it, the module returned fine, listing the full path to it. If you did not get anything back this may be that you are running a custom kernel by your own choosing, and need to configure the CONFIG_ATA_OVER_ETH option. It may also be that your data center provider or a software vendor installed a custom kernel without this feature and you should contact them requesting it. As an alternative, you could download the etherdrive sources for the AoE kernel module on the coraid website and compile it against your kernel, this requires your kernel build sources or on RHEL based systems the kernel-headers package.

That said, we will now load the module using modprobe, the preferred method:

# /sbin/modprobe aoe
( or you can run /sbin/insmod MODULE-PATH )

If everything went OK, then modprobe will generate no output and you can verify the module is loaded as follows:

# lsmod | grep aoe
aoe                    60385  1

When the AoE module is loaded it will start listening for broadcast traffic from AoE on all available interfaces, a very passive process. If you have done everything correct then the module will quickly detect the exported device/file from the vblade server and inform you in the kernel log along with creating the appropriate /dev/etherd/ device file. Let’s verify this by checking the log and then checking the /dev/etherd path:

# tail /var/log/messages
Apr  4 17:13:02 apollo kernel: aoe: aoe_init: AoE v22i initialised.
Apr  4 17:13:02 apollo kernel: aoe: 003048761643 e0.0 v4014 has 419430400 sectors
Apr  4 17:13:02 apollo kernel:  etherd/e0.0: unknown partition table
# ls /dev/etherd/

If for some reason you do not see the log entries described above along with no e0.0 device file under /dev/ethered, this may be a misconfiguration on the vblade server, perhaps you got the interface or mac address in vbladed.conf wrong? Double check all values. If you opted to try run things over a public facing interface, the issue may be that your network VLAN’s each server (which is fairly common), in that case you may need to request that all your hardware be part of the same VLAN or the provisioning of a private switch and private links for your hardware.

Assuming that things went good, that you see the appropriate log entries and the e0.0 device file under /dev/etherd/, we are ready to mount the file system, we will mount it as /mnt/aoe for the purpose of this article:

# mkdir /mnt/aoe
# mount /dev/etherd/e0.0 /mnt/aoe
# df -h /mnt/aoe
Filesystem            Size  Used Avail Use% Mounted on
/dev/etherd/e0.0      5G   36M  4.9G  0% /mnt/aoe

You may run into an issue of unrecognized file system on the device, though the file system we created on it, on the vblade server, should show through. If it does not, simply run an ‘mkfs.ext3 /dev/etherd/e0.0’ on it and you will be all set. There is no hard set rule on creating the file system on the vblade server, you could just export raw images and devices then partition/format file systems on them on a per-client basis as you require it.

The only thing that is left is to set our new file system on apollo to load at boot time, the simplest way to do this is to append a couple of lines to /etc/rc.local as follows:

/sbin/modprobe aoe
sleep 5 ; mount /dev/etherd/e0.0 /mnt/aoe -onoatime

The rc.local script is run at boot time after all other services have started, so if you are loading a file system used for mysql, user home data or similar you will probably want to also add a line after the mount to restart said services. You’ll also notice two things about the entries we added to rc.local; The first is the sleep delay before the mount, this allows the aoe kernel module to complete its discovery process for AoE file systems before we try to mount it. Then, we are using the noatime option on the mount command, which disables the updating of the last access time on files during read/write operations. This is important because traditionally whenever a file is read from disk, it causes a write operation back to disk to update the atime attribute on the file, so disabling atime usage can greatly reduce i/o calls (effectively in half for reads), which is especially significant for networked file systems.

I have had an overall good experience with AoE so far, it is incredibly simple and very reliable as an implementation. The only issue I have seen is the scalability of it and I attribute this more to the vblade server package than AoE as a protocol. There appears to be a degradation in I/O throughput performance for exported file systems that is in-line with the number of (instances) file systems you export on the same physical server. The best usage example of this is that in my environment I run vblade on one server with exports to 54 servers, the throughput when there is 1-10 instances running averages about 51MB/s (408Mbit), as that increases though to 54 instances, the throughput per client server drops drastically to an average of 14MB/s (112Mbit). This is a very sharp decrease in performance, one that makes the viability of vblade in much larger of a setup questionable.

I do need to caution that this issue may be environment specific as speaking to other vblade users has produced mixed feedback, some do not experience this kind of performance loss while others do. I will also note that I run vblade on a second storage device, on the same private network as the 54 instance vblade server, and this second storage device has only 4 instances running with an average throughput of 71MB/s (568Mbit). So the conclusion you draw from this is up to you, at the end of the day I am more than happy with the implementation as a whole and can accept the loss of performance for the larger implementation in the name of reliability and simplicity.

Data Integrity: AIDE for Host Based Intrusion Detection

It used to be all the talk, everyone knew it, accepted it but few did anything about it and still even today, very few do anything about it. What is it? Data Integrity. But it is not in the form of how we usually look at data integrity; it is not backups, raid management or similar — it is host based intrusion detection.

What is host based intrusion detection (hIDS)? In it simplest form it is basically the monitoring of a file system for added, deleted or modified content, for the purpose of intrusion detection and (post) compromise forensic analysis. At one time hIDS was a very popular topic with allot of emphasis pushed on it from the security community and although it still is an area of religious focus for some, it is generally a very under utilized part of a well rounded security and management policy. Note how I said management policy there also, as hIDS is not just about intrusion detection but can also play a vital role in day-to-day operations of any organization by providing “change monitoring” capabilities. This can play out in many scenarios but the simplest being that it allows you to track changes to file systems made through regular administration tasks such as software installations, updates or more importantly administrative mistakes. Though the topic of change monitoring can be a whole article in of itself, hIDS to me is vitally important in both respects as an intrusion detection AND change monitoring resource.

I can not beat around the fact that even myself, over the years, have let hIDS fall to the wayside, I used to be the biggest fan of tripwire and would use it on everything. However, over time tripwire became a time consuming, bloated and difficult tool to manage, it is also tediously slow and would cause very undesirable loads on larger systems. This made for hIDS falling out of my regular security and management habits which in turn had a way about sneaking up on me and biting me in the butt whenever a system got compromised or an administrator would make a “oopsy” on a server.

A few years back I experimented with a tool called AIDE (advanced intrusion detection environment), at the time it was the new kid on the block but showed incredible potential with a very simplified configuration approach, fast database build times and reasonably modest resources usage — by tripwire standards it was exactly what I was looking for, simple and fast. AIDE has since grown up a bit, many of the small issues I used to have with it are now fixed and it is now available in the package management for most major distributions including FreeBSD, Ubuntu, Fedora & RHEL (CentOS).

The configuration and deployment scenario we are going to look at today is one that is suitable for web and application servers but really can be broadly applied to just about any system. We are going to slightly sacrifice some monitoring attributes from files on the system in the name of increasing performance and usability while maintaining a complete picture of added, deleted and modified files. So, let’s jump right on in….

The first task is we need to install AIDE, for the purpose of this article I am assuming you are using Fedora or an RHEL based OS (i.e: CentOS), so please refer to your distributions package management or download and compile the sources at if a binary version is not available for you.


# yum install -y aide


The binary default installation paths for AIDE place the configuration at /etc/aide.conf , executable at /usr/sbin/aide and databases at /var/lib/aide/. The obviously important part being the configuration file so lets get a handle on that for the moment. The configuration defaults are a little loud, intensive and in my opinion will overwhelm anyone who has never used hIDS before; even for myself the defaults were just too much. That said, we are going to backup the default configuration for reference purposes and download my own custom aide.conf:


# cp /etc/aide.conf /etc/aide.conf.default
# wget -O /etc/aide.conf
# chmod 600 /etc/aide.conf


This configuration was created for a WHM/Cpanel server, it is however generalized in nature and can apply to almost any server but will require modification to keep noise to a minimum. Now I stress that fact, noise — hIDS reports can get very loud if you do not tune them and that can lead to them being ignored as a nuisance but more on that later. Lets take a look at the configuration file we just downloaded and I will attempt to break it down for you by each section:

# nano -w /etc/aide.conf
( or your preferred editor *ahem vi* )

The first 10 or so lines of the file declare the output and database paths for AIDE, they should not be edited, the first parts we want to look at follow:

# Whether to gzip the output to database

# Verbose level of message output - Default 5

These options speak for themselves; do we want to gzip the output databases? No, we do not as our management script that we will run from cron and look at later is going to take care of that for us. Next is is the verbosity level (0-255 — less to more) which defaults at 5. The verbosity is fine left at the default, you can lower it to 2 if you want strictly add/delete/modified info in the reports with NO EXTENDED information on what attributes were modified on files (i.e: user, group, permissions, size, md5) — suitable maybe for a very simplified change management policy. If set to 20 then reports will be exceedingly detailed in item-by-item change information and reports can become massive — so I recommend leaving it at the default of 5 for the best balance of detail and noise reduction.

Next the configuration file lists, in comments, the supported attributes that can be monitored on files and paths and then our default monitoring rules of what attributes we will actually use; this list shows the depth of AIDE and should be reviewed in brief for at least a fundamental understanding of what you are working with:

# These are the default rules.
#p:     permissions
#i:     inode:
#n:     number of links
#u:     user
#g:     group
#s:     size
#b:     block count
#m:     mtime
#a:     atime
#c:     ctime
#S:     check for growing size
#md5:    md5 checksum
#sha1:   sha1 checksum
#rmd160: rmd160 checksum
#tiger:  tiger checksum
#haval:  haval checksum
#gost:   gost checksum
#crc32:  crc32 checksum
#E:     Empty group
#>:     Growing logfile p+u+g+i+n+S

# You can create custom rules like this.

LOG = p+u+g
DIR = p+u+g+md5

The important parts here that we will be using, and can be seen from the custom rules, are p,u,g,s,md5 for permissions, user, group, size and md5 hashes. How does this work in our interest? The basics of permission, user, and group are fundamentals we would always want to be notified of changes on, as really, those are attributes that shouldn’t ever change without an administrator doing so intentionally (i.e: /etc/shadow gets set 666). Then there is size and md5 which will tell us that a file has been modified, though we are not specifically tracking mtime (modified time), it is not strictly needed as md5 will tell us when even a single bit has changed in a file and mtime is an easily forged attribute (although feel free to add m to the R= list for mtime tracking if you desire it).

Then we have the paths to be monitored which you’ll note we are not monitoring on the top level ‘/’ itself but instead a specific list. Although you can monitor from the top level, it is not recommended on very large servers, if you do choose to monitor from the top level then be sure to add ‘!/home’ and other heavily modified user paths into your ignore list (covered next), especially if you have a shared hosted environment. Keep in mind, this is not about monitoring every single user level change but rather the integrity at the system (root) or critical application/content level.

/etc    NORMAL
/boot   NORMAL
/bin    NORMAL
/sbin   NORMAL
/lib    NORMAL
/opt    NORMAL
/usr    NORMAL
/root   NORMAL
/var    NORMAL
/var/log      LOG

## monitoring /home can create excessive run-time delays
# /home   DIR

As mentioned above, monitoring of /home is not the best of ideas, especially on larger servers with hundreds of users. The exception to this rule is smaller servers that are task oriented towards mission critical sites or applications. In these situations, such as my employers and even my own web server that have no other task than to host a few sites, monitoring of /home can be invaluable in detecting intrusions in your web site and web applications. This is especially true if you run billing, support forums, help desks and similar web applications on a single server dedicated to your businesses corporate web presence. So, the take away here is — monitor /home sparingly and evaluate it on a case-by-case basis.

Now, onto our ignore list which is as simple as it gets — any paths that are not subject to monitoring for whatever reason, be it too heavily modified or just administratively not suitable to be reported on.


Generally speaking, you do want to limit the paths ignored as every ignored path is a potential area that an attacker can store malicious software. That said though, we are trying to strike a balance in our reports that alert us to intrusions while still being reasonable enough in length to be regularly reviewed. The important thing to remember is although an attacker can hide content in these ignored paths, to effectively compromise or backdoor a server, the attacker needs to replace and modify a broad set of binaries and logs on the server, which will stand out clearly in our reports. Nevertheless, remove any paths from the ignore list that may not apply to your environment or add too it as appropriate.

That’s it for the configuration side of AIDE, hopefully you found it straight forward and not too overwhelming, if you did then google tripwire and you’ll thank me later 😉

The next part of our AIDE installation is the management and reporting component. The approach we will be taking is using a management script executed through cron daily, weekly or monthly to perform maintenance tasks and generate reports, which can optionally be emailed. The maintenance consists of compressing and rotating our old AIDE databases and logs to time stamped backups along with deleting data that has aged past a certain point.

# wget -O /etc/cron.weekly/aide
# chmod 755 /etc/cron.weekly/aide

The default for this article will be to run AIDE on a weekly basis, this is what I recommend as I have found that daily creates too many reports that become a burden to check and monthly creates reports that are far too large and noisy — weekly strikes the right balance in report size and frequency. The cron has two variables in it that can be modified for email and max age of databases/logs, so go ahead and open /etc/cron.weekly/aide with your preferred editor and modify them as you see fit.

# email address for reports

# max age of logs and databases in hours
# default 2160 = 90 days

The e-mail address variable can be left blank to not send any emails, if you choose this then reports can be manually viewed at /var/lib/aide/aide.log and are rotated into time stamped backups after each execution (i.e: aide.log.20110315-162841). The maxage variable, in hours, is the frequency at which aide logs and databases will be deleted, which I think 90 days is a reasonable length of retention. However, I strongly recommend for a number of reasons that you make sure /var/lib/aide is included in your remote backups so that if you ever need it, you can pull in older databases for compromise or change analysis across a wider time range than the default last-execution comparison reports.

Although it is not needed, you can go ahead and give the cron job a first run, or simply wait till the end of the week. Let’s assume your like me though and want to play with your new toy 🙂 We will run it through the time command so you can get an idea of how long execution will take in your environment, might also be a good idea to open a second console and top it to see what the resource hit is like for you — typically all CPU but the script runs AIDE as nice 19 which is the lowest system priority meaning other processes can use CPU before AIDE if they request it.


# time sh /etc/cron.weekly/aide


Let it run, it may take anywhere from 10 to 60 minutes depending on the servers specs and amount of data, for very large servers, especially if you choose to monitor /home, do not be surprised with run times beyond 60 minutes. Once completed check your email or the /var/lib/aide/aide.log file for your first report and that’s it, you are all set.

Two small warnings about report output, the first is that when you perform software updates or your control panel (i.e: WHM/Cpanel) does so automatically, you can obviously expect to see a very loud report generated. You can optionally force the database to regenerate when you run server updates by executing ‘/usr/sbin/aide –init’ and this will keep the next report nice and clean. The second warning is that sometimes the first report can be exceedingly noisy with all kinds of attribute warnings, if this happens give the cron script (/etc/cron.weekly/aide) a second run and you should receive a nice clean report free of warnings and noise.

For convenience, I have also made a small installer script that will take care of everything above in my defaults and install AIDE/cron script for you, suitable for use on additional servers after you’ve run through this on your first server.

# wget
# sh install_aide "[email protected]"

I hope AIDE proves to be as useful for you as it has been for me, hIDS is a critical component in any security and management policy and you should take the time to tweak the configuration for your specific environment. If you find the reports are too noisy then please ignore paths that are problematic before you ditch AIDE; if you give AIDE a chance it will be good to you and one day it may very well save you in a compromise or administrative “oops” situation.

Raid Management: Know Whats Really Going On

In today’s hosting environment it is common place for servers to have hardware based raid cards but what is not common place is having a reliable method for checking the status of the raid arrays. Few would question the value to data integrity by making use of raid technology but very few organizations and businesses implement the tools required to proactively maintain raid arrays, they simply hope for a DC tech to hear a raid alarm and assume the technician will handle the failure. The reality is very different, data centers are loud and increasingly server-dense so hearing a raid alarm let alone pin-pointing the server with the alarm going off, is a daunting task. I remember more than a few times where I found myself with a paper towel tube to my ear listening server to server to try find that troubled box with the annoying alarm going off. This is not how servers should be managed.

As server administrators or web host operators, it is your responsibility, your duty, to have tools in place that can proactively monitor the status of raid arrays and alert you when an array becomes degraded. That way you can have a paper trail of sorts when something has went wrong, submit a ticket to your data center technicians and have the situation corrected before a degraded array from a single disk failure turns into a multi-disk failure and failed array with data loss.

I have created and been using for sometime a script that can query the status of raid controllers from Areca, 3Ware & MegaRaid. The MegaRaid support is mostly intended for Dell PowerEdge PERC cards, however it should work for most MegaRaid based controllers, ymmv though.

The principle is very simple, the package contains the proprietary command line tools from Areca, 3Ware & MegaRaid that can query the status of respective controllers and then an accompanied ‘check’ script handles determining what raid controller is on the system and then runs the appropriate tool in order to get the raid status and if it is degraded or in any state other than a consistent one, it will dispatch an alert to a configured e-mail address.

Download and extract the package:

# wget
# tar xvfz raid_check_pub.tar.gz

The package will extract to raid_check/, you should place this under /root/ as the check script expects to be run from /root/raid_check/. If you wish to change the path then please modify ‘raid_check/check’.

With the package now setup under /root/raid_check/ you need to modify the ‘raid_check/check’ script to set an email address that alerts are going to get sent too. Once this is done you should symlink the check script to cron.daily so that raid failures will be picked up on daily cron runs, you may change this to cron.hourly if so desired.

# ln -s /root/raid_check/check /etc/cron.daily/raid_check

That’s it, you can give the check script a run to see if things are working. If there is a failure or inconsistency detected then it will be shown on console in addition to the email alert being sent. If everything is OK and there is no issues detected, then no output will be presented.

# sh /root/raid_check/check

Tip: You can check if your server has a raid card by running the following command:

# cat /proc/scsi/scsi  | grep Vendor

If you see Vendors listed as ATA followed by hard drive model names (i.e: WDC, HD etc…) then your servers disks are directly connected and there is no raid controller present. If on the other hand you see vendor names such as Areca, AMC, 3ware, or MegaRaid then you have a hardware raid controller.

Nginx: Caching Proxy

Recently I started to tackle a load problem on one of my personal sites, the issue was that of a poorly written but exceedingly MySQL heavy application and the load it would induce on the SQL server when 400-500 people were hammering the site at once. Further compounding this was Apache’s horrible ability to gracefully handle excessive requests on object heavy pages (i.e: images). This left me with a site that was almost unusable during peak hours — or worse — would crash the MySQL server and take Apache with it by frenzied F5ing from users.

I went through all the usual rituals in an effort to better the situation, from PHP APC then Eaccelerator, to mod_proxy+mod_cache, to tuning Apache timeouts/prefork settings and adjusting MySQL cache/buffer options. The extreme was setting up a MySQL replication cluster with MySQL-Proxy doing RW splitting/load balancing across the cluster and memcached, but this quickly turned into a beast to manage and memcached was eating memory at phenomenal rates.

Although I did improve things a bit, I had done so at the expense of vastly increased hardware demand and complexity. However, the site was still choking during peak hours and in a situation where switching applications and/or getting it reprogrammed is not at all an option, I had to start thinking outside the box or more to the point, outside Apache.

I have experience with lighttpd and pound reverse proxy, they are both phenomenal applications but neither directly handles caching in a graceful fashion (in pounds case not at all). This is when I took a look a nginx which to date I had never tried but heard many great things about. I fired up a new Xen guest running CentOS 5.4, 2GB RAM & 2 CPU cores….. an hour later I had nginx installed, configured and proxy-caching traffic for the site in question.

The impact was immediate and significant — the SQL server loads dropped from an average of 4-5 down to 0.5-1.0 and the web server loads were near non-existent from previously being on the brink of crashing every afternoon.

Enough with my ramblings, lets get into nginx. You can download the latest release from and although I could not find a binary version of it, compiling was straight forward with no real issues.

First up we need to satisfy some requirements for the configure options we will be using, I encourage you to look at ‘./configure –help’ list of available options as there are some nice features at your disposal.

yum install -y zlib zlib-devel openssl-devel gd gd-devel pcre pcre-devel

Once the above packages are installed we are good to go with downloading and compiling the latest version of nginx:

tar xvfz nginx-0.8.36.tar.gz
cd nginx-0.8.36/
./configure --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_image_filter_module --with-http_gzip_static_module
make && make install

This will install nginx into ‘/usr/local/nginx’, if you would like to relocate it you can use ‘–prefix=/path’ on the configure options. The path layout for nginx is very straight forward, for the purpose of this post we are assuming the defaults:

[root@atlas ~]# ls /usr/local/nginx
conf  fastcgi_temp  html  logs  sbin

[root@atlas nginx]# cd /usr/local/nginx

[root@atlas nginx]# ls conf/
fastcgi.conf  fastcgi.conf.default  fastcgi_params  fastcgi_params.default  koi-utf  koi-win  mime.types  mime.types.default  nginx.conf  nginx.conf.default  win-utf

The layout will be very familiar to anyone that has worked with Apache and true to that, nginx breaks the configuration down into a global set of options and then the individual web site virtual host options. The ‘conf/’ folder might look a little intimidating but you only need to be concerned with the nginx.conf file which we are going to go ahead and overwrite, a copy of the defaults is already saved for you as nginx.conf.default.

My nginx configuration file is available at, be sure to rename it to nginx.conf or copy the contents listed below into ‘conf/nginx.conf’:

user  nobody nobody;

worker_processes     4;
worker_rlimit_nofile 8192;

pid /var/run/;

events {
  worker_connections 2048;

http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format main '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status  $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  logs/nginx_access.log  main;
    error_log  logs/nginx_error.log debug;

    server_names_hash_bucket_size 64;
    sendfile on;
    tcp_nopush     on;
    tcp_nodelay    off;
    keepalive_timeout  30;

    gzip  on;
    gzip_comp_level 9;
    gzip_proxied any;

    proxy_buffering on;
    proxy_cache_path /usr/local/nginx/proxy levels=1:2 keys_zone=one:15m inactive=7d max_size=1000m;
    proxy_buffer_size 4k;
    proxy_buffers 100 8k;
    proxy_connect_timeout      60;
    proxy_send_timeout         60;
    proxy_read_timeout         60;

    include /usr/local/nginx/vhosts/*.conf;

Lets take a moment to review some of the more important options in nginx.conf before we move along…

user nobody nobody;
If you are running this on a server with an apache install or other software using the user ‘nobody’, it might be wise to create a user specifically for nginx (i.e: useradd nginx -d /usr/local/nginx -s /bin/false)

worker_processes 4;
This should reflect the number of CPU cores which you can find out by running ‘cat /proc/cpuinfo | grep processor‘ — I recommend a setting of at least 2 but no more than 6, nginx is VERY efficient.

proxy_cache_path /usr/local/nginx/proxy … inactive=7d max_size=1000m;
The ‘inactive’ option is the maximum age of content in the cache path and the ‘max_size’ is the maximum on disk size of the cache path. If you are serving up lots of object heavy content such as images, you are going to want to increase this.

proxy_send|read_timeout 60;
These timeout values are important, if you run any scripts through admin interfaces or other maintenance URL’s, these values will cause the proxy to time them out — that said increase them to sane values as appropriate, anything more than 300 is probably excessive and you should consider running such tasks from cronjobs.

Apache style MaxClients
Finally, maximum amount of connections, or MaxClients, that nginx can accept is determined by worker_processes * worker_connections/2 (2 fd per session) = 8192 MaxClients in our configuration.

Moving along we need to create two paths that we defined in our configuration, the first is the content caching folder and the second is where we will create our vhosts.

mkdir /usr/local/nginx/proxy /usr/local/nginx/vhosts /usr/local/nginx/client_body_temp /usr/local/nginx/fastcgi_temp  /usr/local/nginx/proxy_temp

chown nobody.nobody /usr/local/nginx/proxy /usr/local/nginx/vhosts /usr/local/nginx/client_body_temp /usr/local/nginx/fastcgi_temp  /usr/local/nginx/proxy_temp

Lets go ahead and get our initial vhosts file created, my template is available from and should be saved to ‘/usr/local/nginx/vhosts/’, the contents of which are as follows:

server {
    listen 80;
    server_name alias;

    access_log  logs/myforums.com_access.log  main;
    error_log  logs/myforums.com_error.log debug;

    location / {
        proxy_redirect     off;
        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;

        proxy_cache               one;
        proxy_cache_key         backend$request_uri;
        proxy_cache_valid       200 301 302 20m;
        proxy_cache_valid       404 1m;
        proxy_cache_valid       any 15m;
        proxy_cache_use_stale   error timeout invalid_header updating;

    location /admin {
        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;

The obvious changes you want to make are ‘’ to whatever domain you are serving, you can append multiple aliases to the server_name string such as ‘server_name alias alias;‘. Now, lets take a look at some of the important options in the vhosts configuration:

listen 80;
This is the port which nginx will listen on for this vhost, by default unless you specify an IP address with it, you will bind port 80 on all local IP’s for nginx — you can limit this by setting the value as ‘listen;‘.

Here we are telling nginx where to find our content aka the backend server, this should be an IP and it is also important to not forget setting the ‘proxy_set_header Host’ option so that the backend server knows what vhost to serve.

This allows us to define cache times based on HTTP status codes for our content, for 99% of traffic it will fall under the ‘200 301 302 20m’ value. If you are running allot of dynamic content you may want to lower this from 20m to 10m or 5m, any lower defeats the purpose of caching. The ‘404 1m’ value ensures that not found pages are not stored for long in case you are updating the site/have a temporary error but also prevent 404’s from choking up the backend server. Then the ‘any 15m’ value grabs all other content and caches it for 15m, again if you are running a very dynamic site you may want to lower this.

When the cache has stale content, that is content which has expired but not yet been updated, nginx can serve this content in the event errors are encountered. Here we are telling nginx to serve stale cache data if there is an error/timeout/invalid header talking to the backend servers or if another nginx worker process is busy updating the cache. This is really useful in the event your web server crashes, as to clients they will receive data from the cache.

location /admin
With this location statement we are telling nginx to take all requests to ‘’ and pass it off directly to our backend server with no further interaction — no caching.

That’s it! You can start nginx by running ‘/usr/local/nginx/sbin/nginx’, it should not generate any errors if you did everything right! To start nginx on boot you can append the command into ‘/etc/rc.local’. All you have to do now is point the respective domain DNS records to the IP of the server running nginx and it will start proxy-caching for you. If you wanted to run nginx on the same host as your Apache server you could set Apache to listen on port 8080 and then adjust the ‘proxy_pass’ options accordingly as ‘proxy_pass;’.

Extended Usage:
If you wanted to have nginx serve static content instead of Apache, since it is so horrible at it, we need to declare a new location option in our vhosts/*.conf file. We have two options here, we can either point nginx to a local path with our static content or have nginx cache our static content then retain it for longer periods of time — the later is far simpler.

Serve static content from a local path:

        location ~* ^.+.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|exe|pdf|ppt|txt|tar|mid|midi|wav|bmp|rtf|js)$ {
            root   /home/myuser/public_html;
            expires 1d;

In the above, we are telling nginx that our static content is located at ‘/home/myuser/public_html’, paths must be relative!! When a user requests ‘’, nginx will look for it at ‘/home/myuser/public_html/img/flyingpigs.jpg’. The expires option can have values in seconds, minutes, hours or days — if you have allot of dynamic images on your site then you might consider an option like 2h or 30m, anything lower defeats the purpose. Using this method has a slight performance benefit over the cache option below.

Serve static content from cache:

        location ~* ^.+.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|exe|pdf|ppt|txt|tar|mid|midi|wav|bmp|rtf|js)$ {
             proxy_cache_valid 200 301 302 120m;
             expires 2d;
             proxy_cache one;

With this setup we are telling nginx to cache our static content just like we did with the parent site itself, except that we are defining an extended time period for which the content is valid/cached. The time values are, content is valid for 2h (nginx updates cache) and every 2 days the content expires (client browsers cache expires and requests again). Using this method is simple and does not require copying static content to a dedicated nginx host.

We can also do load balancing very easily with nginx, this is done by setting an alias for a group of servers, we then define this alias in place of addresses in our ‘proxy_pass’ settings. In the ‘upstream’ option shown below, we want to list all of our web servers that load should be distributed across:

  upstream my_server_group {
    server weight=1;
    server weight=2 max_fails=3  fail_timeout=30s;
    server weight=2;

This must be placed in the ‘http { }’ section of the ‘conf/nginx.conf’ file, then the server group can be used in any vhost. To do this we would replace ‘proxy_pass;’ with ‘proxy_pass http://my_server_group;’. The requests will be distributed across the server group in a round-robin fashion with respect to the weighted values, if any. If a request to one of the servers fails, nginx will try the next server until it finds a working server. In the event no working servers can be found, nginx will fall back to stale cache data and ultimately an error if that’s not available.

This has turned into a longer post than I had planned but oh well, I hope it proves to be useful. If you need any help on the configuration options, please check out, it covers just about everything one could need.

Although I noted this nginx setup is deployed on a Xen guest (CentOS 5.4, 2GB RAM & 2 CPU cores), it proved to be so efficient, that these specs were overkill for it. You could easily run nginx on a 1GB guest with a single core, a recycled server or locally on the Apache server. I should also mention that I took apart the MySQL replication cluster and am now running with a single MySQL server without issue — down from 4.

IRSYNC & Limiting Passwordless SSH Keys

Anyone who has ever used SSH key-pairs to access more than a couple of servers (or hundreds in my case), will tell you they are an invaluable convenience. It is a natural progression and very common usage that SSH key-pairs are coupled with other common tasks or tools, where having a pass phrase attached to the key would be counter-intuitive to the task automation. So, what do we do despite our better judgment? We create key-pairs with absolutely no pass phrase. The implications are abundantly obvious, if the private key ever gets lost or stolen, any accounts that have the key-pair associated to it can be instantly compromised.

In the case of my recently released project Incremental Rsync (IRSYNC), one of the implementation hurdles at work was to have servers backup using a secure medium. This is easily handled with rsync’s -e option to have data transferred over ssh using a key-pair but then the obvious issue comes up that what if a client server ever gets compromised? Then the backup account on the backup server can be compromised (please don’t use root!@#!@#) allowing for backups to be deleted or worse yet data to be stolen for every server that backups to said server/account.

A solution to this is to limit the commands that can be executed over SSH by a specific public key, though this is not a perfect way to mitigate the threat it does go a long way to help. For my backup server implementation I have setup the user ‘irsync’ on the backup server, this account has the usual ‘~irsync/.ssh/authorized_keys’ file where I place the public key. Where things differ is that you prefix a script path in front of the public key that is used to interpret commands sent over ssh, which looks something like this:

command="/data/irsync/" ssh-dss AAAAB3NzaC1kc3MAAAC......87JVNLJ5nhaK1A== irsync@irsync

The ‘’ script is basically a simple interpreter, it looks at the commands being passed over ssh and either allows them or denies them with some logging thrown in for auditing purposes. The script can be downloaded from: Please take note to edit the scripts ‘log_file=’ value to an appropriate path, usually the base backup path or user homedir.

An example of in play would be as follows, first the client side view then the logs from $log_file:

root@praxis [~]# ssh -i /usr/local/irsync/ssh/id_dsa irsync@buserver3 "rm -rf /some/path"
sshval(13156): ssh command rejected from rm -rf /some/path

root@praxis [~]# ssh -i /usr/local/irsync/ssh/id_dsa irsync@buserver3
sshval(13403): interactive shell rejected from

May 04 11:36:15 buserver3 sshval(13156): ssh command rejected from rm -rf /some/path
May 04 11:40:03 buserver3 sshval(13403): interactive shell rejected from

On the flip side when a command is authorized, it gets recorded into the $log_file as follows:

May 04 05:29:08 buserver3 sshval(29993): ssh command accepted from rsync --server -lHogDtprx --timeout=600 --delete-excluded --ignore-errors --numeric-ids . /data/irsync/

Take note that if you do choose to use with irsync, you will need to create your own script to manage the snapshots as internally irsync uses the find command, piping results to xargs and rm which will not be authorized by (for good reason!). This is actually a very simple task, although all your snapshots will have to use the same rotation age (whatever).


for i in `ls $bkpath | grep snaps`; do
find $wd -maxdepth 1 -mtime +14 -type d | xargs rm -rf

You can save this to /root/, chmod 750 it and run it as a daily cronjob by linking it into /etc/cron.daily/ (ln -s /root/ /etc/cron.daily/) or you can add an entry into /etc/crontab as follows:

02 4 * * * root /root/ >> /dev/null 2>&1

Although I detailed the use of in the context of backups with irsync, this could easily be adapted to any usage when you want to restrict the commands executed over ssh with key pairs. You could even create your own script in perl or whatever floats your boat and use that instead — if you happen to go that route, please share with me what you created in the comments or by e-mail to ryan <at>