Linux Malware Detect v1.3.6: Loose Ends

In LMD 1.3.3 there was allot of changes, 29 to be exact, that made LMD much more robust and especially the monitoring component, much more usable. If that release was about making good things better, then this release is about bringing loose ends together. I spent a couple of days running LMD through its paces along with having many people help me test it and during that process, we brought allot of little things to the surface that needed fixing or revising.

In total, there has been 31 changes, fixes or new additions to LMD since that 1.3.3 release on the 15th, most of these changes were completed days ago but I wanted to take the time to make sure they were working as intended and that no more bugs/issues came to the surface. At the moment, since releasing LMD on the 11th, there has been a total of 1349 downloads, so to say that there is plenty of opportunity for bug reports would be understated. I am comfortable in saying that the changes from 1.3.3 to 1.3.6 are stable, reliable and working as intended.

The version changes aside for the moment, there has also been a mountain of user submitted files with the –checkout feature, I processed many of those yesterday and earlier last week for a total of 71 new signatures for the week. Those signatures will have automatically been updated to your install through the cron.daily run of –update, or you can run it yourself if you do not use the default cronjob.

So, what of significance has changed since 1.3.3? The biggest changes are that there is now a -d|–update-ver feature that performs a version update check and if a new version of LMD is available, it will install it. This feature does both a version number check and hashes the main LMD files checking for differences with the server side files, when one of the two checks fails, an update is forced. The version update is not automatically run for a number of reasons that I am to lazy to explain, just think about it a bit. All session and quarantine data is migrated on update.

Most of the other changes are fixes and improvements on existing features, especially the monitoring component which of the 31 changes since 1.3.3, 17 of them are all within the monitoring component. There has also been a few changes to the README file to reflect some minor usage changes, to clarify better some usage of the monitoring service and to explain some new ignore options.

That is all from me, changelog is below, enjoy.

Project Page: http://www.rfxn.com/projects/linux-malware-detect/

Change Log v1.3.3 => v1.3.6:
[Fix] session data gets recreated if it disappears during scan
[Fix] tlog now handles data that logged between 0bytes and first wake cycle
[Fix] monitor_check now properly handles CREATE,ISDIR events
[Change] –alert-daily|weekly alerts have been changed similar to manual alerts
[Fix] cleaner was not properly running on monitor_check calls to scan files
[Fix] quar_suspend was not properly running on monitor_check calls to quar()
[Change] monitor tracker files now pass through trim_log to avoid oversizing
[Fix] monitor_check now properly handles path names with spaces
[Fix] monitor_check was throwing nx file/directory error for monitor.pid
[Fix] older bash versions were having trouble with the [[ =~ ]] regexp search
[Change] set all script files from shebang/bin/sh to shebang/bin/bash
[Change] –alert-daily|weekly will now only send alerts if hits were found
[New] -d|–update-ver now compares file hashes to determine update status
[Fix] suspend events were not properly being added to monitor alerts
[Change] all alerts have had spacing changes to make them more readable
[Fix] signature names now properly list for daily|weekly alerts hit list
[Fix] monitor_check will now recursive monitor newly created directories
[New] monitor daily|weekly alerts now save as a pseudo scan report with SCANID
[Fix] monitor reports now generate properly when quar_hits=0
[Fix] cleaner function was not properly executing under certain conditions
[Change] additional error checking/output added to the cleaner function
[Change] default status output of scans changed for better performance
[New] added ignore_intofiy for ignoring paths from the monitor service
[Change] updated ignore section of README
[Fix] backreference errors kicking from scan_stage1 function
[New] -d|–update-ver option added to update installed version from rfxn.com
[Change] updated short and long usage output for update-ver usage
[Fix] -k|–kill-monitor now properly kills only the inotifywait/monitor pid’s
[Fix] monitor_cycle function now correctly stores its pid in the pidfile
[Fix] files with multiple events in the same waking cycle are only scanned once
[Change] install.sh now symlinks maldet executable to /usr/local/sbin/lmd

Let The Rewrites Begin: New Life For PRM

In my last post, I reflected on the last 7-8 years of projects here at rfxn.com, in doing so I also dug up some statistics on project downloads. I not only did this for my own curiosity but to prioritize the mile long to do list I have for the projects, based on downloads. One of the revealing things was just exactly what people are downloading, in particular that projects like LES , PRM & SIM are still very popular download destinations on the site.

Although a new incarnation of APF & BFD are on the agenda, I thought I would work up to those by first knocking off rewrites of some of the smaller projects, starting this off is PRM. This is a project originally written in December of 2003 and although it has stood the test of time by doing exactly what it was intended for and doing it reliably, it was starting to show its age in a number of ways, especially the not-so-intuitive logic and less-than-appealing documentation.

Today I have put out PRM v1.0.6, a ground-up rewrite of just about everything in the project, simplified logic, oodles of new features and one of the biggest problem areas over the years, far better ignore options to control exactly what PRM does along with detailed documentation.

Enough said, check the changelog for for summary of changes and the README for details on the new usage.

Project Page: http://www.rfxn.com/projects/process-resource-monitor/
Current Release:
http://www.rfxn.com/downloads/prm-current.tar.gz
http://www.rfxn.com/appdocs/README.prm
http://www.rfxn.com/appdocs/CHANGELOG.prm

Better Late Than Never: Linux Malware Detect 1.3

Today I have released Linux Malware Detect (LMD) 1.3, the first public stable release of my malware detection tool. The documentation is a little thin but the details are on the project page and the README file should fill you in on anything you need to know, otherwise you can post a comment on the bottom of the project page and I will assist where possible. Input on feature ideas, bugs and malware data is always welcome, see the –help options on LMD for the checkout feature to upload malware data to rfxn.com.

In October I detailed the concepts behind the then to-be-released LMD in a post, though allot has changed since then in how LMD operates, the jist of the post is still on point.

To those (unfortunate?) enough to ride in on the closed testing, it certainly was a long road and I thank everyone that over time submitted new malware data, bug reports and feature ideas. To say this is the most banged-in release of one of my projects would be understated and I hope it shows in the end product.

So, What has changed since the first incarnations of LMD? Well first is that I ditched the whole “chunked hash” concept for a simpler HEX based pattern matching feature to find malware variants which has proved far more accurate and easier to manage. Though I can see some scaling issues with the current implementation of the HEX scanner as the signature set grows, this is something I do expect to resolve in a future release. The basic MD5 hashed scanning is still the stage-1 scanning component and then the HEX scanner picks up as a stage-2 scanner if no MD5 hit was found.

The kernel based inotify real-time file creation/modification monitoring has been reworked and now more gracefully handles users of any type in addition to monitoring the /dev/shm, /var/tmp and /tmp paths on execution of the monitoring component. Also changed is that the scanner will now batch through new/changed files every 30 seconds for the sake of efficiency but this can easily be modified in the internals.conf down to as low as a 1 second iteration on the scanning of new/changed files.

The quarantine queue now stores files original path, owner and mode to facilitate a –restore feature that allows any file to be restored to its original path with owner and file modes restored as well. This can be used to recover false-positive hits or to restore files after you have cleaned malware from within its contents (default quarantining of malware is now also disabled, see conf.maldet).

The final notable change is that there is now a quarantine suspend account feature, the owning user account (UID>=500) can optionally be Cpanel suspended or have its shell set to /bin/false on non-cpanel systems (configurable in conf.maldet). When Cpanel users are suspended, they will have a comment attached to it with the ‘maldet –report SCANID’ value so you can easily call up the report that suspended the user.

There has been many more changes to LMD but I certainly can not list them all, give it a spin and let me know how it goes, happy malware hunting!

BFD 1.4: Important Security Fix

Today I have put up a new release of BFD, version 1.4, that addresses an unsanitized variable issue that is used on the command line. This is a serious issue and should be treated as such, if you currently have BFD installed I would encourage you to update it immediately, the install.sh script in the BFD package will retain all your options and tracking data so the update process is painless.

Current Release:
http://www.rfxn.com/downloads/bfd-current.tar.gz

Change Log:
[Fix] properly sanitized vars passed to the command line
[Fix] ignore.hosts is now updated with system addresses on each bfd run
[Note] thanks to [email protected] for invaluable input and pointers

wget http://www.rfxn.com/downloads/bfd-current.tar.gz
tar xvfz bfd-current.tar.gz
cd bfd-1.4/
./install.sh

Although this issue has many mitigating factors that lessen the severity of the potential impact it is nevertheless very serious and best to opt on the side of caution. I need to extend a special thanks to Jeff Petersen of webhostsecurity.com for identifying this issue in a very professional fashion and offering technical input.

Nginx: Caching Proxy

Recently I started to tackle a load problem on one of my personal sites, the issue was that of a poorly written but exceedingly MySQL heavy application and the load it would induce on the SQL server when 400-500 people were hammering the site at once. Further compounding this was Apache’s horrible ability to gracefully handle excessive requests on object heavy pages (i.e: images). This left me with a site that was almost unusable during peak hours — or worse — would crash the MySQL server and take Apache with it by frenzied F5ing from users.

I went through all the usual rituals in an effort to better the situation, from PHP APC then Eaccelerator, to mod_proxy+mod_cache, to tuning Apache timeouts/prefork settings and adjusting MySQL cache/buffer options. The extreme was setting up a MySQL replication cluster with MySQL-Proxy doing RW splitting/load balancing across the cluster and memcached, but this quickly turned into a beast to manage and memcached was eating memory at phenomenal rates.

Although I did improve things a bit, I had done so at the expense of vastly increased hardware demand and complexity. However, the site was still choking during peak hours and in a situation where switching applications and/or getting it reprogrammed is not at all an option, I had to start thinking outside the box or more to the point, outside Apache.

I have experience with lighttpd and pound reverse proxy, they are both phenomenal applications but neither directly handles caching in a graceful fashion (in pounds case not at all). This is when I took a look a nginx which to date I had never tried but heard many great things about. I fired up a new Xen guest running CentOS 5.4, 2GB RAM & 2 CPU cores….. an hour later I had nginx installed, configured and proxy-caching traffic for the site in question.

The impact was immediate and significant — the SQL server loads dropped from an average of 4-5 down to 0.5-1.0 and the web server loads were near non-existent from previously being on the brink of crashing every afternoon.

Enough with my ramblings, lets get into nginx. You can download the latest release from http://nginx.org and although I could not find a binary version of it, compiling was straight forward with no real issues.

First up we need to satisfy some requirements for the configure options we will be using, I encourage you to look at ‘./configure –help’ list of available options as there are some nice features at your disposal.

yum install -y zlib zlib-devel openssl-devel gd gd-devel pcre pcre-devel

Once the above packages are installed we are good to go with downloading and compiling the latest version of nginx:

wget http://nginx.org/download/nginx-0.8.36.tar.gz
tar xvfz nginx-0.8.36.tar.gz
cd nginx-0.8.36/
./configure --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_image_filter_module --with-http_gzip_static_module
make && make install

This will install nginx into ‘/usr/local/nginx’, if you would like to relocate it you can use ‘–prefix=/path’ on the configure options. The path layout for nginx is very straight forward, for the purpose of this post we are assuming the defaults:

[root@atlas ~]# ls /usr/local/nginx
conf  fastcgi_temp  html  logs  sbin

[root@atlas nginx]# cd /usr/local/nginx

[root@atlas nginx]# ls conf/
fastcgi.conf  fastcgi.conf.default  fastcgi_params  fastcgi_params.default  koi-utf  koi-win  mime.types  mime.types.default  nginx.conf  nginx.conf.default  win-utf

The layout will be very familiar to anyone that has worked with Apache and true to that, nginx breaks the configuration down into a global set of options and then the individual web site virtual host options. The ‘conf/’ folder might look a little intimidating but you only need to be concerned with the nginx.conf file which we are going to go ahead and overwrite, a copy of the defaults is already saved for you as nginx.conf.default.

My nginx configuration file is available at http://www.rfxn.com/downloads/nginx.conf.atlas, be sure to rename it to nginx.conf or copy the contents listed below into ‘conf/nginx.conf’:

user  nobody nobody;

worker_processes     4;
worker_rlimit_nofile 8192;

pid /var/run/nginx.pid;

events {
  worker_connections 2048;
}


http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format main '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status  $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  logs/nginx_access.log  main;
    error_log  logs/nginx_error.log debug;

    server_names_hash_bucket_size 64;
    sendfile on;
    tcp_nopush     on;
    tcp_nodelay    off;
    keepalive_timeout  30;

    gzip  on;
    gzip_comp_level 9;
    gzip_proxied any;

    proxy_buffering on;
    proxy_cache_path /usr/local/nginx/proxy levels=1:2 keys_zone=one:15m inactive=7d max_size=1000m;
    proxy_buffer_size 4k;
    proxy_buffers 100 8k;
    proxy_connect_timeout      60;
    proxy_send_timeout         60;
    proxy_read_timeout         60;

    include /usr/local/nginx/vhosts/*.conf;
}

Lets take a moment to review some of the more important options in nginx.conf before we move along…

user nobody nobody;
If you are running this on a server with an apache install or other software using the user ‘nobody’, it might be wise to create a user specifically for nginx (i.e: useradd nginx -d /usr/local/nginx -s /bin/false)

worker_processes 4;
This should reflect the number of CPU cores which you can find out by running ‘cat /proc/cpuinfo | grep processor‘ — I recommend a setting of at least 2 but no more than 6, nginx is VERY efficient.

proxy_cache_path /usr/local/nginx/proxy … inactive=7d max_size=1000m;
The ‘inactive’ option is the maximum age of content in the cache path and the ‘max_size’ is the maximum on disk size of the cache path. If you are serving up lots of object heavy content such as images, you are going to want to increase this.

proxy_send|read_timeout 60;
These timeout values are important, if you run any scripts through admin interfaces or other maintenance URL’s, these values will cause the proxy to time them out — that said increase them to sane values as appropriate, anything more than 300 is probably excessive and you should consider running such tasks from cronjobs.

Apache style MaxClients
Finally, maximum amount of connections, or MaxClients, that nginx can accept is determined by worker_processes * worker_connections/2 (2 fd per session) = 8192 MaxClients in our configuration.

Moving along we need to create two paths that we defined in our configuration, the first is the content caching folder and the second is where we will create our vhosts.

mkdir /usr/local/nginx/proxy /usr/local/nginx/vhosts /usr/local/nginx/client_body_temp /usr/local/nginx/fastcgi_temp  /usr/local/nginx/proxy_temp

chown nobody.nobody /usr/local/nginx/proxy /usr/local/nginx/vhosts /usr/local/nginx/client_body_temp /usr/local/nginx/fastcgi_temp  /usr/local/nginx/proxy_temp

Lets go ahead and get our initial vhosts file created, my template is available from http://www.rfxn.com/downloads/nginx.vhost.conf and should be saved to ‘/usr/local/nginx/vhosts/myforums.com.conf’, the contents of which are as follows:

server {
    listen 80;
    server_name myforums.com alias www.myforuns.com;

    access_log  logs/myforums.com_access.log  main;
    error_log  logs/myforums.com_error.log debug;

    location / {
        proxy_pass http://10.10.6.230;
        proxy_redirect     off;
        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;


        proxy_cache               one;
        proxy_cache_key         backend$request_uri;
        proxy_cache_valid       200 301 302 20m;
        proxy_cache_valid       404 1m;
        proxy_cache_valid       any 15m;
        proxy_cache_use_stale   error timeout invalid_header updating;
    }

    location /admin {
        proxy_pass http://10.10.6.230;
        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
    }
}

The obvious changes you want to make are ‘myforums.com’ to whatever domain you are serving, you can append multiple aliases to the server_name string such as ‘server_name domain.com alias www.domain.com alias sub.domain.com;‘. Now, lets take a look at some of the important options in the vhosts configuration:

listen 80;
This is the port which nginx will listen on for this vhost, by default unless you specify an IP address with it, you will bind port 80 on all local IP’s for nginx — you can limit this by setting the value as ‘listen 10.10.3.5:80;‘.

proxy_pass http://10.10.6.230;
Here we are telling nginx where to find our content aka the backend server, this should be an IP and it is also important to not forget setting the ‘proxy_set_header Host’ option so that the backend server knows what vhost to serve.

proxy_cache_valid
This allows us to define cache times based on HTTP status codes for our content, for 99% of traffic it will fall under the ‘200 301 302 20m’ value. If you are running allot of dynamic content you may want to lower this from 20m to 10m or 5m, any lower defeats the purpose of caching. The ‘404 1m’ value ensures that not found pages are not stored for long in case you are updating the site/have a temporary error but also prevent 404’s from choking up the backend server. Then the ‘any 15m’ value grabs all other content and caches it for 15m, again if you are running a very dynamic site you may want to lower this.

proxy_cache_use_stale
When the cache has stale content, that is content which has expired but not yet been updated, nginx can serve this content in the event errors are encountered. Here we are telling nginx to serve stale cache data if there is an error/timeout/invalid header talking to the backend servers or if another nginx worker process is busy updating the cache. This is really useful in the event your web server crashes, as to clients they will receive data from the cache.

location /admin
With this location statement we are telling nginx to take all requests to ‘http://myforums.com/admin’ and pass it off directly to our backend server with no further interaction — no caching.

That’s it! You can start nginx by running ‘/usr/local/nginx/sbin/nginx’, it should not generate any errors if you did everything right! To start nginx on boot you can append the command into ‘/etc/rc.local’. All you have to do now is point the respective domain DNS records to the IP of the server running nginx and it will start proxy-caching for you. If you wanted to run nginx on the same host as your Apache server you could set Apache to listen on port 8080 and then adjust the ‘proxy_pass’ options accordingly as ‘proxy_pass http://127.0.0.1:8080;’.

Extended Usage:
If you wanted to have nginx serve static content instead of Apache, since it is so horrible at it, we need to declare a new location option in our vhosts/*.conf file. We have two options here, we can either point nginx to a local path with our static content or have nginx cache our static content then retain it for longer periods of time — the later is far simpler.

Serve static content from a local path:

        location ~* ^.+.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|exe|pdf|ppt|txt|tar|mid|midi|wav|bmp|rtf|js)$ {
            root   /home/myuser/public_html;
            expires 1d;
        }

In the above, we are telling nginx that our static content is located at ‘/home/myuser/public_html’, paths must be relative!! When a user requests ‘http://www.mydomain.com/img/flyingpigs.jpg’, nginx will look for it at ‘/home/myuser/public_html/img/flyingpigs.jpg’. The expires option can have values in seconds, minutes, hours or days — if you have allot of dynamic images on your site then you might consider an option like 2h or 30m, anything lower defeats the purpose. Using this method has a slight performance benefit over the cache option below.

Serve static content from cache:

        location ~* ^.+.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|exe|pdf|ppt|txt|tar|mid|midi|wav|bmp|rtf|js)$ {
             proxy_cache_valid 200 301 302 120m;
             expires 2d;
             proxy_pass http://10.10.6.230;
             proxy_cache one;
        }

With this setup we are telling nginx to cache our static content just like we did with the parent site itself, except that we are defining an extended time period for which the content is valid/cached. The time values are, content is valid for 2h (nginx updates cache) and every 2 days the content expires (client browsers cache expires and requests again). Using this method is simple and does not require copying static content to a dedicated nginx host.

We can also do load balancing very easily with nginx, this is done by setting an alias for a group of servers, we then define this alias in place of addresses in our ‘proxy_pass’ settings. In the ‘upstream’ option shown below, we want to list all of our web servers that load should be distributed across:

  upstream my_server_group {
    server 10.10.6.230:8000 weight=1;
    server 10.10.6.231:8000 weight=2 max_fails=3  fail_timeout=30s;
    server 10.10.6.15:8080 weight=2;
    server 10.10.6.17:8081
  }

This must be placed in the ‘http { }’ section of the ‘conf/nginx.conf’ file, then the server group can be used in any vhost. To do this we would replace ‘proxy_pass http://208.76.83.135;’ with ‘proxy_pass http://my_server_group;’. The requests will be distributed across the server group in a round-robin fashion with respect to the weighted values, if any. If a request to one of the servers fails, nginx will try the next server until it finds a working server. In the event no working servers can be found, nginx will fall back to stale cache data and ultimately an error if that’s not available.

Conclusion:
This has turned into a longer post than I had planned but oh well, I hope it proves to be useful. If you need any help on the configuration options, please check out http://wiki.nginx.org, it covers just about everything one could need.

Although I noted this nginx setup is deployed on a Xen guest (CentOS 5.4, 2GB RAM & 2 CPU cores), it proved to be so efficient, that these specs were overkill for it. You could easily run nginx on a 1GB guest with a single core, a recycled server or locally on the Apache server. I should also mention that I took apart the MySQL replication cluster and am now running with a single MySQL server without issue — down from 4.