LMD 1.3.9: Quietly Awesome

It has been a busy couple of weeks for the LMD project, lots of late nights and sleepless days behind me and I can say I am a ‘little’ happier with where things are in the project now 🙂

This release has no major feature changes or additions other than a modification in the default hexdepth that is used to scan malware; increased from 15,736 to 61,440 (1024*60). This enables LMD to better detect threats that it was having a little difficulty with due to the byte size of some malware. At the moment there is no byte-offset feature that would allow us to create more targeted hex signatures, which kind of fly’s in the face of my goal of improving performance — but it is what it is and for the moment things are O.K. With the new hex-depth value, the miss rate on valid malware that rules already exist for is below 1% and I can live with that till I put together a new scanner engine/logic. You can apply this update to your installations by using the -d|–update-ver flags.

In light of performance concerns with the current incarnation of LMD, I felt it prudent, and also has been requested, to create a set of signatures compatible with ClamAV. This allows those who wish to do so, to leverage the LMD signature set with ClamAV’s very impressive scanner performance. Although there are performance concerns with LMD on large file sets, it does need to be said that for day-to-day operations of the cron initiated or inotify real-time scans, there is little to no performance issues. This is strictly a situation where if you choose to scan entire home directory trees, where file lists exceed tens or hundreds of thousands, you now have an option to take advantage of our signatures within a scanner engine that can handle those large file sets.

The LMD converted ClamAV signatures ship as part of the current release of LMD and are stored at /usr/local/maldetect/sigs/ and are named rfxn.ndb and rfxn.hdb. The ClamAV signatures will not be updated with the usage of -u|–update but rather are static files placed inside the release package when it is rebuilt nightly with new signatures. As such, the latest versions of the rfxn.ndb|hdb files are always available at the following URL’s (these are updated whenever LMD base signature are updated — typically daily):
http://www.rfxn.com/downloads/rfxn.ndb
http://www.rfxn.com/downloads/rfxn.hdb

To make use of these signatures in ClamAV using the clamscan command, you would run a command similar to the following:
clamscan -d /usr/local/maldetect/sigs/rfxn.ndb -d /usr/local/maldetect/sigs/rfxn.hdb -d /var/clamav/ –infected -r /path/to/scan

The -d options specify the virus databases to use, when you use the -d option it will exclusively use those databases for virus scans and ignore the default ClamAV virus database, so we redefine -d /var/clamav/ to also have all of the default ClamAV signatures included in our scans. The –infected option will only display those files that are found to be infected and the -r option is for recursive scanning (descend directory tree’s).

That all said, the real guts of changes recently have been in the signatures themselves, we have in the last two weeks went from 6,083 to 6,769 signatures, an increase of 686 — one of the largest updates to signatures to date for a single month. A great deal of these signatures have come from the submissions queue which ended up in such a backlogged state that there was 2,317 items pending review. Currently I have managed to get the queue cut down to 1,381 and I have committed myself to eliminating the backlog by the end of the month or shortly there after (ya we’ve heard that before right ?). It should be understood that reviewing the submissions queue is an exceedingly tedious task as every file needs to be manually reviewed, assessed on its merits as malware (or deleted), then ultimately hex match patterns created, classify the malware, hash it and insert it into the signatures list. As painful of a process as it is at times, I do enjoy it, it is just very time consuming so please show patience when you submit malware with the checkout option.

Finally, I have allot in store for LMD going forward, as always time is the biggest factor but be assured the project will continue to grow and improve in the best interest of detect malware on your servers. Likewise, for any who have followed previous blogs, the dailythreats website is still a work in progress and will compliment the project once it is released, in the very near future. As always, thank you to everyone who uses my work and please consider a donation whenever possible.

Did You Know? Stats:
Active Installations (Unique IP daily update queries): 5,903
Total Downloads (Project to date): 32,749
Total Malware Signatures: 6,769
rfxn.com Malware Repository: 21,569 files / 1.6G data
Tracked Malware URL’s: 18,304
Tracked IRC C&C Botnets: 421

LMD 1.3.7: Milestones, Fixes & Signature Updates

Today marks the release of LMD 1.3.7, which is a minor release update that fixes a few bugs and is also the final 1.x release before version 2.0 as described in the LMD: one year later blog post. The bug list for LMD has remained very small over the last 6 months and this release reflects that by fixing the current outstanding bugs.

Changes 1.3.6 => 1.3.7:
[Fix] package ownership at some point got set to uid 501 instead of root
[Fix] daily cronjob now checks ps output for inotifywait proc instead of pidof
[Fix] monitor mode users would exit prematurely if a user home path did not exist
[Fix] a file hijacking race condition existed with quarantine mode restore function
[Fix] inotify max_user_instances value was being set to a value that would cause inotifywait
to fail

A thanks goes out to Mark McKinstry of Nexcess.net for assistance tracking down and fixing the issue with inotifywait reporting on some systems that inotify support did not exist in the kernel, when it actually did, this was an issue with the value maldet was setting for inotify max_user_instances. A thanks also goes out to Jeff Patersen from webhostsecurity.com for identifying and bringing the file hijacking race condition to my attention. This issue had the potential, under certain circumstances, to allow a user to gain access to root-owned files in user-readable paths. These fixes on their own are reason enough for all users to update, the -d|–update-ver command switches will take care of all update business for users so there is no reason to not update (i.e: # maldet -d).

Today I have also put up a small set of signature updates on top of the regular daily queue processing, this includes 25 HEX signatures for various items in the review queue as well as associated file hashes. This brings the project to over 5,000 signatures, a milestone that has been a long time coming and one that sets this project apart from all other malware projects in the Linux world. Even the top tier AV vendors and open source project ClamAV lack the depth of malware signatures that LMD brings to the Linux community. At the moment, the project is growing by an average of 14 signatures per day with a review queue that I still need to finish processing of over 1300 user submissions.

We also can celebrate another milestone this month, with passing 3,000 confirmed installations of LMD (3,241 as of this writing). We can determine this by checking the number of unique IP addresses (servers) that check-in daily to the rfxn.com server for signature updates. The total downloads of LMD sit at 12,952 to date, which is roughly where we expect it to be having had 3 major releases (minor releases dont get much attention) that most users would have installed or updated to.

As a holiday gift to all LMD users, I am making it my goal to have all pending items in the review queue processed and signatures created by the end of December, so keep your eyes open and i’ll make a post when that has been completed.

LMD: One Year Later

With my move back to Canada behind me and adjusting to some new routines with life, its about time to get back into the mix with the projects. Though things have been slow the last couple of months, it has not stopped me from making sure regular and prompt malware updates are released.

Today, we reflect on the first year of Linux Malware Detect, which was released in a very infantile beta release about a year ago. The project has evolved in allot of ways from its original goals, it has certainly changed in every way for the better. What was originally to be a closed project, relegated to mostly internal work related needs, ended up like most of my projects morphing into a public release. The first release saw the world with less than 200 signatures, no reliable signature update method, manual upgrade options and very flawed scanning and detection methods (v 0.7<). Now, we sit at version 1.3.6, with 4,813 signatures, a scanning method that though still needs some work, is far superior than what was originally in place, a detection routine based on solid md5 hashes and hex signatures. We have cleaner rules that can clean some nasty injected malware, we got a fully functional quarantine and restore system, reporting system, real-time file based monitoring, integrated signature updater and version updater and a vibrant community of users that regularly submit malware for review. Yes, LMD has grown up!

The most grown-up part of LMD has to be how signatures are handled and how the processing of them is almost an entirely automated process now, this was detailed a little more in Signature Updates & Threat Database posted in September. The key part here though, is “almost entirely automated”, everyday that the processing scripts run to bring in new malware, there is always a number of files that cant be processed automatically and these are moved to a manual review queue. With how busy life has been the last couple of months, the review queue has slowly risen to 1,097 files pending review. This queue is at the top of my list for tackling over the next couple of days and weeks, its allot of work to review that much malware but it will get done. Many of the files to review are actual user-submissions so if you did submit something and find its still not detected by LMD, this would be why :).

There is still allot on the to-do list for LMD going forward, with the upcoming release of version 2.0 we will see some changes in how LMD does business. The first and to me the biggest will be optional usage statistics, which will allow users to have LMD report anonymized statistics back to rfxn.com. These statistics will show us which malware hits are found on your servers, which in turn contributes towards better focus on what type of malware threats are prioritized in the daily processing queue for hashing & review. The statistics will also help create informative profiles on the soon-to-be-released dailythreats.org web site about how maldet is used and what are the most prevalent threats in the wild.

Other additions to LMD 2.0 will be a refined scanner that will provide greater speed with large file sets (50k – 1M+ files), an ability to fork scans to the background, better and more predictable logging format for 3rd party processing of LMD log data, redesigned reporting system, full BSD support, ability to create custom signatures from the LMD command line, expanded cleaner rules, wildcard support for exclude paths, a number of security and bug refinements and as always, more signatures.

If you have any feature requests for LMD 2.0, go ahead and post them as a comment and I will make sure they get added to the list. Thank you to everyone who continues to support rfxn.com projects through donations, feedback and by just using & spreading the word about the projects. I look forward to another year of LMD and seeing it become the premier malware detection tool for Linux and all Unix variant OS’s.

Signature Updates & Threat Database

It has been a very active month for those that pay attention to the signatures as they are released, you might have noticed a sudden spike about two weeks ago in signatures from 2,500’ish to the now 4,425 mark. The vast majority of these signatures were put up in MD5 format as a great many are variants of “known” malware and were extracted through processing historical threat data for the last 90 days, sorted by unique hashes, from clean-mx.de. I also did some leg work in my processing scripts which has allowed them to handle base64 and gzip decoding of POST payloads from IPS data which is generating a marked increase in new malware and known malware variants. Together, this has added 1806 MD5 and 31 HEX signatures in the last 45 days bringing us to the current mark of 4425 (2808 MD5 / 1617 HEX) total signatures.

In addition to the above, the daily processing scripts have been rewritten and combined into a single task on the processing server, this has brought together what was previously 9 different scripts into a single, streamlined and much more efficient task. The reason that things got to the point where there was 9 different scripts to update various elements of the back end processing server is that the LMD project developed very fluidly over the last year, meaning that every time I had a new idea or added a new feature, I in turn created a new script to support the idea/feature — over time this naturally was not sustainable and now what we have is exactly that — sustainable.

For those interested, here is the output report generated and sent to my inbox at the end of each daily malware update task:

started daily malware update tasks at 2010-09-13 00:09:35
running daily malware fetch... finished in 710s
running daily ftp malware fetch... finished in 6s
regenerating signatures from daily malware HEX hits... finished in 95s
propagating signature files... finished in 2s
generating sqlfeed data... finished in 88s
running mysql inserts for sqlfeed on praxis... finished in 42s
syncing & updating malware source data (master-urls.dat).... finished in 27s
syncing & updating irc c&c nets... finished in 15s
rebuilding maldetect-current... finished in 3s
pushing maldetect-current and signatures to web... finished in 4s
completed daily malware update tasks at 2010-09-13 00:26:05 (990s)
processed 156 malware url's
retrieved 40 malware files
extracted and hashed 16 new signatures
extracted 59 new irc c&c networks
queued 24 unknown files for review

An important part to streamlining the daily update tasks was also in rewriting some of the basic processing scripts to better log and store information on malware sources, such information includes date, source url, file md5, sig name, top level domain, online state, ip, asn, netowner and more. All malware is also now processed through an IRC extraction script that checks for irc server details in malware files and adds it to a irc command & control list with details such as date, source file md5, source file sig name, irc server, irc port, irc chan, online state, ip, asn, netowner and more. The “online state” fields in both the malware source and IRC c&c databases perform active checks, for the malware source this is simply verifies a URL is still active and/or domain still resolves, for the IRC c&c database this is a bot that manually connects to the irc network and verifies the network and channels are online & populated. All irc users, host masks and a sampling period of channel activity is also recorded from each active IRC c&c network, this information at this time is not included in the database as allot of it requires sanitizing as many IRC c&c networks dont mask connecting hosts and the channel activity reveals exceedingly sensitive information about actively vulnerable web sites and servers, this is something I am working on adding but its a difficult task so it will take some time. The malware signatures database has also been populated but requires a little more work, mainly adding meta data to describe each signature in a format that is longer than the single-word descriptions included in the signature naming scheme.

Together, the malware signature database, the malware source database and the IRC C&C networks database will all tie together into a single threat portal to be released in the next couple of weeks (I hope) allowing correlation between data in all 3 databases seamlessly. For example one could query all malware sourced from a specific IP, ASN or Netowner or you could find all the source URL’s for a specific malwares MD5 signature, or you could query the signature database to find more information on a specific signature, etc… there are a great many options that will be available for reviewing, cross referencing and exporting data from the databases.

These databases are all already completed, active and receiving updates, all that is left for me to do is create the front end that will find its home on http://www.dailythreats.com. The signature database, as expected, has 4,526 entries, the malware source database has 7,859 entries and the IRC C&C database has 386 entries. There is currently 511 files pending review in the malware queue, there has been 3,592 malware files reviewed in the last 45 days, of those 1,806 were unique files and the 511 files in queue for review represent files that could not be auto-hashed against a known threat or variant threat from HEX pattern matches.

The biggest pitfall of all these changes has been the explosion in the review queue that I must tend with daily, it has started to back up on me as I am in the middle of moving from Michigan to Montreal but as soon as I am done with my move in a couple of weeks, I plan to get that queue under control and work on some more back end scripts to help streamline its processing slightly.

Well that’s it for now, keep an eye out for details to come on the dailythreats.com site, its going to be exciting 🙂

Understanding Signatures

The signature naming scheme for LMD is a little confusing and something I’ve received more than a few questions about, more so about what the *.unclassed signatures mean. The naming scheme (to me) is straight forward and breaks down as follows:

{SIG_FORMAT}lang/vector.type.name.ID#

The ‘SIG_FORMAT’ is either HEX or MD5 reflecting the internal format of the signature, the ‘lang/vector’ is the language or attack vector of the malware, ‘type’ is a short descriptive field for what the malware does (i.e: ircbot, mailer, injection etc…), ‘name’ is a short descriptive name unique to the piece of malware and ‘ID#’ is the internal signature ID number.

What some people appear confused about is signatures such as ‘{HEX}base64.inject.unclassed.7’ that use the term “unclassed” for the name field. Essentially, signatures that are unclassed represent a group of malware that is not necessarily unique from each other but that follows the same attack vector, such as base64 encoded scripts; there are hundreds of these scripts and in encoded form it doesn’t really matter what they do, we are detecting the encoded format not the decoded, so they get lumped together. In other instances, I will throw some malware into an unclassed group when it is very new and I have not had time yet for processing it into its own classification, for example the web.malware.unclassed is a dumping ground for allot of malware that is newly submitted, which I have reviewed and confirmed IS MALWARE but have not yet classified it or determined if it is a variant of an existing malware classification.

It needs to be understood that the processing of malware is mostly a manual task, though there are some elements of it that are automated, the actual review of each malware file is done by hand to remove the chance of false positives — keeping LMD accurate and reliable. As such, not all malware makes it into a classification group right away, the important part is that malware is reviewed, verified and signatures generated for it in a timely fashion. I process malware daily from the network edge IPS system at work, from user submitted files and from various malware news groups / web sites and the priority is getting the signatures up for in the wild threats. The signature name/classification serves informative purposes, yes it is important but not as important as the actual verification and signature generation.