Signature Updates & Threat Database

It has been a very active month for those that pay attention to the signatures as they are released, you might have noticed a sudden spike about two weeks ago in signatures from 2,500’ish to the now 4,425 mark. The vast majority of these signatures were put up in MD5 format as a great many are variants of “known” malware and were extracted through processing historical threat data for the last 90 days, sorted by unique hashes, from I also did some leg work in my processing scripts which has allowed them to handle base64 and gzip decoding of POST payloads from IPS data which is generating a marked increase in new malware and known malware variants. Together, this has added 1806 MD5 and 31 HEX signatures in the last 45 days bringing us to the current mark of 4425 (2808 MD5 / 1617 HEX) total signatures.

In addition to the above, the daily processing scripts have been rewritten and combined into a single task on the processing server, this has brought together what was previously 9 different scripts into a single, streamlined and much more efficient task. The reason that things got to the point where there was 9 different scripts to update various elements of the back end processing server is that the LMD project developed very fluidly over the last year, meaning that every time I had a new idea or added a new feature, I in turn created a new script to support the idea/feature — over time this naturally was not sustainable and now what we have is exactly that — sustainable.

For those interested, here is the output report generated and sent to my inbox at the end of each daily malware update task:

started daily malware update tasks at 2010-09-13 00:09:35
running daily malware fetch... finished in 710s
running daily ftp malware fetch... finished in 6s
regenerating signatures from daily malware HEX hits... finished in 95s
propagating signature files... finished in 2s
generating sqlfeed data... finished in 88s
running mysql inserts for sqlfeed on praxis... finished in 42s
syncing & updating malware source data (master-urls.dat).... finished in 27s
syncing & updating irc c&c nets... finished in 15s
rebuilding maldetect-current... finished in 3s
pushing maldetect-current and signatures to web... finished in 4s
completed daily malware update tasks at 2010-09-13 00:26:05 (990s)
processed 156 malware url's
retrieved 40 malware files
extracted and hashed 16 new signatures
extracted 59 new irc c&c networks
queued 24 unknown files for review

An important part to streamlining the daily update tasks was also in rewriting some of the basic processing scripts to better log and store information on malware sources, such information includes date, source url, file md5, sig name, top level domain, online state, ip, asn, netowner and more. All malware is also now processed through an IRC extraction script that checks for irc server details in malware files and adds it to a irc command & control list with details such as date, source file md5, source file sig name, irc server, irc port, irc chan, online state, ip, asn, netowner and more. The “online state” fields in both the malware source and IRC c&c databases perform active checks, for the malware source this is simply verifies a URL is still active and/or domain still resolves, for the IRC c&c database this is a bot that manually connects to the irc network and verifies the network and channels are online & populated. All irc users, host masks and a sampling period of channel activity is also recorded from each active IRC c&c network, this information at this time is not included in the database as allot of it requires sanitizing as many IRC c&c networks dont mask connecting hosts and the channel activity reveals exceedingly sensitive information about actively vulnerable web sites and servers, this is something I am working on adding but its a difficult task so it will take some time. The malware signatures database has also been populated but requires a little more work, mainly adding meta data to describe each signature in a format that is longer than the single-word descriptions included in the signature naming scheme.

Together, the malware signature database, the malware source database and the IRC C&C networks database will all tie together into a single threat portal to be released in the next couple of weeks (I hope) allowing correlation between data in all 3 databases seamlessly. For example one could query all malware sourced from a specific IP, ASN or Netowner or you could find all the source URL’s for a specific malwares MD5 signature, or you could query the signature database to find more information on a specific signature, etc… there are a great many options that will be available for reviewing, cross referencing and exporting data from the databases.

These databases are all already completed, active and receiving updates, all that is left for me to do is create the front end that will find its home on The signature database, as expected, has 4,526 entries, the malware source database has 7,859 entries and the IRC C&C database has 386 entries. There is currently 511 files pending review in the malware queue, there has been 3,592 malware files reviewed in the last 45 days, of those 1,806 were unique files and the 511 files in queue for review represent files that could not be auto-hashed against a known threat or variant threat from HEX pattern matches.

The biggest pitfall of all these changes has been the explosion in the review queue that I must tend with daily, it has started to back up on me as I am in the middle of moving from Michigan to Montreal but as soon as I am done with my move in a couple of weeks, I plan to get that queue under control and work on some more back end scripts to help streamline its processing slightly.

Well that’s it for now, keep an eye out for details to come on the site, its going to be exciting 🙂