Linux Malware Detectection

[ UPDATE: Linux Malware Detect has been released ]
I have the last few weeks been working on a new project for malware detection on Linux web servers, it is already at a pre-release version in use at work and it has shown phenomenal promise.

Right to it, some background… On a daily basis the network I manage receives a large number of attacks, most of these are web based abuses against common web application vulnerabilities which inject/upload to servers an array of malware such as phishing content, defacement tools, exploits for privilege escalation and irc c&c bots. All these actions are typically logged and recorded by our network edge snort setup which got me to thinking if we started to catalog some of the injected malware, I could hash it and then detect it on servers.

Now, some might be thinking – “network edge IDS? why not convert it to IPS and stop the attacks right away?” – though this is something I am actually in the process of doing, there is a much larger problem and that is content encoding. Allot of malware attacks are coming in these days in base64 and gzip encoded data payloads which snort or any other IDS/IPS products for that matter are currently NOT capable of decoding without use of fancy transparent proxy setups that are out of the scope of standard network edge intrusion detection/prevention.

So, this brings us to a host based solution for malware detection which as it turns out is not so easy as there is no simple sites that actually track malware specifically targeting web applications and the ones that do exist focus primarily on Windows based malware; utterly useless. To address this short coming, what I have done is essentially written a set of tools that extracts from specific ids events the payload data of attacks (decodes if needed) and saves/downloads the content attackers are trying to inject. This data is then processed for false positives by me every couple of days followed by the creation of md5 hashed definitions of the malware for the detection tool. The hashes are compiled in two methods, the first is straight md5 hashes of the data and the second are hashes of “chunked” elements of the data in specific increments and formats as so to detect commonly occurring malware code in otherwise unique files and content types.

The scanner portion of the malware detection tool comes in 3 varieties, the first is a standard “scan all” feature which scans an entire defined path, the second is a “scan recent” feature that can scan a path for content created in the last X days (i.e: /home/*/public_html content created in the last 7 days) and the third is a real time monitoring service component that uses Linux inotify() kernel feature to detect real time file create/move/modify operations and scan content immediately as it is created under user web paths (default /home[2]/user/public_html).

The malware hit management is a very simple anti-virus like quarantine system that moves offending files to ‘INSTALL_PATH/quarantine/’ and logs the exact source path and destination file name in quarantine locker in case you need to restore any data due to false positives (though this should never happen since we are using hashed detection). In addition, the quarantine function can optionally search the process table for running tasks that contain the file name of the offending malware and kicks off a kill -9 against it.

The event management is handled in two ways, for manual user invoked scans from cron/command line, emails are directly dispatched with the scan results including quarantine details – nothing really fancy here. The monitor component that uses inotify() on the other hand, has the potential to generate allot of quarantine events in rapid succession so a standard email out on every hit isn’t appropriate. Instead, we have a daily cron job that runs an internal option in the malware detect tool to read ONLY new lines from a quarantine hit list and dispatch a daily event summary if any quarantine hits are found. Since we are only reading new lines from the hit list, we avoid repetitive daily alerts for events we already know about and retain the hit list as an “all-time” hit list that can later be used to derive trending data / phone home features for global trending.

Finally, the project also contains an internal update function to check for new hashes and runs in the daily cron task in addition to a simple check feature that determines if inotify() based monitoring is running, if it is not then it kicks off against /home[2]/user/public_html a scan for content created in the last 48h.