Tracking & Killing Bot Networks

In a previous blog I discussed how one of the more enjoyable parts of my day-to-day malware rituals also involves the tracking and killing of command and control bot networks. Recently I have begun automating this process a bit; I have created a series of scripts that extract irc servers, port numbers and channels from malware as it comes in and then checks if the irc server is still online, a custom bot then logs into the server, queries the active channels and determines how many zombies are active on the network. If an irc server is determined to be active with zombies actively connected, the server is then reported to the abuse address listed in the whois information for the servers IP Address.

The automation of this process is something I have had on my todo list for a little while but finally stopped procrastinating it and got it done. The real advantage of it being automated now is I can easily generate a tangible set of information that allows for me to see how many bot networks are present in the malware I process daily, weekly and monthly, how many of those networks are still active and more importantly how many of those networks have active zombies still connected. Likewise, as I’ve discussed previously, I am working on a threat portal and having the irc c&c data processing automated will more easily allow me to put that information on the threat portal and integrate it into the aggregate threat feed that the portal will offer for route/firewall/DNSBL drops.

Here are some statistics on IRC command and control networks as seen in the malware processed by me in the last 30 days:
Total Processed Malware (30d): 607
Total IRC C&C Servers: 251
Total Online IRC C&C Servers (as of 08/17/10): 118
Total Online IRC C&C Servers with Active Zombie Hosts: 30
Total Zombies Observed on Online IRC C&C Servers: 1,679 (55 average per server)

There are some notable observations, out of the total of 251 noted IRC C&C servers, only 118 of them are still online, of those 118 that are still active, 64 of them utilize free DNS naming services and/or dynamic dns services, the other 54 create C&C channels on established public IRC networks or use the DNS name of compromised hosts running an IRC server. Most every one of the 133 now inactive IRC servers used IP addresses within the host malware script, a small majority used DNS names of compromised hosts.

It goes without saying that by using public DNS services / dynamic DNS services, it allows attackers the flexibility to quickly recover a C&C server and its participating zombies in the event of the host server being shutdown. Further, a number of more mature IRC C&C bots will continue reconnection attempts periodically when disconnected from the host C&C server, further increasing the chance of fully recovering the zombie network for the attacker.

Also increasingly, PHP is becoming more common as a language of choice for C&C bot agents, though Perl agents are still vastly more popular. The LMD project currently has classified 44 unique C&C bot agents comprising 286 agent scripts/binaries, 14 classes or 38 scripts of which are PHP based and 21 classes or 213 scripts of which are Perl based, 9 classes or 35 scripts/binaries being Other (c/ruby/java).

Currently there is an average of 6 bot networks being abuse reported per day, of those only about 2-3 per day ever receive any form of followup and/or shutdown of the host running the network. That is a rate of less than 50% on average, which is abysmal to say the least. When the threat management portal goes up in the coming weeks, these networks will find themselves at the top of the threat feed and planted squarely on the front page of the portal — we might not be able to shut them down but we sure can filter them off our networks.

ATF v2: Weighted Threats

When I first introduced you all to the Aggregate Threat Feed back in May, it was a much smaller feed with very simple ambitions — pulling together threat data at work from our network edge and host based firewalls and aggregating the data into a usable feed. The actual intention being that as an attacker exposes themselves more on the network through invasive scans and attacks, they would quickly climb up the threat feed and end up banned proactively. Though this did and still does happen in a way, a problem was introduced when more and more data started to come in from the network edge and it quickly outweighed data from the hosts.

The old way the threat feed was sorted was by number of events. For the network edge IPS the events correlated with actual signature events on the network edge, so these could number from 50 events for an SNMP community scan to thousands of events for an SSH scan. Then you have the host based firewall events (mostly brute force attacks), these events are correlated into the feed by the occurrence of an attackers address across unique servers, so if made a brute force attempt against 11 servers it would show in the feed as 11 events.

The problem that developed here is that the network edge IPS is far more noisy on an exposure level than the host based firewalls, so you would end up with hundreds of IP’s from the network edge with thousands of events each, while the host based firewalls, even though they represent hundreds of attacking IP’s also, the actual event counts relative to unique servers those IP’s attacked, was FAR lower. This meant that often the top 50 or 100 items in the threat feed were all IPS events, though quite valid events the actual host based events had more of a threat significance than some of the IPS events. The host events were simply being washed out of the top 100 on the list from the sheer volume of IPS events (who really wants to import 300 addresses from a threat feed? let alone even 100).

So, what I decided on doing was adding a weighted field into the database that is based on unique targets for each attacking IP. This weighted field is the new sort method for the feed and it works something like this. If the IPS picks up an attacker hammering five servers with an SQL injection exploit, that attacking IP ends up in the threat feed with a weight of 5, if we then have an attacker that runs brute force attacks on 30 servers, that attacking IP ends up in the threat feed with a weight of 30. The end result is that the threat feed gets better populated with the highest weighted attackers at the top, so those attackers who are more aggressive across unique targets, quickly end up at the top of the list. This allows the feed to better protect the devices/hosts it is being used on from a developing attack before the attacker reaches that device/host on the network.

Drop Format:

List Format (fields: IP | SERVICE | EVENTS | WEIGHT):

(ATF) Aggregate Threat Feed

aggregate feed sexyness
ATF sexyness

For my first post back into things in awhile (a long while), I thought I would introduce everyone to the sexyness that i’ve called the Aggregate Threat Feed or ATF for short. This feed is derived from threat data at work, namely our network edge IPS (a custom snort implementation, another post on that later) and aggregated firewall data from 250+ servers, mostly being brute force/invasive scan attack addresses.

There really is nothing terribly fancy about this, the data is presented in a drop list format that is updated every 15 minutes with an optional variable to adjust the amount of addresses returned: (defaults at 100)

The entries in the list are sorted on the database side by highest event count first, you can optionally view the source and event count entries in the list but this is considered strictly for review purposes (it wouldn’t be of much other use). Take note that the maximum value for ‘top’ is 300.

The review data looks something like follows:

———————- ips 227 ips 202 fw 176 ips 130 fw 125

This is pretty basic to understand, the distinction to note however is that event numbers for IPS source data can be 50 events against 1 or 20 servers whereas the event count for fw sourced data typically reflect unique servers. So an address sourced from fw data with 200 events, actually hit 200 servers.

The next release of APF due in the coming months, will feature many changes and among them will be the inclusion of ATF as part of the new feed subscription feature. Further, users will have the option to enable reporting to the server that allows your own block data to be included in the ATF database. As more installations opt-in on this feature it will allow data aggregation to reflect a more global threat landscape that truly represents the users of APF (currently active installations based on those fetching the reserved.networks list daily: 46,921).

Also on the agenda is a simple ATF landing page that presents statistical data and some fancy graphs/charts (probably use google api cause im lazy like that), that will allow users to better visualize threats included in the feed and details on the actual events that caused an address to end up in it (snort events, firewall triggers etc..).