Subtle “Bugs” in Perl Regex and LongTail Botnets

Wowza, I just learned a valuable lesson in Perl Regex.

For some reason my botnet analyze script was pulling in IP addresses into big_botnet that had no visible connection (no matching md5sum).

As seen in a prior post, I gather IP addresses together, and then use the md5sums of the attacks to try and add more IP addresses to the botnet.

The code I was using was this:


$ip=$_ ; # Ip is a line from the botnet definition file
open (SUMDATA, “/var/www/html/honey/attacks/sum2.data”);
while (){
$line=$_;
if (/$ip/){

And the file it is searching through looks like this:


a71ac83fc03ca530136e2adb4e175f48 62.4.9.24.shepherd.1-2015.02.04.12.57.09
ad53ba7b3ea9d177559bcea56fc44448 62.4.9.2.shepherd.1-2015.01.18.15.11.05
f06669ab489d368448e6238460ba060f 62.4.9.2.shepherd.3-2015.01.21.12.23.03
f06669ab489d368448e6238460ba060f 62.4.9.2.shepherd.4-2015.01.22.13.59.13
02d62d45952d93d1dab97aedb7443df5 43.229.53.25.edu_c.573-2015.08.17.04.38.58
7fcba4c6bba56214f9c2473ab2b471f8 43.229.52.134.kippo2.28-2015.05.24.17.02.42
7fcba4c6bba56214f9c2473ab2b471f8 43.229.52.137.edu_c.22-2015.05.24.20.02.27
7fcba4c6bba56214f9c2473ab2b471f8 43.229.52.148.kippo2may.27-2015.05.24.17.02.55
7fcba4c6bba56214f9c2473ab2b471f8 43.229.52.156.edub.16-2015.05.24.19.59.58
31fb7e10045de0476964c6af769d465a 62.4.9.2.shepherd.2-2015.01.19.14.50.50
62d7d6a8d9360c4bab7a7c46277b459e 62.4.9.24.shepherd.2-2015.02.08.01.29.47
7556cd86e6aa22a6b9f171fcf05687cb 62.4.9.2.shepherd.5-2015.01.23.12.40.52
7556cd86e6aa22a6b9f171fcf05687cb 62.4.9.2.shepherd.6-2015.01.30.01.39.15
11604da37fe8e63e252aa255a4119e05 62.4.9.24.shepherd.3-2015.02.10.07.56.53

See the problem?

Nope? OK, here it is. Searching for an ip address of 62.4.9.2 not only finds , but also 62.4.9.24.

AND

Since the “.” means “match any single character”. So it also matches the MD5 checksums:


7fcba4c6bba56214f9c2473ab2b471f8
^^ ^ ^ ^^

So this explains why my botnet script is pulling in weird hosts.

My code now looks like this:


if (/\Q$ip.\E/){

The \Q and the \E is equivalent to the “-F” (Fixed strings) in grep. And that last “.” makes sure that I don’t match extra numbers in the last part of the IP address.

Now it’s time to start making botnets from scratch!