Kippo and LongTail

The batch integration of Kippo text logs is moving along, but has pointed out some annoying bugs in my rebuild code. The actual integration of the data into my log files is done (and mostly tested). The problem is that I have been creating lists of IP addresses, Usernames, and Passwords based on “ALL” the old data, and then using those files (passwords.all, for instance) to determine what passwords are new for that day.

Sooooo, I need to nuke those files, and then iterate through the days in order and recreate those files to get new passwords for that day. This is also important for my Trends anaylsis.

Thankfully I already had “rebuild” code in place, so I just need to do the analysis in that section of code.

Easy, but non trivial (I hope).

Version 2.0 of LongTail will handle batch AND live feeds from Kippo.


sshPsycho RED, Yellow, and Green, How are they determined?

  1. Could you please tell how you identify IP address belonging to ‘yellow’ and ‘green’ types? Because we also have such honeypot, and we are interested in extracting ‘yellow’ and ‘green’ type IP addresses.

Short answer here:
RED attacks

I am identifying the yellow and green types by analyzing the incoming attack patterns for similarities to attacks that came from the 4 sshPsycho IP address ranges.  These are the “Gold” standard.  Just to be clear, those IP address ranges are the following four class C subnets: 103.41.124, 103.41.125, 43.255.190, 43.255.191.  These are the Red portions of the bar chart.
YELLOW attacks

The yellow portions of the bar chart are what I am calling “Friends of sshPsycho”.  These are IP addresses that have used the exact same attack pattern for at least one attack as an attack that came from sshPsycho.

Another way of saying this is that I have analyzed the data from the attacks to create what I call an “Attack Pattern”.  I have then broken these attack patterns into individual files on my server.  I can then compare ALL of the attack patterns to easily determine which ones are exactly the same.  sshPsycho used the same exact attack patterns multiple times in their attacks.

So, if I see the same attack pattern from a different host, I know that whoever is running the attack from that server has received the tools and the dictionaries they are using from sshPsycho.  I am calling those IP addresses “Friends of sshPsycho” since I do not KNOW if they are the same people or not.  (My feeling is that they are the same group of hackers.)

GREEN attacks

Green attacks are what I call “Associates of sshPsycho”.  In English this implies a more distant relationship.  To clarify, in English the cloessness of a relationship goes from Self -> Spouse -> Family -> Friends -> Associates.

I have analyzed the sshPsycho attacks and have determined several characteristics that only appear in their attacks.  I have also characterized their typical attack patterns.

Specifically, the only places I have seen the passwords “XXXXXX” and “YYYYYY” come from was from sshPsycho.  Again, I have never seen those passwords come from any place besides China and Hong Kong, and mostly from sshPsycho IP addresses.  (I am not specifically mentioning WHICH two passwords they are just to make it a little harder for the bad guys to figure out which two passwords to stop using, but it’s also kind of obvious if you look at the data hard enough).

When I see those passwords, AND it is an attack against ONLY root, I list them as “Associates”.  Coincidentally these have only come from China.

As sshPsycho was being blocked from their main servers, I saw the amount of blue go up in my charts significantly.  This indicates to me that they are still active but using new IP addresses to attack from.  By coloring this set of attacks as green, I was able to show this belief that they are somehow connected to sshPsycho.

BLUE attacks

There is still a higher than normal number of blue attacks.  Those are attacks from IP addresses that are either other hackers than sshPsycho, or are IP addresses that have not yet shown they are associated with sshPsycho.

As these attacks continue I believe I will be able to move more of the blue IP addresses into the green or yellow sections of the chart.


That’s too hard to explain in a blog entry.  I am (slowly) working on a paper to discuss this issue.  I basically group all attacks against a single host that are closely connected in time and calling that an attack pattern.  When there is too much of a gap between attacks, I close the attack pattern and start a new one.

Based on the number of exactly the same attack patterns that I have seen from sshPsycho from multiple IP addresses, I believe it works.

Making a Dirt Simple SSH HoneyPot

So you want to run your own ssh honeypot but don’t know where to start? That’s OK, I didn’t either. What I did know was that in the battle between my honed over many years paranoia, and my at times insatiable curiostity, was that curiosity won! But I still don’t want to let the script-kiddies onto my honeypot. So while I want to know what’s out there, I want to be as safe as possible.

One of the first rules of running a honeypot is “Start Simple”, (as seen in Forensics Analysis of a Compromised Honeypot ). This is so you can start getting data as soon as possible, without opening yourself up to the “Great Unknown” of hackers and script-kiddies. The more you open yourself up to gather data, the greater the chance that you might miss something and leave yourself wide-open to the script-kiddie with the magic “Silver Bullet”. Right now I’m interested in accounts, passwords, and IP addresses where they are coming from. This is a fairly standard starting point in the world of honeypots.

There seems to be four methodologies of running honeypots,

  1. Ridiculously high interaction, these are wide open servers, where it’s a real server with backdoors left open intentionally and monitored through the network to see what is being done to it. I’ve see this on the web, but I can’t find a link now that I’m looking for one.

  2. High interaction servers where the server looks and acts like a real system, but are heavily monitored to make sure the hackers don’t get full control, and all their actions are logged. These are more “real” than Kippo, but have external firewalls blocking outbound attacks. Again, I’ve seen this on the web but can’t find a link now.

  3. Medium interactions servers that look like real servers but aren’t, and while they look like a live system, they don’t really do anything. This is like Kippo Honeypot

  4. Low interaction servers, like custom built ssh honeypots like at or through modifying the openssh code like at hacking-sshd-for-a-pass_file

For my purposes, I chose to low interaction server and modified the current OpenSSH code myself (openssh-6.7p1.tar.gz). I wanted the data, but I wasn’t willing to take a lot of risk getting it. What I basically did was to set my REAL sshd to run at an obscenely high port number, so I could still get in, and have my customized sshd running on port 22.

As a side note, while there are lots of instructions out there on doing this, I’m documenting how “I” did it, with my viewpoint on what I did, and how I did it, and I’m including some other details that they’ve left out.

Setting up my “REAL” sshd

To setup my real sshd, I edited /etc/ssh/sshd_config and set the following variables:

# Not the real port, but a really high port number

#These are the defaults anyways, but I wanted to MAKE SURE!
SyslogFacility AUTH
LogLevel INFO

#This is the default anyways, but I wanted to MAKE SURE!
PermitRootLogin no

# Set so only I can login from a few IP addresses
AllowUsers myaccount@* myaccount@localhost

Then I did a

service sshd restart

and checked that I could still ssh into my server, and that it was properly logged into /var/log/secure.

Setting up my “FAKE” sshd

I chose to use the “real” OpenSSH instead of any of the “fake” ssh daemons because …

I downloaded the code from into /usr/local/source/openssh.

I then edited auth-passwd.c include #include “canohost.h” so I can get the IP address, added a logit line, and added a “return 0” so that it always thinks the password was wrong. Please note that I added the word PassLog in the logit line to make it easier for me to grep out the account/password lines from the log file.

The context diff is below (and the source file is at

[wedaa@localhost openssh-6.7p1-22]$ diff -c3 auth-passwd.c-hacked auth-passwd.c.orig
*** auth-passwd.c-hacked 2015-03-27 11:12:05.767799071 -0400
— auth-passwd.c.orig 2014-07-18 00:11:25.000000000 -0400
*** 55,67 ****
#include “auth.h”
#include “auth-options.h”

– /* Added by ERICW so we can do IP lookups*/
– #include “canohost.h”

extern Buffer loginmsg;
extern ServerOptions options;

extern login_cap_t *lc;
— 55,63 —-
*** 87,100 ****
struct passwd * pw = authctxt->pw;
int result, ok = authctxt->valid;

– /* ERICW ADDED logit */
– logit(“IP: %s PassLog: Username: %s Password: %s”, get_remote_ipaddr(), authctxt->user, password);

– /* ERICW ADDED return 0 so the password ALWAYS fails */
– return 0;

#if defined(USE_SHADOW) && defined(HAS_SHADOW_EXPIRE)
static int expire_checked = 0;
— 83,88 —-
*** 226,229 ****
strcmp(encrypted_password, pw_password) == 0;

— 214,216 —-

Then I did a ./configure, make, and a make install. By default openssh will install into /usr/local/etc, so I wasn’t worried about killing my /sbin/sshd file. I then renamed /usr/local/etc/sshd to /usr/local/etc/sshd-22 and /usr/local/etc/sshd_config to /usr/local/etc/sshd_config_22. I did this so that I could also recompile the code and create another sshd_config file for listening to port 2222 and to make sure that the logging indicated which ssh daemon and port was being targeted.

I checked /usr/local/etc/sshd_config_22 to make sure it was still set to Port 22, and set the logging to the same as my real sshd.

SyslogFacility AUTH
LogLevel INFO

I also made sure to disable publickey logins (for the time being), by editing the RSAAuthentication and PubkeyAuthentication to no.

RSAAuthentication no
PubkeyAuthentication no

As a side note, it took me a few tries to actually get the code to compile since I was missing some packages on my honeypot server. I had to add

yum install wget
yum groupinstall ‘Development Tools’
yum install zlib
yum install zlib-devel
yum install openssl-devel libssh-devel

at this point it’s ready to go. I started it manually with

/usr/local/sbin/sshd-22 -f /usr/local/etc/sshd_config_22

and then did an ssh to port 22. The connection worked, but my account and password failed to log me in, as they should (remember I put the “return 0;” line into the code). I then checked my log to make sure it was there. Then I deleted the log so my password wasn’t in the log file anymore.

Since my system still supports the file /etc/rc.local, I added this line to the end of that file as well. That way when my server reboots, it starts up the honeypot sshd as well

/usr/local/sbin/sshd -f /usr/local/etc/sshd_config

Setting up iptables for my honeypot

Then I had to setup my firewall to allow ports 22, 2222 AND into my server.

# Accept inbound ssh
-A INPUT -m state –state NEW -m tcp -p tcp –dport 22 -j ACCEPT
-A INPUT -m state –state NEW -m tcp -p tcp –dport 2222 -j ACCEPT
-A INPUT -m state –state NEW -m tcp -p tcp –dport <BIGPORT> -j ACCEPT

Telling SELinux about the new ssh ports

You also need to run the following selinux commands if you have selinux installed.

semanage port -a -t ssh_port_t -p tcp 2222
semanage port -a -t ssh_port_t -p tcp <BIGPORT>
semanage port -l | grep ssh # Shows ssh ports

Setting up my router for my honeypot

Then I had to setup my router to allow ports 22, 2222, AND into my server.

Doing all this the eash way…

I have a script at which will do most of this for you.

Lessons learned from running my honeypot

So… I’ve been running my ssh/http honeypot for almost a month now. While
I’m not really ready to release all my reports to the intar-webs yet,
there are some lessons that are readily apparent. And sadly, these
are the same lessons everyone else has been screaming about for the
last dozen years.

SSH lessons

  1. Don’t allow root to ssh into your server. Make sure your
    /etc/ssh/sshd_config has “PermitRootLogin no” set.

  2. Don’t use stupid passwords. Passwords like “password”,
    “admin”, and “123456” and their assorted variations are the
    top passwords that ssh brute force attacks try.

  3. Longer passwords are better than shorter passwords. Well over
    95% of the passwords tried are 8 characters or less.

  4. Don’t keep the default passwords for any software you install.
    Looking at Google for the passwords tried shows that many of them
    are default passwords for one piece of software or another.

  5. Don’t keep the default passwords for any hardware (including
    routers). They keep trying “admin” accounts with the password
    “admin” which was a default for older home routers.

Webserver lessons

  1. Patch bash! They keep trying ShellShock attacks against my
    honeypot, so they must be suceeding enough of the time to make
    it worth their while.

  2. Patch PHP! The second most common attack is against old PHP

  3. Don’t install things into their default directories on the
    webserver. Most attacks are against default scripts that get
    put into the cgi-bin directory. Even renaming cgi-bin to CGI-BIN
    and changing your httpd.conf file to reflect that change eliminates
    more than 95% of the attacks. Close after that are phpMyAdmin.
    webtools, ccbill, cgibin (no dash), and /mail. Rename those
    directories and even if you’re vulnerable, they probably won’t
    find you quickly.

What are they trying to run?

  1. Number one thing they are trying to run is the Atrix IRC worm.
    This is an IRC bot that lets them attack other servers, AND can
    give them the ability to run commands on your server as whatever
    UID is running httpd.

  2. bash. yes, bash. There’s an option in bash to open a network
    connection to the outside world. This lets them telnet to whatever
    port they decided to use and have a bash shell on your server to
    run whatever they want to.

  3. ssh brute force scripts. These try to login to other servers
    and then report the successfull attempts back to another server.

  4. ONE instance of a rootkit. Why? I have no idea. I assume
    the other attacks are precursors to downloading and running a
    rootkit on your server.

IP Address Obfuscation and modification with sed

I’m working on a logfile analyzer for a honeypot. One of the things
I’m interested in is copying the report files to a public website so
that others can see it too.

But, there are some potential privacy issues involved. Since I’m doing
an analysis on where the attacks are coming from, and reporting on them,
I don’t think I want to share the exact IP address that the ssh probes
came from. So how do I do that? Well, I could use Perl, but the analyzer
is a “Big-Ass Shell Script” so I want to minimize how often I run Perl.
The IP addresses are hidden in other lines of text so I can’t use Awk (That
would be too simple. So I have to use Sed.

For the record, I’m running this on Linux, Fedora Core 20 to be exact.
Thinking this would be easy was a mistake. It took me almost an hour
of mucking about before I had a working sed expression. Once I figured
out I NEEDED to use the “-r” option (which is “use extended regular
expressions in the script”) then things finally started falling into

The following sed expression returns the IP address just as it came in.

echo |\
sed -r ‘s/([0-9]{1,3}\.)([0-9]{1,3}\.)([0-9]{1,3}\.)([0-9]{1,3})/\1\2\3\4/’

And the output is

And THIS sed expression replaces the second octed with the word “FOO”.
Please note the “.” after the word “FOO”. That’s part of what gets
substituted in.

echo |\
sed -r ‘s/([0-9]{1,3}\.)([0-9]{1,3}\.)([0-9]{1,3}\.)([0-9]{1,3})/\1FOO.\3\4/’

And the output is


And in this example, I am reversing the IP address.

echo |\
sed -r ‘s/([0-9]{1,3}\.)([0-9]{1,3}\.)([0-9]{1,3}\.)([0-9]{1,3})/\4\3\2\1/’

And the output is (Please note that we have a trailing “.” at the end of the
IP address. I leave removing the trailing “.” as an exercise for the reader.)

Soooooo, What’s the sed expression really doing? Let’s break it apart
into different lines so it’s easier to understand.

Start sed using extended regular expressions

sed -r

Single quote to start the expression, and “s” says to do a substitution.


The start of the search expression.


This is the first “remembered” pattern. The open parenthesis and the
close parenthesis mark the start and end of the remembered pattern.
The “[0-9]” means all the characters between 0 and 9. The “{1,3}”
means the PRIOR pattern 1, 2, or 3 times only. This means “x” doesn’t
match, but “1”, “11”, and “111” match. The “\.” means literally a single
period. It’s backslashed to mean a period. Without the backslash, a
single period means “match any single character”.


This is the second “remembered” pattern.


This is the third “remembered” pattern.


This is the fourth “remembered” pattern. Please note there is NO trailing
“.” character.


Print the first “remembered” pattern.


Print the second “remembered” pattern.


Print the third “remembered” pattern.


Print the fourth “remembered” pattern.


And finally, a final backslash and a single quote to show the end of
the sed expression.


So what can we do with this? In the expression, instead of printing
all four remembered patterns, we can print other things by replacing
the “\#” with something else. So instead of “\1\2\3\4”, we could have
“\1\2\3127” which would print out Patterns are
SINGLE digits(1 through 9), so \3127 doesn’t mean the 3,127th pattern, but means
print the third pattern (\3), followed by the other text.

What are the problems with this expression? Well, it doesn’t explicitly
deal with true IP addresses. A true IP address goes from to This pattern I made goes from to 999.999.999.999.
For what I need to do, this is close enough.

Now, I need to obfuscate URLs. The same deal applies.

echo “” |sed -r ‘s/(http:\/\/..+)(.+)/http:\/\/HIDDEN\/\2/’



And I did it again with FTP.

echo “” |sed -r ‘s/(ftp:\/\/..+)(.+)/ftp:\/\/HIDDEN\/\2/’



Thanks to which was a great
help in remembering how to use sed.