Adventures of an SSH Honeypot
In order to gather data about how SSH servers are attacked, I decided to build my own SSH honeypot.
COVID-19 has caused a major shift in how enterprises perform their day-to-day operations, with a large pool of them settling for remote work. Some enterprises have gone beyond and made remote work as an acceptable standard should the employee desire. Naturally, this is has caused an uptick in phishing emails and log-in difficulties.
When I read in April that the World Health Organization’s staff members emails were leaked, it slowly dawned on me how this situation is ripe for exploitation. How are malicious actors taking advantage of this situation - and to what extent?
That answer proved to be too broad, due to the presence of phishing emails, malware, security misconfigurations and even disinformation. I needed to narrow it down. So, I decided to focus on just one protocol: SSH.
Introduction
Here’s an extremely simplified version of what the protocol is and how it authenticates users: Secure Shell (SSH), is an application layer protocol that is designed to provide remote log-in capabilities to users and administrators so that they can work on machines that stretch throughout their workplaces or even across continents.
Now, the way SSH works is that it essentially provides three methods of authentication:
- None
- Password
- Public Key
The ‘None’ authentication method does exactly what is says - no authentication is required to gain access. Simply type in the username and IP address, and if the username exists - you’re in!
The ‘Password’ authentication method goes one step further. It will ask for the above two requirements and then prompt you for the password. If it is a match - the system will grant you access.
Finally, the ‘Public Key’ authentication method utilizes public-key infrastructure to authenticate the user. Here, instead of a password - you provide the private key associated with your account. If the private key is found to be the match of the public key that is stored on the server, it will grant the user access to the system.
Implementation
As a student, I do not have access to production servers and databases. Thus, in order to gather data about how SSH servers are attacked, I decided to build my own.
Using the standard Python library and the paramiko module, I was able to create a simple SSH server that authenticated users using passwords and public keys.
I chose to leave out the 'None' authentication method as the knowledge of a random IP address alone does not prove that you are a user of the system in question. Security through obscurity is a valid layer, but in the case of ‘None’ authentication method it would end up being the core security mechanism - which would be catastrophic.
Keeping this in mind, I have focused on using the ‘Password’ and ‘Public Key’ authentication methods. In addition to logging which methods are used, I have also logged the following parameters: IP address, usernames, passwords, and SSH clients used for authentication.
In addition to using paramiko's own logging utility, I have also used the logging module - which is part of the standard Python library. Finally, I have used Numbers for visualising the data.
Note: If you are deploying this in the cloud, please be aware that costs can rise dramatically. For this project, I used Amazon Web Services (AWS) to deploy the honeypot. To avoid overstepping the free tier limits, the script has its own fail-safe measure to shut down automatically after a preset amount of time has passed.
Analysis
First of all, let’s talk about usernames. Usernames were found to be in line with the most common usernames present in SecLists, as the graph below details the top 10 most common usernames I found in my honeypot.
As a consumer, one thing that you are assured of (yet have no control over) is security. When data breaches occur and customer data is sold, think how easy it would be to correlate your usernames to your real identity - what story would the data tell? Thus, when it comes to usernames I suggest that unless there is a need to publicly declare your identity, use a random username. Lastpass provides a random username generator that you can check out here.
In my username data, an interesting find was 'cirros', which after a quick Google search turns out is a "minimal Linux distribution that was designed for use as a test image on clouds such as OpenStack Compute".
Nice.
Next up, we'll take a look at passwords.
Again, passwords were found to be in line with SecLists. It looks like admin is still common enough to be exploited, which is a bit disheartening to see. This has served as yet another reminder about the necessity of strong passwords (regardless of your password management process). Another piece of advise is something straight out of the 1Password playbook - who says security questions have to answered with words, use long passwords instead! This way, resetting passwords becomes just as tough as cracking a password - and doesn't scale to meet the needs of threat actors.
There were two distinct passwords that I found, namely ‘gocubsgo’ and ‘cubs win:)’. Guess somebody is a baseball fan, although there are better ways to support your favorite team!
The next parameter we'll look at is the most interesting: IPs!
About 134 unique IPs were observed from China, 46 from the U.S., 22 from the Netherlands ,19 from France and 11 from Russia. This falls in line with reported behavior, although I noticed that none of the connections were bombarding the server with requests - even if successive connections were attempted, they were done so in a manner that maintained a 2 minute gap between each attempt.
This is probably to thwart basic firewall configurations, which means that both security teams and threat actors are aware of tuning firewall configurations - an endeavor that I believe will help us secure systems over the long run.
However, connections by themselves do not indicate malevolence. To gain a better understanding of malicious behavior, I used abuseipdb.com's API to verify maliciousness.
On fetching their data, I found that of the 134 connections made from China, 46 from the US, 22 from the Netherlands, 19 from France and 11 from Russia - all were marked as malicious with an abuseConfidenceScore of 100 (on a scale of 100). Yikes.
However, I'd like to state one thing very clearly: the region within which an IP exists does not imply that the malicious actor resides in the same region as well. Attribution is tough, and singularly relying on IP addresses for determining nefarious activity is a poor practice.
Lastly, let's take a look at the SSH clients that were used to connect to the server.
It was unsurprising to see that PUTTY clients actually beat libssh/openssh clients, as most host systems tend to be Windows machines. Though libssh and openssh did not lag far behind, libssh seems to be more popular than openssh, with 284 connections coming from libssh as opposed to 26 from openssh. Also, the Go programming language seems to be picking up traction is a variety of areas, including networking, as 21 connections were made using a SSH Go client.
Transparency
In the hope that my efforts will be useful in some manner, I have uploaded all of the code to GitHub. You can fetch the dataset from there as well. I will update the data to the extent that Amazon’s free tier allows me to - so expect monthly updates! If you have any constructive criticism about how I could have improved this project, please feel free to message me on Twitter or issue a pull request on GitHub.
I’d love to hear your ideas on how to expand this!
Conclusion
Researching about honeypots, building a small one myself and deploying it proved to be a very fulfilling task. The pandemic has certainly shown me that you can make a difference even if you are staying indoors (and even if you are just a student) - if business operations aren't stopping, neither should we stop learning!
P.S. I am aware of the fact that I could have used a customized configuration of sshd - but where’s the fun in that? 😁