Leveraging Breach Data for OSINT
Unearth the secrets of leveraging data breaches for OSINT investigations in this comprehensive cybersecurity article, packed with insights on real-world implications and essential strategies for staying vigilant in the constantly shifting digital landscape.
Breach Data
According to the Australian Government, "A data breach happens when personal information is accessed, disclosed without authorization, or is lost." Once that data is out in the open, either sold for a profit or just shared online, that data is then known as "breach data." Not all data breaches contain the same type of data.
Some data breaches would only have an email and a password hash, while others would have IP addresses, last login times, plain-text passwords, etc. It depends on the breach and the industry of the company that was breached. For example, if you look at healthcare companies whose data has been exposed, you are more likely to find Social Security numbers in the data. However, this may not be the case if an online gaming company has its data breached.
There is usually a misconception about breach data, where people assume all breach data results from threat actors. Systems can be misconfigured, thus allowing anybody with internet access to view private files. Companies themselves can send data to the wrong parties, leading to a data breach.
Now that I have given a bit of background on what data breaches are, I will go into detail about how to leverage breach data to learn more about a target/subject while doing an OSINT investigation. It is interesting how it perceives the corrections and adapts the conversation to the new rules.
Leveraging Breaches
I use two websites that can tell you if you have been part of a data breach. They are https://haveibeenpwned.com/ and https://dehashed.com/. I like haveibeenpwned (HIBP) because you do not have to log in to use the search feature. That said, I will be using HIBP for the rest of this tutorial. To use either of the aforementioned sites, you will need the email address of the target/subject. I go about getting this information by leveraging how people search websites to get this information. I will start by only using a first name, last name, and city, then pivot to as much information as possible.
From just one website, I was able to get their address, potential phone numbers, potential associates, and email addresses. Now that we have the email address, we can use HIBP to see if this email was part of any breaches:
We can now see that the user was involved in two data breaches. If you search online for the StockX breach, you will likely come across a website called breached.co (currently). Previously, they went by the name "RaidForums." This website is a marketplace where people buy and sell data breaches. The following is the listing for the StockX breach:
The website generates revenue by charging users to download data breaches. I have never paid for a data breach nor do I plan to do so in the future. There are ways to search on Google, DuckDuckGo, and other search engines where you can find breaches for free, whether on pastebin-like or shady websites. This technique is known as Dorking (e.g., Google Dorking). For StockX, I managed to locate the data on a shady website:
Neither I nor the Secjuice team vouch for any of these websites. There is always a risk of these files being malicious and potentially containing trojans. Please practice proper safety and security measures while browsing online.
With the breach downloaded, we now search for the email we have for the target/subject:
In the previous image, I have concealed some data because it contains someone's personal information. One of the fields was a hash of the password.
A hash is an output of a hashing algorithm run on data. Commonly, these are used for password storage. The way that hashes work is by taking your password, running it through the algorithm, and then saving the hash with your account. Every time you try to log in afterward, the website processes the text you input as a password through the algorithm and compares the output to the initial password's output. If they both match, you can log in. Currently, there are two popular "weak" algorithms that are easy to brute force: MD5 and SHA-1.
I used an online website, hashes.com, to check if they had cracked this hash before:
Since the password used a weak hashing algorithm, MD5, the website easily cracked it. You could always use hashcat (https://hashcat.net/hashcat/) or John the Ripper (https://www.openwall.com/john/) to crack these hashes locally. However, be aware that these processes can be CPU/GPU intensive. I now have the password. Using the password, if it appears in other breaches, you can search for it in those breaches to see if the user has different email addresses or usernames. In this instance, the StockX breach revealed their account username and a family member's name associated with the account. I used this information to gather more details. I had no luck with the account name, so I proceeded with the family member's name:
I was able to find two more accounts belonging to the same person. In addition, I found new usernames, email addresses, and password hashes for this person. I entered all the hashes I had found so far into the same website mentioned earlier to see if the website had already cracked these hashes and stored them in their cache. As it turns out, they had:
Three of the passwords shared a common word at the beginning, but just had the end tweaked a bit. I now have more proof that these accounts belong to the same person.
This is only the data I found from one breach. You can keep going down the rabbit hole by using this information to search other breach data. In addition, with the new emails discovered, you can input them into HIBP and check if they were part of other breaches as well. I wanted to show a brief overview of how breach data can be leveraged to uncover new information. Eventually, you will exhaust all the information you can gather about a person using people search websites. When that happens, I believe the next step is to utilize breach data for further insight into the target or subject.
Creating your own Breach Collection
I learned about being proactive from Michael Bazzell in regards to privacy. His books state how there are tasks one can complete to be ahead of the breaches and leaks that will lessen an impact for a person when their data is eventually in a breach. I use the same logic but for data breaches. There is completely nothing wrong with searching for a data breach, downloading it, and then using it for your research. However, there is a timely cost for this, as there is time that it takes to go hunt down data breaches, as well as downloading them. That is why I suggest people be proactive and build their own collection. If you have your own collection, then there is time saved if the target or subject's data is found in one of the breaches that you have. I recommend downloading from the "Largest breaches" section of HIBP:
I believe these are useful due to their quantity. There will always be niche breaches that your target/subject is going to be a part of. These, however, give you a good starting point for your investigations. Make sure to backup your collection, as these end up taking up a lot of space the more you collect. I highly recommend the 3-2-1 backup strategy from Seagate at: https://www.seagate.com/blog/what-is-a-3-2-1-backup-strategy/. This will make sure that if you are still able to function, even if you lose one copy of your breach collection.
Hope this helps with your future OSINT research!