Identifying Phishing Pages with Shodan
Methodology
In an effort to proactively identify phishing pages on the internet, I took to Shodan and started crafting queries to identify pages targeting Microsoft and Google. As with everything else in security, there is no sliver bullet. That is to say, there is no single search to identify them all so I will be showcasing various searches.
As for how the searches were crafted, I looked at the official login pages and created queries that look for slight variations of the official one.
The images below are from login.microsoftonline.com at the time of writing. There are a few things we can key off of to identify phishing pages:
- “Login” and “Logi in” are not words seen but are also common with login pages
- “Microsoft” is not in the page title
Breaking Down the Searches
Wrong Page Title (Microsoft)
http.title:Microsoft http.title:Login port:80,443 “200 OK” -http.html:”with Microsoft”
- http.title:Microsft http.title:Login looks for any website with both words (Microsoft and Login) anywhere in the title
- port:80,443 limits the results to any websites hosted on standard HTTP(S) ports
- ”200 OK” attempts to limit the results to anything that is a valid landing page. This is done to filter out redirects as well as error pages.
- -http.html:”with Microsoft” makes an attempt to exclude websites that use Microsoft for their authentication.
We see IP 52.14.178[.]143 with a few notable features:
- Wrong site title
- The domain appears to try to appear to be related to Office365 despite being under ddns[.]net
Putting this IP in URLScan confirms our suspicions. At the time of analysis, none of the subdomains Shodan listed still existed, checked using https://centralops.net/co/DomainDossier.aspx.
On the second page of the results, we see another login page with the same title and a domain of wwwmtb3[.]com
Confirmed the IP as a phishing page using URLScan:
Also confirmed it’s domain as seen in Shodan is still live and a phishing domain.
Another IP found with the same title and confirmed the same way (not showing to avoid repetition): 40.115.92[.]247
Wrong Page Title (Google)
Google’s login page also says “Sign in” instead of login, so we can use the same type of search.
http.title:Google,Gmail http.title:Login port:443,80 -http.html:”with Google”
Adding one in from Google to demonstrate this works with more that just Microsoft. The query logic is pretty much the same except adding ,Gmail in the http.title field. The comma treats as an “or”, so it is looking for any site title with Google or Gmail in it.
After submitting most of them to URLScan we get a potential hit.
Favicon (Google)
Favicons are the little icons that are in the corner of the browser tab. Some phishing pages may use them to appear more legitimate but that also means we can find those pages if we know the hash for the image.
Identify the Hash
The official accounts.google.com favicon is the “G” logo:
Now we need to find the hash for it, there are tools on GitHub to do this but we will look at the raw data on Shodan. We start by searching the domain accounts.google.com.
Click on the _Raw Data view. There should be a field called favicon in the JSON data, find it and find the hash value for it. (Probably easiest to click on Expand All and just CTRL+F for “favicon”)
We can then search for anything with that same favicon hash by clicking on it and start filtering from there.
We end up with the following search with one potential hit.
http.favicon.hash:708578229 port:80,443 200 http.title:Login,Signin
Closing Notes
I have barely scratched the surface here. Shodan allows you to filter on many parts of the HTTP/HTML data and phishing pages obviously vary widely in sophistication.
Shodan scans the internet once a week, so I hope this helps someone identify phishing threats in near real time.
There are some notable instances where these techniques will likely not find the phishing page(s):
- The phishing page is not the landing page (there appears to be an exception to this if the landing page redirects to the phishing page).
- The web server requires specific HTTP parameters to direct you to the phishing page.
Lastly, in my research on this, I submitted roughly 50 IPs/domains to URLScan and concluded 6 were likely to be phishing; some of those were already known by VirusTotal, some were not. I am not the first to document the use of Shodan for finding phishing pages; I am definitely no expert, just teaching myself threat intelligence and sharing some things I learn along the way. Thanks for reading!