PhishTank is a free community website where users and security vendors submit and share phishing data. PhishTank is doing a great job of collecting phishing data from the community around the world. However, some shortcomings of PhishTank are the lack of up-to-date statistics and the fact that many phishing websites are untagged or misclassified (that is, missing data about the targeted brand – Google, Microsoft, Bank of America, etc…).
We decided to put our technology to the test and use some of Google data warehouse and analysis tools. Here is an illustration of the overall technology:
Data & results
Here are the input data so you can reproduce the results:
We analyzed about 11,000 URLs (out of which only 6,600 are unique, as PhishTank data has a lot of duplicates) out of the 25,000 (ordered from newest to oldest) in the database.
About 50% of them were Offline but we managed to detect and tag 1,800 phishing sites (1,100 unique out of total 6,600 and about 3,000 offline).
Here are the results:
- Publicly shared BigQuery dataset (You must have a Google Cloud account with BigQuery enabled)- You can run your own queries against this dataset (Google will charge you for that).
- Publicly shared Google Data Studio report – You can create new reports against the mentioned BigQuery dataset – (Data Studio is free but you are charged for the queries made by DataStudio).
A few disclaimers:
- Our engine is still in Beta and we are not detecting all brands yet so the results are a little biased, Although I think it gives a pretty good overview of current phishing trends.