In this blog, we will quickly go through what is the Certificate Transparency log and how we used BigQuery to search for bad domains trying to spoof a business’s domain (and how you can do it as well).
Certificate Transparency Log
Certificate Transparency log is an open project to help fight bad CA’s (Certificate Authorities), stolen certificates, misissued certificates and phishing attacks. At its core, it’s a distributed append-only log that all certificate authorities are required to report to whenever they issue a certificate. This gives business owners and web providers the ability to monitor every certificate that’s issued – for example, if someone tries to issues “google-very-secure-domain.com”, Google can get alerted and get this domain off the web before he initiates phishing attacks. But what if a CA is not reporting to the log? Well, in the very near future the browser will block or show a warning for every site whose certificate is not in the certificate log. This puts the attacker in a catch-22 position – if they or their CA are not reporting to the log then the browser is blocking their website and if the CA reports their new certificate to the log they have very little time to initiate the attack before someone gets their website down.
You can read more about the Certificate Transparency Log here.
Certificate Transparency Log Search
When we tried to interact with the CT we found it pretty hard and also with limited search capabilities, as it’s final API is intended for the browser and not for security analysts.
BigQuery to The Rescue
We wrote a small serverless:) program that syncs 4 google logs (argon2018-2021) to bigquery everyday. Here is the link to the public dataset – https://bigquery.cloud.google.com/dataset/phish-ai-production:phish_ai_ct_log
In this example, we will hunt phishing website that pretends to be Bank of America. We composed the following query:
Let’s look at the results:
We see a certificate that was issued yesterday (by Let’s Encrypt, of course) which looks suspicious – bankofamerica.com.auth-verification-process.com. If we go to the website we indeed saw that it’s a phishing website and by the time we published it was reported and blocked by chrome.
Interesting stats that we can build with Google Data Studio or just with running different queries:
We can see that 90% of all certificates are Let’s Encrypt, which makes sense because it’s free and easy.
What’s Next – Searching & Monitoring
You can setup a Google Cloud account and connect the BigQuery public database to your account and setup a cronjob that runs queries on a daily basis and alerts you of certificates impersonating your website (the first 1TB of queries is free on Google Cloud). Or, if you wish to use “as a service” you can use our app https://app.phish.ai/search-certificate
The monitoring option is currently available only in closed beta so please reach out. However, we will soon deploy a public beta. Unfortunately, The number of searches are limited for the unpaid plan, seeing as it costs us money to run the BigQuery, so you have two options here – subscribe to our paid plan or run queries directly on top of BigQuery (it will also cost you money). You can use also use some code-snippets available at our repo to set-up your own monitoring.
We are looking forward to more extended and open use of the certificate transparency log and seeing what stuff the community will come up with on top of this great data initiative.