There have been several recent instances of spikes in spam traffic that have been reported by a large number of websites. One that appears to be affecting a lot of sites is a jump in traffic coming from Warsaw, Poland, mostly from a small group of referring domains.
GA4 automatically excludes traffic from bots and spiders that identify themselves as such, but does not do a good job of filtering out spam traffic that is not self-identifying. In most cases, these bots are not trying to do any harm to your site, apart from adding unwanted load to your web server. But they are polluting your Google Analytics metrics, making it hard to separate the signal from the noise.
So, what can you do about it?
First, decide if a surge or spike in traffic really is a bot. To do this, you need to isolate the anomalous traffic in reporting and decide if it looks human or not. You are looking for a dimension or combination of dimensions that show unexpected/unreasonable growth and likely have a very low engagement rate. Though the latter is not always true – sometimes spam traffic will generate metrics that look very much like human behavior. If you have a high enough volume of ecommerce purchases or lead form submissions, a lack of these metrics might also help you identify non-human traffic, but if not you may have to make a call based on instinct. E.g., “do I really believe that half of my visitors come from Ashburn, Virginia?”
Below are the reports and dimensions we tend to look at when we are doing this type of diagnosis.
*The dimensions with asterisks often work better when combined with other dimensions – for example, combining screen resolution with browser may reveal suspicious traffic in a way that the individual dimensions do not. You can add a secondary dimension to any of the above reports to report on two dimensions, but you’ll need to use Explorations or Looker Studio in order to analyze more than two dimensions at the same time.
In regular reporting, your only option is to add an exclude filter each time you want to view a report with spam traffic removed. Unfortunately, you can’t save filters, so you have to recreate the filter every time you view a report. It is possible to customize a report and save it with a filter applied, but if you do that you won’t be able to apply additional filters to the report.
To the right is an example of a filter that removes the spam traffic from the report screenshotted above. This example excludes traffic that we know to be spam, but sometimes there is no way to exclude spam traffic without also excluding a little bit of genuine traffic. For example, we recently dealt with a bot with the following characteristics:
The latter two are obviously very common for human visitors, and there are also circumstances where GA can’t identify the location of a real visitor, but because of the volume of bot traffic, it was worthwhile to apply the filter, even though some real humans would be excluded.
In Explorations, you can create a segment that excludes the dimensions associated with spam traffic and apply it to multiple report tabs in the same Exploration.
And in Looker Studio, you can create a filter and apply it at the chart, page or dashboard level. The latter is typically the best option, so you don’t have to remember to apply it each time.
Each of these methods removes bot traffic from reporting, but you may be wondering, “what if I want to prevent it from showing up in GA4 in the first place?”
The only built-in mechanism for doing this is to filter traffic based on IP address(es). It is also possible to prevent GA4 tags from firing in Google Tag Manager (GTM) based on certain traffic attributes. Or you set a value for the traffic_type parameter in GTM that flags it as spam, then filter it in GA4. Setting up the latter approaches requires proficiency with GTM and possibly some custom JavaScript, depending on the combination of dimensions you are filtering on. The specific setup varies based on the attributes in question and is beyond the scope of this article to describe.
The actual feature in GA4 is called an “Internal traffic filter”, but it should have been named “IP address filter”, since it is filtering by IP address, internal or otherwise. Google’s documentation describes this pretty well, so I won’t walk through each step here, but a couple of words of advice:
Thanks for reading, and good luck bot hunting! It’s a skill we all may need to lean on a bit more in the future 😉
Welcome to May! We highlight key digital marketing updates over the past month across analytics,…
Custom Insights in GA4 is great for monitoring & alerting on important metrics for changes.…
Happy April! We review key digital marketing updates in the month of March, including renaming…