Removing bot traffic from GA4 reports

There have been several recent instances of spikes in spam traffic that have been reported by a large number of websites. One that appears to be affecting a lot of sites is a jump in traffic coming from Warsaw, Poland, mostly from a small group of referring domains.

report with spam

GA4 automatically excludes traffic from bots and spiders that identify themselves as such, but does not do a good job of filtering out spam traffic that is not self-identifying. In most cases, these bots are not trying to do any harm to your site, apart from adding unwanted load to your web server. But they are polluting your Google Analytics metrics, making it hard to separate the signal from the noise. 

So, what can you do about it?

Identifying bot traffic

First, decide if a surge or spike in traffic really is a bot. To do this, you need to isolate the anomalous traffic in reporting and decide if it looks human or not. You are looking for a dimension or combination of dimensions that show unexpected/unreasonable growth and likely have a very low engagement rate. Though the latter is not always true – sometimes spam traffic will generate metrics that look very much like human behavior. If you have a high enough volume of ecommerce purchases or lead form submissions, a lack of these metrics might also help you identify non-human traffic, but if not you may have to make a call based on instinct. E.g., “do I really believe that half of my visitors come from Ashburn, Virginia?” 

Below are the reports and dimensions we tend to look at when we are doing this type of diagnosis.

  • Acquisition > Traffic acquisition – set your primary dimension to Session source / medium: look for a source you’ve never seen before that doesn’t make sense. Also take note if there’s been a big increase in ‘(direct) / (none)’ traffic. The latter is not useful for filtering traffic since you get plenty of real direct traffic, but bot traffic often has no attributable source.
  • User attributes > Demographic details – set your primary dimension to Country: look for countries outside of your market area – in particular, countries whose primary language is different from the language(s) of your site.
  • User attributes > Demographic details – set your primary dimension to City: look for relatively low-population cities with disproportionately high traffic volume.
  • Tech > Tech details – set your primary dimension to Browser*: this one is not very useful by itself, but often bot traffic will be identifiable by a specific combination of browser, OS and screen resolution.
  • Tech > Tech details – set your primary dimension to OS with version*: per the previous bullet, this dimension can be helpful in combination with others. You may also see high volumes of traffic from old OS versions that look suspicious, for example Windows 7 and 8.
  • Tech > Tech details – set your primary dimension to Screen resolution*: for some reason, bots often show up with a 800×600 screen resolution or other resolutions that were common 20 years ago.
  • Engagement > Landing page – it is most common for bots to request your home page, which is not a useful differentiator, but sometimes they make high-volume requests of a page that doesn’t even exist on your site, which is a dead giveaway. For example, a few years back, there was a bot that requested the page path /bottraffic.live on millions of websites.
  • IP address – this one is a bit tricky, since GA4 doesn’t report on nor store IP addresses for visitors. But it is possible to block traffic from specific IP addresses in GA4, and bot traffic often comes from a narrow range of IP addresses. Your hosting provider or CDN may provide reporting that includes the IP addresses of visitors or you may be able to access web server log files that include IP addresses. 

*The dimensions with asterisks often work better when combined with other dimensions – for example, combining screen resolution with browser may reveal suspicious traffic in a way that the individual dimensions do not. You can add a secondary dimension to any of the above reports to report on two dimensions, but you’ll need to use Explorations or Looker Studio in order to analyze more than two dimensions at the same time.

Removing bot traffic from GA4 reports

filter dialog

In regular reporting, your only option is to add an exclude filter each time you want to view a report with spam traffic removed. Unfortunately, you can’t save filters, so you have to recreate the filter every time you view a report. It is possible to customize a report and save it with a filter applied, but if you do that you won’t be able to apply additional filters to the report.

To the right is an example of a filter that removes the spam traffic from the report screenshotted above. This example excludes traffic that we know to be spam, but sometimes there is no way to exclude spam traffic without also excluding a little bit of genuine traffic. For example, we recently dealt with a bot with the following characteristics:

  • Country = (not set)
  • Browser = Chrome
  • OS = Windows 10

The latter two are obviously very common for human visitors, and there are also circumstances where GA can’t identify the location of a real visitor, but because of the volume of bot traffic, it was worthwhile to apply the filter, even though some real humans would be excluded.

In Explorations, you can create a segment that excludes the dimensions associated with spam traffic and apply it to multiple report tabs in the same Exploration.

And in Looker Studio, you can create a filter and apply it at the chart, page or dashboard level. The latter is typically the best option, so you don’t have to remember to apply it each time.

Each of these methods removes bot traffic from reporting, but you may be wondering, “what if I want to prevent it from showing up in GA4 in the first place?” 

The only built-in mechanism for doing this is to filter traffic based on IP address(es). It is also possible to prevent GA4 tags from firing in Google Tag Manager (GTM) based on certain traffic attributes. Or you set a value for the traffic_type parameter in GTM that flags it as spam, then filter it in GA4. Setting up the latter approaches requires proficiency with GTM and possibly some custom JavaScript, depending on the combination of dimensions you are filtering on. The specific setup varies based on the attributes in question and is beyond the scope of this article to describe.

Creating an IP-based traffic filter

The actual feature in GA4 is called an “Internal traffic filter”, but it should have been named “IP address filter”, since it is filtering by IP address, internal or otherwise. Google’s documentation describes this pretty well, so I won’t walk through each step here, but a couple of words of advice:

  • Use an IP address range instead of a single IP address. It is fairly common for a router to be configured with a block of addresses and assign them dynamically to nodes in its network. So, a web server that has the address 192.0.2.34 one day might have 192.0.2.45 the next. I typically block a range of 256 addresses, which in this case would be done by specifying 192.0.2.0/24. (An explanation of how this notation works.)
    This does run the risk of blocking non-spam traffic from other addresses in the block, but that risk is fairly small.
  • Use a meaningful name for traffic_type – this will help you out a lot if you add more filters down the road. The value defaults to “internal”, but there is no reason not to give it a more descriptive name, for example “grets_bot”. 
  • Don’t forget to create a data filter after you’ve defined the “internal” traffic. This is described in Step 2 in Google’s documentation, but I find it counterintuitive that the setup takes place in two different places in the GA4 admin UI and missed this step the first few times I set one up.

Thanks for reading, and good luck bot hunting! It’s a skill we all may need to lean on a bit more in the future 😉

Share This Post

Share This Post

Page Value in GA4

GA4 doesn’t have a metric that is comparable to the Page Value metric from Universal Analytics. In this post, we will recreate the metric using GA4 data in Google BigQuery.  Google defined the UA metric as: Page Value is the average value for a page that a user visited before landing on the goal page or completing an Ecommerce

➔ Read more

master GA4

with personalized training

A training program like no other. Work with expert marketers for 1:1 training to maximize learning & impact.

have you registered?

Our next free digital marketing seminar is coming soon!

[MEC id="946"]