We’re going to start with the mechanics of web tracking, which is important background and context for tracking and reporting on website visitor behavior in Google Analytics. Note that GA can also track mobile app behavior, but we are not going to discuss the specifics of that here.
What Happens When You Open a Web Page in a Browser?
Put another way, the web “page” you see in your browser is not a page at all, it is a collection of files that combine to create what you see in front of you.
If the website uses Google Analytics or some other form of web tracking, tracking tags are one of the things included in the source code of the page, and they create a record of your visit. As you browse the site, tracking tags continue to fire, sending additional bits of information to Google Analytics with each tracked interaction. This information includes, but is not limited to:
- A user identifier
- The Google Analytics ID of the site
- Your IP address
- Your browser type and version (which also indicates whether you are on a mobile device)
- The page you requested
- The previous page you visited, if you clicked on a link to get to the page.
Every HTTP request also includes a variety of other fields, detailed here: https://en.wikipedia.org/wiki/List_of_HTTP_header_fields .
Most tracking systems use a cookie to keep track of who you are between page views and visits.
A cookie is a small text file stored in your browser cache by a web server (mostly analogous to a website). When you make a request of the same web server, the cookie is included with the request. There is a common misconception that cookies can expose your private information to malicious websites. This is not true. The information in a cookie is only available to the site that set the cookie. In other words, if you visit xyz.com, xyz.com only gets to see cookies set by xyz.com.
Tracking tags often set their own cookies. So, for example, if you visit xyz.com, you might get an xyz.com cookie and a facebook.com cookie that gets set by a Facebook tracking tag. The Google Analytics tag sets cookies, but it sets them to the website’s domain.
Most of the time, a cookie is used to associate a visitor with a user account or a set of preferences. For example, if you visit amazon.com, Amazon identifies you with a cookie and presents personalized content. In this case, the cookie stores a unique identifier, but no personal information. Web developers do sometimes store personal information in cookies, but this is considered bad practice. Major players such as Google and Facebook never store personal information in cookies.
Below is an example of the contents of an actual cookie.
The domain name in the URL of a web page you request is the part that comes before a slash or question mark. It acts as a pointer to a domain name server, which maps the domain name to an IP address. Store.xyz.com and www.xyz.com are two different domain names. “store” and “www” in this case are referred to as third-level domains or sub-domains, because they are the third part of the domain name. A cookie is set to a domain. It can be set to the second level or below.
If a cookie is set to xyz.com, both www.xyz.com and store.xyz.com will receive the cookie when the user makes a request, since both are part of xyz.com. This matters for GA tracking, because GA will set a user-identifier cookie on the second-level domain by default. As long as the user stays within xyz.com, he/she is tracked as a single user. If he/she jumps to another domain, xyz.shopify.com, for example, GA will create a new user session unless configuration changes are made in the Google Analytics tag to link the two domains. These configuration changes are referred to as cross-domain tracking.
An IP (internet protocol) address is a group of numbers or numbers and letters that specify a location on the Internet. Anything that travels over the internet, such as a web page request or a sent email, gets routed to an IP address. Most of the time, an IP address can also be linked to a physical location, though often the location can only be known generally.
An example of an IP address: 18.104.22.168
Type “what is my IP address” in Google to see your IP address.
This example looks a lot different than the previous example. This is because there are two IP address systems currently in use. The second example uses the newer protocol, IPv6, which allows for a lot more addresses on the Internet.
If you are connecting through a business, ISP or mobile carrier, your IP address is assigned dynamically, and cannot be traced back to you. In other words, web sites can’t know who you are just from your IP address. They may, however, know where you work, and they definitely know what ISP or mobile carrier you use.
Paths and Querystrings
In addition to a domain name, a URL includes a path and a querystring.
The path is the portion of the URL that comes after the domain name, and is ostensibly the path to a document on the web server. Technically, the path doesn’t always point to a different document, but the idea is that the path points to a unique page of content, however it is served.
The querystring (AKA query parameters or URL parameters) is the set of parameters that come after the question mark in the URL. It passes variables to a web page/application. It is formed as name/value pairs.
In the example below, there are two querystring parameters: type, and disposition. This example might describe a scenario where a user has selected the filters “tabby” and “crazy” when viewing cat videos on the gallery page of catvideo.com. Don’t get too excited, this is just an example and not a real site.
The querystring is an important component of tracking, because most tracking systems have the ability to collect specific querystring parameters, if included. Google Analytics, for example, supports several parameters that can be used to track the source of traffic, called UTM parameters. We’ll cover these in more detail later, but in summary you can add UTM parameters to the URL you assign to an ad you create to identify visitors that click on that ad and visit your website.
Tracking Interactions Other Than Page Views
Other page interactions can be tracked as well, such as form submits or video views. In each case, it is a matter of getting the tag to fire when the interaction happens, and customizing the tag to provide information about the interaction.
There are tag management tools that make tag configuration and customization much easier, and may eliminate the need to write custom code. Google Tag Manager is free, and streamlines the process of setting up GA and other tags.
What This Means for Marketers
What has been described so far are the mechanics of how web analytics data is collected and sent to an analytics system such as GA. To recap, tracking tags send a series of messages about the characteristics and activity of each visitor to a website. The analytics system then aggregates the data into various data structures for reporting on a variety of dimensions and metrics. A dimension is a characteristic of a visitor or group of visitors, and a metric is a statistic that describes something about the behavior of a visitor or group of visitors.
Because a visitor is usually identified by means of a cookie, which is associated with a browser and a website, the visitor will appear as two people if he or she switches browsers or devices. This highlights a critical limitation of web analytics systems. According to the Internet Advertising Bureau, 78% of internet users browse with both mobile and desktop devices, and device usage is getting ever more fragmented with personal assistants, tablets & etc. Some analytics systems have capabilities to track people across devices, but none are completely effective. This means you have a limited view of how individual visitors behave on your website over time, which is a significant drawback when it comes to analyzing marketing data.
|Things that can be tracked||Things that can’t be tracked|
|You can track a lot of data about individual user sessions on a website, including where people come from, where they are, and what they do on your site.You can see visitor’s behavior when they leave and come back, as long as they are using the same browser on the same device.You can track how people interact with specific content/pages on your site.You can measure how many purchases happen, or contact form submits.||You can only track what you can tag. In other words, you can’t track users when they go elsewhere on the web. You can’t track personally identifiable information (PII).You can’t reliably track people over time who visit your site using more than one browser or device.|
People in the world of digital marketing often talk as if we are getting closer to perfect knowledge of customer behavior online, but in many respects, we are actually getting farther away.
Factors that inhibit understanding of customer behavior:
- Google is trying to keep users on Google.com by showing more information directly on the search results page. About half of Google searches don’t result in a click to another website. Facebook is also discouraging clicks away from Facebook.
- Amazon is almost completely opaque when it comes to user behavior.
- Browsing behavior is spread across more and more devices, making it harder to track.
- Increased focus and regulation regarding privacy is making it harder for user behavior to be stored and shared from one site or service to another.
- People are using ad blocking tools and browsers are adding features allowing users to prevent themselves from being tracked.
- Fraudulent and/or bot traffic pollutes the user data we can collect.
If we reconcile ourselves to the fact that we have at best partial knowledge of users over time, we can still use GA to form a more accurate picture, even if it is a bit blurry in places. This brings to mind the inimitable words of Donald Rumsfeld:
“There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don’t know. But there are also unknown unknowns. There are things we don’t know we don’t know.”
In the world of Google Analytics, the known knowns are things like what pages get viewed, and how many people complete a lead form. Known unknowns include the “whys” behind visitor behavior, and personal characteristics of visitors. Unknown unknowns are by definition unknowable, but certainly include nearly everything a website visitor does before and after a visit.