Data sampling is a hot topic when it comes to Google Analytics. Especially for websites with huge data sets (think of ecommerce with hundreds or even thousands of products), using the free version of Google Analytics often limits us to only be able to see a fraction of the data.
What is sampled data?
Data sampling is a statistical analysis technique to identify certain trends, insights or patterns in a larger data set. In practice, sampled data is a small subset of your total data in order to get a glimpse of the full, larger, data set.
Google's Universal Analytics generally uses data sampling for faster response. This will be used for both default and custom (or ad hoc) reports. When data sampling happens, GA takes a small fraction of the number of sessions and enlarges this fraction to the entire population.
The exact threshold is 500k sessions for Google Analytics Standard (or the free version of GA), and 100M sessions for GA360. However, we at Semetis have experienced this can be fewer when the complexity of your report intensifies, for example when adding additional dimensions, filters or segments.
This doesn't sound like a big issue initially, but the bigger your data set becomes, the bigger the discrepancy between your sampled data and your exact data becomes. When you need to report on exact metrics like Revenue, this can become problematic.
How to avoid data sampling
In practice, there are two ways of avoiding data sampling:
- You go for GA360, the paid version, and you practically never face data sampling.
- You use third-party tools that work around data sampling.
These third-party tools (like Supermetrics or Funnel.io) break down your entire data set in smaller parts of your data set. Supermetrics breaks down your entire query into separate "subqueries", while Funnel.io continuously imports your most recent data.
Unsampled data in DataStudio
Most of the time, when using Google Analytics in DataStudio, we use a Google Connector. This is the native integration of all Google tools with DataStudio. This native integration also exists with Google Ads, DV360, Campaign Manager etc.
When it comes to data sampling, Google DataStudio faces the same limitation to data sampling as Google Analytics does in-platform. DataStudio can "show" this data sampling as well. You can read more about that here.
But, as explained above, third-party tools can work around that. The most straightforward way is to use the Supermetrics tool for Google Sheets, via which you can check a box to avoid data sampling in your query.
When it comes to Google DataStudio, there is a Supermetrics Google Analytics connector. When adding Google Analytics to your report as a data source, you'll see you can select to avoid data sampling, just like you would do in Google Sheets.
You'll even be able to modify this with every chart you add to the report, under "Parameters".
Bonus: via the Supermetrics connector you can also add some reports you aren't able to add via the Google Connector, like Multi-Channel Funnels to report on Assisted Conversions.