The launch of GA4 has been a bumpy ride. GA4 is a significant change of the previous version of Google Analytics, Universal Analytics. It introduces new features, data modeling and reporting that require a significant learning curve for users. Additionally, some users have reported technical issues with GA4, such as data discrepancies, inaccurate tracking, and delays in data processing. These issues have caused frustration and concern among users, who rely on accurate data to make informed decisions. One of these issues is cardinality.
What is cardinality?
Cardinality is the number of unique values that are assigned to a dimension. GA4's data modeling is based on an event-driven data model that assigns a unique identifier to each event that is sent to GA4. If a particular data field or column has high cardinality (i.e. a large number of unique values), this can result in a large number of unique event IDs, which can cause delays in data processing and lead to data discrepancies.
To overcome these high cardinality dimensions, Google is grouping dimensions in the (other) row. This row has been dreaded among marketeers and data analysts for a while now. For example, if a GA4 property has a high cardinality for the "Item name" data field, which contains unique items on an ecommerce website, this can result in a large number of unique event IDs being generated.
In the example below, you can see how this looks in practice: the number of unique event IDs being generated for the event Items viewed and Items added to cart are too high for GA4 to be able to process, which results in some values for the Item name dimension being compressed into the (other) row. This is suboptimal for accurate reporting.Item name dimension compressed into (other) row due to number of unique event IDs being too high.
What are the solutions?
Given the outrage on this cardinality issue, it is clear that compressing results into one row has been one of the biggest issues that GA4 is facing. Up until now, Google can't seem to find a proper way of addressing high cardinality within their standard reports. Before looking into finding a solution for cardinality, let’s have a look at GA4’s 4 key reporting surfaces:
1. Standard reports: GA4 has several standard reports that can be accessed from the left-hand navigation menu in the interface. These reports include Acquisition, Engagement, Monetization, Retention, Demographics and Tech.
These are the reports that are generally subjected to cardinality. If you are looking to overcome cardinality specifiically, other reporting surfaces are available:
2. Explorations: An exploration can be created directly from standard reports by opening the data quality icon and clicking Create an exploration.Creating an exploration directly from the Standard Reports in GA4.
This option creates an exploration with the same query applied at the lowest sampling rate. Indeed, in Explorations data sampling might appear, which brings us to one of the core issues we had in Universal Analytics.
3. API: You can of course also choose to send your data to a dashboarding solution like Looker Studio. However, sending data via the API doesn’t seem to be the best solution to overcome cardinality, as your data might be blocked by an Google Analytics API quota. Supermetrics has written an interesting article on how to overcome this.
So it seems that while Explorations and exporting your data via the Google Analytics API might solve your cardinality issues, it fails to offer a suitable solution to the fundamental aspect of any analytics tool - obtaining precise data that enables users to make well-informed decisions.
This brings us to the last reporting surface GA4 has to offer, without having to move to a 360 license:
4. BigQuery: Linking your GA4 Property to BigQuery Export and requesting the same data in BigQuery seems to be the most future-proof way of avoiding cardinality up to now. BigQuery GA4 data is not affected by data sampling and cardinality because data is stored in a different way. Data is stored in a raw format, which means that each individual event is saved as a separate record, rather than being aggregated into pre-defined tables.
In conclusion, while GA4 is a free tool, it does come with data processing limitations that may hinder your ability to analyze your data in the way that you want. To achieve an unsampled dataset without cardinality limitations, you may need to consider using a cloud-based data warehouse like BigQuery or investing in a 360 license, both of which may come with additional costs.