3rd party cookies
Cross-media

New GEMIUS Methodology in a world without 3rd party cookies

In implementing the gemiusAudience study (in Poland under the name Mediapanel), we repeatedly faced challenges on how to measure the number of Real Internet Users based on the identifiers we counted, which browsers allowed to be assigned in "cookies." From the very beginning, our methodology included elements that eliminated phenomena such as the use of several devices by one person, the sharing of one device by several people, or the phenomenon of cookie deletion. The discontinuation of support for 3rd party cookies (TPC) by browsers such as FireFox, or Edge, and Chrome's disabling of this functionality is another methodological challenge

Our response to the "world without cookies" is a new version of the gemiusAudience methodology:
JAR (Joint Apocalypse Response) - a comprehensive solution that will preserve the continuity and quality of gemiusAudience results even after the much heralded "Apocalypse" has already occurred. The new method also makes it possible to estimate the number of Real Users for other browsers that have disabled the mechanism for supporting 3rd party cookies earlier than Chrome.

How to re-estimate the number of Real users?

The gemiusAudience study is based on a hybrid methodology, the two main components of which are the Site-centric Audit and the Panel Survey. In both areas, 3rd party identifiers were the source of user information. The new reality without 3rd party identifiers poses two challenges:

  1. How to estimate the number of Real Users for domains audited based on site-centric data?
  2. How to recruit and monitor panelists in cookie panel research panels to determine the socio-demographic profile of Internet users?

Site-centric Audit and the Panel Survey

Number of Real Users is a key metric indicating audience size. It can be estimated from survey samples and from full surveys, i.e., measuring within the entire population. The latter approach eliminates statistical error, which, for medium-sized or small publishers or ad campaigns, can make the results statistically insignificant. That's why Mediapanel's survey, for publishers since 2004, has a component called Site-centric (Audit), which makes it possible to accurately measure every page view, every user contact with a publisher's site.

The "Real User" algorithm used to estimate the value of the RU from the Audit data consists of two components:

  1. The Browser Instance Number ("BN") estimation component, which, based on the collected identifiers, after eliminating cookie deletion and taking into account non-cookie traffic, determines with how many different browser instances users have visited a domain.
  2. An estimator of the number of users who used these browsers.

RU values are calculated for the following segments:

  • Group of domains
  • Domain/Application/Audio/Video Player.
  • Service on domain

The change in Chrome support causes the audited portion of the Internet in the measurement to split into as many ID spaces as there are domains/apps in the study. This makes it necessary to adjust the algorithms that count duplicate users between domains within the Owner Group. In addition, the transition in our method to 1st party identifiers, forced a change in the method of eliminating cookie erasability. Erasability of 1st party identifiers (has significantly different characteristics than 3rd party identifiers.

In the new version of the Browser Number component, the algorithm has been divided into 4 phases of calculation:

  • For one domain in one day
  • For a group of domains in one day
  • For a domain over a period of - for example, a month.
  • For a group of domains over a period of - for example - a month.

For a single domain on a single day, we switched the calculation from 3rd party identifiers to 1st party with the change in characteristics between these sets.

new version of the Browser Number component

In order to eliminate duplicate coverage within a group of domains, we have developed a method that, based on the designated set of characteristic IP addresses and the Browsers' Number values for each domain counted at an earlier stage, determines the total value of Real Users for the ownership group. The Browsers' Number (Browsers' Number -BN) value estimator for a group of domains determines the % complement values of the number of browsers for each domain in the group and assembles this information into a single BN value for the group.

In the case of estimating the number of browsers for a domain per month, we needed to solve the problem of erasability of 1st party Cookies (FPC) identifiers over time. The longer the period, the higher the probability that a given identifier can be erased and replaced with a new value. We adapted the developed model to the characteristics of the erasability of 1st party identifiers.

The final stage of audience size calculations for the ownership group is a composite of the duplication estimation method and the elimination of identifier deletion. The audit data prepared in this way are passed on to the further stages of modeling the data from the research samples as reference values, which reduce statistical error and enable precise analysis of reach and contact frequency even for small domains or campaigns.

Changes in the panel survey

The Mediapanel survey is based on 3 types of research panels:

  1. Cookie Panel - the most numerous, but covering only audited domains
  2. Software Panel (PC, Tablet and Smartphone) - covering all domains and mobile applications, but significantly smaller in number than the Cookie Panel - requires installation of the Gemius measurement application.
  3. Hardware Panel - a sample of people, equipped with Gemius metrics to measure Internet, Television, Radio and contact with outdoor advertising media (traditional OOH and DOOH)

graf_4

Cookie Panel - closely related to the Site-centric Audit - are randomly drawn individuals who completed recruitment surveys emitted while using the sites of the publishers included in the Audit.  Such a person became a panelist until, for whatever reason, he or she deleted the 3rd party ID through which we linked his or her activities on different domains.

The change in Chrome revised our approach to creating and maintaining the Cookie Panel. The first step was to adjust the emission of recruitment surveys to the panel using 1st party IDs. As the survey is assigned to a 1st party ID, the activities of the person who completed the survey are only from the domain on which our panelist was recruited. In order to attribute to him his measured activities on the other audited domains, we created a model for combining 1st party IDs coming from the same device, but from different domains. 

For this purpose, we create a classifier that estimates the probability of origin of two 1st party identifiers from a single device, based on their distribution over time of appearance in different characteristic subnets (IP addresses). Compounding the identifiers allows us to determine the desired probability. The classifier is built in a learning model, where the training data comes from our test panels (Software and Hardware).

We verified the effectiveness of the classifier with current data from Chrome, for which 3rd party identifiers are still available. In the future, evaluation of the quality of the reproduced identifiers will be based on data from our research panels equipped with Software and Hardware meters. This situation underscores how important a role good quality research panels will play in the future.

Real User ID - linked 1st party identifiers derived from a given user browser instance

The relation of being paired on the same device is a transitive relation. If A and B are from the same device, and B and C are also, we conclude that A and C are also in a relationship. This property allows different 1st party identifiers to be combined into a single set, without the need for a high probability for each pair. The more domains covered by Mediapanel's audit survey, the higher the probability of using this property of the algorithm.

Such a collection of linked 1st party IDs is called the Community of Real User 1st Party IDs (CRUD). By giving each CRUD an ID, we create a Real User ID, a probabilistic ID that has the characteristics of a 3rd party ID.

For the gemiusAudience study, we select only those Real User IDs that have a certain demographic (completed recruitment survey), i.e. belong to one of our thousands of cookie panelists. In this way, we maintain the Cookie Panel as a valid source of data for the Mediapanel study.

Maintaining the high quality of the survey results

Adjusting the algorithms that estimate the value of the number of browsers and developing a method that rebuilds the Cookie Panel, allow us to maintain the existing quality of Mediapanel results. Despite the withdrawal of 3rd party cookies from the Chrome browser, we are still able to provide information about the Internet audience thanks to the research panels we have, the ubiquity of auditing and our experience in data modeling.