When it comes to testing products, innovations, communications or packaging, most research agencies, and their clients, swear by norms databases. This is a practice that has gone on for decades.
Likewise, at PRS IN VIVO, we developed our own norms databases. Since the early 1970s, we have meticulously filled in our performance KPIs for all the thousands of tests we conducted - country by country, category by category. Why? First, because it brings great insights, but also to answer the inevitable question, "How is my pack/product/innovation performing against the PRS IN VIVO database?"
Are all norms databases created equal?
When Norms database are both well-built and reliable, they can be incredibly useful. That is determined by its internal and external validity.
To be a well-built database, it means its INTERNAL VALIDITY is ensured:
- Clean measurements;
- Executed consistently with controlled samples; and
- Well-classified by type of test and by category.
To be reliable, we are referring to its EXTERNAL VALIDITY: the predictivity of the database in relation to what will happen "in real life."
When Norms meet these two criteria, they can be used for many things, such as:
- Conducting transversal analysis and generating deep learnings within categories;
- Learning how to convey messaging in a simple and clear way;
- Reassuring Insights professionals and marketers in their decision making;
- And, most importantly, if their external validity is strong, correctly predict product/packaging/communication performance before launch
Understanding internal validity Vs external validity in today's marketplace
While internal validity is generally assured by most of the better research agencies, external validity is less assured – especially in periods of turbulent times when prior norms may not be a good yardstick for today.
For example . . .
How should we value/compare all the data points measured before COVID and before the recent spike in inflation? Consumer habits are changing dramatically, and society is undergoing profound changes. Simply put: is a metric obtained in 2007 still relevant for today? What about pre-Covid? Pre-inflation?
Can we say that an innovation - that produces better results than a series of tests conducted before Covid/inflation – will succeed in a market that is dramatically different today, when consumers’ needs and expectations are changing?
Here’s a wonderful anecdote that was shared with us by a new client. Prior to our relationship, their marketing and sales team met with their Insights team to review the results of an important innovation launch. The marketing team asked the research agency: How does purchase intent compare to your norms?
The research agency said: Congratulations, you are in the top quintile!
A salesperson then spoke up: Oh great . . . “in the top quintile” . . . just like that product we launched last year that was a complete flop.
The salesperson, whose ultimate measure of success is always: did the product sell well is, ultimately, the right metric to focus on. The success of an innovation is not its performance against a database. The real marker of a new product’s success is its performance in the market.
Are norms misleading then?
Not necessarily, but before jumping headlong into using them, it is important to ensure their internal validity and, above all, their external validity.
1. Internal Validity
Here are some critical questions to ask your research agency to ensure the internal validity of its Norms. For example:
- Number of cases: If the answer is below 30 cases, the variability of the data is often too significant to guarantee the stability of the benchmark. Thirty cases, as a base, allows for a minimum benchmark, but for a true norm or for more volatile variables, it is recommended to have at least 50 cases;
- Countries/regions; consumers use scales very differently from one country to another, some being more optimistic than others. Benchmarking a test in Brazil vs. Germany is meaningless and will certainly lead to biased conclusions;
- Category; consumer engagement measures can differ greatly from category to category;
- Sample; data points need to be comparable; e.g., you can’t compare brand loyalists against a national representative sample;
- Brand strength; one single brand can have different strengths by country. It’s important to know what the norms are made of, as the tested brands and the countries where they were tested will impact the norms levels.
2. External Validity
It’s critical to note, however, that good internal validity can be misleading and can give the impression of reliability. Without good external validity, Norms can be counterproductive and even dangerous because it can lead to very bad outcomes.
In consumer research, external validity is usually synonymous with predictability of results. In other words: will good performance against norms accurately predict what will happen in real life?
Possibly, but it's all about measuring the right KPIs.
Attitudinal KPIs such as purchase intent are notoriously bad at predicting consumers’ real behaviors, yet still widely used by most research agencies because they are simple to measure, understand and communicate. As a whole, however, the research industry learned long ago there is no strong correlation between what a consumer says s/he will buy versus what they will actually do. Still, some research agencies continue to use purchase intent as the main KPI because they believe they’ve corrected for various biases. But the starting point is wrong1. This has been proven time and again.
The truth is no attitudinal measure is very predictive. Asking consumers to project usage or purchase can provide interesting insights, but they are not very reliable for external validity. For the sake of common sense, we obviously cannot expect a consumer to know how many times a week in the next 12 months they will buy a product not yet released.
Behavior never lies
To ensure reliable measurements, PRS IN VIVO uses the most predictive KPIs available: actual consumer behavior; not what consumers tell us they would do in a hypothetical context. Our databases are built on tens of thousands of observations of consumers in real-life shopping situations. Our Retail Labs™ and methodologies are rooted in behavioral sciences to observe and track all the actions made by consumers in the most realistic environments. Using real-time eye tracking, live observation and video recording, we carefully analyze how consumers navigate the aisle, the category, the shelf, de-select and select products, discover what they read on pack, and measure the speed at which a purchase is made, etc.
While we have always recognized that this environment isn’t the actual retail environment, real world sales outcomes have proven this methodology’s success for over 5 decades.
Hence, when PRS IN VIVO tells you that your product is in the top quintile of our database, the measurement is both robust AND reliable.
Is a well-constructed and reliable database enough?
It's a good first step, but more must be done to assure reliability.
This is where the question of benchmarking comes in. Comparing the performance of an innovation or a packaging design change to previous tests is fine, but among these tests, we also compare/analyze the products in our norms to determine: how many were successful, how long did the innovations stay on the shelf? Did they reach the sales volume our clients were hoping for?
As the salesperson in the anecdote above rightfully stated, the only real benchmark that matters is market success. That's why at PRS IN VIVO, we always measure the shelf performance of competitors. That way, we can tell you not only how your product is performing against all our norms, but also how it is performing against your category today.
Given that today’s economic/political/social environment is not like it was a few years back, not only is this the most reliable measurement available, but it's relevant to everyone in your company: from the marketing manager to the salesperson in the field.
Norms databases are very powerful, but you must be very critical about how they are built, and careful about how they are used.
If you have the choice, always favor approaches that observe real consumer behaviour rather than asking consumers to predict future hypothetical behaviors. As much as possible, benchmark yourself against the reality of your market today, not against old tests carried out, some of which date back to before Covid and inflation. Only then can you be really certain of its reliability.
To learn more about our unique behavioral science-based, anchored-in-reality approaches, you can reach out to us below.