Revology Analytics

View Original

The Merits of Aggregated Data for Demand and Price Elasticity Modeling

Challenging the Norms with Retail Chain-level Data: A Practical, Efficient, and Robust Approach to Predict and Explain Retail Sales

Traditional sales and price elasticity modeling methods in the Retail, CPG, and Distribution industries have relied heavily on detailed store (or warehouse) and product level models to estimate critical factors in pricing decisions like baseline sales or regular and promotional price elasticities.

These approaches have been around for the past few decades. While they have contributed significantly to our understanding of price-volume dynamics for Retail and Consumer Products industries, they present substantial challenges and limitations.

The primary issue is data access for Distributors and CPGs: disaggregated, store-level data is available for POS data aggregators like Nielsen and IRI, who have traditionally charged a hefty sum for things like Price Elasticity models or Pricing Scenario Analysis tools. Even if companies can access store-level sell-through data, there are issues related to data quality, storage, and the complexity of ongoing, large-scale data management. 

We propose reconsidering the data we use for Price & Promotional Elasticity modeling and related Demand Models (e.g., Baseline vs. Incremental Sales). Instead of focusing on detailed, disaggregated data, we advocate using aggregated, Retail Chain level (or often Retail Chain and Region level) data. 

In terms of modeling techniques, we advocate for adopting modern ML approaches, moving away from derivations of log-log regressions or mixed effects linear models that many data providers still rely on.

Our proposal is nothing new, albeit still not widely accepted in the Retail and CPG communities. Kurt Jetta and Erick Rengifo wrote an excellent paper over a decade ago titled "A Model to Improve the Estimation of Baseline Retail Sales," which supports our point of view on this topic. 


Disaggregated Data and Traditional Models: A Critical Review

For decades, the CPG and retail industry has extensively relied on detailed store and product level data for sales and price elasticity modeling. Many predictive models still in use by industry and consultancies have been derived from old-school models like "Scan*Pro" and "PromotionScan," traditionally used by Nielsen and IRI. 

These models have contributed significantly to shaping the field of Revenue Growth Management and pricing analytics, providing actionable insights to inform pricing and promotional strategies and marketing tactics.

However, the reliance on these models and the use of disaggregated data presents several limitations:

  1. We often observe sharp increases in modeled baseline sales during Price Promotions, which should not be the case. Baseline sales are independent of promotional activity and are primarily a factor of brand strength, distribution depth/breadth, and seasonality.

  2. Store-level data have high dimensionality and multi-collinearity that are hard to control, even if you use regularization. 

  3. Most small- to mid-size CPGs don't have the budget to shell out $100K-150K on a 3-month Price Elasticity Study using store-level disaggregated data.

  4. Most retail execution for pricing, promotions, and marketing are retail chain (i.e., Harris Teeter) or chain-market level (Kroger Atlanta). Therefore, store-level modeling is unnecessary and doesn't align with how execution or management thinking occurs. 


Aggregated Data: An Overlooked Resource

In contrast to the highly granular, store-level data that has dominated the field, we propose a sharp turn towards aggregated retail chain-level data. This logic also applies to the distribution industry, where more aggregated methods will suffice instead of complex models built at the Distribution Center - Product level (i.e., Regional or National models).

As mentioned above, modeling demand and price elasticity at a higher, aggregated level aligns with management thinking and accountability, and model accuracy is often as good or better than models built on granular, more complex data. 

Furthermore, using aggregated data offers several practical advantages:

  1. It is more readily available and less costly to acquire and manage than disaggregated data.

  2. It allows for greater modeling flexibility, reducing processing time and computational resources. You can build sophisticated price & promotion elasticity models in a matter of hours or a few days vs. the usual 1-3 month long process that has characterized these modeling efforts.

  3. Models and insights are available for smaller CPGs and retailers, who otherwise wouldn't have the budget or the resources to procure complex models.

  4. It allows for complete in-sourcing of all demand and price modeling efforts, building critical, sustainable capabilities for Revenue Growth Management instead of relying on 3rd parties.


Modern Machine Learning Models: A Superior Alternative

Having reconsidered the level of granularity for our modeling data, we now turn to the models we'll use for demand and price elasticity modeling.

We argue for a departure from traditional linear models (including mixed effects) towards more modern but fundamental machine learning approaches. We explicitly advocate using log-log models with regularization through methods like ElasticNet, and tree-based models like Random Forest and Gradient Boosting Machines (GBMs).

These models offer many benefits, including improved accuracy, enhanced modeling efficiency, and robustness against overfitting and multi-collinearity.

Regularized regressions like ElasticNet have been recognized for their superior performance, especially in scenarios with high multi-collinearity and dimensionality, which are common in Retail and CPG-type data sets. These methods work by adding penalty terms to the model, which helps to control model complexity and prevent overfitting. It also helps improve model generalization, allowing it to perform substantially better on new, unseen data than traditional regression models.

Meanwhile, tree-based ensemble models like Random Forest and GBMs are adept at capturing complex, non-linear relationships between variables. Random Forest, for instance, works by creating hundreds or thousands of decision trees, outputting the average prediction at each branch of the tree (analogous to "the wisdom of the crowd").


In-Sourcing key Revenue Growth Analytics Capabilities

The traditional reliance on 3rd parties, store-level disaggregated data, and traditional regression/econometrics approaches have shaped our understanding of price-volume dynamics. However, these methods are fraught with high costs, data quality issues, storage complexity, and misalignment with practical management strategies.

There is a pressing need for industries like Retail and CPG to rethink their approach and embrace aggregated data for modeling. This shift is not merely about cost efficiency and alignment with price or promotional execution. It's about breaking down barriers that prevent smaller firms from effectively leveraging these valuable insights and advanced capabilities.

This shift should extend to the modeling techniques being used. Modern ML approaches, such as regularized regressions and tree-based models, promise improved accuracy, enhanced modeling efficiency, robustness against overfitting, and the ability to capture complex, non-linear relationships.

However, to truly unlock the potential of this shift, companies should look towards building these capabilities in-house. Outsourcing to 3rd parties can result in significant expenditure and dependency. In contrast, in-sourcing fosters a culture of self-reliance while simultaneously building critical, sustainable capabilities for Revenue Growth Analytics.

In-sourcing also allows companies to better control and understand their models, leading to more customized, flexible, efficient, and relevant price analytics solutions that align closely with their unique business needs and strategies. Furthermore, in-house analytics or data science teams can react more quickly to changes in business needs or market conditions, providing a competitive advantage in today's fast-paced Retail and CPG landscape.

The transition towards aggregated data, modern ML models, and in-house advanced price analytics capabilities may be a departure from traditional industry norms. Still, it signifies a promising path toward a more effective, efficient, and sustainable Revenue Analytics future.

By embracing this change, retailers, CPGs, and their merchandising, category, and pricing teams, can potentially redefine their competitive positioning, offering superb service and products to their B2C and B2B customers, respectively.