Before deploying, algorithms need to learn based on historical and competitive data. The model (or the algorithm) analyzes every single variable that impacts sales, such as pricing and traffic, during the learning stage.

Once the training is over, and the algorithm makes accurate predictions which are later proven by real results, the model is ready for a pilot and, if the retailer is satisfied with the outcome, for further usage.

Very often, retailers have either incomplete, difficult to extract or ill-structured data in a wrong format. The article covers the ways machine learning deals with insufficient data in retail.



Causes of Missing Data

  • The format is different from before. New internal systems, IT solutions, as well as the difference in data collection methods (whether it is by day or by transaction) can cause such a difference.
  • The data was initially collected for other purposes. For example, for top management to pay bonuses to the Category Managers — such data is not eligible for the algorithms.
  • The retailer has not been in the market long enough. As a result, the initial sales are nearly entirely reliant on the site traffic, making it difficult to analyze how prices impacted sales during that time frame.
  • The retailer has sales data for various departments or brands for short time periods — algorithms cannot work properly due to that mixed sales data.

If the data is incomplete, retailers can either attempt to use everything they have or simulate the missing data.

Working with Existing Data

Retailers have to merge all of the data into one format. Also, if a retailer has already collected some data, but then new data is added based on other factors, for example, competitive prices, the business needs to wait for nearly a year to start collecting fresh data.

Another way is to purchase the missing data.



If there is no way to obtain the necessary data, the algorithms can use data modeling methods to simulate it.



Although such models do not make entirely precise predictions and require more time for training and more data to be modeled, they are effective and retailers use them extensively.

Lost Data Simulation

The model can use the current data of a specific variable to define potential missing values of other variables. For example, if a retailer’s prices and sales history spans the past two years, while their competitor’s history of prices spans the past year and a half, a simulation can help restore the missing data about the competitive prices.



Businesses use classifiers to resolve such issues. They define potential values through various independent variables that have data.

There are two ways of “smart” data simulation with the help of classifiers:

The Predictive Model

The existing data is split into two groups:

  • the current data is used as the train set;
  • the missing data is used as the forecast goals.

A binary classifier helps to understand whether an event happened; for instance, whether the products had been on the shelf. A categorical classifier appoints a product to a specific segment, for example, a price segment.

The KNN (k-nearest neighbor) Method

This approach “restores” the missing values based on the “closest” variable. The approximate distance between them helps to understand how similar the variables are.

The churn predictor is the most known example of a classifier; it depicts the probability of customer churn for either a retailer or a service company. The five most commonly used classifier types include logistic regressions, decision trees, neural networks, boosters, as well as Random Forest.

As soon as the values that were missing are predicted, regressors, which are another type of algorithms, are used to predict sales. They do not predict a segment or the probability, but rather the numerical value, which in retail is sales.

Linear and polynomial regressions, neural networks, regression trees, and Random Forest are the most commonly used regressor types.

Machine Learning to Work with eCommerce Data

Retailers managing significant amounts of data can use the power of neural networks to suggest stock-ups or prices and boost sales. If the data to train a neural network is insufficient, businesses can use other algorithms which require fewer data.

For example, if a retailer has the sales history of only 30% of their products, as well as limited traffic and scarce sales, they can use one of the three algorithms to predict sales at the product level: XGBoost, LightGBM, and CatBoost. In this case, an active sales history of no fewer than 150 days is sufficient to suggest optimal pricing. As such an algorithm usually does not factor in the interdependence of prices of various products, it is mostly used for KVI products. Meanwhile, the rest of the products can be managed through rule-based pricing.

Retailers use regression to calculate the price elasticity for up to 30 products by adding 3 to 4 variables. Regression also helps to make high-level decisions such as whether or not there is room for a price increase.

SVM, or support vector machines, is an example of linear or polynomial regression. The algorithm does not recommend the price which can increase sales or margin but shows the trend.

A/B testing is yet another approach which helps to eliminate the problem of incomplete data. It uses both mathematics and statistics. Retailers who are new to the market can benefit from it to analyze how both ads and prices influence sales.

Conjoint analysis is an example of such an approach. Retailers use it to define the most effective price-promo-ads combinations based on a small amount of data. Also, it shows how each of these factors and their optimal values contributes to the effectiveness of the combinations.


Retailers use a variety of approaches “to restore” missing data, recommend prices or predict sales based on small amounts of data — from lost data simulation, including such tools as logistic regressions, decision trees, neural networks, boosters, as well as Random Forest, to machine learning algorithms.

Meanwhile, historical data collection and processing remains the most effective tool as it allows for training neural networks and thus to ensure much more reliable predictions than other methods.