Demand Planning: Time Series Forecasting Models
How to build statistical forecasting models in the retail sector
I would like to give detailed information about Walmart dataset in Kaggle. My motivation in the project is that a good forecast helps us make better decisions about the future. Hence, I will be talking about how to build forecasting models in the retail sector.
Demand planning is part of the supply chain management process. Predicting future demand is one of the most valuable activities the organizations can undertake. A demand plan’s impact is felt throughout the business, from sales and marketing to manufacturing and distribution. When forecasting models are built correctly, demand planning can position the company in a great position to deliver superior customer service while meeting their financial objectives.
The failure of a statistical forecasting and planning approach could lead to serious problems, which result in a broad range of issues such as
- missed deadlines (unsatisfied customers)
- lost credibility (damage its reputation)
- hold much inventory (can lead to waste as well as financial losses)
To reduce the high costs of failure, the businesses need to utilize historical sales and adopt a sophisticated demand planning strategy that leverages data and marketing insights. This strategy starts with building accurate forecasting models. These forecasting methods are as follows:
- Moving Average
- Simple Exponential Smoothing
- Double Exponential Smoothing
- Triple Exponential Smoothing
Before jumping to forecasting models, I wanted to perform exploratory data analysis to understand how I could use the data for the forecasting models.
Data Analysis
Training dataset includes historical weekly sales starting from Feb 2010 until Nov 2012. The table below represents 10 observations from the training dataset.
The dataset includes historical sales data for 45 Walmart stores located in different regions. Each store obtains many departments (81 in total) and my objective was to predict the department sales for each store in an accurate way. Firstly, I analyzed the total sales of the departments during year 2011. According to the plot in Figure 2, the top 5 departments are ‘dept. 92', ‘dept. 95’, ‘dept. 38', ‘dept. 72' and ‘dept. 90’. Although there is no detailed information related to departments, based on my few conversations with Walmart associates:
- department 92 — dry grocery,
- department 38 — apparel,
- department 72 — electronics,
- department 95 — beverage and snacks,
- department 90 — dairy products
Afterward, I checked the total sales of the stores within year 2011. It demonstrates that the top 5 stores are ‘store 4’, ’store 10‘, ’store 13', ‘store 14’ and ‘store 20’.
Besides, I performed a granularity analysis of the store & department dataset. (the line graph in Figure 4 indicates weekly sales of department 72 for store 4). The prominent holidays (the three largest of which) are the Super Bowl, Thanksgiving, and Christmas. The three holidays fall within the following weeks in the dataset.
- Thanksgiving — 25 November 2010
- Christmas — 25 December 2010
- Super Bowl — 6 February 2011
- Thanksgiving — 25 November 2011
- Christmas — 25 December 2011
- Super Bowl — 5 February 2012
Based on the plot, ‘Thanksgiving’ and ‘Christmas’ played a significant role in the growth of sales. (the weeks which obtained ‘Thanksgiving’ and ‘Christmas’ were the highest points within the dataset). The third top point demonstrates the effect of ‘Super Bowl’ effect.
Forecasting Approach
To build efficient forecasting models, I need to answer two questions.
- Which stores and departments play a significant role in terms of business?
- Is the data of each ‘store & dept’ sufficient to build a forecasting model?
According to the store and department plots above, I found the top departments and stores that contributed to 95% of the total sales. Then, I checked the length of time-series data for each store & department and realized that weekly sales data of each department at stores differentiate from each other in a given dataset. My forecasting model does not give any forecast for any store & department combination whose data size is below 13 weeks. Regarding the size of weekly sales data, I implemented different forecasting methods. For instance, to run Holt Winter’s method, I need to have sufficient data to divide actual demand into level, trend and seasonality components. Hence, I implemented Holt Winter’s model into store & department datasets which have more than and equal to 104 weeks (full two years data). Based on the modeling framework, my holdout period is the last 13 weeks for each SKU and I provided 39 forecasting periods for each SKU.
Afterward, I checked the forecasting lag of each store & dept since there have been data gaps in the forecasting model. For instance, department 47 in store 2 has a training dataset until 20 July 2012, yet the starting forecasting period is 14 December 2012 (147 days forecasting lag). Data scientists have to be careful and know when their training set ends and the true forecasting period starts. Thus, I eliminated the datasets which have more than 7 days forecasting lag at the initial step. (consider it in later phase)
Pre-processing
When I checked the dataset, I observed negative and zero sales values which represent returns from customers. There are 1258 negative sales values and 73 observations did not have any sales in the train dataset. To build a sufficient forecasting model, I replaced the negative sales of the departments at each store with an average of the department’s sales at each store. Then, I modified the data type of ‘date’ column as ‘DateTime’ to use dates efficiently in ‘pandas’ package. Besides, I created a unique SKU code by combining the store & departments. In addition to that, I removed the attributes which are not essential for statistical forecasting models such as holidays, promotions, CPI and unemployment. These would be leveraged in future machine learning model development.
Statistical Forecasting Models
Statistical forecasting methods are intuitive to build and implement on demand planning processes. The goal is to isolate patterns in past data. The patterns that arise most often are:
- Trend — Stable increasing or decreasing pattern.
- Seasonality — Pattern repeated at fixed intervals (weekly, monthly, etc.)
- Cycles — Pattern repeated cyclically at variable intervals.
- Randomness — Random pattern with no recognizable structure.
Figure 7 demonstrates the idea of forecasting models visually separating historical data into several components. It represents the historical sales of department 94 in store 4.
Moreover, each forecasting model has different parameters such as moving average (ma), alpha (level), beta (trend), phi (damped trend), and gamma (seasonality) which are hyper-parameters that need to be tuned in the training set. They are optimized by running the whole job, checking the accuracy, and adjusting to improve the system. Hence, I applied hyper-parameter tuning by minimizing the loss function (error rate).
1.Moving Average
The first forecasting model I developed is the simplest. The main idea is that future demand is similar to the recent demand observed. With this model, I will assume that the forecast is the average of the demand during the last n periods.
Based on the formula above, I ran a moving average model for SKUs with low sales volume. Since these store and department combinations contributed less to the total sales, I decided not to experiment with the later techniques presented in this article. I optimized a ‘MA’ value for each SKU by maximizing forecasting accuracy. I would like to illustrate one example which belongs to department 59 at store 30. For this SKU, optimal MA value is 6 which means that the model considers the last 6 periods to predict the next period. Although the moving average model works well on average (accuracy is 75.2%), the model has a difficulty to understand the sharp fluctuations. In other ways, it ignores complex relationships in data.
2.Simple Exponential
It is one of the simplest ways to forecast time series data. The model will be able to learn from historical sales. It will then forecast the future demand as its last estimation of the level. The underlying idea of the algorithm is that, at each period, the model will learn a bit from the most recent demand observation and remember a bit of the last forecast it did.
According to the given formula, I performed a simple exponential model for some SKUs whose data size is below 52 weeks. When implementing the model in Python, I optimized ‘alpha’ value for each SKU by maximizing its forecasting accuracy. I would like to demonstrate one example which belongs to department 96 in store 17. The optimal alpha value is 0.6 which points out learning more from the most recent demand observation than the last forecast. This resulted in 83.6% forecast accuracy. Even though the simple smoothing model identifies the pattern, there is a still lag between the predicted and actual value.
3.Double Exponential
A major issue known from simple exponential smoothing methods is a lack of identifying a trend in data. The trend can be described as the average variation of time series level between two consecutive periods. (Remember level was the average actual value around which the demand varies over time). This new model will predict the trend based on weight which is beta (β) giving more or less importance to the most recent observations.
The fundamental idea is that each component (level and trend) will be updated each period based on two information:
- The most recent observation of this component
- The previous estimation of this component
Using the given formula above, I performed a double exponential model for some SKUs whose data size is between 52 and 104 weeks. I carried out tuning ‘alpha’ and ‘beta’ values by maximizing forecasting accuracy. I would like to demonstrate one example which belongs to department 93 at store 16. The optimal parameters for alpha and beta are 0.4 and 0.2 respectively. Although the historical sales fluctuate substantially, the double exponential model works well on average. (the forecasting accuracy is 90.7%)
4.Triple Exponential
Simple Exponential and Double Exponential models do not recognize seasonal patterns and thus can not extrapolate any seasonal behavior in the future. To learn a seasonal pattern, we will add a third component to exponential smoothing. Firstly, the level and trend are deseasonalized. Then, seasonality factors are added into the general formula.
The fundamental idea is that the forecasting model will learn multiplicative seasonal factors that will be implemented for each period inside a full seasonal cycle. Then, the forecast is comprised of level (a) plus damped trend (b) multiplied by seasonal factor (s). The advantage of the damping factor is to drastically reducing the impact of the trend for the future forecast. With the damped trend, the model will forget the trend and the forecast will remain at a stable level.
In the light of all formulas explained above, I ran a multiplicative triple exponential model for some SKUs whose data size is above 104 weeks by using 52 weeks seasonality length. To maximize forecast accuracy, I implemented hyper-parameter tuning to decide optimal ‘alpha’, ‘beta’, ‘phi’ and ‘gamma’ values. I would like to indicate one sample which represents department 94 in store 8. For that, the optimal parameter combination is (alpha=0.4, beta=0.2, phi=0.2, gamma=0.2). Based on that, the model explores sales giving more importance to the prior recent observations. The forecasting accuracy of this SKU is 96.45%.
Results
After creating the forecast model, the next step is to quantify its accuracy. Mean Absolute Error (MAE) % is a very good indicator to measure forecast accuracy. This error gives more insights than other KPIs. (considering the average absolute error over the average demand as shown in Figure 17). The problem with other KPIs such as Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), they are not scaled to the average demand.
Based on this formula, I created a final result table that shows the forecasting error and accuracy along with the associated model for each store & department combination. The below table is just a sample output among 1776 store & department combinations.
Finally, I plotted the distribution of forecast accuracy for all store & department combinations. It can be seen that the distribution is negatively skewed and the forecasting model performs well on average even if there are several outlier points. The average accuracy is 88.2% across Walmart stores.
Next Steps
One limitation of the forecasting models explained above is the inability to deal with any external input (GDP growth, unemployment rate, weather, etc.). Regarding that, I planned to build machine learning algorithms on this dataset by concentrating on both external and internal factors such as holidays, promotions, CPI and unemployment. Then, I am going to compare statistical forecasting methods and machine learning models by checking their forecast accuracy.
This was a detailed analysis of statistical forecasting models in retail management. I would like to share my baseline code (not final one) to help data science candidates. Please leave any comment you have and also if you would like me to explain anything further!