When a last months’ event-log and lost sales corrected sales value is “too high or too low” compared to “what we expect it to be” according to the last months real demand, it is called an outlier.
The purpose of outlier identification is to point out incoherent values that should be double checked by sales and category managers.
Definition of theexpected sales value for a product
In order to make outlier correction a simple, but yet efficient, an expected sale volume should be calculated for each SKU. This section defines the methodology for doing so and the underlying theory that justifies this choice.
Any product follows its proper lifecycle, which means its corrected sales could behave like the following through time:
Source : http://www.quickmba.com/marketing/product/lifecycle/
Then they should follow a predictable pattern, the model. This assumption is also true locally in time: if the market conditions stay unchanged for a given period then the corrected sales (that will be referred to as sales from now on) for that period will follow a given model.
Any difference between the model and the corrected sales is due to therandom aspect of the sale, that will be referred to as the perturbation. Whatever the probability function it follows, the perturbation has an average close to 0 over time (the more sales data you have, the more it is verified) and a given standard deviation.
In this simulation the corrected sales (green triangles) are following the same model for 20 months with more or less 50% error. Averageerror is 0.05% (of model value) and standard deviation is 0.29%.
If the standard deviation of the perturbation is close to (or even higher than) the value of the model (sales in qty) for the last months, then the sales of the next month are unpredictable.
In that case a most naïve forecast, the moving average, is the best to determine the expected value.
If it’s not the case (perturbationis small enough) then linear regression shall be used to produce the next month forecasts.
The reason for this is that any mathematical function, including the model, is locally comparable to a straight line when looked on a small enough scale. This is the basics of derivatives, also called first degree in Brook Taylor’s series (theory written in 1715). Provided that the market conditionschange quickly, a small range of data must be used to capture the trend. Thus it is irrelevant to consider using higher series degree.
Two representations of the same mathematical function (in blue): first is on wide scale and second is on smaller scale. In red is drawn the tangent of this function at the point 13.5.
In a straight line equation y=mx+b, m is called the slope and b is calledthe intercept. Linear regression is a mathematical process that finds out the best straight line equation (thus the best couple m, b) that fits to a sample of data, and provides a measure of the likeliness of the data to fit to that straight line, r2.
Then in case it fits, the expected sales value is given by the next value of that straight line.
Illustration of the results of a linearregression with Excel
The closer is r2 to 1 the more the data is linear, while the closer to 0 the more the data is non-linear.
Illustration of r2’s efficiency in determining whether the data is linear or not.
The very interesting point in using this method is thatwithout knowing the model or the perturbation it is affordable, through r2, to know whether the standard deviation of the perturbation is too high or not AND/OR whether the model itself has changed due to market parameters.
For a sample of n couples of values [x,y], x being the month and y being the sales, here are the formulas to calculate m, b and r:
In the pharmaceutical retail...