Ask not what accuracy your algorithm achieves but what value it can add

Partnering with the marketing team at work earlier this month reminded me of an important lesson in my algorithmic trading. Our  external TV agency presented a mid-term performance analysis and I was tasked with performing due diligence on their analysis methodology. It took me 2 minutes to realise that they were using a linear regression model. At first I thought it was too simplistic. Then I realised my own naivete because it was never about accuracy to begin with. Measuring ROI on a marketing campaign is an illusive task. A quick metric is to perform a regression on the money you spent versus the revenue generated in the same time period. For example, let y(t) = x(t), where y represents revenue, x is marketing spending, and t is day. Plot your data on y versus x for a number of days. Then draw a best fit straight line to graphically show a correlation. If you're feeling adventurous, throw in a few other factors that you think might have important influence on your daily revenue. Such that you have x2(t), x3(t), etc. This is what the agency did in their model. One major flaw with using a regression model for this purpose is that it assumes each data point is mutually exclusive to another. So a day's event does not influence another. This is simply not true in the real world. For examples, it might take a few days after someone sees your ad until they click your buy button or it might take more than a few showing of the ads until people take action. A better regression model for the first example is y(t) = x1(t - a), where 'a' is the delay. And for the second example, y(t) = ∑x1(t - A), where A is a vector of 'a'. The problem with this is that finding 'a' and 'A' is another regression problem in and of itself. Luckily, this is conceptually what an Artificial Neural Network does, a series of regression models taking into account the interrelation of the factors and non-linear effects of the response. And so, within the span of a few minutes of our conference call with them, I've just convinced myself in my mind to try another algorithm when I have a chance. That weekend I wasted a few hours on R tinkering with the data. Then it finally dawned on me. What is the value of this? To calculate an ROI figure on our marketing campaign, is it necessary to spike a machine learning project just so we can be 95% confident on that figure? The justification is further weakened by the fact that ROI is merely one of the many metrics available when evaluating a marketing campaign. So the impact of an accurate model versus a throw in the dart might not be worth spending an extra two weeks working on it whereas a throwing a dart takes only one command in R.