Partnering with the marketing team at work earlier this month reminded
me of an important lesson in my algorithmic trading. Our external TV
agency presented a mid-term performance analysis and I was tasked with
performing due diligence on their analysis methodology. It took me 2
minutes to realise that they were using a linear regression model. At
first I thought it was too simplistic. Then I realised my own naivete
because it was never about accuracy to begin with. Measuring ROI on a
marketing campaign is an illusive task. A quick metric is to perform a
regression on the money you spent versus the revenue generated in the
same time period. For example, let y(t) = x(t), where y represents
revenue, x is marketing spending, and t is day. Plot your data on y
versus x for a number of days. Then draw a best fit straight line to
graphically show a correlation. If you're feeling adventurous, throw in
a few other factors that you think might have important influence on
your daily revenue. Such that you have x2(t), x3(t), etc. This is what
the agency did in their model. One major flaw with using a regression
model for this purpose is that it assumes each data point is mutually
exclusive to another. So a day's event does not influence another. This
is simply not true in the real world. For examples, it might take a few
days after someone sees your ad until they click your buy button or it
might take more than a few showing of the ads until people take action.
A better regression model for the first example is y(t) = x1(t - a),
where 'a' is the delay. And for the second example, y(t) = ∑x1(t - A),
where A is a vector of 'a'. The problem with this is that finding 'a'
and 'A' is another regression problem in and of itself. Luckily, this is
conceptually what an Artificial Neural Network does, a series of
regression models taking into account the interrelation of the factors
and non-linear effects of the response. And so, within the span of a few
minutes of our conference call with them, I've just convinced myself in
my mind to try another algorithm when I have a chance. That weekend I
wasted a few hours on R tinkering with the data. Then it finally dawned
on me. What is the value of this? To calculate an ROI figure on our
marketing campaign, is it necessary to spike a machine learning project
just so we can be 95% confident on that figure? The justification is
further weakened by the fact that ROI is merely one of the many metrics
available when evaluating a marketing campaign. So the impact of an
accurate model versus a throw in the dart might not be worth spending an
extra two weeks working on it whereas a throwing a dart takes only one
command in R.