I construct models, not theories

I gave a talk at the 54th Annual Conference on Operational Research last week in Edinburgh. Operational Research is "using advanced analytical methods to help make better decisions" [wiki]. This field has been around long before data science and business intelligence. After listening to so many talks and talking to so many academics from marketing to finance to industrial engineering, I find that what we do are quite similar on a 30 thousand feet level -- using data to solve problems. Yet, there is a fundamental difference to our approaches. Whereas operational research is about constructing analytical theories; data science is about constructing models.

One of the talks that I recall is from a phd from Dubai about optimising maintenance scheduling in desalination plants. Desalination plants is big business in the Middle East as they provide a major source of fresh water there. However, components in these plants fail often because of the harsh condition that they work in and that servicing some of these components might need to bring the entire plant down for hours. The presenter proceeded to explain their method of using a Poisson process on the failure data to optimise maintenance work.

Now if it were me, I would add tons of sensors everywhere to enhance the frequency and granularity of data captured. Similar to what we do at work for web data. Then the problem practical solve itself as we'll be able to build predictive models for each and every crucial component. Using on-going data with these predictive models, we can flag high risk components and service them before they cause trouble.

The problem is that adding sensors is not trivial (from my electrical engineering days) in a physical system. The high cost of installing all of that and the questionable efficacy of measurements make getting data a challenge. For problems like this, I can see where a traditional scientific thinking of using sparse data to support theories is practical.

Yet, not everything can be reduced to formulas and solved analytically. As this blog piece on Scientific American points out, science is moving towards solving problems computationally. So too are the industries as we've seen examples from Amazon and LinkedIn driving massive sales by modelling and enabling a feedback loop with their data.

It's a shame that so many companies are poisoning the term Big Data these days by plastering it all over their marketing material to sell products with no substance. There are real strategic advantages to be reaped if companies can do it right.