My talk at OR54 on knowledge discovery with web log data


Web log data contains a wealth of information about online visitors. We have a record of each and every customer interaction for the millions of visitors coming through each month at The challenge is to analyse this discrete time series, semi-structured dataset to understand the behaviour of our visitors on a personal level. This talk is a case study of how our data team of three leveraged heterogeneous architecture and agile methodologies to tackle this problem. And we had three months.


My talk on bootstrapping data science in a company

My 5 minute lightning talk on Cascalog

Cascalog makes it a lot simpler to build distributed strategy backtesters on terabytes of market data, for example. It is a data processing library for building MapReduce jobs. I've been spiking out a data processing project with it at work for the past couple of weeks. So I thought I might as well give a lightning talk about it at our monthly developers meetup. Here are my presentation slides introducing Cascalog and outlining its features.

The possibilities...