6 ways to filter data for your trading system

The imminent extortion of metered internet usage here in Canada got me thinking about the large amount of data (at least a few GB a month) that I use for my trading work. In particular, I'd like to talk about methods to filtering time series price data in this post. There is a tradeoff between reading every tick that comes in or sampling the data at less-frequent intervals. A common way discretionary traders present data is by using open-high-low-close-volume (OHLCV) bars of a certain time period. A 5-minute chart looks similar but shows different information than a 4-hour chart, for example. Technically, this is one way of filtering, or presenting, data by sampling at a fixed time interval. The thing with summarizing tick data into 5-min or 4-hour data is that you lose details in exchange for simplicity. However, the reduction in the amount of data is significant. A year's worth of tick data of a single currency pair, for example, runs in the gigabytes range. Whereas a year's worth of the same data in 5-minute OHLCV bars is merely a few ~~hundred~~ megabytes. Another factor to consider is that market data are inherently noisy. Would feeding your system with each and every tick be useful? Of course, if your system is able to handle the noise, or even better, make use of it, then all the more reason to opt for more details in the data. But that's another topic in itself. In any case, there are 6 ways to present/filter data that I can think of.

  1. Time interval.
  2. Price interval.
  3. Volume interval.
  4. Regression modelling.
  5. Frequency filtering.
  6. Wavelet transform.

As you can see, massaging data is not limited to choosing the time period for your OHLCV bars. In fact, it's a whole distinct field involving their own IEEE journal, Knowledge and Data Engineering. There is probably many other ways to filter data than these 6 that I introduced. This is yet another topic that I would like to know more to improve my trading system.