Building a distributed back-tester with Hadoop on Amazon AWS

Testing is arguably the single most important aspect of trading system development. You can’t tell how well an idea works unless you test it out. Testing can also help you identify weaknesses or strengths in your model. The downside to testing is that it takes time to churn through those gigabytes of data.

Backtesting is inherently a linear process. You feed in your tick data into your algorithm and expect some actions. You can’t really make use of fork/join to let other threads steal from the process queue as the later process depends on results from the earlier calculations. However, often times than not, you’re interested in testing many variations of a strategy. This is where MapReduce comes into play.

MapReduce is a Google software framework. It is inspired by the map and reduce functions ubiquitous in functional programming. They are as common as for-loops in the Java world.

The map function partitions an input into smaller problems and run them concurrently, e.g. each of the strategy’s variant is executed on a node.

The reduce function takes the results from all the nodes and aggregate them to get an output, e.g. back-test results from each strategy.

Having used functional programming for some time now, using map/reduce is very natural for me. Where my knowledge falls short is in implementing a distributed infrastructure for running these map and reduce with massive scaling beyond my own multi-core computer.

It just so happens that Amazon AWS has a hosted Hadoop PaaS. Where Hadoop is the Apache’s framework for MapReduce. Hardware, check. Framework, check. This will be the first second system that I’ll be working on in my goal to build a complete trading R&D platform.

Expect some technical discussions in the coming months as I work my way through. Now, where should I start…

Related posts:

  1. Local Hadoop test cluster up and running
  2. Transitioning to a professional website, front and back
  3. Trade of the Day: Don't be afraid to get back in

2 Comments

  1. aris says:

    I followed your question about intellectual property on JForex support forum (http://www.dukascopy.com/swiss/english/forex/jforex/forum/viewtopic.php?f=84&t=24398&p=30776&hilit=intellectual+property#p30776).

    Apparently, your one year old question has been largely ignored by Dukascopy support. As you can see, when compiling in JForex, the system will auto generate a strategy Id to identify your specific strategy. I wonder is your strategy source code and altogether with all the related libraries been secretly uploaded to one of their server categorized, and effectively the IP would be infringed. It’s not that I don’t trust dukascopy, it’s probably one of the best, but lacking IP protection policy on the JForex platform really makes one uncomfortable to use it as the production as well as testing platform.

    What is your take on the IP regarding to JForex and Dukascopy in general after more than a year or so?

    A distributed platform seems the correct path, as there probably really is incentives for dukascopy to analyze and integrate your strategy to its own.

    Looking forward to see your reply on this one,
    Thanks,
    aris

  2. Paul says:

    Hi Aris,

    I compile my production strategy into a JAR then obfuscate it with ProGuard. Then I merely call it from a JForex strategy and compile the wrapper strategy in JForex to a JFX to be run live. It’s an extra step of work but well worth the hassle as compiling directly in JForex is open.

    Paul

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>