Event-driven finite state machine for a distributed trading system

One problem I had when building my distributed trading system is managing states asynchronously from multiple triggers. For example, when the alpha engine say buy, it needs confirmation from the position engine to see if it is safe to enter a new position. I could chain one check after another imperatively or via callbacks. However, the underlying constraint is that these triggers:

  1. are resource-intensive to generate,
  2. might need to compose many of them,
  3. not sequential or have one-to-one depencency, and
  4. most importantly, they are in separate programs or different machines

Thus I've opted to abstract this problem out into its own module of the system as an event-driven finite state machine (FSM) to keep track of state transitions. Intimidating term, but my first implementation was just if-else statements to qualify as such. The benefit is that each of my system's components only need to push signals and pull states from a central interface without having to worry about what should it call next or poll anything else to see if the stars are aligned. That drastically simplified development and maintenance.

The responsiblities of my FSM module are to:

  1. listen to all the signals,
  2. figure out all the transitions, and
  3. publish the latest states for the rest of the system.

Handling asynchronous events

I use RabbitMQ as the message transport layer between my system's modules. All I need to do here is to associate an appropriate message handler to each triggering input for the FSM. Here's one example of the event handlers using the Clojure RabbitMQ library, Langohr. The rest of this part are just standard RabbitMQ publish/subscribe stuff.

(defn- event-message-handler [ch {:keys [headers delivery-tag redelivery?]} ^bytes payload]
  (let [{:keys [message-type user-id instrument quantity]} (read-payload payload)]
    (when (= :position-event message-type)
      (-> (get-cached-states user-id)
          (update-position-state instrument quantity)
          (cache-states user-id))))
  (lbc/ack ch delivery-tag))

This is called when a position event is received with information such as user, instrument, and quantity. This handler would thread these information by fetching current states for that user, evaluate next state with input, and then cache the new states for the user.

State transitions

Below is one of my system's state transition diagrams.

state transition example

There are 4 states represented by 4 colours with 4 triggers signalling state transition. The program is expected to handle up to hundreds of independent states concurrently with event triggers coming in a couple times per second.

As I was saying, my first implementation is just a set of if-else methods. For example, an engage trigger would call the engaging method to determine the next state given the implicit input engage and current state.

(defn engaging
  [current]
  (condp = current
    "white" "yellow"
    "yellow" "white"
    "green" "red"
    "red" "green"))

There were a handful of these boilerplate code. So after I deployed my system I came back to refactor them. I've been meaning to give core.logic a try for a while so this seem like a good place to start using it.

Before we can ask the logic solver question we need to define relations. Here I define a transition relation to specify all the state transition definition conveniently in one place.

(defrel transition from input to)
(facts transition [[nil :open :green]
                   [nil :close :white]
                   [:white :engage :yellow]
                   [:white :disengage :white]
                   [:white :open :green]
                   [:white :close :white]
                   [:yellow :engage :white]
                   [:yellow :disengage :white]
                   [:yellow :open :green]
                   [:yellow :close :yellow]
                   [:green :engage :red]
                   [:green :disengage :red]
                   [:green :open :green]
                   [:green :close :yellow]
                   [:red :engage :green]
                   [:red :disengage :red]
                   [:red :open :red]
                   [:red :close :white]])

And the event handler methods are just wrappers for a one-liner logic expression asking the question -- given current stage, cur-state, and input trigger, input, what state can q take to satisfy this constraint?

(defn next-state 
  "Solver for next state"
  [input cur-state]
  (first (run 1 [q] (transition cur-state input q))))

(def colour-clicked (partial next-state :engage))
(def colour-deactivate (partial next-state :disengage))
(defn next-position-colour [cur open?]
  (if open?
    (next-state :open cur)
    (next-state :close cur)))

Not the most illustrative core.logic example but it does the job.

Getting started with core.logic is surprisingly easy. I went through the Primer and tutorial and got this working in one try.

State caching and sharing

Now that the state transition have been taken care of, states are cached and served on Redis for other parts of the system. I use Redis for this because it is fast and easy. Values are stored in edn format instead of something more popular like JSON to maintain data structure through the wire.

This is my first time using edn in production. All inter-process messages in this trading system are edn formatted. It works seamlessly with Clojure by simply using str to write and clojure.edn/read-string to read. Besides my other Clojure components in the system, my trade broker interface is written in Java. My Java program use edn-java to parse and unparse complex Clojure data structures (e.g. nested maps with keywords).

(def pool         (car/make-conn-pool))
(def spec-server1 (car/make-conn-spec))
(defmacro with-car [& body] `(car/with-conn pool spec-server1 ~@body))

(defn get-cached-states
  "Generate edn from database."
  [id]
  (edn/read-string (with-car (car/get (str "states:" id)))))

(defn cache-states [m id]
  (with-car (car/set (str "states:" id) (str m))))

I find coupling edn with Redis is a fantastic choice as it's almost like working with Clojure's native concurrency data structures, such as atom, but also enable external programs to access the data.

Simple and quick

The entire event-driven FSM program is less than 200 lines of Clojure code and took no more than a few hours to do. However, I did give it some pondering time for a few days. I haven't done any benchmark to estimate performance result. So all I can say is that this setup can handle my simplistic use case with barely any load on the server so I'm happy with it.

A few years ago, I would have set a whole bunch of flags to switch states. In fact, that's what I did. The biggest satisfaction here for me isn't the implementation or technologies, it is seeing through the underlying problem at hand and solving it with a common pattern that made my work simpler.

Posted 20 May 2013 in computing.

Post gone viral, 16000 visitors in a day, how many actually read the article?

Edit: A few people have pointed out that my assumption about the Analytics engagement metric might be wrong because single page hit could be counted as zero on engagement time. I'll make an update to this post when smarter people than me on HN can agree on a metric. So I open this problem to Analytics expert, how can I discern readership ratio from Analytics data?

My reminiscing post about my time as an aerospace engineer versus software was on the front page of Hacker News for about 12 hours on Friday. That garnered 16,374 unique visitors to this site on that single day. However, Google Analytics data say that only 975 of those people spent more than 10 seconds here. Given that there's 652 words in that viral post, I doubt anyone can actually read it within that time. If we assume that only people spending more than 10 seconds have meaningfully read the article, it appears that only 6% of traffic are real readers from this Hacker News blitz.

Viral post visitors engagement

Given that my usual stats is above 10%, viral traffic audience is understandably less targeted but isn't abysmal by comparison. However, as a data scientist, I'm obliged to say that as this is an one-off event, we couldn't draw a statistically significant observation from it.

Interestingly, overall traffic the day after on Saturday is back down to 901 visits. And engagement for those spending more than 10 seconds is up at 8.3%. These residual traffic are coming in from domains like Twitter and link sharers.

Posted 12 May 2013 in journal.

How a few screws cost $2000 and a 240GB multinodes cluster cost $50

About ten years ago I was an electrical engineering intern at MDA Space Robotics. They are the company that designed the Canadarm 1 and 2 used on the International Space Station. I still remember meeting the R&D team next door to see their demo of a 3D LIDAR system mounted on a Mars Rover model. It was one of the competing designs to serve as the eyes of the rover. To put this into perspective, this was at a time when a single-beam scanning laser was common on a robot to gauge distances. Seeing a vision system which can generate 3D polygons of the terrain in real-time for navigation was just unbelievable.

As for our mundane electronics team, we were designing new power electronics to upgrade the Canadarm2. Obviously, I didn't actually contribute much as an intern. One project which I had my hands on was building loading circuits to simulate the electrical response of the motors on the Canadarm2. So that we can test the new power electronics with live circuit without having motors spinning in the lab. For my particular role, I didn't do any of that either. What I did was design and build safety housing for these big loading circuits.

Normally, this wouldn't take more than a couple days in a workshop. Not so in a regulated industry. Even though these boxes were only used as a superficial safety mechanism during ground support testing and were never going to be used in production or have anything to do with the actual test itself, we still needed to follow proper engineering guidelines because my mentor told me that it's safer to have a blanket rule for every component than nit-pick what is or isn't regulated.

After designing the housing in no time (it's just a rectangular shell to cover the circuit, how hard can it be?), I sourced a contractor to mold these polycarbonate shells for us. That's the same material hockey masks use because it's transparent and strong. To secure the shell onto the loading circuit, which are about as big as a moving box each, I needed big screws to bolt it on the baseboard. Seeing that we're an electrical team, we didn't have suitable screws for it in the lab. I figured I should just drive down to Home Depot to buy them.

Not so fast. Apparently, as I was technically sourcing in a new component, I couldn't just go down the street and get them. I ended up having to order from one of our approved suppliers and had them shipped to us overnight. Not that I was in any hurry. It's because we did all shipping by courier. And even though I needed just a few screws, the supplier don't do small orders either so I had to order the minimum of a hundred or something. Still, all of that didn't really cost that much. The majority of the cost came from my hours spent in getting technical and administrative approvals for adding this new component into our bill of material.

And so that is how I ended up spending the company around $2000 on a few screws. I never saw an itemised bill for those screws. But I figured that's about right based on hours spent and people's estimated salaries.

This forgotten story from my engineering days came about this week as my colleague Paul and I were spiking out a big data project on Amazon Redshift. On a whim just for the sake of it, we launched a 32 virtual cores, 240GB memory, 32TB storage multi-node cluster with literally just the click of a button. We played with it for a couple hours, did what was needed, and decommissioned the cloud servers. It cost us $45.

What is my point of the stories? Same concept of materialising an idea. Different time, different industry. Diametrically different prototyping experience.

Update: this post generated some heated debates on Hacker News.

Posted 10 May 2013 in journal.

What should I work on next for Cascalog?

Too many things to do, too little time. I figured we can do this in a data-driven way. So here's a poll. Please only submit an entry if you use Cascalog. And only one per person. Let's see if an honour system would work.

Poll result one week later

poll result

Votes Choice
15 Self-contained documentation site, e.g. http://cascalog.quantisan.com (demo address only)
10 Improve and consolidate guides into Github wiki
11 Bring Cascalog to Cascading 2.1/2.2
8 Fix open issues
13 Add features and performance increase, e.g. new logic solver, make use of new Cascading features since 2.0
2 Other

One person voted for integrating other machine learning library into Cascalog and another to isolate system library.

Posted 08 May 2013 in computing.

Securing a fresh Ubuntu server with Fabric tasks

An administrative task that I've done countless number of times is spinning up new servers to host applications like a web dashboard or a trading engine. At uSwitch my colleagues use Puppet and other smart devops tools that I have no idea about to automate our infrastructure. For my own work I just need an easy tool to run some commands over ssh. I chose to use Fabric to automate this repetitive but necessary task of hardening a fresh Ubuntu server. Now that I have this set of Fabric tasks, after I create a new instance, I can just run a single Fabric task and the server would be configured properly for use.

The choice for Fabric is because I've been using Fabric for some time already to perform simple deployment tasks. Then I stumbled on an open-source project that actually configures an Ubuntu server using Fabric. Much of these scripts are based on that source. Automating these tasks saves me a lot of time and ensures consistency of configurations. Perhaps when the need arises, I should look into other tools like Vagrant too.

Here are some basic tasks that I perform on a new Ubuntu 12.04 server and the corresponding Fabric task script.

Create an administrator account

from fabric.api import *
from fabric.contrib.files import append
from fabric.contrib.files import sed
from fabric.contrib.files import exists
from fabric.operations import prompt

def create_admin_account(admin, default_password=None):
    """Create an account for an admin to use to access the server."""
    env.user = "root"

    opts = dict(
        admin=admin,
        default_password=default_password or env.get('default_password') or 'secret',
    )

    # create user
    sudo('egrep %(admin)s /etc/passwd || adduser %(admin)s --disabled-password --gecos ""' % opts)

    # add public key for SSH access
    if not exists('/home/%(admin)s/.ssh' % opts):
        sudo('mkdir /home/%(admin)s/.ssh' % opts)

    opts['pub'] = prompt("Paste %(admin)s's public key: " % opts)
    sudo("echo '%(pub)s' > /home/%(admin)s/.ssh/authorized_keys" % opts)

    # allow this user in sshd_config
    append("/etc/ssh/sshd_config", 'AllowUsers %(admin)s@*' % opts, use_sudo=True)

    # allow sudo for maintenance user by adding it to 'sudo' group
    sudo('gpasswd -a %(admin)s sudo' % opts)

    # set default password for initial login
    sudo('echo "%(admin)s:%(default_password)s" | chpasswd' % opts)

harden ssh server

def harden_sshd():
    """Security harden sshd."""

    # Disable password authentication
    sed('/etc/ssh/sshd_config',
        '#PasswordAuthentication yes',
        'PasswordAuthentication no',
        use_sudo=True)

    # Deny root login
    sed('/etc/ssh/sshd_config',
        'PermitRootLogin yes',
        'PermitRootLogin no',
        use_sudo=True)

    sudo("restart ssh")

setup firewall

def install_ufw(rules=None):
    """Install and configure Uncomplicated Firewall."""
    sudo('apt-get update')
    sudo('apt-get -yq install ufw')
    configure_ufw(rules)

def configure_ufw(rules=None):
    """Configure Uncomplicated Firewall."""
    # reset rules so we start from scratch
    sudo('ufw --force reset')

    rules = rules or env.rules or err("env.rules must be set")
    for rule in rules:
        sudo(rule)

    # re-enable firewall and print rules
    sudo('ufw --force enable')
    sudo('ufw status verbose')

time synchronisation daemon

def set_system_time(timezone=None):
    """Set timezone and install ``ntp`` to keep time accurate."""

    opts = dict(
        timezone=timezone or env.get('timezone') or '/usr/share/zoneinfo/UTC',
    )

    # set timezone
    sudo('cp %(timezone)s /etc/localtime' % opts)

    # install NTP
    sudo('apt-get -yq install ntp')

enable unattended upgrades

def install_unattended_upgrades(email=None):
    """Configure Ubuntu to automatically install security updates."""
    opts = dict(
        email=email or env.get('email') or err('env.email must be set'),
    )

    sudo('apt-get -yq install unattended-upgrades')
    sed('/etc/apt/apt.conf.d/50unattended-upgrades',
        '//Unattended-Upgrade::Mail "root@localhost";',
        'Unattended-Upgrade::Mail "%(email)s";' % opts,
        use_sudo=True)

    sed('/etc/apt/apt.conf.d/20auto-upgrades',
        'APT::Periodic::Update-Package-Lists "0";',
        'APT::Periodic::Update-Package-Lists "1";',
        use_sudo=True)

    append('/etc/apt/apt.conf.d/20auto-upgrades',
           'APT::Periodic::Unattended-Upgrade "1";',
           use_sudo=True)
Posted 06 May 2013 in computing.

Minimal variance asset allocation for Stocks ISA

With interest rate in the UK so pathetically low, I thought I might take some chance by making use of a Stocks ISA account in the UK. The problem though, is that I have no knowledge about the London stock market nor do I have the time to follow it. So I wrote a program to pick some Exchange Traded Funds (ETFs) with a primary goal to minimise risk and opportunity costs.

Here are what I wanted to achieve:

  • only a few trades at most a year
  • less risk than FTSE100
  • more yield than a laddered government bonds portfolio
  • require less than an hour per month of maintenance

Basically, this is a computer-assisted passive investment portfolio.

The first step is to scrape all the ETF symbols from London Stock Exchange on these pages. I use getNodeSet from XML package in R to select the relevant data from the HTML page with XPath.

page <- getURL(url, curl=curl)
tree <- htmlTreeParse(page, useInternalNodes=TRUE)
xpath <- "//table[@class = 'table_dati']/tbody"
node <- getNodeSet(tree, xpath)

This program only considers ETF because this portfolio is to diversify risk and not pick winning stocks. ETFs provide convenient exposure to various asset classes such as equities, bonds, and commodities at low costs.

Next is to scrape profile information for each symbol from Yahoo. We want data such as the fund's expense ratio and asset class category.

url <- paste("http://finance.yahoo.com/q/pr?s=", symbol, "+Profile", sep="")
tree <- htmlTreeParse(url, useInternalNodes=TRUE)
xpath <- "//table[contains(concat(' ', @class, ' '), ' yfnc_datamodoutline1 ')]/tr/td/table"
node <- getNodeSet(tree, xpath)

operation <- tryCatch(readHTMLTable(node[[2]]), error = function(e) NA)
overview <- tryCatch(readHTMLTable(node[[1]]), error = function(e) NA)

Once the funds' fundamental data are fetched, we can do a preliminary screening. I am filtering for:

  1. Actively traded funds,
  2. Sufficient age (3 years), and
  3. Only the best 3 expense ratio efficiency from each class

That last point is particularly important as illustrated in this plot.

London funds expense by category

The above plot couldn't fit in this frame but it shows that expense ratios are all over the place. What matters is that the raw data is available for use.

The plot below is clearer. It shows expense ratio by the fund's issuer. You can see that Vanguard funds generally have the best expense ratio as is commonly known.

London funds expense by issuer

The initial ETF list has 667 funds in 104 categories. The screened list narrows it down to 20 funds in 18 categories. Most that were screened are niche funds such as Islamic Global Equity and regional real estate funds.

Out of that 20 funds, I apply the popular Modern Portfolio Theory to minimise risk using historical quotes data with quantmod's Yahoo data fetcher. Given the expected returns of each asset, er and their covariance matrix, cov.mat, a long-only efficient portfolio weighting of those assets can be solved with quadratic programming like so.

Dmat <- 2*cov.mat
dvec <- rep.int(0, N)
Amat <- cbind(rep(1,N), er, diag(1,N))
bvec <- c(1, target.return, rep(0,N))
result <- solve.QP(Dmat=Dmat,dvec=dvec,Amat=Amat,bvec=bvec,meq=2)

To get these weightings,

IGLT.L   MIDD.L   INXG.L   EQQQ.L   SLXX.L   IBGS.L   IUKP.L   LUK2.L 
0.603023 0.122829 0.116879 0.084122 0.037906 0.017975 0.014051 0.003215

But here's the catch, this mean-variance optimisation approach which I'm using does not work in the real-world. The problem is that it optimises for historical data under simplistic assumptions. For potential improvements on this model, start with this Q&A on StackExchange but be warned that it's a rabbit hole to go down in.

Knowing that I shouldn't trust this model much, I do this a couple times under different scenarios on the efficient frontier and union the top weighted assets from each run as a compensation by sampling.

The result is a suggestion of six ETFs.

Symbol            Name                   category
BRIC.L    ISHARESII 50                BRIC Equity
EQQQ.L   POWERSHS EQQQ US Large-Cap Growth Equity
IGLS.L ISHARESIII 0-5£        GBP Government Bond
INXG.L GBP IDX-LNK GLT  GBP Inflation-Linked Bond
MIDD.L  ISHARESFTSE250          UK Mid-Cap Equity
SLXX.L ISHSIII IBX £CB         GBP Corporate Bond

Out of these I hand picked IGLS.L and MIDD.L for a conservative 80% bonds and 20% equity portfolio. This plot below shows the annualised return versus risk of ISF (FTSE100), an equal-weighted portfolio of the pre-screened 20 ETFs, and this final portfolio of two ETFs. Notice the historic risk of this final portfolio is a third of FTSE100.

Return vs risk

Not surprisingly, what my program derived from scratch is similar to the commonly suggested portfolio balance of bonds, local equities, and emerging market blend. What this program offers is picking out the specific ETFs from the hundreds of ETFs traded on London Stock Exchange for a balanced asset allocation.

The complete R source code for this project is available on Github.

Posted 31 January 2013 in stocks.

continue   →