It’s an open buffet in a small business

One of the few benefits of working for myself is that I don’t need to worry about compatibility with legacy systems. I am free to use whatever open source tools to get the job done well. The downside to this is that there are so many technologies out there that it’s hard to choose the right ones for the job. To give you a sense of what I meant, here are some of the topics that I have either tried or seriously considered in the past year.

Programming language choices are important because they sit at the bottom of a technology stack. Most of the work that I do are built on using them. For a long while, I settled on using a combination of Java, Python, and R. I prototype small ideas in Python. Implement production code in Java. And perform analysis in R. I discussed why use the right tool for the right task a year ago.

By the end of my previous project, I am finding that the popular development triplet of Java, Python, and R, is not ideal for a solo-operation. Seeing that I have more time on my hands because I am using QTD to trade for me now, I am taking a break this summer to expand my knowledge and learn new technologies.

Some of the technologies that I am experimenting with includes:

  • an in-memory data store for instantaneous and persistent tick data
  • parallel programming for concurrent processing with no locks
  • mathematically intuitive algorithm implementations using high order functions

Don’t mind me as I help myself in this open buffet of technologies.

read more

Data Scraping the Toronto Stock Exchange: Extracting 3,660 companies’ data

One of the tasks that I’ve always wanted to make more efficient in my stock trading is the work of scanning for stocks to trade. One look at my trading strategy posts and you’ll see that I have devised many stock scanning systems in the past few years. The most recent system that I’ve used is one that uses options data to filter stocks. However, it is not automated. So it takes a lot of time to gather and analyze the data. Furthermore, the set of tools that I use is limited to U.S. stocks. Now that I have taken an interest in the Canadian stock market, I can’t seem to find any public tool that I like. Thus, I am biting the bullet now and taking my time to develop a custom system once and for all.

Before we can analyze stock data, we need to extract them first. Where better else for that than go straight to the source at TMX.com, the parent company of Toronto Stock Exchange (TSX) and TSX Venture Exchange (TSXV). TMX.com provide a list of publicly traded companies in ten Excel files. The files are divided by sectors. Each contain a number of fundamental company data, such as market capitalization and outstanding shares. So step 1 is to extract those data.

This is where I am at now. I attached the source code for an alpha/developmental release below for anyone interested. It is a working program to scrape the data from TMX.com’s files. But it’s still a work-in-progress. That’s why I am calling it a version 0.1.

The next milestone is to program a Stocks class to hold, organize, and manage all three thousand, six hundred, and sixty companies’ data. This is an easy task to do by extending the built-in dictionary class in Python. However, I haven’t gotten to that chapter yet in my scientific programming with Python learning book. I stopped at chapter 8 to work on this project. Chapter 9 is the inheritence and hierarchical material.

The goal of this project is to build an automated data scraping program for TSX and TSXV data from various sources into my computer. Once I have my data, that’s when the real fun starts.

Regarding the code below, I know that source code is useless for most people. Once the project is complete, I will compile the code into a standalone application and post it on this site. Subscribe to my RSS feed so that you can keep up-to-date with the progress of this project and my other ramblings on trading.

# extractTMX.py
# version: 0.1 alpha release
# revision date: March, 2010
# by Paul, Quantisan.com

"""A data scraping module to extract company listing excel files from TMX.COM"""

import xlrd									# to read Excel file
#import sys
from finClasses import Stock				# custom Stock class

def _verify():
	"""Verification function for a rundown of the module"""
	pass									# copy test block here when finished

def findCol(sheet, key):
	"""Find the column corresponding to header string 'key'"""
	firstRow = sheet.row_values(0)
	for col in range(len(firstRow)):
		if key in firstRow[col]: return col		# return first sighting
	else:										# not found
		raise ValueError("%s is not found!" % key)

def scrapeXLS(book):
	"""Data scraping function for TMX Excel file"""

	listingDict = {}						# dict of ('ticker': market cap)

	for index in range(book.nsheets):
		sh = book.sheet_by_index(index)

		mcCol = findCol(sh, "Market Value")
		assert type(mcCol) is int, "mcCol is a %s" % type(mcCol)
		osCol = findCol(sh, "O/S Shares")
		assert type(osCol) is int, "osCol is a %s" % type(osCol)
		secCol = findCol(sh, "Sector")		# multiple matches but taking first
		assert type(secCol) is int, "secCol is a %s" % type(secCol)
		hqCol = findCol(sh, "HQ\nRegion")
		assert type(hqCol) is int, "hqCol is a %s" % type(hqCol)

		for rx in range(1, sh.nrows):
			sym = str(sh.cell_value(rowx=rx, colx=4))		# symbol

			s = sh.cell_value(rowx=rx, colx=2)	# exchange col.
			if s == "TSX": exch = "T"
			elif s == "TSXV": exch = "V"
			else: raise TypeError("Unknown exchange value")

			mc = sh.cell_value(rowx=rx, colx=mcCol)		# market cap
			# check for empty market cap cell
			mc = int(mc) if type(mc) is float else 0

			os = int(sh.cell_value(rowx=rx, colx=osCol))	# O/S shares
			sec = str(sh.cell_value(rowx=rx, colx=secCol))	# sector
			hq = str(sh.cell_value(rowx=rx, colx=hqCol))	# HQ region

			listingDict[sym] = Stock(symbol=sym,exchange=exch,
				mktCap=mc,osShares=os, sector=sec,hqRegion=hq)
	return listingDict

def fetchFiles(fname):
	infile = open(fname, 'r')				# text file of XLS file names
	listing = {}
	for line in infile:						# 1 file name per line
		if line[0] == '#': continue			# skip commented lines

		line = line.strip()					# strip trailing \n
		print "Reading '%s' ..." % line

		xlsFile = "TMX/" + line				# in TMX directory
		book = xlrd.open_workbook(xlsFile)	# import Excel file
		listing.update(scrapeXLS(book))		# append scraped the data to dict
	return listing

#if __name__ == '__main__':					# verify block
#	if len(sys.argv) == 2 and sys.argv[1] == 'verify':
#		_verify()

if __name__ == '__main__':					# test block
	listing = fetchFiles('TMX/TMXfiles.txt')
read more

First look at Google App Engine for automated trading and quant analysis on the cloud

I just spent the last few hours looking into Google App Engine to use it for trading. Google App Engine (GAE) is a cloud computing development and hosting platform for web applications. GAE is similar to the well-known Amazon EC2 service but it is also very different. The main difference between GAE and EC2 is that GAE is not as flexible as EC2 but it is a lot easier to develop on. Think of it like using C++ (EC2) to write your own trading platform to using EasyLanguage on TradeStation or MQL on Metatrader (GAE).

Another advantage for using Google App Engine is that it is free! Well, to a certain extent. It is free for a limited monthly usage. However, Google’s free quota is known to be very generous (think Gmail) and this is no exception. Based on the numbers, it looks comparable to a typical $20/month virtual private server (VPS) hosting package. My guess is that you can run a decent personal quant program on it if you keep the resources for yourself (private use). For more information on the free quota, you can read the current GAE quota web page or the Wikipedia page for GAE.

You might be wondering how did I spend a few hours just reading on GAE? The answer is that I’ve done a brief research, planning, and assessment to see how GAE can be put to use in terms of quantitative analysis for automated and discretionary trading. Without going through all of my notes and logic, here’s the gist of my conclusion.

For Automated Trading

Since GAE is also a SDK, it has a restricted programming API. You can’t just plug in any custom library of your own (i.e. your broker’s API) into your program. That is its advantage and its disadvantage. Furthermore, GAE only support HTTP and HTTPS. So you can’t open a socket for your connection either. These limitations won’t be changing anytime soon as GAE is designed to be like that for security reasons. Basically, there’s no way for your web application on GAE to connect with your broker for trading. So much for that idea.

For Discretionary Trading

Things look brighter for discretionary trading though. A big bonus with using Google App Engine is that it supports the Google Finance API. Google Finance offers free real-time stock quotes. Combine it with the programming capability of the Google App Engine and you have yourself a free number cruncher in the cloud. However, GAE is definitely no match for your personal computer, so why would you go through all that trouble?

One possible scenario in which this could be useful is for traders that is often on the road or traders such as myself in which we have a day job. In that case, you can set you quant analytics on GAE as a private web application and then you can access it as a web page anywhere on your mobile phone.

One of the obvious limitation with using GAE is that it has limited features for scientific computing. For example, it doesn’t support Numpy in the Python SDK. As such, I can’t imagine using GAE for anything more than simple technical analysis for now. I certainly hope that GAE can include more powerful scientific computing capabilities in the near future.

Until then, I really can’t see Google App Engine offering much help aside from performing algorithmic tasks to help with your trading analysis. Their open issues log does announce that Numpy is being integrated into GAE though. So that’s a great sign for things to come! I am very hopeful of GAE’s potential as a free cloud server to host my quant analytics on. We’ll just have to wait for now.

read more