One of the tasks that I've always wanted to make more efficient in my
stock trading is the work of scanning for stocks to trade. One look at
my trading strategy posts and you'll see that I have devised many
stock scanning systems in the past few years. The most recent system
that I've used is one that uses options data to filter stocks.
However, it is not automated. So it takes a lot of time to gather and
analyze the data. Furthermore, the set of tools that I use is limited to
U.S. stocks. Now that I have taken an interest in the Canadian stock
market, I can't seem to find any public tool that I like. Thus, I am
biting the bullet now and taking my time to develop a custom system once
and for all. Before we can analyze stock data, we need to extract them
first. Where better else for that than go straight to the source at
TMX.com, the parent company of Toronto Stock Exchange (TSX) and TSX
Venture Exchange (TSXV). TMX.com provide a list of publicly traded
companies in ten Excel files. The files are divided by sectors. Each
contain a number of fundamental company data, such as market
capitalization and outstanding shares. So step 1 is to extract those
data. This is where I am at now. I attached the source code for an
alpha/developmental release below for anyone interested. It is a working
program to scrape the data from TMX.com's files. But it's still a
work-in-progress. That's why I am calling it a version 0.1. The next
milestone is to program a Stocks class to hold, organize, and manage all
three thousand, six hundred, and sixty companies' data. This is an easy
task to do by extending the built-in dictionary class in Python.
However, I haven't gotten to that chapter yet in my scientific
programming with Python learning book. I stopped at chapter 8 to work on
this project. Chapter 9 is the inheritence and hierarchical material.
The goal of this project is to build an automated data scraping program
for TSX and TSXV data from various sources into my computer. Once I have
my data, that's when the real fun starts. Regarding the code below, I
know that source code is useless for most people. Once the project is
complete, I will compile the code into a standalone application and post
it on this site. Subscribe to my RSS feed so that you can keep
up-to-date with the progress of this project and my other ramblings on
trading. [python] # extractTMX.py # version: 0.1 alpha release #
revision date: March, 2010 # by Paul, Quantisan.com """A data scraping
module to extract company listing excel files from TMX.COM""" import
xlrd # to read Excel file #import sys from finClasses import Stock #
custom Stock class def _verify(): """Verification function for a
rundown of the module""" pass # copy test block here when finished def
findCol(sheet, key): """Find the column corresponding to header string
'key'""" firstRow = sheet.row_values(0) for col in
range(len(firstRow)): if key in firstRow[col]: return col # return
first sighting else: # not found raise ValueError("%s is not found!" %
key) def scrapeXLS(book): """Data scraping function for TMX Excel
file""" listingDict = {} # dict of ('ticker': market cap) for index in
range(book.nsheets): sh = book.sheet_by_index(index) mcCol =
findCol(sh, "Market Value") assert type(mcCol) is int, "mcCol is a %s" %
type(mcCol) osCol = findCol(sh, "O/S Shares") assert type(osCol) is int,
"osCol is a %s" % type(osCol) secCol = findCol(sh, "Sector") # multiple
matches but taking first assert type(secCol) is int, "secCol is a %s" %
type(secCol) hqCol = findCol(sh, "HQ\nRegion") assert type(hqCol) is
int, "hqCol is a %s" % type(hqCol) for rx in range(1, sh.nrows): sym =
str(sh.cell_value(rowx=rx, colx=4)) # symbol s =
sh.cell_value(rowx=rx, colx=2) # exchange col. if s == "TSX": exch =
"T" elif s == "TSXV": exch = "V" else: raise TypeError("Unknown exchange
value") mc = sh.cell_value(rowx=rx, colx=mcCol) # market cap # check
for empty market cap cell mc = int(mc) if type(mc) is float else 0 os =
int(sh.cell_value(rowx=rx, colx=osCol)) # O/S shares sec =
str(sh.cell_value(rowx=rx, colx=secCol)) # sector hq =
str(sh.cell_value(rowx=rx, colx=hqCol)) # HQ region listingDict[sym] =
Stock(symbol=sym,exchange=exch, mktCap=mc,osShares=os,
sector=sec,hqRegion=hq) return listingDict def fetchFiles(fname): infile
= open(fname, 'r') # text file of XLS file names listing = {} for line
in infile: # 1 file name per line if line[0] == '#': continue # skip
commented lines line = line.strip() # strip trailing \n print "Reading
'%s' ..." % line xlsFile = "TMX/" + line # in TMX directory book =
xlrd.open_workbook(xlsFile) # import Excel file
listing.update(scrapeXLS(book)) # append scraped the data to dict
return listing #if __name__ == '__main__': # verify block #
if len(sys.argv) == 2 and sys.argv[1] == 'verify': # _verify() if
__name__ == '__main__': # test block listing =
fetchFiles('TMX/TMXfiles.txt') [/python]