Social media: The fourth dimension in market data

It looks as though the first hedge fund utilizing social media data is about to launch in April. This "Twitter hedge fund" is based on the work of Bollen et al., "Twitter mood predicts the stock market," Journal of Computational Science, 2011. In more general terms, this is an example use of Natural Language Processing (NLP) to assess the semantic of tweets relating to the stock market. NLP, or more specifically semantic search, has been the holy grail of search engine companies since people realized that you can make a boatload of money by providing good search results. Semantic search improve search accuracy by understanding the seacher's intent and the contextual meaning rather than going on the vocabulary alone (see Nature: Quiz-playing computer system could revolutionize research). What Bollen et al. are proposing is a subset of the grander NLP problem, in that they are only concerned with the "mood states" rather than the whole meaning of the tweets. In other words, sentiment analysis. NLP is not new. R, for example, has a Natural Language Processing task view. Python has a Natural Language Toolkit. Wikipedia has a good list of NLP toolkits for other languages. Traders understand that the market is driven by people, and people are ultimately driven by emotions. So far, there hasn't been a method to directly evaluate the emotions of market participants. Would social media provide a glimpse to the emotions of the masses? According to Bollen et al., yes. At any rate, data is king in quantitative trading. I have no idea would their particular implementation of data mining social media work or not. What's true is that as more and more people publish their lives on the internet, all these non-quantitative information are just waiting to be exploited. Ethics and privacy issues aside, look at what Facebook is doing with their ads as an example. In addition to time, volume, and price, data mining social media may provide us with a fourth dimension to market data.