Monday, November 29, 2010

INFORMS Data Mining Competition leaders used Open Source software

The results of 2010 INFORMS data mining competition just recently finished.  The leaders were presented at the 2010 Annual INFORMS Conference.  The 2010 INFORMS data mining competition goal was to determine short term movements in stock prices.  You may recall that IEOR Tools competed in this competition with not too glamorous results at the end.  There was a lot to learn from this competition.  Firstly that it seems trading price movement can be correlated to lags in prices very well.  Most of the top leaderboard finishers used future information to determine an appropriate lag in the price movement.

The next most interesting thing is that all top 3 finishers used free and open source software as tools for the competition.  Two of the leaders used R and the second place finisher used Python, namely SciPy.  This should not be surprising to most people in the analytics community.  Open source software has been making inroads for quite a while.  The R-Project has been getting a lot of interesting press lately especially in enterprise business circles.  Python is an objective oriented programming language that is getting more popular.  Python's popularity seems to be to its ease of use and how quickly it can be learned and implemented.

The presenters of the 2010 INFORMS data mining competition were kind to post the methods of the Top 3 competitors.  Each method is an interesting read on how they were able to use the open source tools to get predictive results of stock price movements.

If you are interesting in learning more about R as a tool I recommend a new book by Luis Torgo "Data Mining with R: Learning with Case Studies".

This book is one of the first in its kind of showing R methodologies with real life applications.  I am intending to get the book and hopefully have a review on it in the near future.  I am already hearing good things about it.