Tuesday, June 29, 2010

Kaggle hosting INFORMS 2010 Data Mining Contest

Kaggle is hosting the 2010 INFORMS Data Mining Contest.  The goal of this years INFORMS Data Mining Contest is to predict intra-day stock price movements.  All data and submission guidelines are provided on the Kaggle website.  Entries that are submitted are immediately scored and evaluated by an AUC calculation.  The leading AUC score by the end of the contest is going to be honored as the annual INFORMS meeting which is in Austin, Texas (Nov. 7-10).

There is already a lot of good discussions of modeling techniques.  Mark started off with a question on OR-Exchange about modeling methods for the INFORMS contest.   Since the data is a binary categorical target his preferred method was using Logistic Regression.  Mark provides example R code to provide collaborative input to the contest.  I followed suit and provided an IEORTools entry to the contest.  I used the same methods of Logistic Regression.  I also did some variable analysis using the rpart package in R to develop a decision tree.  After pulling some variables that were not significant I was able to get on the leaderboard with Mark.  The pictured leaderboard is of June 28. 

There is also some good discussion on the Kaggle website contest forum.  Posted on the forum one entrant suggested possible variables to use in a Logistic Regression model which is very beneficial.

I really like to see this collaborate effort to modeling.  This was one of the qualities I really enjoyed in the Netflix Prize.  I hope Kaggle and INFORMS continues to provide these fun and thought provoking contests.

Friday, June 25, 2010

U.S. SEC endorses Python to fix financial problems

News from PCWorld is mentioning that ActivePython, the software distrbution from ActiveState, is going to include numerical, scientific, and optimization software with its current software bundle.  The numerical and optimization software it is going to include is Python based NumPy, SciPy, and matplotlib.  All of the new software is open source and available for free download.

Apparently this is in anticipation to the new U.S. financial rules from the U.S. Securities and Exchange Commission.  On April 7, 2010 the U.S. S.E.C. proposed new rules for Asset-Backed Securities that will enable the markets to run efficiently and fairly.  On the first page of the released documents from the S.E.C. they mention the use of Python.  That is a nice shocker to us open source advocates.

Python is a great computing language.  It is really easy to learn compared to the other languages such as C.  Perhaps the U.S. S.E.C. thought it would be the best choice because of its ease of use and abundance of software packages.  This is really interesting news and hopefully we will be hearing more about it in the near future.

Thursday, June 24, 2010

R package for World Bank Data

A little while ago I posted about how the World Bank data is open to the public for research.  This apparently is just the beginning of what is possible with having free access to a lot of really good data sets on socio-economic information.  R-chart blog just recently posted saying that an R package was developed as an API to access the World Bank data.

This opens up a lot more data mining opportunities and could just be the start of some great analytic research.  I'm really looking forward to seeing what some of the great R minds will find with the World Bank data at their fingertips.  Since R is freely available anyway this merger makes sense on all sorts of levels.  Happy data mining!

 Software for Data Analysis: Programming with R (Statistics and Computing)
 Software for Data Analysis: Programming with R (Statistics and Computing)

Wednesday, June 16, 2010

Analytics and FIFA World Cup

What would the FIFA World Cup be without the prognosticators?  You can be assured that the Analytics community is not far away from the scene.  There are plenty of places to find on the web predictions and analytics of the 2010 FIFA World Cup.  Here are some of places on the web where you can find all of your World Cup analytics interests.

Wayne Winston is posting some predictions and rankings on his blog mathletics.  If you are a fan of sports and analytics (i.e. Moneyball) than you would love Wayne Winston's blog.   Wayne does predictions for professional and collegiate sports in basketball, football, baseball, and soccer.

Blog posts on AnalyticBridge tells that big financial institutions are using quantitative financial instruments used in credit swaps and and debt obligations for predicting World Cup outcomes.  It is a corporate financial challenge that is trying to predict with country will go the farthest in the World Cup.  Let's hope its not the same models that were used to predict mortgage backed securities from a few years back.

Spotfire's blog has an entry about providing World Cup data all the way back to 1930.  TIBCO, Spotfire's parent, is providing analytic data from the World Cup including all sorts of statistics.  Analysts can get scores, goals, penalties, attendance, and other data points.  The online app that TIBCO provides also has nifty charts to compare different countries performance. 

Tuesday, June 15, 2010

OpenOpt release 0.29

New OpenOpt Suite release is out. This is free (license: BSD) and cross-platform (Linux, Windows, Mac etc) Python language modules for numerical optimization, automatic differentiation, solving systems of linear/nonlinear/ordinary differential equations etc. It is published quarterly since 2007, already has some essential applications and expected to become even more popular with Python release 3.3, where dynamic compilation will be implemented.

OpenOpt 0.29:
* Some minor bugfixes
* Some improvements for handling sparse matrices
* Bugfix for problems with nonlinear equality constraints
* Major changes for problems with nConstraints>1

FuncDesigner 0.19:
* Some improvements for automatic differentiation
* New feature: attached constraints
* New feature: oosystem
* Now you can model & solve ODE systems

DerApproximator 0.19:
* Function get_d2
* Add new stencil

You can try it online via our Sage-server.
See also: Full Changelog, Future Plans

Tuesday, June 1, 2010

OR-Exchange confession

I have a blogger confession to make about OR-Exchange. I am addicted. I think its the first Operations Research related social network that has me really hooked. I confess that I check it daily. Yes you can see that I've earned a silver badge for my continual obedience. Is this OR's Farmville? Well at least it is for me.

The premise for OR-Exchange is really simple. Think of an Operations Research related question that bugs you, puzzles you, or simply just want to get peer feedback. Shortly after, and I mean shortly, you will be barraged by answers from like minded individuals. The perfect storm that the Web 2.0 wants to fuel.

In my mind the beauty of OR-Exchange is that it is not any normal social network. This is a social network of peers that understand my issues, problems and concerns. Maybe its just my Generation X upbringing that requires instant gratification. Yet I don't need a whole lot of stimulating from other social networks. In fact I'm pretty much done with most of the others. The online media I keep going back to are the ones associated with my interests and for me that is OR-Exchange right now.

I love the feedback from the folks at OR-Exchange. Good, bad or indifferent it brings perspectives that I often don't get in my circles. In my present work I don't often get to chat up Operations Research with my co-workers. I'm one of two employees that has any knowledge of what is Operations Research. I guess that's where it benefits me. I'm hoping that it benefits others like us that either has to wait a year to go to an INFORMS conference. I am active in my local INFORMS Chapter but most of it is topical speeches and programs. OR-Exchange is more of an outlet which has filled a void for me.

I hope the Operations Research community can take on to OR-Exchange. I believe there can be only more good as more users come online. Please help preach its worth if you are using it. Perhaps there are many more unanswered questions in the Operations Research community.