Wednesday, March 30, 2011

Baseball and Decision Analytics

When its spring time it means that baseball season is getting ready to get started.  On the last day of March is when the 2011 Major League Baseball season gets going.  Baseball is as American as apple pie and almost every baseball enthusiast has something to say about the game.  Analytics professionals are not far behind when it comes to opinions on baseball. 

Baseball is definitely a numbers game.  Mathematicians have been studying baseball for as long as the game itself has been played.  One of the first notable baseball analysts to apply decision analysis was Bill James.  Bill coined the study of baseball analysis as sabermetrics which is taken from the acronym of the Society of American Baseball Research.  More recently baseball decision analysis has found its way to the Major League Baseball teams management offices.  Popular books such as Moneyball by Michael Lewis and The Extra 2%: How Wall Street Strategies Took a Major League Baseball Team from Worst to First by Jonah Keri have shown how major league management turned around poor performaning clubs into championship contenders.  The mathematics behind their decision analysis can be described best by Wayne Winston's book called Mathletics: How Gamblers, Managers, and Sports Enthusiasts Use Mathematics in Baseball, Basketball, and Football.

Baseball decision analysis has grown up since Bill James devised the batting average.  Now baseball decision analysis uses techniques such as replacement value.  The Value Over Replacement determines the value of a player given that player would be replaced by an average or run-of-the-mill at the given player's position.  Value Over Replacement was made popular by Keith Woolner, the author of the Baseball Prospectus 2011.  At first the value, which is usually offensive value, was to determine how many runs a player could produce over an average player.  Now value over replacement methodologies determine how many wins a player can generate for their respective team.  One of the best sites to give WAR analysis, or Wins Over Replacement, is Fangraphs.  Fangraphs has about every major statistic on baseball available for the baseball enthusiast.  In fact they even have heat maps for pitch location.  Ready to manage your own team yet?

Pitch location heat map from

Of course all of this decision analysis would not be possible without the numbers.  One of the best places for baseball data is  Just about every data point on baseball can be mined from the site and downloaded.  So if you have a craving to create your own baseball metric or analytics strategy there should be nothing stopping you.

This is another post in the INFORMS Online Blog Challenge.  This month is O.R. and Sports. 

Tuesday, March 22, 2011

R again in Google Summer of Code

I'm a big fan of the Google Summer of Code.  It brings great projects together with a learning opportunity for students.  Once again the R Project was selected to be part of the Google Summer of Code in 2011.  Some other notable mathematical and statistics projects with R include Shogun Machine Learning, SymPy, GambitComputational Geometry Algorithms Lab, Orange, and Computational Science and Engineering.

The Google Summer of Code has really grown over the years.  I'm glad to see that these open source initiatives really help teach our younger generation. 

Wednesday, March 16, 2011

OpenOpt Suite release 0.33

New release 0.33 of OpenOpt Suite is out:


  • cplex has been connected
  • New global solver interalg with guarantied precision, competitor to LGO, BARON, MATLAB's intsolver and Direct (also can work in inexact mode), can work with non-Lipschitz and even some discontinuous functions
  • New solver amsg2p for unconstrained medium-scaled NLP and NSP


  • Essential speedup for automatic differentiation when vector-variables are involved, for both dense and sparse cases
  • Solving MINLP became available
  • Add uncertainty analysis
  • Add interval analysis
  • Now you can solve systems of equations with automatic determination is the system linear or nonlinear (subjected to given set of free or fixed variables)
  • FD Funcs min and max can work on lists of oofuns
  • Bugfix for sparse SLE (system of linear equations), that slowed down computations and demanded more memory
  • New oofuns angle, cross
  • Using OpenOpt result(oovars) is available, also, start points with oovars() now can be assigned easier

SpaceFuncs (2D, 3D, N-dimensional geometric package with abilities for parametrized calculations, solving systems of geometric equations and numerical optimization with automatic differentiation):

  • Some bugfixes


  • Adjusted with some changes in FuncDesigner

For more details visit