Monday, May 23, 2011

Sports analytics summer blog reading recommendations

The dog days of summer are almost here and if you are a sports fan it can be long.  Only baseball and soccer endure the summer seasons in the U.S.  Even if you are a die hard baseball or soccer fan the season itself can seem to last forever.  Now is the perfect time to get caught up in the analytics of your favorite spectator sports.  The following is some of my favorite sports analytics blogs and reading material.

FanGraphs is the all everything baseball numbers website.  The best thing that FanGraphs is known for is having a complete database of baseball players metrics.  One of my favorite metrics in baseball is WAR or Wins Above Replacement.  If that is not enough they even have heat maps of strike zone pitching locations.  Tracking your favorite team has never been more analytically exciting.

Advanced NFL Stats is the best NFL analytics blog out there right now.  Similar to FanGraphs there is a complete database of NFL offense and defense metrics.  Advanced NFL Stats also does a good job of explaining the numbers behind the measurements.  Football is no easy task to analyze team and player performance.  This site does an excellent job of both.  Also Advanced NFL Stats is keeping a database of play-by-play data.

The up and comer of the NFL analytics blogs is Drive-By Football.  Drive-By does a great job of explaining some of the harder math around determing team and player efficiency.  One of the most interesting features is the Markov Chain Drive calculator which calculates likelihood of scoring scenarios drive-by-drive hence the name of the website.

Wayne Winston blog
This blog's primary focus is on Basketball, specifically the NBA.  Wayne Winston is definitely known as a prolific Operations Research professor.  You may not know is that Wayne Winston consulted the Dallas Mavericks and other sports teams to help improve their franchises.  Wayne talks about other sports from time to time as well.  If you have not read Wayne Winston's book Mathletics: How Gamblers, Managers, and Sports Enthusiasts Use Mathematics in Baseball, Basketball, and Football you are in for an analytical treat.   Wayne analyzes the why and how of measuring professional sports efficiency and winning.

Tuesday, May 17, 2011

In Memorium of Dr. Paul Jensen

I received discouraging news last week that we lost a colleague in the Operations Research and INFORMS community.  Dr. Paul Jensen passed away peacefully on April 4, 2011.  Dr. Jensen served a number of years at the Univ. of Texas in Austin as a great contributor to the Operations Research community and researcher.  As recently as 2007 Dr. Jensen was awarded the INFORMS Prize for the Teaching of ORMS Practice.

I unfortunately did not know Dr. Jensen personally.  I was first introduced to his ORMM website through my graduate courses at SMU.  The ORMM website is a great resource to help educate the principles of Operations Research methods.  I was also able to use some of his Excel modeling add-ons in practice to demonstrate optimization problems.

Dr. James Cochran is going to hold a special session in memorium of Dr. Jensen.  This message from Dr. Cochran was sent on Dr. Jensen's ORMM mailing list.

Dear friends and colleagues,

Paul was a good friend and colleague.  I know each of us will miss him (as will many other friends throughout the OR community) and each of us is very sorry for the loss suffered by Margaret and the rest of Paul's family.

I will chair a special INFORM-ED sponsored session in Paul's memory at the 2011 INFORMS Conference in Charlotte (November 13-16).  Several of Paul's many friends will speak on his contributions to operations research education and share personal stories and remembrances about Paul.  Margaret and Paul's children will be invited to attend, and I hope each of you will also be able to attend (I'll try to reserve some time at the end of the session during which members of the audience will have an opportunity to share their thoughts).

INFORMS Transactions on Education (the online journal for which I am Editor in Chief) will also publish a special issue devoted to Paul's influence on OR education.  Dave Morton has kindly agreed to edit this special issue, so I am certain it will be a fine tribute to Paul.



Monday, May 16, 2011

Welcome to the Insight Age

We are in the midst of the Insight Age.  In other words the end of the Information Age.  This has been explained on HPCwire quoting HP Labs distinguished technologist Parthasarathy Ranganathan.  No longer are we seeking ways to process information.  We are seeking ways to disseminate and draw conclusions from the information we already hold.

Does this sound familiar to anyone in Operations Research?  It should because this is what Operations Research has been doing for years.  I think I sound like a broken record sometimes.  Yet I guess the story needs to be told again.  But perhaps I'm being a little too snarky.  It could just mean that the Information Age is catching up to the decision science analysts.

The crux of the article is technology meeting the demands of information overload.  Yet that is not what the definition of insight is to me.  Insight is drawing conclusions based on the evidence.  The Operations Research analyst will undoubtedly be well prepared for this evolutionary advancement.  I'm sure HP is aware that technology alone will not help the Insight Age revolution. 

I hope we've all seen this new age coming.  The Insight Age is here and is ready to be tackled.  My next inclination is to think what will define the Insight Age.  The Information Age was defined by the internet, computing power, and globalization.  My prognostication to define the Insight Age is open data and decision science.  Open data is about having no barriers to information.  Data will be freely accessible and easy to disseminate.  Decision science is already here and will make an even bigger impact.  Machine Learning, Artificial Intelligence, Optimization Algorithms will all be the cogs of the Insight Age mechanism. 

Insight Age is such a fitting name.  I'm really liking it the more I think about it.  I'm going to try to remember that in some of my future conversations.

Sunday, May 15, 2011

R Tutorial: Add confidence intervals to dotchart

Recently I was working on a data visualization project.  I wanted to visualize summary statistics by category of the data.  Specifically I wanted to see a simple dispersion of data with confidence intervals for each category of data. 

R is my tool of choice for data visualization.  My audience was a general audience so I didn't want to use boxplots or other density types of visualization methods.  I wanted a simple mean and 95% (~ roughly 2 standard deviations) confidence around the mean.  My method of choice was to use the dotchart function.  Yet that function is limited to showing the data points and not the dispersion of the data.  So I needed to layer in the confidence intervals. 

The great thing about R is that the functions and objects are pretty much layered.  I can create one R object and add to it as I see fit.  This is mainly true with most plotting functions in R.  I knew that I could use the lines function to add lines to an existing plot.  This method worked great for my simplistic plot and adds another tool to my R toolbox.

Here is the example dotchart with confidence intervals R script using the "mtcars" dataset that is provided with any R installation.

### Create data frame with mean and std dev
x <- data.frame(mean=tapply(mtcars$mpg, list(mtcars$cyl), mean), sd=tapply(mtcars$mpg, list(mtcars$cyl), sd) )

###  Add lower and upper levels of confidence intervals
x$LL <- x$mean-2*x$sd
x$UL <- x$mean+2*x$sd

### plot dotchart with confidence intervals

title <- "MPG by Num. of Cylinders with 95% Confidence Intervals"

dotchart(x$mean, col="blue", xlim=c(floor(min(x$LL)/10)*10, ceiling(max(x$UL)/10)*10), main=title )

for (i in 1:nrow(x)){
    lines(x=c(x$LL[i],x$UL[i]), y=c(i,i))

And here is the example of the finished product.

Tuesday, May 3, 2011

Google funding research to measure regret

According to an article in Mashable Google is funding Artificial Intelligence research at Tel Aviv University that will help determine if computers could be taught regret.  My first inclination is to wonder if this is really anything new.  Linear programming itself is all about regret or, in financial terms, opportunity cost.  From the article it describes the research is about how to
measure the distance between a desired outcome and the actual outcome, which can be interpreted as “virtual regret.” 
That sounds a lot like mathematical programming to me.  So what is so different about the Tel Aviv Universtity findings?  Apparently its not something new with the algorithms but more or less new with how the data is processed.  Dr. Yishay Mansour explains that they will be using machine learning methodologies to look at all the relevant variables in advance of making informed decisions.  This sounds more like this research is in the realm of how to understand large amounts of data and processing it into usable information. 

Big data is a huge problem in the data rich but information lacking internet environment that we face today.  There is a lot of data handled by organizations but they need to know what to do with it.  Today's Operations Research professional should be perched to swoop in an help this issue.  Organizations are data rich but lack the focus to apply it to meaningful decision analysis.  I'm hoping that this is only going to lead to a big watershed moment for the Operations Research community.