Maximize Productivity with Industrial Engineer and Operations Research Tools: 2010

Friday, December 31, 2010

Video of Joy of Stats by Hans Rosling

The Joy of Stats narrated by Hans Rosling was just produced by BBC and shown to their audience. Hans Rosling via gapminder.org was kind enough to post the full hour video of the documentary about joys of statistics. The video is posted on YouTube and is available to anyone.

http://www.gapminder.org/videos/the-joy-of-stats/

Hans Rosling's passion for statistics is infectious. He definitely has a joy about him that persuades the viewer to really enjoy finding new and invigorating ways to explore data. Now for me this is not hard to do as I love data and analyzing. Yet for many in the world mathematics, let alone statistics, is considered a universe all unto its own that they dare not search. Hans breaks down that barrier with The Joy of Stats. No matter your educational interests or background I find it very hard to ignore his plea that statistics is not boring, and dare I say it, sexy.

If you are interested in this video as a eulogy to statistics you would also enjoy Dr. Robert Lewis's essay on Mathematics. Both of these works explain how a world without number analysis is merely a world not worth living. There is so much to explore in so little time. I am so happy that I decided to take a career in Engineering and Operations Research to help the world one datum at a time.

Tuesday, December 28, 2010

Wednesday, December 22, 2010

An essay on Mathematics and education

I just read an essay on Mathematics by Dr. Robert Lewis named "What Math?" This has to be the single best essay I have ever read on Mathematics and why it is so important. I can imagine Dr. Lewis has been peppered with cynic criticism about understanding math just as much as any of us in our analytical profession. Dr. Lewis does a superb job of explaining the essence of why those of us that use math love it so much.

Educating Math to our newer generations is definitely a concern. I really like how Dr. Lewis explains that education is not just about transfer of information but the understanding of underlying principles of specific knowledge. The parables are a very clever device to relay those principles of math.

I also love how he portrays Math as not just a device for the technologically minded but also for the liberal arts. Dr. Lewis conveys that Math is not merely knowing numbers but the processes of finding solutions. My own example is when people often ask me how I am so good at math. I usually tell them its just like learning a language. Once you understand the language and are fluent then you can start applying it in everyday life. Math is a language to learn just as much a foreign language. It may take some time to learn but it will take a lifetime to master.

I highly recommend reading this essay. I also recommend saving this essay for our future generations, teachers, educators, family members, and friends. This essay can be used to help bridge understanding that may be missing from our own words.

Thursday, December 16, 2010

New OpenOpt/FuncDesigner quarterly release

New OpenOpt and FuncDesigner quarterly release is out: 0.32.

OpenOpt:
* New class: LCP (and related solver)
* New QP solver: qlcp
* New NLP solver: sqlcp
* New large-scale NSP solver gsubg. Currently it still requires lots of improvements (especially for constraints - their handling is very premature yet and often fails), but since the solver sometimes already works better than ipopt, algencan and other competitors it was tried with, I decided to include the one into the release.
* Now SOCP can handle Ax <= b constraints (and bugfix for handling lb <= x <= ub has been committed) * Some other fixes and improvements

FuncDesigner:
* Add new functions removeAttachedConstraints, min and max
* Systems of nonlinear equations: possibility to assign personal tolerance for an equation
* Some fixes and improvements (especially for automatic differentiation)

See also: Full Changelog, Some Applications, Future Plans

Where to find good data sets

O'Reilly Media has been a big advocate of Open Data and believes that is where a lot of computing is going to be headed in the future. I think they are definitely on to something. Yet the future could be now. There is a lot of opportunities to find good data sources immediately. One of my favorite blogs, OReilly Radar, has an article by Edd Dumbill on Where To Find Data. There is plenty of good data available on the internet for download to explore and mine new information. These places not only offer great sources of data but many of them offer an API to allow quick and seamless access. Below is a link summary from the article.

Freebase

An all-things graph database. The website focuses on trends of certain cultural and interest topics.

Amazon Public Data Sets

Amazon is probably considered the cloud computing mecca next to Google. Amazon Web Services offers a lot. One of which is storage of public data sets. They offer a huge variety of public data.

Windows Azure Data Marketplace

Surprisingly Microsoft has an open data protocol data source. This data market offers quite a few points of interest data sets.

Yahoo Query Language

YQL is an interesting API that is very similar to SQL. YQL is essentially a language that allows to grab data from cloud services. This could be very handy to grabbing data quickly and dynamically. YQL offers to connect to a lot of data sources as well.

Infochimps

Infochimps is a data marketplace warehouse. They offer to host, sell, and distribute data sets. Some of their data comes at a cost but a lot of their data is free as well. This is an interesting startup and will be very interesting to follow their growth. Also there is a new Infochimps R package that uses their API to gather data and process Infochimps data.

DBpedia

DBpedia is a wikipedia for data sets. In fact the data itself comes from Wikipedia.

Some other sources not from the article include the World Bank open data and the U.S. Census data.

Sunday, December 12, 2010

Shortest past algorithm solved by ants?

University of Syndey researchers are working on the next greatest optimization algorithms. You would think they would be hunkered down in the math or computer science departments working with large multi-core processors. Yet Chris Reid and Madeleine Beekman, working with David Sumpter of Uppsala Univ., are studying how ants solve the shortest path problem. By studying how ants solve complex and dynamic problems such as getting food back to their colony they could unravel some new and innovative ways to solve routing problems. The researchers published their results in Journal of Experimental Biology.

There has already been some algorithms developed out of studying the ants. One method is the Ant Colony Optimisation (ACO) algorithms. Ants solve the complex problem of shortest path by communicating to other ants in the colony by pheromone trails. Each ant leaves a pheromone trail as a signal back to a following ant. The trail has a certain "optimal path" signal telling other ants the best way to get to the intended destination.

It would be really interesting to find out that the best shortest path algorithm might have been literally under our noses the entire time. This will be an interesting study to follow for the Operations Research community.

Wednesday, December 8, 2010

2 years of blogging with IEOR Tools

I forgot I was going to make a mention but on Nov. 21 was officially two years of blogging about Industrial Engineering and Operations Research Tools. I have really enjoyed writing about this space and reading all of the contributions. I have no intentions of quitting and hope to make many more contributions.

An update to the blog is that I'm starting to contribute Amazon content to the site. Amazon has been a valuable resource for linking books on content matter. I've thought about adding a website that will be a "store" or compilation of some of the better resources with Amazon being a partner. I thought I would bring this up with the readers first to see if this would be a valuable addition to this blog. It would be a clearinghouse or aggregator for all the best tools and resources in Operations Research, Industrial Engineering, Analytics and Data Mining. I'm not sure there is anything on the internet besides doing searches in Google or Amazon. I hope the site would be nice layout to help easily find resources.

Since it is the holiday season I would like to send my warmest regards to all those reading. I thank you so much for your readership. I wish you and your family a safe and happy holidays.

Tuesday, December 7, 2010

Big Data Logistic Regression with R and ODBC

Recently I've been doing a lot of work with predictive models using logistic regression. Logistic regression is great for determing probable outcomes of a independent binary target variable. R is a great tool for accomplishing this task. Often times I will use the base function glm to develop a model. Yet there are times, due to the hardware or software memory restrictions, that the usual glm function is not enough get the job done.

A great alternative to performing usual logistic regression analyses on big data is using the biglm package. Biglm performs the same regression optimization but processes the data in "chunks" at a time. This allows R to only perform calculations on smaller data sets without the need for large memory allocations to the computer. Biglm also has an interesting option that it not only can perform calculations on imported dataframes and text files but also database connectivity. This is where the helpful package RODBC comes in to the aid.

I have be looking all over the R support lists and blogs in hopes of finding a good tutorial using biglm and RODBC. I was not successful yet I was able to find out how to perform this myself.

INFORMS Data Mining Competition leaders used Open Source software

The results of 2010 INFORMS data mining competition just recently finished. The leaders were presented at the 2010 Annual INFORMS Conference. The 2010 INFORMS data mining competition goal was to determine short term movements in stock prices. You may recall that IEOR Tools competed in this competition with not too glamorous results at the end. There was a lot to learn from this competition. Firstly that it seems trading price movement can be correlated to lags in prices very well. Most of the top leaderboard finishers used future information to determine an appropriate lag in the price movement.

The next most interesting thing is that all top 3 finishers used free and open source software as tools for the competition. Two of the leaders used R and the second place finisher used Python, namely SciPy. This should not be surprising to most people in the analytics community. Open source software has been making inroads for quite a while. The R-Project has been getting a lot of interesting press lately especially in enterprise business circles. Python is an objective oriented programming language that is getting more popular. Python's popularity seems to be to its ease of use and how quickly it can be learned and implemented.

The presenters of the 2010 INFORMS data mining competition were kind to post the methods of the Top 3 competitors. Each method is an interesting read on how they were able to use the open source tools to get predictive results of stock price movements.

If you are interesting in learning more about R as a tool I recommend a new book by Luis Torgo "Data Mining with R: Learning with Case Studies".

This book is one of the first in its kind of showing R methodologies with real life applications. I am intending to get the book and hopefully have a review on it in the near future. I am already hearing good things about it.

Wednesday, October 27, 2010

R references for handling Big data

The Dallas R User Group had a meeting over the weekend. One of the discussions is the memory limitations with R. This is a common subject among the R community and R User Groups. There has been a lot of strides recently in allowing R to stretch its memory limitations. I thought I would compile and share some of the best resources I have found to remedy the big data issue.

CRAN Packages
ff
This package allocates hard disk space to big data vectors.

bigmemory
This package allocates points to unused memory or points to a swap file.

Blog Articles
Taking R to the Limit: Parallelism and Big Data

Hitting the Big DataCeiling Limit in R
While this is not a helpful article for big data it does show some of the issues R current faces. Namely the issue of that lack of a "int64" or Long Long data type memory allocation.

Enterprise Software
Revolution R Enterprise
Revolution Analytics is creating enterprise software around R to tackle issues of big data, parallelism and threaded computing in order to speed up large data processing and analytics.

Wednesday, October 20, 2010

Friday, October 8, 2010

Data mining competition with R

There is a new data mining competition aimed at predicting preferred data mining tools in R via dataists.com. The concept of the competition is to try to determine which R packages are preferred in the R community via their CRAN package libraries. The developers of this new competition are also in the R community with the NY R Users Group.

I am a user of R and I am also a member of the Dallas R Users Group. As you can imagine I find this competition very interesting because I could benefit greatly from knowing the preferred methods of implementation in the R community. This can also be a very interesting exercise to determine preferred modeling methods. I believe this competition will give an insight into the most common methods to apply statistical computing in the community today.

R is getting a lot of press lately. Revolution Analytics just released the first part of a series of articles on the impacts of R and why it is hot. R as a statistical and optimization tool is really making a play in the business community.

I am a big fan of these data mining competitions as I've written previously. It is really interesting what you can learn about the world from the competitions especially if you do not have previous knowledge of the subject matter. I will try to bring up these competitions from time to time. I would be interested to know if any IEOR Tools readers have participated in these competitions and what was their experience.

Tuesday, September 21, 2010

IBM's furious Analytics aquisitions

Is anyone keeping up with IBM and their propensity to obtain Analytics based companies? Let's see if we can do a recap of IBM in the news in the last couple of years.

Acquires COGNOS for $5.0 billion in November 2007
Acquires ILOG for $0.34 billion in July 2008
Acquires SPSS for $1.2 billion in the same month July 2008
Acquires Netezza for $1.7 billion in September 2010

In actuality there is an estimated $12 billion dollars of 23 Analytics based companies that IBM has acquired in the last few years. That is quite a leap for a hardware/software/IT company. IEOR Tools has talked about IBM's emergence as an analytics company before with new analytic centers and acquisitions. I think its safe to assume that IBM is the de-facto analytics champion in the world right now.

So what does this mean for Operations Research and their professionals? I believe it means the sky is the limit now. This is a grand opportunity for the Operations Research community. In fact I would even say that if INFORMS does not take advantage of the recent demand for analytics and decision sciences then they are missing this big picture. Jobs should be plentiful in the foreseeable future. There should be plenty of work to keep management happy and help drive value into organizations. This may even be the dawn of a new day for Operations Research and Analytics. There is so much buzz now it will leave a ringing in your ears. Sure I might be a little optimistic but I think any news is good news right now in this economy.

I also believe that IBM is not done. I think IBM is going to evolve even more in the Analytics realm. Perhaps getting more involved in the software within Operations Research and statistics. Its just a guess but who knows if SAS, Matlab, or even contributing to open source projects like R, RapidMiner or Weka. This is an exciting time none of the less for Operations Research.

Thursday, September 16, 2010

Current Data Mining and Analytics Challenges

I love the Data Mining and Analytics Challenges. There tends to be so much collaboration and open knowledge especially if the challenge has an affiliated forum. There really is so much to learn and the challenges offer a great way to bring all of the resources and knowledge together. Here is a list of the current challenges underway in the Data Mining community.

Kaggle is hosting a three competions. Tourism Forecasting part one challenges to predict 581 tourism-related time series. Chess Ratings - Elo vs the rest of the World is trying to determine a chess rating system that is better than the current Elo rating system. INFORMS Data Mining Contest challenges to predict intra-day stock price movements based on experts predictions, sector data, and other indicators.
TunedIT is another competition hosting organization. Currently TunedIT is hosting the e-LICO mutli-omics prediction challenge with background knowledge on Obstructive Nephropathy. Yes, I had to look it up too.
UC San Diego is hosting the 2010 UC San Diego Data Mining Contest. This is a two task contest which tries to predict e-tailer's data on consumer and non-consumer information. The two tasks are a binary preditor and a boolean-transformed predictor.

Wednesday, September 15, 2010

OpenOpt release 0.31

OpenOpt 0.31:

Lots of new NLP, NSP (nonsmooth) and GLP (global) solvers from nlopt have been connected
New LP solver: pclp (very premature, but permissive license (BSD) and pure Python implementation)
Some bugfixes (mostly wrt using sparse matrices) and code cleanup

FuncDesigner 0.21:

New features: Integration, Translator
Some speedup for functions evaluation, automatic differentiation, optimization problems with some fixed variables
New parameter useSparse for optimization probs and automatic differentiation (sometimes autoselect works prematurely)
Some bugfixes (mostly wrt using sparse matrices) and code cleanup

DerApproximator 0.21:

Add parameter exactShape (prevents flattering operations on result)

Welcome to our homepage: http://openopt.org

Wednesday, September 8, 2010

Computer languages and Applied Math

There is no question that computer languages have helped pushed the envelope for applied mathematics. It is hard to imagine where we would be without airline scheduling, supply chain management, or inventory control if it were not for all of the great advances in optimization and statistical computing. I have thought a lot about the convergence of computing and Operations Research. In fact I brought up a discussion on the topic on OR-Exchange with the question "Is programming skills a requirement for today's OR practitioner?" You would think with all of the advances in computing that programming would be simpler but that is not the case.

There is an interesting debate in the R-project community about the shortcomings of the R language. Xi'an Og posted a discussion on R shortcomings re-posted from another blog. The consensus of the R community seems to be that R is an inferior language but has a brilliant library of resources. So where does that leave the practioner? Does the practioner need to update their coding skills and develop something better in another computer language? I find it really interesting that some of the first solutions to this debate is to scrap everything and start over.

I don't think this debate is ever going to change. The computer is always going to be a valuable tool for the Operations Research practitioner. The tools we use to complete our daily tasks need to ubiquitous but also readily available. Let's just say that the slide rule is not going to be making any sort of comeback.

I believe that the Open Source model has a real advantage here over the proprietary counterparts in this debate. The community has a lot of input into Open Source software. It is often called a meritocracy. The best solutions continue while those that do not go away in obscurity in the Open Source model. This is one of the reasons why I advocate Open Source software. In the end I think R is going to be fine. There will be advances, possible even forks of the software, but there will always be progress. The only limitations seem to be of what we could dream.

Friday, August 20, 2010

What did the new PvsNP proof prove?

I normally don't like to blog about mathematical theory. I usually leave that for the smart people and theorists. Yet there is an interesting article out this week from the Science section of the New York Times about the new PvsNP proof from Vinay Deolalikar. The article is not your typical topical subject matter about what the mathematicians are working on next. The article instead is about the explosion of activity and dialogue on the internet and around the world about this proof in the mathematical community. The author is suggesting that the likes of which have not been seen before with these types of theoretical discussions. I would have to agree with the author. I also find it very ironic that the Old Grey Lady is reporting on this as well. Since really the only thing that can really be proven, as this article suggests, is that the old media is nearly dead and the new media has supplanted it.

This brings another interesting thought to how problems are solved now. The mathematical community is closer now than it ever has been. This is the age of online crowd sourcing. If I have an question about Operations Research I go to OR-Exchange. If I am looking for a professional network contact I go to LinkedIn or INFORMS. If I need to read about the interests of the Operations Research communities I will go to their blogs. The convergence of ideas, thoughts, and knowledge is closer now and is only going to get even closer.

This article is one of the reasons why I am such an advocate of open source software with organizations like GNU and COIN-OR. Open source brings the best of bringing thoughts and ideas together to create a quality product. Sure there are licensing issues. This article is a good metaphor in that software licensing is like the "old media". Licensing is trying to catch up with the new technology but there are still a lot of kinks to work out. There are even suggestions now that software patents should even eliminated. I'm not sure what will happen but I do know that open source software is driving a lot of innovation in a much shorter time frame.

So yes I find it ironic that the New York Times is reporting on this proof as if it is new news. Maybe I'm just too close to the subject so I understand it a little better than the rest of the New York Times readers. Yet if you are anywhere near the mathematical world you would have already seen the proof and had your own conjectures. Even if that is the case we can prove now that information and knowledge is faster and easier to obtain than ever.

Monday, August 16, 2010

IEOR Tools Tutorial: Learning XML with R

I have been using a lot of R lately in my work. R (main site) is an open source statistical computing platform. Saying R is only used for statistics does not do it justice. I am finding it to be a really powerful statistical and optimization computing platform. There seems to be no task that can not be accomplished. Lately I've been curious about measuring performance with my blog and how it compares to other blogs. So I thought I would use this opportunity to show how I performed this in R. I want to rank Operations Research blogs using the Alexa ranking system. Unfortunately Alexa does not have a search function for Operations Research blogs so I am going to have to get the information myself using R.

This R tutorial is going to be using the package XML. Packages are used in R to perform specific computational needs that the base R platform can not accomplish on its own. There are several different packages that can be loaded into R to perform a wide variety of problem instances.

Favorite Operations Research books from OR-Exchange

A while ago I posted a question on OR-Exchange about some of the favorite Operations Research books that OR-Exchange members like to reference. I was rather pleased with the response. Of course there are great books on the subject of Operations Research. The best part of OR-Exchange allows for the community to vote up the favorites. A lot of these books are just plain good to have in your desk drawer or in your work study. I have to admit that I have not read all of these books. So this gives me a good excuse to go get them and perhaps offer up some reviews in the future.

So in order of OR-Exchange votes here are the favorite Operations Research books.

1. Applied Mathematical Programming by Bradley, Hax, Magnanti.

Applied Mathematical Programming

Also available at http://web.mit.edu/15.053/www/ but if you like it you might want to give it a purchase.

2. Network Flows: Theory, Algorithms, and Applications by Ahuja, Magnanti, Orlin

3. Linear Programming by Chvatal

4. Model Building in Mathematical Programming by Williams

5. Introduction to Operations Research by Hillier, Lieberman

6. 50 Years of Integer Programming by Juenger, Liebling, Naddef, Nemhauser, Pulleyblank, Reinelt, Rinaldi, Wolsey

7. The Traveling Salesman Problem: A Computational Study by Applegate, Bixby, Chvatal, Cook

8. Tabu Search by Glover, Laguna

9. Prisoner's Dilemma by Poundstone

10. Serious Play by Schrage

11. The Fifth Discipline by Senge

12. The Predictioneer's Game by Mesquita

13. Optimization Algorithms for Networks and Graphs by Evans, Minieka

Thursday, August 5, 2010

Kaggle introduces new Chess rating competition

Kaggle, home of the statistics and predictive modeling competitions, is introducing its latest contest Elo Versus The Rest of The World. The competition is being organized by Jeff Sonas who is a chess-metrics afficionado himself. Jeff describes his history with rating chess players and why he wanted to start such a competition with Kaggle.

This looks to be a really interesting modeling competition with already more than 40 submissions in the leaderboard. The interesting note about this competition is that the Elo rating system itself is going to be making an appearance on the leaderboard. This means that if no one beats the Elo system than there is no declared winner. Although it looks like someone has beaten Elo at its game already. Elo will be on the leaderboard as a benchmark to make sure that the competition is proving its worth.

I hope to get a chance to make an appearance on the leaderboard. I am involved in Kaggle's INFORMS 2010 Data Mining contest. I'm barely hanging on to the top 10 in that competition. There are some pretty good models to compete against in that group.

Tuesday, August 3, 2010

New Look for IEOR Tools

I think I'm following the trend of a lot of OR bloggers and updating the look of the site. I'm not sure if it is a more modern Web 2.0 look but at least its a little more refined. I finally got around to fixing the font on the title. Anyway let me know if you like it.

R IDE for Linux and Gnome

I have been using R in my work recently. I have also been using R at home to do some tinkering. In my work environment I use Windows (none too pleased). I find using the regular R console with Textpad makes for a good Windows development environment. I haven't been able to replicate this at my home. At my home I have Ubuntu as my operating system. I have been searching for a comparable R environment for my home.

That is until now. The statistics blog at Stattler.com did some research on this very topic of R and Ubuntu. They found a plugin for R with the text editor Gedit that works wonders. The plugin is called Rgedit and is very easy to install. Stattler offers a simple instruction for installation. Also Stattler has a great review of the Rgedit plugin. Rgedit is very similar in layout to usual gedit text editor except it splits the panes of the screen for code and R output.

Some of the highlights of the Rgedit plugin include:

Split screen of panes and can be turned on and off
Syntax highlighting specific to the R code
Single line or batch processing of R scripts
Multiple R workspaces can be run
Shortcut keys can be created and customized

This plugin suits my needs just fine for my Ubuntu uses with R. There are many other IDEs for R that you may find suit your needs better. The the beauty of open source software is that there never seems to be a shortage of options.

Saturday, July 31, 2010

Eight Data Mining Social Networking Groups

Visit AnalyticBridge

Networking is an essential part of career management for any professional. The relationships we develop can have great impact on our career direction and growth. I tell young professionals all the time that their best asset in career growth is their professional network. I tell them to start early and maintain the network continually. The advent of LinkedIn and other internet social networking sites has made that task easier.

Vincent Granville at AnalyticBridge.com has compiled a list of 8 data mining social networking groups with more than 2000 members. These groups are easy to become as member as all of them are associated with LinkedIn. If anyone is finding anyone in the data mining community more than likely they will be a member in these groups.

I have found a couple of different jobs through LinkedIn in the past. I have found hiring managers as well as peers that I would be potentially working with closely. I found that to be a great benefit before the interview process. In fact I even would contact some of the peers in the group to get a pre-interview idea of where I was working and the idiosyncrasies of the organization. The important thing is using your professional network to maximize your career productivity.

Tuesday, July 27, 2010

Audio of Richard Stallman keynote at useR2010

useR 2010 Conference

The topic of open source and free software licensing can be a very confusing topic. In the Operations Research world it is no different. There are a lot of players in the mathematical programming software world that are vying for attention that include both proprietary and free software. Insight into the world of free software really requires immersion into using the free software products and finding how it can apply to daily application. Another good way to understand free software is to get it from the founders of the movement. Richard Stallman is considered the father of the free software movement and you can find a lot of good material online based on his work. The useR2010 conference, the annual conference for R project for statistical computing, just completed this past week and the final keynote was given by Richard Stallman.

The R-statistics blog was kind enough to post an audio of the keynote address by Richard Stallman at useR2010. Richard is not your typical stereotype of a computer geek. He may look the part but Richard does not pull any punches in his presentation of the free software movement and it's ideology. Richard's talk discusses the history of the free software movement, the Gnu Public License, and his history of dealing with free software.

Why was Richard giving the keynote address when he doesn't have a statistics background? Well the R statistical computing software platform is licensed under the GPL, Gnu Public License. R is free to use, distrubute, modify and improve as long as it's code is given credit to it's creator. This is much of what the GPL represents. Listen to the audio by Richard to really understand his passion for free software and what it means to him and the software world.

Wednesday, July 14, 2010

Podcast with Revloutions Analytics CEO Norman Nie

Through the Revolutions blog there is a really interesting podcast about R by Internet Evolution Radio interviewing Norman Nie. Norman Nie is the CEO of Revolutions Analytics which I posted about in the past on how Revelotion Analytics is going to take R commercial.

In this podcast Norman is asked a lot of interesting questions about R and the statistical modeling enterprise in general. They discuss his past with SPSS. They also discuss the advantages of using an Open Source software versus a proprietary platform. The interview gets really interesting when they discuss how statistical data is important to enterprise business and how a lot of organizations get it wrong.

If you are new to R and want to know more about its capabilities this is a great podcast.

Tuesday, July 13, 2010

OpenGamma startup claims Open Source Financial Analytics

According to h-online.com a new startup firm, OpenGamma, is preparing to launch its Open Source Financial Analytics solutions. OpenGamma is a London based firm that will specialize in risk management and financial markets by providing software architecture. From the article, this is a quote from the CEO Kyle Wylie

"Our goal in building OpenGamma isn't just to build an open source technology" said Wylie, "Our goal is to build the best platform for financial analytics and risk management possible". The platform will be made available under a "commercial friendly open source licence"

OpenGamma is going to provide several solutions based on its Open Architecture software platform. According to their website they will be providing batch risk systems, commercial trading, bespoke trading, and event-driven alert systems. The companies main moniker and selling point is that all of their software code with be Open. This means that companies will have the flexibility to not only debug but potential contribute back to the project, in theory.

This sounds like a brave yet prudent business venture. I think OpenGamma could be wildly successful with their Open Architecture platform. They can be especially successful if they allow the financial analytics community to contribute back to their software platforms. I believe we will be seeing more companies and startups like this in the future in Analytics. Perhaps there is an Open Source Operations Research platform on the horizon.

Wednesday, July 7, 2010

Open source solver for Excel

Thanks to a post by Michael Trick we find that the open source solver community has an new platform with Excel. Although its not a replacement for Solver the OpenSolver does offer a lot of benefits that the existing Excel optimization platform does not perform. OpenSolver is an extender of the existing Solver. You will still need to use Solver to develop the model. Yet OpenSolver can take over from there. Some of the benefits include

COIN-OR CBC optimization engine to perform the calculations
Compatible to existing Solver models
No artificial limits to the size of the problem (huge win here!)

Some of the disadvantages are that it does not solve non-linear models. Also it does not run as a stand-alone plug-in as the current Solver. I believe that OpenSolver is only developed for the Excel 2007 platform. OpenSolver was developed by Andrew Mason and is licensed under GPL. OpenSolver is free to distribute and download.

I'm hoping to give this a try soon. I'm really encouraged by OpenSolver because I always thought the current Solver was very limited. Looking forward to great things from OpenSolver. For other ideas about Open Source solvers with spreadsheets be sure to look at Open Office Calc.

Tuesday, June 29, 2010

Kaggle hosting INFORMS 2010 Data Mining Contest

Kaggle is hosting the 2010 INFORMS Data Mining Contest. The goal of this years INFORMS Data Mining Contest is to predict intra-day stock price movements. All data and submission guidelines are provided on the Kaggle website. Entries that are submitted are immediately scored and evaluated by an AUC calculation. The leading AUC score by the end of the contest is going to be honored as the annual INFORMS meeting which is in Austin, Texas (Nov. 7-10).

There is already a lot of good discussions of modeling techniques. Mark started off with a question on OR-Exchange about modeling methods for the INFORMS contest. Since the data is a binary categorical target his preferred method was using Logistic Regression. Mark provides example R code to provide collaborative input to the contest. I followed suit and provided an IEORTools entry to the contest. I used the same methods of Logistic Regression. I also did some variable analysis using the rpart package in R to develop a decision tree. After pulling some variables that were not significant I was able to get on the leaderboard with Mark. The pictured leaderboard is of June 28.

There is also some good discussion on the Kaggle website contest forum. Posted on the forum one entrant suggested possible variables to use in a Logistic Regression model which is very beneficial.

I really like to see this collaborate effort to modeling. This was one of the qualities I really enjoyed in the Netflix Prize. I hope Kaggle and INFORMS continues to provide these fun and thought provoking contests.

Friday, June 25, 2010

U.S. SEC endorses Python to fix financial problems

News from PCWorld is mentioning that ActivePython, the software distrbution from ActiveState, is going to include numerical, scientific, and optimization software with its current software bundle. The numerical and optimization software it is going to include is Python based NumPy, SciPy, and matplotlib. All of the new software is open source and available for free download.

Apparently this is in anticipation to the new U.S. financial rules from the U.S. Securities and Exchange Commission. On April 7, 2010 the U.S. S.E.C. proposed new rules for Asset-Backed Securities that will enable the markets to run efficiently and fairly. On the first page of the released documents from the S.E.C. they mention the use of Python. That is a nice shocker to us open source advocates.

Python is a great computing language. It is really easy to learn compared to the other languages such as C. Perhaps the U.S. S.E.C. thought it would be the best choice because of its ease of use and abundance of software packages. This is really interesting news and hopefully we will be hearing more about it in the near future.

Thursday, June 24, 2010

R package for World Bank Data

A little while ago I posted about how the World Bank data is open to the public for research. This apparently is just the beginning of what is possible with having free access to a lot of really good data sets on socio-economic information. R-chart blog just recently posted saying that an R package was developed as an API to access the World Bank data.

This opens up a lot more data mining opportunities and could just be the start of some great analytic research. I'm really looking forward to seeing what some of the great R minds will find with the World Bank data at their fingertips. Since R is freely available anyway this merger makes sense on all sorts of levels. Happy data mining!

Software for Data Analysis: Programming with R (Statistics and Computing)

Software for Data Analysis: Programming with R (Statistics and Computing)

Wednesday, June 16, 2010

Analytics and FIFA World Cup

What would the FIFA World Cup be without the prognosticators? You can be assured that the Analytics community is not far away from the scene. There are plenty of places to find on the web predictions and analytics of the 2010 FIFA World Cup. Here are some of places on the web where you can find all of your World Cup analytics interests.

Wayne Winston is posting some predictions and rankings on his blog mathletics. If you are a fan of sports and analytics (i.e. Moneyball) than you would love Wayne Winston's blog. Wayne does predictions for professional and collegiate sports in basketball, football, baseball, and soccer.

Blog posts on AnalyticBridge tells that big financial institutions are using quantitative financial instruments used in credit swaps and and debt obligations for predicting World Cup outcomes. It is a corporate financial challenge that is trying to predict with country will go the farthest in the World Cup. Let's hope its not the same models that were used to predict mortgage backed securities from a few years back.

Spotfire's blog has an entry about providing World Cup data all the way back to 1930. TIBCO, Spotfire's parent, is providing analytic data from the World Cup including all sorts of statistics. Analysts can get scores, goals, penalties, attendance, and other data points. The online app that TIBCO provides also has nifty charts to compare different countries performance.

Tuesday, June 15, 2010

OpenOpt release 0.29

New OpenOpt Suite release is out. This is free (license: BSD) and cross-platform (Linux, Windows, Mac etc) Python language modules for numerical optimization, automatic differentiation, solving systems of linear/nonlinear/ordinary differential equations etc. It is published quarterly since 2007, already has some essential applications and expected to become even more popular with Python release 3.3, where dynamic compilation will be implemented.

OpenOpt 0.29:
* Some minor bugfixes
* Some improvements for handling sparse matrices
ralg:
* Bugfix for problems with nonlinear equality constraints
* Major changes for problems with nConstraints>1

FuncDesigner 0.19:
* Some improvements for automatic differentiation
* New feature: attached constraints
* New feature: oosystem
* Now you can model & solve ODE systems

DerApproximator 0.19:
* Function get_d2
* Add new stencil

You can try it online via our Sage-server.
See also: Full Changelog, Future Plans

Tuesday, June 1, 2010

OR-Exchange confession

I have a blogger confession to make about OR-Exchange. I am addicted. I think its the first Operations Research related social network that has me really hooked. I confess that I check it daily. Yes you can see that I've earned a silver badge for my continual obedience. Is this OR's Farmville? Well at least it is for me.

The premise for OR-Exchange is really simple. Think of an Operations Research related question that bugs you, puzzles you, or simply just want to get peer feedback. Shortly after, and I mean shortly, you will be barraged by answers from like minded individuals. The perfect storm that the Web 2.0 wants to fuel.

In my mind the beauty of OR-Exchange is that it is not any normal social network. This is a social network of peers that understand my issues, problems and concerns. Maybe its just my Generation X upbringing that requires instant gratification. Yet I don't need a whole lot of stimulating from other social networks. In fact I'm pretty much done with most of the others. The online media I keep going back to are the ones associated with my interests and for me that is OR-Exchange right now.

I love the feedback from the folks at OR-Exchange. Good, bad or indifferent it brings perspectives that I often don't get in my circles. In my present work I don't often get to chat up Operations Research with my co-workers. I'm one of two employees that has any knowledge of what is Operations Research. I guess that's where it benefits me. I'm hoping that it benefits others like us that either has to wait a year to go to an INFORMS conference. I am active in my local INFORMS Chapter but most of it is topical speeches and programs. OR-Exchange is more of an outlet which has filled a void for me.

I hope the Operations Research community can take on to OR-Exchange. I believe there can be only more good as more users come online. Please help preach its worth if you are using it. Perhaps there are many more unanswered questions in the Operations Research community.

Thursday, May 13, 2010

My 5 favorite Operations Research blogs

I do a lot of blogging in my spare time. I especially like to read up on blogs that really interest me. My passion is what I do for a living which is, of course, Operations Research and Industrial Engineering. I am often amazed of the great writing and resources available in the online Operations Research community. So I thought I would share on this blog my 5 favorite blogs that I am usually checking every day. These blogs are not listed in any particular order.

1. Michael Trick's Operations Research blog

You can pretty much argue that Michael is the "Father of Operations Research blogs". Michael does a great job of mixing academia and real world applications of Operations Research. Often times the comment section is worth the read with great contributors to his blog.

2. Thoughts on business, engineering, and higher education by Aurelie Thiele

This blog by Auriele is probably some of the best writing in the Operations Research blogs. I particularly love the issues that Auriele presents on a weekly basis. In fact I'm outright jealous of Auriele's insight. This blog is just a flat out good read.

3. Punk Rock Operations Research

Punk Rock O.R.'s writer Laura McLay is another good Operations Research blogger that mixes academia and real life OR interests. I enjoy Laura's commentary on a lot of issues that you might not expect in mainstream OR applications. I especially like Laura's interests in sports.

4. Sebastian Pokutta's Blog

This blog may not be one of the most popular blogs but I really like Sebastian's Operations Research blog. Maybe the fact that I really relate to Sebastian's ideas and his endorsement of open source software in Operations Research. Sebastian finds really good nuggets in the OR world that you don't often see on other blogs.

5. ThinkOR

This blog is perhaps one of the best in writing real life Operations Research examples. I really enjoy the thoughtfulness of this blog and writing of this blog. I enjoy reading about ThinkOR's writing style of sifting through real world problems and dictating possible solutions.

Monday, May 10, 2010

Algorithms and Wall Street

The crazy events of Wall Street last week sent off a huge wave of confusion as to the events that led to the sudden drop in stock prices. At first it was thought to be a "fat finger" that cause the decline of major stock indexes. Now the focus is on the large trading farms of computers that are said to make trades by specific rules and algorithms. Now there is a question as to what are the underlying algorithms that these computers are trading. What was thought to be a no brainer of setting trades at the speed of electrons to make a more efficient market is now all being thrown into question.

I do not claim to understand the rules or algorithms that are programmed into these trading computers. Wall Street trading is not my area of expertise. Although I am curious at this overall crisis and how it could be the result of supposed computer rules. The U.S. government is interested also as they are investigating what caused the sudden drop. Can algorithms imposed to trade on a whim cause that much market capitalization to drop out so suddenly. There are claims that market values dropped by nearly 100% on long established companies like Accenture.

I'm definitely going to be following this story closely. I'm curious what the SEC is going to find in their investigation. I'm going to reserve my opinions until more facts are brought forth. Perhaps we may never really know what caused this crisis. I would hope that it is something the Operations Research community could learn. We know that algorithms can be developed to provide great benefits to people and organizations. Yet we hardly ever hear of the times when they can cause great trouble. We can learn from those bad implementations of algorithms. Usually at the heart of it is not so much a bad algorithm but the underlying assumptions of the model. We should know this all to well with the recent mortgage crisis. Perhaps this road to recovery out of this current recession is going to take a lot more time than we thought.

Thursday, May 6, 2010

R has a revolutionary commercial launch

R is going commercial and mainstream thanks to Revolution Analytics. Revolution Analytics, formerly REvolution Computing, is going to take R to the next level in predictive analytics and data mining for enterprise business. Many in the OR blog sphere is reporting on this move as it can mean big changes to the statistical enterprise software market.

Revolution Analytics is going to bridge the academic and business divide by providing solutions that were considered limitations to R in the past. They will be focusing on software enhancements that will be able to handle larger datasets. There is going to be better use of multi-core processing power. There is also going to be improvements to user-interfaces for business analysts.

R is a free and open source software environment for statistical computing and data visualization. I think it is too early to tell what this announcement is going to mean for the statistical enterprise software market. Revolution Analytics has already said that they will be mixing proprietary methods with R. This will be interesting to see how the R community embraces that relationship. R is licensed under the Gnu Public License which is supported by the Free Software Foundation. That is a crowd that does not take too kindly to proprietary software and patents. It will be interesting to follow Revolution Analytics and how they are able to implement their roadmap.

Tuesday, May 4, 2010

Railways improved by mobilizing Operations Research

A great article by PhysOrg.com on the values of Operations Research implemented in the Dutch railways. Improvements were able to be realized in train arrivals, passenger utilization in the cars, and operating profit. Many countries were impacted by the improved railway service all across Europe including Netherlands, Germany, and Switzerland among others.

The team that implemented the Operations Research strategies for the railway improvement project is lead by Christos Zaroliagis, a professor of Computer Science and Informatics at University of Patras. Christos was part of the team that earned the 2008 Edelman Prize from INFORMS for "The New Dutch Timetable: The O.R. Revolution." The team of the ARRIVAL project is a consortium of several researchers from many European countries.

This is a great example of Operations Research in practice and how OR continues to improve the operations and lives of organizations. I really like sharing stories like this because I don't feel they often get their due respect. There is a lot of research and planning in the background of a good research project, let alone Operations Research, that does not get noticed.

Wednesday, April 28, 2010

Operations Research courses via Open Courseware Consortium

Open Courseware is making it possible for anyone with an internet connection to find subject knowledge from many different academic institutions. The best part for knowledge seekers is that it is free and open to the public. Perhaps you would like to know about Quantum Physics or English Literature. There is a very good chance that lectures, notes, exams and class references will be available in a particular subject.

Operations Research is no different to Open Courseware. In fact there is an increasing amount of Operations Research, Management Science, Supply Chain, and Applied Mathematics available on the internet. The Open Courseware Consortium is one way to find open courses on the internet in Operations Research. The consortium themselves are trying to promote themselves with membership but to search for courses is free to the public by the institution. IEOR Tools has featured this in previous blog post about open courseware.

Tuesday, April 27, 2010

More Analytic Competitions

In a follow-up to a previous IEOR Tools blog post on Predictive Modeling and Recommendation Challenges there is another organization opening up analytic competitions. Kaggle is an organization that is trying to bring together the best prediction modeling and statistical talent vis-a-vis analytic competitions.

http://kaggle.com/

Kaggle is also encouraging organizations to host a competition on their platform. They want to encourage companies to use them to find top notch predictive analysts.

There are two types of competitions promoted by Kaggle. The two kinds are predicting the future and predicting the past. From Kaggle's website...

The platform allows companies, researchers, governments and other organizations to post their problems and have statisticians worldwide compete to predict the future (produce the best forecasts) or predict the past (find the best insights hiding in your data).

The current contest is a European Voting.

Kaggle is taking advantage of the Netflix Prize and its success. The hope is that Kaggle can be a platform to bring these competitions together. It will be interesting to follow Kaggle to see if there is success in these open competitions for analytics. The results of the Netflix Prize seems to think that there will be a good indication.

Friday, April 23, 2010

World Bank opens data to the public for open research

Press release this week from the World Bank Group states the World Bank will release free access to data. According to the article there is over 2,000 financial, business, health, economic and human development statistics available for free to research.

The World Bank has created a new website to access the free data at data.worldbank.org. Skimming over the Data Catalog shows a great amount of variety in the data sets. There are tables on Global Finance, Education Statistics, Poverty in developing countries, Gender, Business, Debt, Governance just to name a few.

This is very encouraging that the World Bank will offer data openly in this manner. Openness can be a great asset to the research community and help drive improvements and reform where needed. I definitely cheer the World Bank for allowing this data to become public.

Sunday, April 18, 2010

OR-Exchange in the top 50

Per Stack Exchange's directory of sites OR-Exchange has moved into the Top 50. This is good news for those in favor of keeping OR-Exchange a resource for the Operations Research community. There have been a lot of good questions asked this week. Although I threw a proverbial OR question about OR solvers some of the good ones include...

What file format for problem definition is suitable for OR-Exchange?

References for conjecture: Any regression can be translated into a math model

Please keep the questions and answers coming for OR-Exchange. For those that don't know about OR-Exchange I had a recent blog post about promoting this website for the Operations Research community.

Thursday, April 15, 2010

OR-Exchange Needs Your Help

Michael Trick, maintainer of the OR-Exchange site, is finding himself in a predicament. Apparently the owners of Stack Exchange that make OR-Exchange possible are changing their terms of service and providing an ultimatum to the lower trafficked pages. OR-Exchange will have to shut down if it can not meet the new requirements.

I'm a fan of OR-Exchange. OR-Exchange is a great place to share ideas with the Operations Research community and find answers to questions. OR-Exchange is in the model of digg.com and reddit.com where you can vote up the questions and answers that you find favorable. That means value content will always rise to the top for easy dissemination.

Some people may argue that there is already a lot of information on the internet with Operations Research and that I can't argue against. The value of OR-Exchange is the dynamic collaboration with the Operations Research community. There can be value in posting Q&A topics and getting the community to answer to vote up and comment.

I urge you to give OR-Exchange a try to help promote this project. Otherwise we might lose a great resource.

Wednesday, April 14, 2010

Math is the New Cool for Employers

While parusing AnalyticBridge, Vincent Granville posted an article from the Wall Street Journal online that New Hiring Formula Values Math Pros. The WSJ article states that more and more companies are looking to statistics, data mining, and machine learning experts. Computer Science is waning compared to analytic experts. From the article...

The most desirable candidates, employers say, can have a variety of experience and educational backgrounds. Companies say specific degrees are less important than a focus on data-mining techniques.

This is definitely a trend I've been seeing. Companies want to see value delivered from their employees instead of just data management. I'm encouraged for the Industrial Engineering and Operations Research field and am looking forward to see how it takes off.

Monday, April 12, 2010

Anyone interested in a good RUG?

R User Groups are popping up around the country. In Dallas there is a new R User Group as told by David at the REvolutions blog. If the Larry in the article sounds familiar than you are right! Here is where you can sign up for the RUG in Dallas.

http://tech.groups.yahoo.com/group/Dallas_RUG/

Also in Chicago they are getting things going with their own R User Group. REvolutions blog chimes in as well in an announcement for the windy city R users.

For those that don't know, R Project is a statistical computing environment very similar to S+ and SAS. It is free and open source and contains hundreds of free libraries and packages for statistical, optimization, predictive analytics, and data mining computing.

If you would like to get more interested in R in your region take a look at the REvolutions blog. REvolutions lists R User Groups all around the world. And if one is not in your area go ahead and get one started. It is a great way to network with professionals in your discipline.

Tuesday, April 6, 2010

DataMiningTools.com devoted to sharing data mining resources

DataMiningTools.com is an up and coming website devoted to all things data mining. There is a lot of tutorials, videos, reviews, and recommendations for quality tools of the data mining trade. There is even a feature for Open Source tools which definitely gets my attention.

DataMiningTools.com seems to be to data mining what IEOR Tools is to Industrial Engineering and Operations Research. I really like the presentation of the content. The links are tagged really well and are easy to find relevant resources for data mining. In the future I hope to feature some of the tutorials from this website.

One area you may to look at is the R Project tutorials which has my interest as of late.

Friday, March 26, 2010

New IEOR Tools contributor - developer of OpenOpt

I would like to welcome Dmitrey, developer of the open source optimization software suite, OpenOpt and maintainer of OpenOpt.org. Dmitrey will contribute from time to time about OpenOpt and software releases.

When I started the IEOR Tools blog I wanted it to be a forum for discussion, evaluation, and peer review of the tools available to the Industrial Engineering and Operations Research community. I believe that the best opportunity for continued development and research of tools of the trade is in the Open Source community. I hope that this blog can be a conduit for continued development and opportunity.

I'm excited for the contributions that Dmitrey will provide with OpenOpt and other open source developments.

Thursday, March 25, 2010

Timetric, startup grearing up for a new statistics presentation platform

Techcrunch is reporting that Timetric has closed seed funding for an innovative way to present statistics online. From the article, the money is going to be used to help get their statistical platform rolled out. Timetric's claim is that it will be able to present statistics in a more "useful" manner. Timetric already is working with online gurus such as Guardian and United Business Media.

It seems to me that this may be competing with the Wolphram Alphra platform. Another way to present public data on the internet. Yet this is also showing an increased trend in the importance of data visualization. I'm encouraged to see there is competitors in the data visualization market. It definitely shows that there is a need for that data management and presenting it in a meaningful way. Sounds a lot like Operations Research to me!

I really like the layout of the website. Any website that focuses on visualization should be sharp and refined. Curious to see how Timetric proves out in the future.

Friday, March 19, 2010

R Project in Google Summer of Code 2010

R Project, the open source statistical and mathematical computing environment, is going to be a part of the Google Summer of Code 2010. There is a an R Wiki page devoted to topics and projects within the R Project for the Google Summer of Code. The assortment of projects range from mathematical and statistical research oriented to computer API and interface.

The Google Summer of Code is a student internship program that provides stipends to develop free and open source software around the globe. The GSOC has been in existence since 2005 and has allowed thousands of students to work on hundreds of computing projects of interest. For a complete list of the GSOC open source organizations can be found on their project site.

Thursday, March 18, 2010

10 Linux Productivity Tools

For anyone who reads this blog it is no mystery that I am an Open Source and Free Software advocate. I have my reasons, which are many, I have previously posted on the IEOR Tools blog. That being said I like to find software for my favorite free operating system Linux. I know, Linux by itself is not an operating system but the kernel. I'm just referring to the "flavor" of the operating system which uses Linux.

There are some great productivity tools for Linux that can help any Analyst or Engineer. Linux.com does a great job of reviewing some of the best available productivity tools for Linux. I am a big fan of Kontact, Ocular, and Kivio. I've mentioned before that Kivio is a great free software diagramming tool.

These applications can help improve productivity in the Linux environment. Often times folks feel the Windows corners the market on these types of applications but that is not often the case.

Monday, March 15, 2010

Predictive Modeling/Recommendation Challenges

The absence of the Netflix Prize may leave a gaping hole in some of our leisure activities. Well maybe not that big of a hole but at least some of our thought provoking lapses of time. If you are an avid modeler and really want to stretch your data mining and predictive modeling skills to the limit than there are other ways to get that accomplished.

Here are some other notable predictive modeling contests available compliments of KDnuggets.

http://www.kdnuggets.com/datasets/competitions.html

A notable competition is the Analytics X Prize which aims at solving social problems within our world. The current prize is predicting homicide rates in Philadelphia. A bit morbid but may prove useful to municipalities across the world.

Also Yahoo has a collaborative learning or recommendation prize of their own. Yahoo Learning to Rank Challenge allows modelers to benchmark their ranking algorithms against the world. Must act quick because the challenge ends in June 2010.