Friday, December 31, 2010

Video of Joy of Stats by Hans Rosling

The Joy of Stats narrated by Hans Rosling was just produced by BBC and shown to their audience.  Hans Rosling via gapminder.org was kind enough to post the full hour video of the documentary about joys of statistics.  The video is posted on YouTube and is available to anyone.

http://www.gapminder.org/videos/the-joy-of-stats/

Hans Rosling's passion for statistics is infectious.  He definitely has a joy about him that persuades the viewer to really enjoy finding new and invigorating ways to explore data.  Now for me this is not hard to do as I love data and analyzing.  Yet for many in the world mathematics, let alone statistics, is considered a universe all unto its own that they dare not search.  Hans breaks down that barrier with The Joy of Stats.  No matter your educational interests or background I find it very hard to ignore his plea that statistics is not boring, and dare I say it, sexy. 

If you are interested in this video as a eulogy to statistics you would also enjoy Dr. Robert Lewis's essay on Mathematics.  Both of these works explain how a world without number analysis is merely a world not worth living.  There is so much to explore in so little time.  I am so happy that I decided to take a career in Engineering and Operations Research to help the world one datum at a time.

Tuesday, December 28, 2010

Top IEORTools Blog Articles of 2010

According to my Google Analytics account these are the top posts by pageview for the 2010 year.  Since this is an analytics based website I thought it would be appropriate to provide some data.  So here is a sort of year in review for IEOR Tools.

1.  Favorite Operations Research books from OR-Exchange

2.  R references for handling Big Data

3.  IEORTools Tutorial: Learning XML with R

4.  My 5 Favorite Operations Research Blogs

5.  Where to find good data sets

A lot of the pages had to do with using the statistical computing software R.  I'm also a contributor to the R-bloggers website so that has a lot to do with the traffic.  I'm excited to see what 2011 will have in store for the OR blogging world.  Happy New Year to the Operations Research community.

Wednesday, December 22, 2010

An essay on Mathematics and education

I just read an essay on Mathematics by Dr. Robert Lewis named "What Math?"  This has to be the single best essay I have ever read on Mathematics and why it is so important.  I can imagine Dr. Lewis has been peppered with cynic criticism about understanding math just as much as any of us in our analytical profession.  Dr. Lewis does a superb job of explaining the essence of why those of us that use math love it so much.

Educating Math to our newer generations is definitely a concern.  I really like how Dr. Lewis explains that education is not just about transfer of information but the understanding of underlying principles of specific knowledge.  The parables are a very clever device to relay those principles of math. 

I also love how he portrays Math as not just a device for the technologically minded but also for the liberal arts.  Dr. Lewis conveys that Math is not merely knowing numbers but the processes of finding solutions.  My own example is when people often ask me how I am so good at math.  I usually tell them its just like learning a language.  Once you understand the language and are fluent then you can start applying it in everyday life.  Math is a language to learn just as much a foreign language.  It may take some time to learn but it will take a lifetime to master.

I highly recommend reading this essay.  I also recommend saving this essay for our future generations, teachers, educators, family members, and friends.  This essay can be used to help bridge understanding that may be missing from our own words. 

Thursday, December 16, 2010

New OpenOpt/FuncDesigner quarterly release

New OpenOpt and FuncDesigner quarterly release is out: 0.32.

OpenOpt:
* New class: LCP (and related solver)
* New QP solver: qlcp
* New NLP solver: sqlcp
* New large-scale NSP solver gsubg. Currently it still requires lots of improvements (especially for constraints - their handling is very premature yet and often fails), but since the solver sometimes already works better than ipopt, algencan and other competitors it was tried with, I decided to include the one into the release.
* Now SOCP can handle Ax <= b constraints (and bugfix for handling lb <= x <= ub has been committed) * Some other fixes and improvements

FuncDesigner:
* Add new functions removeAttachedConstraints, min and max
* Systems of nonlinear equations: possibility to assign personal tolerance for an equation
* Some fixes and improvements (especially for automatic differentiation)

See also: Full Changelog, Some Applications, Future Plans

Where to find good data sets

O'Reilly Media has been a big advocate of Open Data and believes that is where a lot of computing is going to be headed in the future.  I think they are definitely on to something.  Yet the future could be now.  There is a lot of opportunities to find good data sources immediately.  One of my favorite blogs, OReilly Radar, has an article by Edd Dumbill on Where To Find Data.  There is plenty of good data available on the internet for download to explore and mine new information.  These places not only offer great sources of data but many of them offer an API to allow quick and seamless access.  Below is a link summary from the article.

Freebase

An all-things graph database.  The website focuses on trends of certain cultural and interest topics.

Amazon Public Data Sets

Amazon is probably considered the cloud computing mecca next to Google.  Amazon Web Services offers a lot.  One of which is storage of public data sets.  They offer a huge variety of public data.

Windows Azure Data Marketplace

Surprisingly Microsoft has an open data protocol data source.  This data market offers quite a few points of interest data sets.

Yahoo Query Language

YQL is an interesting API that is very similar to SQL.  YQL is essentially a language that allows to grab data from cloud services.  This could be very handy to grabbing data quickly and dynamically.  YQL offers to connect to a lot of data sources as well.

Infochimps

Infochimps is a data marketplace warehouse.  They offer to host, sell, and distribute data sets.  Some of their data comes at a cost but a lot of their data is free as well.  This is an interesting startup and will be very interesting to follow their growth.  Also there is a new Infochimps R package that uses their API to gather data and process Infochimps data.

DBpedia

DBpedia is a wikipedia for data sets.  In fact the data itself comes from Wikipedia. 


Some other sources not from the article include the World Bank open data and the U.S. Census data.

Sunday, December 12, 2010

Shortest past algorithm solved by ants?

University of Syndey researchers are working on the next greatest optimization algorithms.  You would think they would be hunkered down in the math or computer science departments working with large multi-core processors.  Yet Chris Reid and Madeleine Beekman, working with David Sumpter of Uppsala Univ., are studying how ants solve the shortest path problem.  By studying how ants solve complex and dynamic problems such as getting food back to their colony they could unravel some new and innovative ways to solve routing problems.  The researchers published their results in Journal of Experimental Biology

There has already been some algorithms developed out of studying the ants.  One method is the Ant Colony Optimisation (ACO) algorithms.  Ants solve the complex problem of shortest path by communicating to other ants in the colony by pheromone trails.  Each ant leaves a pheromone trail as a signal back to a following ant.  The trail has a certain "optimal path" signal telling other ants the best way to get to the intended destination.

It would be really interesting to find out that the best shortest path algorithm might have been literally under our noses the entire time.  This will be an interesting study to follow for the Operations Research community.

Wednesday, December 8, 2010

2 years of blogging with IEOR Tools

I forgot I was going to make a mention but on Nov. 21 was officially two years of blogging about Industrial Engineering and Operations Research Tools.  I have really enjoyed writing about this space and reading all of the contributions.  I have no intentions of quitting and hope to make many more contributions.

An update to the blog is that I'm starting to contribute Amazon content to the site.  Amazon has been a valuable resource for linking books on content matter.  I've thought about adding a website that will be a "store" or compilation of some of the better resources with Amazon being a partner.  I thought I would bring this up with the readers first to see if this would be a valuable addition to this blog.  It would be a clearinghouse or aggregator for all the best tools and resources in Operations Research, Industrial Engineering, Analytics and Data Mining.  I'm not sure there is anything on the internet besides doing searches in Google or Amazon.  I hope the site would be nice layout to help easily find resources.

Since it is the holiday season I would like to send my warmest regards to all those reading.  I thank you so much for your readership.  I wish you and your family a safe and happy holidays. 

Tuesday, December 7, 2010

Big Data Logistic Regression with R and ODBC

Recently I've been doing a lot of work with predictive models using logistic regression.  Logistic regression is great for determing probable outcomes of a independent binary target variable.  R is a great tool for accomplishing this task.  Often times I will use the base function glm to develop a model.  Yet there are times, due to the hardware or software memory restrictions, that the usual glm function is not enough get the job done.

A great alternative to performing usual logistic regression analyses on big data is using the biglm package.  Biglm performs the same regression optimization but processes the data in "chunks" at a time.  This allows R to only perform calculations on smaller data sets without the need for large memory allocations to the computer.  Biglm also has an interesting option that it not only can perform calculations on imported dataframes and text files but also database connectivity.  This is where the helpful package RODBC comes in to the aid.

I have be looking all over the R support lists and blogs in hopes of finding a good tutorial using biglm and RODBC.  I was not successful yet I was able to find out how to perform this myself.