Friday, December 31, 2010
Hans Rosling's passion for statistics is infectious. He definitely has a joy about him that persuades the viewer to really enjoy finding new and invigorating ways to explore data. Now for me this is not hard to do as I love data and analyzing. Yet for many in the world mathematics, let alone statistics, is considered a universe all unto its own that they dare not search. Hans breaks down that barrier with The Joy of Stats. No matter your educational interests or background I find it very hard to ignore his plea that statistics is not boring, and dare I say it, sexy.
If you are interested in this video as a eulogy to statistics you would also enjoy Dr. Robert Lewis's essay on Mathematics. Both of these works explain how a world without number analysis is merely a world not worth living. There is so much to explore in so little time. I am so happy that I decided to take a career in Engineering and Operations Research to help the world one datum at a time.
Tuesday, December 28, 2010
1. Favorite Operations Research books from OR-Exchange
2. R references for handling Big Data
3. IEORTools Tutorial: Learning XML with R
4. My 5 Favorite Operations Research Blogs
5. Where to find good data sets
A lot of the pages had to do with using the statistical computing software R. I'm also a contributor to the R-bloggers website so that has a lot to do with the traffic. I'm excited to see what 2011 will have in store for the OR blogging world. Happy New Year to the Operations Research community.
Wednesday, December 22, 2010
Educating Math to our newer generations is definitely a concern. I really like how Dr. Lewis explains that education is not just about transfer of information but the understanding of underlying principles of specific knowledge. The parables are a very clever device to relay those principles of math.
I also love how he portrays Math as not just a device for the technologically minded but also for the liberal arts. Dr. Lewis conveys that Math is not merely knowing numbers but the processes of finding solutions. My own example is when people often ask me how I am so good at math. I usually tell them its just like learning a language. Once you understand the language and are fluent then you can start applying it in everyday life. Math is a language to learn just as much a foreign language. It may take some time to learn but it will take a lifetime to master.
I highly recommend reading this essay. I also recommend saving this essay for our future generations, teachers, educators, family members, and friends. This essay can be used to help bridge understanding that may be missing from our own words.
Thursday, December 16, 2010
New OpenOpt and FuncDesigner quarterly release is out: 0.32.
* New class: LCP (and related solver)
* New QP solver: qlcp
* New NLP solver: sqlcp
* New large-scale NSP solver gsubg. Currently it still requires lots of improvements (especially for constraints - their handling is very premature yet and often fails), but since the solver sometimes already works better than ipopt, algencan and other competitors it was tried with, I decided to include the one into the release.
* Now SOCP can handle Ax <= b constraints (and bugfix for handling lb <= x <= ub has been committed) * Some other fixes and improvements
* Add new functions removeAttachedConstraints, min and max
* Systems of nonlinear equations: possibility to assign personal tolerance for an equation
* Some fixes and improvements (especially for automatic differentiation)
An all-things graph database. The website focuses on trends of certain cultural and interest topics.
Amazon Public Data Sets
Amazon is probably considered the cloud computing mecca next to Google. Amazon Web Services offers a lot. One of which is storage of public data sets. They offer a huge variety of public data.
Windows Azure Data Marketplace
Surprisingly Microsoft has an open data protocol data source. This data market offers quite a few points of interest data sets.
Yahoo Query Language
YQL is an interesting API that is very similar to SQL. YQL is essentially a language that allows to grab data from cloud services. This could be very handy to grabbing data quickly and dynamically. YQL offers to connect to a lot of data sources as well.
Infochimps is a data marketplace warehouse. They offer to host, sell, and distribute data sets. Some of their data comes at a cost but a lot of their data is free as well. This is an interesting startup and will be very interesting to follow their growth. Also there is a new Infochimps R package that uses their API to gather data and process Infochimps data.
DBpedia is a wikipedia for data sets. In fact the data itself comes from Wikipedia.
Some other sources not from the article include the World Bank open data and the U.S. Census data.
Sunday, December 12, 2010
There has already been some algorithms developed out of studying the ants. One method is the Ant Colony Optimisation (ACO) algorithms. Ants solve the complex problem of shortest path by communicating to other ants in the colony by pheromone trails. Each ant leaves a pheromone trail as a signal back to a following ant. The trail has a certain "optimal path" signal telling other ants the best way to get to the intended destination.
It would be really interesting to find out that the best shortest path algorithm might have been literally under our noses the entire time. This will be an interesting study to follow for the Operations Research community.
Wednesday, December 8, 2010
An update to the blog is that I'm starting to contribute Amazon content to the site. Amazon has been a valuable resource for linking books on content matter. I've thought about adding a website that will be a "store" or compilation of some of the better resources with Amazon being a partner. I thought I would bring this up with the readers first to see if this would be a valuable addition to this blog. It would be a clearinghouse or aggregator for all the best tools and resources in Operations Research, Industrial Engineering, Analytics and Data Mining. I'm not sure there is anything on the internet besides doing searches in Google or Amazon. I hope the site would be nice layout to help easily find resources.
Since it is the holiday season I would like to send my warmest regards to all those reading. I thank you so much for your readership. I wish you and your family a safe and happy holidays.
Tuesday, December 7, 2010
A great alternative to performing usual logistic regression analyses on big data is using the biglm package. Biglm performs the same regression optimization but processes the data in "chunks" at a time. This allows R to only perform calculations on smaller data sets without the need for large memory allocations to the computer. Biglm also has an interesting option that it not only can perform calculations on imported dataframes and text files but also database connectivity. This is where the helpful package RODBC comes in to the aid.
I have be looking all over the R support lists and blogs in hopes of finding a good tutorial using biglm and RODBC. I was not successful yet I was able to find out how to perform this myself.
Monday, November 29, 2010
The next most interesting thing is that all top 3 finishers used free and open source software as tools for the competition. Two of the leaders used R and the second place finisher used Python, namely SciPy. This should not be surprising to most people in the analytics community. Open source software has been making inroads for quite a while. The R-Project has been getting a lot of interesting press lately especially in enterprise business circles. Python is an objective oriented programming language that is getting more popular. Python's popularity seems to be to its ease of use and how quickly it can be learned and implemented.
The presenters of the 2010 INFORMS data mining competition were kind to post the methods of the Top 3 competitors. Each method is an interesting read on how they were able to use the open source tools to get predictive results of stock price movements.
If you are interesting in learning more about R as a tool I recommend a new book by Luis Torgo "Data Mining with R: Learning with Case Studies".
This book is one of the first in its kind of showing R methodologies with real life applications. I am intending to get the book and hopefully have a review on it in the near future. I am already hearing good things about it.
Wednesday, October 27, 2010
This package allocates hard disk space to big data vectors.
This package allocates points to unused memory or points to a swap file.
Taking R to the Limit: Parallelism and Big Data
Hitting the Big DataCeiling Limit in R
While this is not a helpful article for big data it does show some of the issues R current faces. Namely the issue of that lack of a "int64" or Long Long data type memory allocation.
Revolution R Enterprise
Revolution Analytics is creating enterprise software around R to tackle issues of big data, parallelism and threaded computing in order to speed up large data processing and analytics.
Wednesday, October 20, 2010
Introduction and Tutorials
R Tutorial Series and Introduction
Burns Statistics Tutorials
Introductory R Tutorials
Videos about R
Videos on Data Analysis with R
Graphics with R
R Graph Gallery
Friday, October 8, 2010
I am a user of R and I am also a member of the Dallas R Users Group. As you can imagine I find this competition very interesting because I could benefit greatly from knowing the preferred methods of implementation in the R community. This can also be a very interesting exercise to determine preferred modeling methods. I believe this competition will give an insight into the most common methods to apply statistical computing in the community today.
R is getting a lot of press lately. Revolution Analytics just released the first part of a series of articles on the impacts of R and why it is hot. R as a statistical and optimization tool is really making a play in the business community.
I am a big fan of these data mining competitions as I've written previously. It is really interesting what you can learn about the world from the competitions especially if you do not have previous knowledge of the subject matter. I will try to bring up these competitions from time to time. I would be interested to know if any IEOR Tools readers have participated in these competitions and what was their experience.
Tuesday, September 21, 2010
- Acquires COGNOS for $5.0 billion in November 2007
- Acquires ILOG for $0.34 billion in July 2008
- Acquires SPSS for $1.2 billion in the same month July 2008
- Acquires Netezza for $1.7 billion in September 2010
So what does this mean for Operations Research and their professionals? I believe it means the sky is the limit now. This is a grand opportunity for the Operations Research community. In fact I would even say that if INFORMS does not take advantage of the recent demand for analytics and decision sciences then they are missing this big picture. Jobs should be plentiful in the foreseeable future. There should be plenty of work to keep management happy and help drive value into organizations. This may even be the dawn of a new day for Operations Research and Analytics. There is so much buzz now it will leave a ringing in your ears. Sure I might be a little optimistic but I think any news is good news right now in this economy.
I also believe that IBM is not done. I think IBM is going to evolve even more in the Analytics realm. Perhaps getting more involved in the software within Operations Research and statistics. Its just a guess but who knows if SAS, Matlab, or even contributing to open source projects like R, RapidMiner or Weka. This is an exciting time none of the less for Operations Research.
Thursday, September 16, 2010
- Kaggle is hosting a three competions. Tourism Forecasting part one challenges to predict 581 tourism-related time series. Chess Ratings - Elo vs the rest of the World is trying to determine a chess rating system that is better than the current Elo rating system. INFORMS Data Mining Contest challenges to predict intra-day stock price movements based on experts predictions, sector data, and other indicators.
- TunedIT is another competition hosting organization. Currently TunedIT is hosting the e-LICO mutli-omics prediction challenge with background knowledge on Obstructive Nephropathy. Yes, I had to look it up too.
- UC San Diego is hosting the 2010 UC San Diego Data Mining Contest. This is a two task contest which tries to predict e-tailer's data on consumer and non-consumer information. The two tasks are a binary preditor and a boolean-transformed predictor.
Wednesday, September 15, 2010
New OpenOpt Suite release is out. This is free (license: BSD) and cross-platform (Linux, Windows, Mac etc) Python language modules for numerical optimization, automatic differentiation, solving systems of linear/nonlinear/ordinary differential equations, interpolation, integration etc.
- Lots of new NLP, NSP (nonsmooth) and GLP (global) solvers from nlopt have been connected
- New LP solver: pclp (very premature, but permissive license (BSD) and pure Python implementation)
- Some bugfixes (mostly wrt using sparse matrices) and code cleanup
- New features: Integration, Translator
- Some speedup for functions evaluation, automatic differentiation, optimization problems with some fixed variables
- New parameter useSparse for optimization probs and automatic differentiation (sometimes autoselect works prematurely)
- Some bugfixes (mostly wrt using sparse matrices) and code cleanup
- Add parameter exactShape (prevents flattering operations on result)
Wednesday, September 8, 2010
There is an interesting debate in the R-project community about the shortcomings of the R language. Xi'an Og posted a discussion on R shortcomings re-posted from another blog. The consensus of the R community seems to be that R is an inferior language but has a brilliant library of resources. So where does that leave the practioner? Does the practioner need to update their coding skills and develop something better in another computer language? I find it really interesting that some of the first solutions to this debate is to scrap everything and start over.
I don't think this debate is ever going to change. The computer is always going to be a valuable tool for the Operations Research practitioner. The tools we use to complete our daily tasks need to ubiquitous but also readily available. Let's just say that the slide rule is not going to be making any sort of comeback.
I believe that the Open Source model has a real advantage here over the proprietary counterparts in this debate. The community has a lot of input into Open Source software. It is often called a meritocracy. The best solutions continue while those that do not go away in obscurity in the Open Source model. This is one of the reasons why I advocate Open Source software. In the end I think R is going to be fine. There will be advances, possible even forks of the software, but there will always be progress. The only limitations seem to be of what we could dream.
Friday, August 20, 2010
This brings another interesting thought to how problems are solved now. The mathematical community is closer now than it ever has been. This is the age of online crowd sourcing. If I have an question about Operations Research I go to OR-Exchange. If I am looking for a professional network contact I go to LinkedIn or INFORMS. If I need to read about the interests of the Operations Research communities I will go to their blogs. The convergence of ideas, thoughts, and knowledge is closer now and is only going to get even closer.
This article is one of the reasons why I am such an advocate of open source software with organizations like GNU and COIN-OR. Open source brings the best of bringing thoughts and ideas together to create a quality product. Sure there are licensing issues. This article is a good metaphor in that software licensing is like the "old media". Licensing is trying to catch up with the new technology but there are still a lot of kinks to work out. There are even suggestions now that software patents should even eliminated. I'm not sure what will happen but I do know that open source software is driving a lot of innovation in a much shorter time frame.
So yes I find it ironic that the New York Times is reporting on this proof as if it is new news. Maybe I'm just too close to the subject so I understand it a little better than the rest of the New York Times readers. Yet if you are anywhere near the mathematical world you would have already seen the proof and had your own conjectures. Even if that is the case we can prove now that information and knowledge is faster and easier to obtain than ever.
Monday, August 16, 2010
This R tutorial is going to be using the package XML. Packages are used in R to perform specific computational needs that the base R platform can not accomplish on its own. There are several different packages that can be loaded into R to perform a wide variety of problem instances.
Sunday, August 8, 2010
So in order of OR-Exchange votes here are the favorite Operations Research books.
1. Applied Mathematical Programming by Bradley, Hax, Magnanti.
Applied Mathematical Programming
Also available at http://web.mit.edu/15.053/www/ but if you like it you might want to give it a purchase.
2. Network Flows: Theory, Algorithms, and Applications by Ahuja, Magnanti, Orlin
3. Linear Programming by Chvatal
4. Model Building in Mathematical Programming by Williams
5. Introduction to Operations Research by Hillier, Lieberman
6. 50 Years of Integer Programming by Juenger, Liebling, Naddef, Nemhauser, Pulleyblank, Reinelt, Rinaldi, Wolsey
7. The Traveling Salesman Problem: A Computational Study by Applegate, Bixby, Chvatal, Cook
8. Tabu Search by Glover, Laguna
9. Prisoner's Dilemma by Poundstone
10. Serious Play by Schrage
11. The Fifth Discipline by Senge
12. The Predictioneer's Game by Mesquita
13. Optimization Algorithms for Networks and Graphs by Evans, Minieka
Thursday, August 5, 2010
This looks to be a really interesting modeling competition with already more than 40 submissions in the leaderboard. The interesting note about this competition is that the Elo rating system itself is going to be making an appearance on the leaderboard. This means that if no one beats the Elo system than there is no declared winner. Although it looks like someone has beaten Elo at its game already. Elo will be on the leaderboard as a benchmark to make sure that the competition is proving its worth.
I hope to get a chance to make an appearance on the leaderboard. I am involved in Kaggle's INFORMS 2010 Data Mining contest. I'm barely hanging on to the top 10 in that competition. There are some pretty good models to compete against in that group.
Tuesday, August 3, 2010
That is until now. The statistics blog at Stattler.com did some research on this very topic of R and Ubuntu. They found a plugin for R with the text editor Gedit that works wonders. The plugin is called Rgedit and is very easy to install. Stattler offers a simple instruction for installation. Also Stattler has a great review of the Rgedit plugin. Rgedit is very similar in layout to usual gedit text editor except it splits the panes of the screen for code and R output.
Some of the highlights of the Rgedit plugin include:
- Split screen of panes and can be turned on and off
- Syntax highlighting specific to the R code
- Single line or batch processing of R scripts
- Multiple R workspaces can be run
- Shortcut keys can be created and customized
Saturday, July 31, 2010
Networking is an essential part of career management for any professional. The relationships we develop can have great impact on our career direction and growth. I tell young professionals all the time that their best asset in career growth is their professional network. I tell them to start early and maintain the network continually. The advent of LinkedIn and other internet social networking sites has made that task easier.
Vincent Granville at AnalyticBridge.com has compiled a list of 8 data mining social networking groups with more than 2000 members. These groups are easy to become as member as all of them are associated with LinkedIn. If anyone is finding anyone in the data mining community more than likely they will be a member in these groups.
I have found a couple of different jobs through LinkedIn in the past. I have found hiring managers as well as peers that I would be potentially working with closely. I found that to be a great benefit before the interview process. In fact I even would contact some of the peers in the group to get a pre-interview idea of where I was working and the idiosyncrasies of the organization. The important thing is using your professional network to maximize your career productivity.
Tuesday, July 27, 2010
|useR 2010 Conference|
The R-statistics blog was kind enough to post an audio of the keynote address by Richard Stallman at useR2010. Richard is not your typical stereotype of a computer geek. He may look the part but Richard does not pull any punches in his presentation of the free software movement and it's ideology. Richard's talk discusses the history of the free software movement, the Gnu Public License, and his history of dealing with free software.
Why was Richard giving the keynote address when he doesn't have a statistics background? Well the R statistical computing software platform is licensed under the GPL, Gnu Public License. R is free to use, distrubute, modify and improve as long as it's code is given credit to it's creator. This is much of what the GPL represents. Listen to the audio by Richard to really understand his passion for free software and what it means to him and the software world.
Wednesday, July 14, 2010
In this podcast Norman is asked a lot of interesting questions about R and the statistical modeling enterprise in general. They discuss his past with SPSS. They also discuss the advantages of using an Open Source software versus a proprietary platform. The interview gets really interesting when they discuss how statistical data is important to enterprise business and how a lot of organizations get it wrong.
If you are new to R and want to know more about its capabilities this is a great podcast.
Tuesday, July 13, 2010
"Our goal in building OpenGamma isn't just to build an open source technology" said Wylie, "Our goal is to build the best platform for financial analytics and risk management possible". The platform will be made available under a "commercial friendly open source licence"OpenGamma is going to provide several solutions based on its Open Architecture software platform. According to their website they will be providing batch risk systems, commercial trading, bespoke trading, and event-driven alert systems. The companies main moniker and selling point is that all of their software code with be Open. This means that companies will have the flexibility to not only debug but potential contribute back to the project, in theory.
This sounds like a brave yet prudent business venture. I think OpenGamma could be wildly successful with their Open Architecture platform. They can be especially successful if they allow the financial analytics community to contribute back to their software platforms. I believe we will be seeing more companies and startups like this in the future in Analytics. Perhaps there is an Open Source Operations Research platform on the horizon.
Wednesday, July 7, 2010
- COIN-OR CBC optimization engine to perform the calculations
- Compatible to existing Solver models
- No artificial limits to the size of the problem (huge win here!)
I'm hoping to give this a try soon. I'm really encouraged by OpenSolver because I always thought the current Solver was very limited. Looking forward to great things from OpenSolver. For other ideas about Open Source solvers with spreadsheets be sure to look at Open Office Calc.
Tuesday, June 29, 2010
There is already a lot of good discussions of modeling techniques. Mark started off with a question on OR-Exchange about modeling methods for the INFORMS contest. Since the data is a binary categorical target his preferred method was using Logistic Regression. Mark provides example R code to provide collaborative input to the contest. I followed suit and provided an IEORTools entry to the contest. I used the same methods of Logistic Regression. I also did some variable analysis using the rpart package in R to develop a decision tree. After pulling some variables that were not significant I was able to get on the leaderboard with Mark. The pictured leaderboard is of June 28.
Kaggle website contest forum. Posted on the forum one entrant suggested possible variables to use in a Logistic Regression model which is very beneficial.
I really like to see this collaborate effort to modeling. This was one of the qualities I really enjoyed in the Netflix Prize. I hope Kaggle and INFORMS continues to provide these fun and thought provoking contests.
Friday, June 25, 2010
Apparently this is in anticipation to the new U.S. financial rules from the U.S. Securities and Exchange Commission. On April 7, 2010 the U.S. S.E.C. proposed new rules for Asset-Backed Securities that will enable the markets to run efficiently and fairly. On the first page of the released documents from the S.E.C. they mention the use of Python. That is a nice shocker to us open source advocates.
Python is a great computing language. It is really easy to learn compared to the other languages such as C. Perhaps the U.S. S.E.C. thought it would be the best choice because of its ease of use and abundance of software packages. This is really interesting news and hopefully we will be hearing more about it in the near future.
Thursday, June 24, 2010
This opens up a lot more data mining opportunities and could just be the start of some great analytic research. I'm really looking forward to seeing what some of the great R minds will find with the World Bank data at their fingertips. Since R is freely available anyway this merger makes sense on all sorts of levels. Happy data mining!
Software for Data Analysis: Programming with R (Statistics and Computing)
Wednesday, June 16, 2010
Wayne Winston is posting some predictions and rankings on his blog mathletics. If you are a fan of sports and analytics (i.e. Moneyball) than you would love Wayne Winston's blog. Wayne does predictions for professional and collegiate sports in basketball, football, baseball, and soccer.
Blog posts on AnalyticBridge tells that big financial institutions are using quantitative financial instruments used in credit swaps and and debt obligations for predicting World Cup outcomes. It is a corporate financial challenge that is trying to predict with country will go the farthest in the World Cup. Let's hope its not the same models that were used to predict mortgage backed securities from a few years back.
Spotfire's blog has an entry about providing World Cup data all the way back to 1930. TIBCO, Spotfire's parent, is providing analytic data from the World Cup including all sorts of statistics. Analysts can get scores, goals, penalties, attendance, and other data points. The online app that TIBCO provides also has nifty charts to compare different countries performance.
Tuesday, June 15, 2010
New OpenOpt Suite release is out. This is free (license: BSD) and cross-platform (Linux, Windows, Mac etc) Python language modules for numerical optimization, automatic differentiation, solving systems of linear/nonlinear/ordinary differential equations etc. It is published quarterly since 2007, already has some essential applications and expected to become even more popular with Python release 3.3, where dynamic compilation will be implemented.
* Some improvements for automatic differentiation
* New feature: attached constraints
* New feature: oosystem
* Now you can model & solve ODE systems
* Function get_d2
* Add new stencil
See also: Full Changelog, Future Plans
Tuesday, June 1, 2010
I have a blogger confession to make about OR-Exchange. I am addicted. I think its the first Operations Research related social network that has me really hooked. I confess that I check it daily. Yes you can see that I've earned a silver badge for my continual obedience. Is this OR's Farmville? Well at least it is for me.
The premise for OR-Exchange is really simple. Think of an Operations Research related question that bugs you, puzzles you, or simply just want to get peer feedback. Shortly after, and I mean shortly, you will be barraged by answers from like minded individuals. The perfect storm that the Web 2.0 wants to fuel.
In my mind the beauty of OR-Exchange is that it is not any normal social network. This is a social network of peers that understand my issues, problems and concerns. Maybe its just my Generation X upbringing that requires instant gratification. Yet I don't need a whole lot of stimulating from other social networks. In fact I'm pretty much done with most of the others. The online media I keep going back to are the ones associated with my interests and for me that is OR-Exchange right now.
I love the feedback from the folks at OR-Exchange. Good, bad or indifferent it brings perspectives that I often don't get in my circles. In my present work I don't often get to chat up Operations Research with my co-workers. I'm one of two employees that has any knowledge of what is Operations Research. I guess that's where it benefits me. I'm hoping that it benefits others like us that either has to wait a year to go to an INFORMS conference. I am active in my local INFORMS Chapter but most of it is topical speeches and programs. OR-Exchange is more of an outlet which has filled a void for me.
I hope the Operations Research community can take on to OR-Exchange. I believe there can be only more good as more users come online. Please help preach its worth if you are using it. Perhaps there are many more unanswered questions in the Operations Research community.
Thursday, May 13, 2010
1. Michael Trick's Operations Research blog
You can pretty much argue that Michael is the "Father of Operations Research blogs". Michael does a great job of mixing academia and real world applications of Operations Research. Often times the comment section is worth the read with great contributors to his blog.
2. Thoughts on business, engineering, and higher education by Aurelie Thiele
This blog by Auriele is probably some of the best writing in the Operations Research blogs. I particularly love the issues that Auriele presents on a weekly basis. In fact I'm outright jealous of Auriele's insight. This blog is just a flat out good read.
3. Punk Rock Operations Research
Punk Rock O.R.'s writer Laura McLay is another good Operations Research blogger that mixes academia and real life OR interests. I enjoy Laura's commentary on a lot of issues that you might not expect in mainstream OR applications. I especially like Laura's interests in sports.
4. Sebastian Pokutta's Blog
This blog may not be one of the most popular blogs but I really like Sebastian's Operations Research blog. Maybe the fact that I really relate to Sebastian's ideas and his endorsement of open source software in Operations Research. Sebastian finds really good nuggets in the OR world that you don't often see on other blogs.
This blog is perhaps one of the best in writing real life Operations Research examples. I really enjoy the thoughtfulness of this blog and writing of this blog. I enjoy reading about ThinkOR's writing style of sifting through real world problems and dictating possible solutions.
Monday, May 10, 2010
The crazy events of Wall Street last week sent off a huge wave of confusion as to the events that led to the sudden drop in stock prices. At first it was thought to be a "fat finger" that cause the decline of major stock indexes. Now the focus is on the large trading farms of computers that are said to make trades by specific rules and algorithms. Now there is a question as to what are the underlying algorithms that these computers are trading. What was thought to be a no brainer of setting trades at the speed of electrons to make a more efficient market is now all being thrown into question.
I do not claim to understand the rules or algorithms that are programmed into these trading computers. Wall Street trading is not my area of expertise. Although I am curious at this overall crisis and how it could be the result of supposed computer rules. The U.S. government is interested also as they are investigating what caused the sudden drop. Can algorithms imposed to trade on a whim cause that much market capitalization to drop out so suddenly. There are claims that market values dropped by nearly 100% on long established companies like Accenture.
I'm definitely going to be following this story closely. I'm curious what the SEC is going to find in their investigation. I'm going to reserve my opinions until more facts are brought forth. Perhaps we may never really know what caused this crisis. I would hope that it is something the Operations Research community could learn. We know that algorithms can be developed to provide great benefits to people and organizations. Yet we hardly ever hear of the times when they can cause great trouble. We can learn from those bad implementations of algorithms. Usually at the heart of it is not so much a bad algorithm but the underlying assumptions of the model. We should know this all to well with the recent mortgage crisis. Perhaps this road to recovery out of this current recession is going to take a lot more time than we thought.
Thursday, May 6, 2010
Revolution Analytics is going to bridge the academic and business divide by providing solutions that were considered limitations to R in the past. They will be focusing on software enhancements that will be able to handle larger datasets. There is going to be better use of multi-core processing power. There is also going to be improvements to user-interfaces for business analysts.
R is a free and open source software environment for statistical computing and data visualization. I think it is too early to tell what this announcement is going to mean for the statistical enterprise software market. Revolution Analytics has already said that they will be mixing proprietary methods with R. This will be interesting to see how the R community embraces that relationship. R is licensed under the Gnu Public License which is supported by the Free Software Foundation. That is a crowd that does not take too kindly to proprietary software and patents. It will be interesting to follow Revolution Analytics and how they are able to implement their roadmap.
Tuesday, May 4, 2010
A great article by PhysOrg.com on the values of Operations Research implemented in the Dutch railways. Improvements were able to be realized in train arrivals, passenger utilization in the cars, and operating profit. Many countries were impacted by the improved railway service all across Europe including Netherlands, Germany, and Switzerland among others.
The team that implemented the Operations Research strategies for the railway improvement project is lead by Christos Zaroliagis, a professor of Computer Science and Informatics at University of Patras. Christos was part of the team that earned the 2008 Edelman Prize from INFORMS for "The New Dutch Timetable: The O.R. Revolution." The team of the ARRIVAL project is a consortium of several researchers from many European countries.
This is a great example of Operations Research in practice and how OR continues to improve the operations and lives of organizations. I really like sharing stories like this because I don't feel they often get their due respect. There is a lot of research and planning in the background of a good research project, let alone Operations Research, that does not get noticed.
Wednesday, April 28, 2010
Operations Research is no different to Open Courseware. In fact there is an increasing amount of Operations Research, Management Science, Supply Chain, and Applied Mathematics available on the internet. The Open Courseware Consortium is one way to find open courses on the internet in Operations Research. The consortium themselves are trying to promote themselves with membership but to search for courses is free to the public by the institution. IEOR Tools has featured this in previous blog post about open courseware.
Tuesday, April 27, 2010
Kaggle is also encouraging organizations to host a competition on their platform. They want to encourage companies to use them to find top notch predictive analysts.
There are two types of competitions promoted by Kaggle. The two kinds are predicting the future and predicting the past. From Kaggle's website...
The platform allows companies, researchers, governments and other organizations to post their problems and have statisticians worldwide compete to predict the future (produce the best forecasts) or predict the past (find the best insights hiding in your data).
The current contest is a European Voting.
Kaggle is taking advantage of the Netflix Prize and its success. The hope is that Kaggle can be a platform to bring these competitions together. It will be interesting to follow Kaggle to see if there is success in these open competitions for analytics. The results of the Netflix Prize seems to think that there will be a good indication.
Friday, April 23, 2010
Press release this week from the World Bank Group states the World Bank will release free access to data. According to the article there is over 2,000 financial, business, health, economic and human development statistics available for free to research.
The World Bank has created a new website to access the free data at data.worldbank.org. Skimming over the Data Catalog shows a great amount of variety in the data sets. There are tables on Global Finance, Education Statistics, Poverty in developing countries, Gender, Business, Debt, Governance just to name a few.
This is very encouraging that the World Bank will offer data openly in this manner. Openness can be a great asset to the research community and help drive improvements and reform where needed. I definitely cheer the World Bank for allowing this data to become public.
Sunday, April 18, 2010
What file format for problem definition is suitable for OR-Exchange?
References for conjecture: Any regression can be translated into a math model
Please keep the questions and answers coming for OR-Exchange. For those that don't know about OR-Exchange I had a recent blog post about promoting this website for the Operations Research community.
Thursday, April 15, 2010
I'm a fan of OR-Exchange. OR-Exchange is a great place to share ideas with the Operations Research community and find answers to questions. OR-Exchange is in the model of digg.com and reddit.com where you can vote up the questions and answers that you find favorable. That means value content will always rise to the top for easy dissemination.
Some people may argue that there is already a lot of information on the internet with Operations Research and that I can't argue against. The value of OR-Exchange is the dynamic collaboration with the Operations Research community. There can be value in posting Q&A topics and getting the community to answer to vote up and comment.
I urge you to give OR-Exchange a try to help promote this project. Otherwise we might lose a great resource.
Wednesday, April 14, 2010
The most desirable candidates, employers say, can have a variety of experience and educational backgrounds. Companies say specific degrees are less important than a focus on data-mining techniques.
This is definitely a trend I've been seeing. Companies want to see value delivered from their employees instead of just data management. I'm encouraged for the Industrial Engineering and Operations Research field and am looking forward to see how it takes off.
Monday, April 12, 2010
R User Groups are popping up around the country. In Dallas there is a new R User Group as told by David at the REvolutions blog. If the Larry in the article sounds familiar than you are right! Here is where you can sign up for the RUG in Dallas.
Also in Chicago they are getting things going with their own R User Group. REvolutions blog chimes in as well in an announcement for the windy city R users.
For those that don't know, R Project is a statistical computing environment very similar to S+ and SAS. It is free and open source and contains hundreds of free libraries and packages for statistical, optimization, predictive analytics, and data mining computing.
If you would like to get more interested in R in your region take a look at the REvolutions blog. REvolutions lists R User Groups all around the world. And if one is not in your area go ahead and get one started. It is a great way to network with professionals in your discipline.
Tuesday, April 6, 2010
DataMiningTools.com is an up and coming website devoted to all things data mining. There is a lot of tutorials, videos, reviews, and recommendations for quality tools of the data mining trade. There is even a feature for Open Source tools which definitely gets my attention.
DataMiningTools.com seems to be to data mining what IEOR Tools is to Industrial Engineering and Operations Research. I really like the presentation of the content. The links are tagged really well and are easy to find relevant resources for data mining. In the future I hope to feature some of the tutorials from this website.
One area you may to look at is the R Project tutorials which has my interest as of late.
Friday, March 26, 2010
When I started the IEOR Tools blog I wanted it to be a forum for discussion, evaluation, and peer review of the tools available to the Industrial Engineering and Operations Research community. I believe that the best opportunity for continued development and research of tools of the trade is in the Open Source community. I hope that this blog can be a conduit for continued development and opportunity.
I'm excited for the contributions that Dmitrey will provide with OpenOpt and other open source developments.
Thursday, March 25, 2010
It seems to me that this may be competing with the Wolphram Alphra platform. Another way to present public data on the internet. Yet this is also showing an increased trend in the importance of data visualization. I'm encouraged to see there is competitors in the data visualization market. It definitely shows that there is a need for that data management and presenting it in a meaningful way. Sounds a lot like Operations Research to me!
I really like the layout of the website. Any website that focuses on visualization should be sharp and refined. Curious to see how Timetric proves out in the future.
Friday, March 19, 2010
R Project, the open source statistical and mathematical computing environment, is going to be a part of the Google Summer of Code 2010. There is a an R Wiki page devoted to topics and projects within the R Project for the Google Summer of Code. The assortment of projects range from mathematical and statistical research oriented to computer API and interface.
The Google Summer of Code is a student internship program that provides stipends to develop free and open source software around the globe. The GSOC has been in existence since 2005 and has allowed thousands of students to work on hundreds of computing projects of interest. For a complete list of the GSOC open source organizations can be found on their project site.
Thursday, March 18, 2010
For anyone who reads this blog it is no mystery that I am an Open Source and Free Software advocate. I have my reasons, which are many, I have previously posted on the IEOR Tools blog. That being said I like to find software for my favorite free operating system Linux. I know, Linux by itself is not an operating system but the kernel. I'm just referring to the "flavor" of the operating system which uses Linux.
There are some great productivity tools for Linux that can help any Analyst or Engineer. Linux.com does a great job of reviewing some of the best available productivity tools for Linux. I am a big fan of Kontact, Ocular, and Kivio. I've mentioned before that Kivio is a great free software diagramming tool.
These applications can help improve productivity in the Linux environment. Often times folks feel the Windows corners the market on these types of applications but that is not often the case.
Monday, March 15, 2010
Here are some other notable predictive modeling contests available compliments of KDnuggets.
A notable competition is the Analytics X Prize which aims at solving social problems within our world. The current prize is predicting homicide rates in Philadelphia. A bit morbid but may prove useful to municipalities across the world.
Also Yahoo has a collaborative learning or recommendation prize of their own. Yahoo Learning to Rank Challenge allows modelers to benchmark their ranking algorithms against the world. Must act quick because the challenge ends in June 2010.
Friday, March 12, 2010
improvements made to the recommendation engine made it easier to identify people through supposedly anonymous information.I guess the modeling community gets a +1 for great improvements.
I was a big fan of the Netflix Prize even if it only brought marginal improvements to the actual recommendation system. The shared knowledge and collaborative spirit was impressive. It doesn't sound like Netflix is going to go out on a limb and suggest a new contest. So it looks like this might be the end of the contest. This is very unfortunate because I was hoping this would spark a lot of companies trying these types of contests.
This also brings up a good point about making data anonymous. There are lots of ways to get this done. Please share ways that you're modeling for business, academic, or clientele that required making data anonymous.
Thursday, March 4, 2010
Some noted e-books in mathematics that I found to be interesting
Algorithms by Ian Craw, John Pulham
An Introduction to R by W. N. Venables, D. M. Smith
Engineering Mathematics by Ian Craw, Stuart Dagger, John Pulham
Statistics for Business and Economics by Marcelo Fernandes
Jeromy Anglim's blog is a very interesting resource for statistics, data mining, and R. Be sure to read it for its wonderful insights.