Friday, March 12, 2010

Netflix scraps Netflix Prize II in lieu of lawsuit

Unfortunately for the prediction and mathematical modeling community Netflix has decided to scrap the sequel to the Netflix Prize. The matter came to a conclusion after a lawsuit was filed against Netflix about the public access to its member's ratings data. For the original Netflix prize the member ratings data was made anonymous. Yet the lawsuit is claiming that the ratings are so good that (from the article)

improvements made to the recommendation engine made it easier to identify people through supposedly anonymous information.
I guess the modeling community gets a +1 for great improvements.

I was a big fan of the Netflix Prize even if it only brought marginal improvements to the actual recommendation system. The shared knowledge and collaborative spirit was impressive. It doesn't sound like Netflix is going to go out on a limb and suggest a new contest. So it looks like this might be the end of the contest. This is very unfortunate because I was hoping this would spark a lot of companies trying these types of contests.

This also brings up a good point about making data anonymous. There are lots of ways to get this done. Please share ways that you're modeling for business, academic, or clientele that required making data anonymous.

Thursday, March 4, 2010

Free e-books on Mathematics

Thanks to Jeromy Anglim on his blog he found a great link for free e-books on mathematics from e-bookdirectory.com. There are quite a few e-books in this link which can be a great resource for rehashing and learning some new methods.

Some noted e-books in mathematics that I found to be interesting

Algorithms by Ian Craw, John Pulham
An Introduction to R by W. N. Venables, D. M. Smith
Engineering Mathematics by Ian Craw, Stuart Dagger, John Pulham
Statistics for Business and Economics by Marcelo Fernandes


Jeromy Anglim's blog is a very interesting resource for statistics, data mining, and R. Be sure to read it for its wonderful insights.

Monday, February 22, 2010

Gnumeric 1.10 released

OStatic.com has a review of the new Gnumeric 1.10 release. For those that don't know Gnumeric is an open source spreadsheet application built specifically for the GNOME desktop environment. Gnome can use a wide variety of spreadsheet formats including Microsoft Excel files, Lotus 1-2-3, Applix, Sylk, XBase, Open Office, Quattro Pro, Dif, Plan Perfect, and Oleo. Gnumeric's files is based on the Open Document Format.

The new 1.10 release removes the 65000 line restriction that was so common to many spreadsheet programs. From the Ostatic review...

The rest is all good news, though. Users will find plenty of improvements in Gnumeric 1.10, including better graphs with new plot types, about 40 new functions, and performance improvements for larger spreadsheets.
Gnumeric can be a great alternative to other spreadsheet programs.

Thursday, February 18, 2010

R Project named in Intelligent Enterprise 2010 Editor's Choice Awards


News from Revolutions blog says that R Project was named to the Intelligent Enterprise 2010 Editor's Choice Awards. They also state that the R Project is also one of the twelve companies to watch in Business Intelligence. For those that use R it should come as no surprise the increased notoriety that R is receiving.

Other Open Source selections from the Intelligent Enterprise 2010 Editor's Choice Awards include Apache (open source web server), Jaspersoft (open source business intelligence), and Eclipse Foundation (open source software development).

Wednesday, February 17, 2010

A review of website Programming R

Lately I have been getting deeply involved in the statistics world of R. I have recently taken a job in online marketing and I am finding R to be a very useful tool. The first thing I did was to try to find tools of the R trade. I am finding that there are a lot of great tools for R available on the internet.

One of those online tools is the website Programming R. Programming R seems to be a fairly new website that seems to be only around for about a year. The promise of Programming R is it's dedication to R users from beginner to advanced. I find the articles written for Programming R to be very concise and well written. Programming R also provides book reviews for R. This is a very helpful tool for the beginner user.

One interesting section of the Programming R website is its devotion to R consultants and R jobs. The R project itself promotes these but its refreshing to see an independent website to promote statistical jobs devoted to R.

There is also a web forum which is always useful. Unfortunately its not very featured by its users for some reason. Hopefully as the website grows the forum will be used more often.

Overall I really like Programming R for its writing and content. I recommend new and veteran R users to seek Programming R as a resource.

Monday, February 1, 2010

When to use Excel and when to use R

There is a great post in O'Reilly's Answers that talks about when to use Excel and when to use R. I have been using a lot of R lately to perform some data analysis and logistic regression. R is a great tool for statistical analysis. R is also free and open source.

I am going to be doing some more blogging about statistics and using R (R-project homepage). There is a huge amount of uses with R and I'm sure I will only hit the tip of the iceberg. If you have experience in R please let me know and I would love to share your experiences.

Tuesday, January 5, 2010

8 Open Source Business Intelligence Software

Business Intelligence software has become a necessity for data rich companies. Business Intelligence software is the software that handles the large amounts of data and transforms it into meaningful information. Of course a lot of the transformation requires technical know-how such as data mining and operations research. Linux Links has put together the top Free and Open Source Business Intelligence software available. Some I have already linked before but there are new ones on this list I was not aware.

  1. Pentaho
  2. Rapidminer
  3. JasperReports community edition
  4. iReport
  5. OpenI
  6. BIRT Project
  7. Agata Report
  8. DataVision