Wednesday, October 27, 2010

R references for handling Big data

The Dallas R User Group had a meeting over the weekend.  One of the discussions is the memory limitations with R.  This is a common subject among the R community and R User Groups.  There has been a lot of strides recently in allowing R to stretch its memory limitations.  I thought I would compile and share some of the best resources I have found to remedy the big data issue.

CRAN Packages
ff
This package allocates hard disk space to big data vectors.

bigmemory
This package allocates points to unused memory or points to a swap file.

Blog Articles
Taking R to the Limit:  Parallelism and Big Data

Hitting the Big DataCeiling Limit in R
While this is not a helpful article for big data it does show some of the issues R current faces.  Namely the issue of that lack of a "int64" or Long Long data type memory allocation.

Enterprise Software
Revolution R Enterprise
Revolution Analytics is creating enterprise software around R to tackle issues of big data, parallelism and threaded computing in order to speed up large data processing and analytics.

Wednesday, October 20, 2010

Friday, October 8, 2010

Data mining competition with R

There is a new data mining competition aimed at predicting preferred data mining tools in R via dataists.com.   The concept of the competition is to try to determine which R packages are preferred in the R community via their CRAN package libraries.  The developers of this new competition are also in the R community with the NY R Users Group.

I am a user of R and I am also a member of the Dallas R Users Group.  As you can imagine I find this competition very interesting because I could benefit greatly from knowing the preferred methods of implementation in the R community.  This can also be a very interesting exercise to determine preferred modeling methods.  I believe this competition will give an insight into the most common methods to apply statistical computing in the community today.

R is getting a lot of press lately.  Revolution Analytics just released the first part of a series of articles on the impacts of R and why it is hot.  R as a statistical and optimization tool is really making a play in the business community.

I am a big fan of these data mining competitions as I've written previously.  It is really interesting what you can learn about the world from the competitions especially if you do not have previous knowledge of the subject matter.  I will try to bring up these competitions from time to time.  I would be interested to know if any IEOR Tools readers have participated in these competitions and what was their experience.