Wednesday, October 27, 2010

R references for handling Big data

The Dallas R User Group had a meeting over the weekend.  One of the discussions is the memory limitations with R.  This is a common subject among the R community and R User Groups.  There has been a lot of strides recently in allowing R to stretch its memory limitations.  I thought I would compile and share some of the best resources I have found to remedy the big data issue.

CRAN Packages
ff
This package allocates hard disk space to big data vectors.

bigmemory
This package allocates points to unused memory or points to a swap file.

Blog Articles
Taking R to the Limit:  Parallelism and Big Data

Hitting the Big DataCeiling Limit in R
While this is not a helpful article for big data it does show some of the issues R current faces.  Namely the issue of that lack of a "int64" or Long Long data type memory allocation.

Enterprise Software
Revolution R Enterprise
Revolution Analytics is creating enterprise software around R to tackle issues of big data, parallelism and threaded computing in order to speed up large data processing and analytics.

5 comments:

Quant said...

I am not happy with bigmemory, to be honest, due to its inability to deal with multiple format data. For example, read.table() can easily import a CSV with the first column "character", second column "double", third column "character", and some possible missing values, but read.big.matrix() always returns error, maybe it is my mistake. Any thought? thanks.

Zach said...

I had no idea that Dallas had an R group—I live in Fort Worth and have been looking for one (just saw your post through R bloggers). Definitely good to know. I'll plan on meeting up with you guys at the next meeting.

bbolker said...
This comment has been removed by the author.
bbolker said...

What about the 'large memory' section in the High Performance Computing task view on CRAN?

Larry said...

Quant, I agree with your assessment of bigmemory. I wish it could work with more data formats. If it could it would be a really promising option.

Zach, The next Dallas RUG meeting is this Sat. Dec 4th 10AM at Dallas Univ. Hope to see you there!