Saturday, February 16, 2013

Google Statistician uses R and other programming tools

A great interview on the Simply Statistics blog with Google's Nick Chamandy, Phd in Statistics.  Explains that he mainly uses R among other tools to perform his work at Google.  Also of note is the active data science community within Google that uses R as well as some other interesting tools.  Note that they use a lot of data at Google, understandably, and that R usually can not handle the size.  They do a lot of ad hoc reduction of the data with tools like map reduce, Go, and even an R API.  I would love to see how they use the R API to assimilate data.

An interesting insight from the interview is the amount of programming done by the Statisticians.  It seems the culture at Google is to foster autonomy and let the modelers develop their own data manipulation from the raw data.  This requires a broader skillset beyond the statistical analysis tools.

I've found in my work that having knowledge in many tools like R,  CPLEX, and GLPK allows me to be a more effective in my work.  Recently I've been learning a lot of SQL using the PostgreSQL platform.  The tools of SQL combined with statistical tools like R make for a very strong combination.  I'm very agile in my work and can do a varied number of decision analysis.