Monday, March 27, 2017

Introducing Boardgame Recommendation website built with Shiny

My family and I love boardgames.  We are also on the lookout for new ones that would fit with the ones we already like to play.  So I decided to create a boardgame recommendation website

https://larrydag.shinyapps.io/boardgame_reco/

I built it using R.  The recommendation engine uses a very simple collaborative filtering algortihm based on correlation scores from other boardgame players collection lists.  The collections are gathered using the API from BoardgameGeek.com.  It is very much in a beta project phase as I just wanted to get something built to get working.

I also wanted another project to build in Shiny.  I really like how easy it is to publish R projects with Shiny.

Some of the features include:

  • Ability to enter your own collection
  • Get recommendation on your collection
  • Amazon link to buy boardgame that is recommended
Its a work in progress.  There is much to clean up and to make more presentable.  Please take a look and offer comments to help improve the website. 

Wednesday, November 16, 2016

Microsoft allows trying its new SQL Server for Linux

This is a great time to have tools to do data analysis.  Microsoft is now allowing evaluations of its new SQL Server.  This new SQL Server can now be deployed to a Linux OS environment.  This is big news especially for those that have followed this blog through the years.

The new SQL Server version is also carries new analytic tools such as R.  From this Techcrunch announcement

The new version of SQL Server will include improved support for R Services and a number of new machine learning and deep neural networking features.

These are exciting times in deed.  I hope Microsoft continues this path to bringing new tools to the market.

Monday, October 17, 2016

Microsoft releases LightGBM

Microsoft has been really increasing their development of tools in the predictive analytics and machine learning space.  Another such tool they released recently is LightGBM.  From the Github site...

LightGBM is a fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Microsoft is definitely increasing their attempts to capitalize on the machine learning and big data movement.  I hope they continue to develop tools such as LightGBM and R with SQL Server.

Wednesday, January 13, 2016

Revolution R is now Microsoft R

There has been a huge shift in the force.  Can you feel it?  Today Microsoft is announcing that Revolution R is now Microsoft R.  This includes the Open R version as well.  Some notable features of Microsoft Open R.


  • Open R 3.2.2 is fully compatible with R 3.2.2
  • Microsoft, Mac OS X, and Linux support (wow!)
  • Available free to download
  • Multi-threaded math libraries
  • Enterprise version available (for a price)

This is an interesting shift for Microsoft in the analytics space.  Microsoft is flexing it's server muscles to show that it can play in the data science field as well.  I'm wondering if the industry is going to shift.  We know that IBM, SAS, and Oracle are pushing forward with cloud analytics.  Microsoft is showing a major commitment to partner with the R community.  I think this is a bold and wise move as R has shown nothing but growth the last few years.

Friday, December 19, 2014

Statistical Analysis and Data Mining hot on LinkedIn for 2014

Statistical Analysis and Data Mining are considered the hottest skills on LinkedIn for 2014.  According to their report from analyzing jobs and recruiters on the LinkedIn website.  I would say its safe to say that it will continue to be hot for 2015.

If you are looking to get your skills honed up I would suggest looking at ComputerWorld's Beginners Guide to R.  It looks like a complete tutorial and is indexed rather well.

Monday, September 22, 2014

Interesting high contrast plots in R

I was inspired by this blog post and thought I could do the same thing in R.  Well I posted the code in Google+


Here are my results.  Not bad.


Monday, August 4, 2014

Introducing the Shiny App DThiring

Well it has a been a long time since I have written anything on this blog.  I am long overdue.  I've been terribly busy learning new things and getting on with life.  One of the things I have learned is building R applications using Shiny developed by RStudio.   The folks at RStudio have also created a way to deploy Shiny apps using Shinyapps.io.  Follow the link to DThiring

http://larrydag.shinyapps.io/dthiring/

I'm a big fan of Data Tau which is a Data Science equivalent to Hacker News.  Like Hacker News, Data Tau has a Who's Hiring comment on the first of every month.  It is a good resource for those looking to see relevant jobs in the Data Science world.  Well someone created a hiring listings aggregator called http://hnhiring.me/.  I decided I wanted to see if I could build a similar application using R and Shiny.

I will be posting the source code for this app to github in the near future.

If you have any ideas on how to improve this application let me know in this comment section.

Saturday, March 15, 2014

OpenOpt 0.53


I'm glad to inform you about new OpenOpt Suite release 0.53:
    Stochastic programming addon now is available for free
    Some minor changes
Regards, Dmitrey.

Sunday, December 15, 2013

OpenOpt Suite release 0.52

I'm glad to inform you about new OpenOpt Suite release 0.52 (2013-Dec-15):
    Minor interalg speedup
    oofun expression
    MATLAB solvers fmincon and fsolve have been connected
    Several MATLAB ODE solvers have been connected
    New ODE solvers, parameters abstol and reltol
    New GLP solver: direct
    Some minor bugfixes and improvements
Regards, Dmitrey.

Sunday, September 15, 2013

New OpenOpt Suite release 0.51


New OpenOpt suite v 0.51 has been released:
  • Some improvements for  FuncDesigner  automatic differentiation and QP
  • FuncDesigner now can model sparse (MI)(QC)QP
  • Octave QP solver has been connected
  • MATLAB solvers linprog (LP), quadprog (QP), lsqlin (LLSP), bintprog (MILP)
  • New NLP solver: knitro
  • Some elements of 2nd order interval analysis, mostly for interalg
  • Some interalg improvements
  • interalg can directly handle (MI)LP and (possibly nonconvex) (MI)(QC)QP
  • New classes: knapsack problem (KSP), bin packing problem (BPP), dominating set problem (DSP)
  • FuncDesigner can model SOCP
  • SpaceFuncs  has been adjusted for recent versions of Python and NumPy
visit http://openopt.org for more details.

Saturday, June 15, 2013

new OpenOpt Suite release 0.50

Hi all,
I'm glad to inform you about new OpenOpt Suite release 0.50 (2013-June-15):

    * interalg (solver with specifiable accuracy) now works many times (sometimes orders) faster on (possibly multidimensional) integration problems (IP) and on some optimization problems
    * Add modeling dense (MI)(QC)QP in FuncDesigner (alpha-version, rendering may work slowly yet)
    * Bugfix for cplex wrapper
    * Some improvements for FuncDesigner interval analysis (and thus interalg)
    * Add FuncDesigner interval analysis for tan in range(-pi/2,pi/2)
    * Some other bugfixes and improvements
    * (Proprietary) FuncDesigner stochastic addon now is available as standalone pyc-file, became available for Python3 as well

Regards, Dmitrey.

Friday, March 15, 2013

OpenOpt Suite release 0.45

I'm glad to inform you about new OpenOpt Suite release 0.45 (2013-March-15):
  * Essential improvements for FuncDesigner interval analysis (thus affect interalg)
  * Temporary walkaround for a serious bug in FuncDesigner automatic differentiation kernel due to a bug in some versions of Python or NumPy, may affect optimization problems, including (MI)LP, (MI)NLP, TSP etc
  * Some other minor bugfixes and improvements

Saturday, February 16, 2013

Google Statistician uses R and other programming tools

A great interview on the Simply Statistics blog with Google's Nick Chamandy, Phd in Statistics.  Explains that he mainly uses R among other tools to perform his work at Google.  Also of note is the active data science community within Google that uses R as well as some other interesting tools.  Note that they use a lot of data at Google, understandably, and that R usually can not handle the size.  They do a lot of ad hoc reduction of the data with tools like map reduce, Go, and even an R API.  I would love to see how they use the R API to assimilate data.

An interesting insight from the interview is the amount of programming done by the Statisticians.  It seems the culture at Google is to foster autonomy and let the modelers develop their own data manipulation from the raw data.  This requires a broader skillset beyond the statistical analysis tools.

I've found in my work that having knowledge in many tools like R,  CPLEX, and GLPK allows me to be a more effective in my work.  Recently I've been learning a lot of SQL using the PostgreSQL platform.  The tools of SQL combined with statistical tools like R make for a very strong combination.  I'm very agile in my work and can do a varied number of decision analysis.

Saturday, December 15, 2012

OpenOpt Suite release 0.43


I'm glad to inform you about new OpenOpt release 0.43 (2012-Dec-15):

    * interalg now can solve SNLE in 2nd mode (parameter dataHandling = "raw", before - only "sorted")
    * Many other improvements for interalg
    * Some improvements for FuncDesigner kernel
    * FuncDesigner ODE now has 3 arguments instead of 4 (backward incompatibility!), e.g. {t: np.linspace(0,1,100)} or mere np.linspace(0,1,100) if your ODE right side is time-independend
    * FuncDesigner stochastic addon  now can handle some problems with gradient-based NLP / NSP solvers
    * Many minor improvements and some bugfixes

Visit  openopt.org  for more details.

Regards, D.


Tuesday, September 25, 2012

Day in the life of a Data Scientist

A great read from the Decomposition blog about the day in the life of a Data Scientist.  I consider myself a Data Scientist by any other name.  The blog article by Sean does a great job of breaking down the essence of making better decisions for the organization you may be involved.

I've always thought asking good questions is the start of good analysis.  The organizations basically doesn't know what it doesn't know.  A good Data Scientist will be a like a sleuth looking for clues.  In all honesty that may be the most fun about being a Data Scientist.

Saturday, September 15, 2012

OpenOpt Suite release 0.42


Hi all,

I'm glad to inform you about new OpenOpt Suite release 0.42 (2012-Sept-15), fa free Python-written cross-platform software with primal focus on numerical optimization. Main changes:

*    Some improvements for solver interalg, including handling of categorical variables
*    Some parameters for solver gsubg
*    Speedup objective function for de and pswarm on FuncDesigner models
*    New global (GLP) solver: asa (adaptive simulated annealing)
*    Some new classes for network problems: TSP (traveling salesman problem), STAB (maximum graph stable set)], MCP (maximum clique problem)
*    Improvements for FD XOR (and now it can handle many inputs)
*    Solver de has parameter "seed", also, now it works with PyPy
*    Function sign now is available in FuncDesigner
*    FuncDesigner interval analysis (and thus solver interalg) now can handle non-monotone splines of 1st order
*    FuncDesigner now can handle parameter fixedVars as Python dict
*    Now scipy InterpolatedUnivariateSpline is used in FuncDesigner interpolator() instead of UnivariateSpline. This creates backward incompatibility - you cannot pass smoothing parameter (s) to interpolator no longer.
*    SpaceFuncs: add Point weight, Disk, Ball and method contains(), bugfix for importing Sphere, some new examples
*    Some improvements (essential speedup, new parameter interpolate for P()) for our (currently commercial) FuncDesigner Stochastic Programming addon
*    Some bugfixes

In our website ( http://openopt.org ) you could vote for most required OpenOpt Suite development direction(s) (poll has been renewed, previous results are here).

Regards, D.

Monday, September 10, 2012

Upgrade your skill sets with free courses

We are in the midst of the Insight Age.  We have moved beyond capturing data and are now processing information.  Properly processing the large amounts of data requires knowlege and skill sets.  Fortunately there are many ways to develop those skills.

Class Central is a website that provides a complete list of free online courses from some of the most established and prestigious universities in the world.  Websites like these are helping to make the world smaller by providing free and accessible learning resources.

I am a big fan of open courseware.  There are plenty of other places to look for open coureses.  The Open Courseware Consortium is a useful resource.  A good metasearch site like OpenCourseWare Finder is valuable as well. 

Monday, July 2, 2012

Popularity of R continues

No doubt those that read my blog know that the tools I use to do my Industrial Engineering and Operations Research work heavily rely on the open source side of software.  That is why I try to support as many open source projects such as COIN-OR, GLPK, and OpenOpt.  One tool that I love to perform Applied Math and Statistics is the statistical computing platform R.  So it comes as no surprise that I like to see how R is growing and its popularity among programmers.

A recent blog from RedMonk produced results of a programming language popularity study.  The study involved ranking popularity using common social media online sites such as Stack Overflow and GitHub.  These sites draw in a lot of programmers for their popularity around Q&A and code review.  I was surprised to see that R ranks highly compared to some very prominant programming languages.


Also interesting to note that the only other "Data Science" type of programming language I could find was Matlab.  As far as I could tell SAS, S, SPSS, Stata are still rather popular but apparently not among the programming community.

Friday, June 15, 2012

OpenOpt Suite 0.39

Hi all,

I'm glad to inform you about new OpenOpt release 0.39 (quarterly since 2007).

OpenOpt is free, even for commercial purposes, cross-platform software for mathematical modeling and (mainstream) optimization. Our website have reached 259 visitors daily, that is same to tomopt.com and ~ 1/3 of gams.com ( details ).

In the new release:
  • interalg (medium-scaled solver with specifiable accuracy abs(f-f*) <= fTol): add categorical variables and general logical constraints, many other improvements
  • Some improvements for automatic differentiation
  • DerApproximator and some OpenOpt/FuncDesigner functionality now works with PyPy (Python with dinamic compilation, some problems are solved several times faster now)
  • New solver lsmr for dense/sparse LLSP (linear least squares)
  • Some bugfixes and some other changes
In our website (openopt.org) you could vote for most required OpenOpt Suite development direction(s).

Monday, May 21, 2012

National Registry of Exonerations charts with R

According to recent news (dallasnews.com) there is a new release of a public national database for wrongful convictions.  There are plenty of details in the public list including Age, Race, and how the conviction was overturned.  According to the database it seems that most of the convictions were overturned due to DNA evidence.

I thought it would be interesting to plot summaries of the details using the open source statistical computing environment R Project.  The following are the plots from the National Registry of Exonerations database.



Here is the R code used to create the above pie charts.


# National Registry of Exonerations
# pie charts

library(XML)

u <- "http://www.law.umich.edu/special/exoneration/Pages/detaillist.aspx"

listu <- readHTMLTable(u)

exondf <- listu[[7]]
data <- exondf[24:nrow(exondf),]
names(data) <- as.character(unlist(exondf[4,]))

# transform data
data$Age <- droplevels(data$Age)
data$Race <- droplevels(data$Race)
data$State <- droplevels(data$State)
data$Crime <- droplevels(data$Crime)
data$Sentence <- droplevels(data$Sentence)
data$Convicted <- droplevels(data$Convicted)
data$Exonerated <- droplevels(data$Exonerated)

data$AgeCNV <- as.numeric(as.character(data$Age))
data$ConvictedCNV <- as.numeric(as.character(data$Convicted))
data$ExoneratedCNV <- as.numeric(as.character(data$Exonerated))

data$AgeCNV_floor <- floor(data$AgeCNV/10)*10
data$ConfinedYrs <- data$ExoneratedCNV - data$ConvictedCNV
data$ConfinedYrs_floor <- floor(data$ConfinedYrs/5)*5

# plot pie charts

LABELS <- c("10-19","20-29","30-39","40-49","50-59","60-69","")
pie(table(data$AgeCNV_floor), labels=LABELS, main="Age Exonerated")

pie(table(data$Race), main="Race")

pie(tail(sort(table(data$State)),10), main="Top 10 States")

LABELS <- c("0-4","5-9","10-14","15-19","20-24","25-29","30-34","35+")
pie(table(data$ConfinedYrs_floor), labels=LABELS, main="Years Confined")