Recordings of RStudio Webinar Series on Essential Tools for Data Science with R

by Yanchang Zhao,

RStudio recently ran a series of live webinars on Essential Tools for Data Science with R, but it is inconvenient for people from other time zones to attend. Fortunately, the recordings have been made available online, which you can watch if you haven’t attended the live webinars. Below is a list of recordings.

1. The Grammar and Graphics of Data Science
– dplyr: a grammar of data manipulation – Hadley Wickham
– ggvis: Interactive graphics in R – Winston Chang
– URL:

2. Reproducible Reporting
– The Next Generation of R Markdown – Jeff Allen
– Knitr Ninja – Yihui Xie
– Packrat – A Dependency Management System for R – J.J. Allaire & Kevin Ushey
– URL:

3. Interactive Reporting
– Embedding Shiny Apps in R Markdown documents – Garrett Grolemund
– Shiny: R made interactive – Joe Cheng
– URL:

Posted in R | Tagged | 4 Comments

R and Data Mining – Examples and Case Studies now in Chinese

My book titled R and Data Mining – Examples and Case Studies now has its Chinese version, translated by researchers at South China University of Technology, and published by China Machine Press in September 2014. It is sold in China only, at a price of RMB 49 Yuan. If you are in China, it is an opportunity to get a copy of the book at a bargain price.

Details of the book is available at, and its original English version can be bought from Amazon at

Its first 11 chapters can be downloaded for free at, and R code and data for the book are available at

RDataMining book in Chinese

Posted in Data Mining, R | Tagged , | 2 Comments

R and Data Mining Workshop at AusDM 2014, Brisbane, 27 November

R and Data Mining Workshop at AusDM 2014

There will be a half-day workshop on R and Data Mining at the AusDM 2014 conference in Brisbane, Thursday afternoon, 27 November. The workshop will be composed of several sessions on data mining with R, including

  • Introduction to Data Mining with R
  • Association Rule Mining with R
  • Text Mining with R — an Analysis of Twitter Data
  • Regression and Classification with R
  • Data Clustering with R

Examples of R code will be presented at all sessions. At the end of every session, attendees will have 10 to 15 minutes to practice with the provided R code on computers.

If you are interested in attending the workshop or AusDM 2014, you can still register for the conference by Wednesday 26 November at The workshop is included in conference registration.

If you cannot attend the conference, you can find the workshop details and download its slides at

Posted in Data Mining, R | Tagged , | Leave a comment

Slides of keynote speeches, tutorials and panelist presentations at IEEE Big Data 2014

Slides of keynote speeches, tutorials and panelist presentations at the 2014 IEEE International Conference on Big Data can be found at the conference website at links below.

(1) Keynote speech
– Never-Ending Language Learning, Tom Mitchell – E. Fredkin University Professor, Machine Learning Department, Carnegie Mellon University
– Smart Data – How you and I will exploit Big Data for personalized digital health and many other activities, Amit Sheth, LexisNexis Ohio Eminent Scholar, Kno.e.sis – Wright State University
– Addressing Human Bottlenecks in Big Data, Joseph M. Hellerstein, Chancellor’s Professor of Computer Science, University of California, Berkeley and Trifacta

(2) Tutorials
– Big Data Stream Mining
Presenters: Gianmarco De Francisci Morales, Joao Gama, Albert Bifet, andWei Fan
– Big ML Software for Modern ML Algorithms
Presenters: Eric P. Xing and Qirong Ho
– Large-scale Heterogeneous Learning in Big Data Analytics
Presenters: Jun Huan
– Big Data Benchmarking
Presenters:  Chaitan Baru and Tilmann Rabl

(3) Panel: Big Data Challenges and Opportunities

Posted in Big Data, Data Mining | Tagged , | 1 Comment

Free Stanford online course on Statistical Learning (with R) starting on 19 Jan 2015

This is an introductory-level course in supervised learning, with a focus on regression and classification methods. The syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines. Some unsupervised learning methods are discussed: principal components and clustering (k-means and hierarchical).

The lectures cover all the material in An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013). As of January 5, 2014, the pdf for this book will be available for free, with the consent of the publisher, on the book website.

Classes Start: Jan 19, 2015
Classes End: Apr 03, 2015
Course Staff: Prof. Trevor Hastie, Prof. Rob Tibshirani
Price: Free

Posted in Data Mining, R | Tagged , | 4 Comments

AusDM 2014 Conference Program

The Program of AusDM 2014 Conference is now available at It features two keynote talks, one on Learning in Sequential Decision Problems by Prof Peter Bartlett from UC Berkeley, and the other on Making Sense of a Random World through Statistics by Prof Geoff McLachlan from University of Queensland. It also has a half-day workshop on R and Data Mining, providing hands-on experience on data mining with R. Moreover, there will be 24 presentations of accepted papers, covering topics on machine learning, information retrieval, health & bioinformatics,  collaborative filtering & recommendation, clustering, data fusion, record linkage and sensor networks.

See detailed conference program at and register for the conference at

Posted in Data Mining, R | Tagged , | 1 Comment

SBS documentary “The Age of Big Data”

by Yanchang Zhao,

“Data is becoming a powerful and most valuable commodity in 21st century. It is leading to scientific insights and new ways of understanding human behaviour. Data can also make you rich. Very rich.”
— SBS documentary “The Age of Big Data”

Last Friday, there was an interesting documentary on SBS, “The Age of Big Data”. It presented applications of data mining and big data in crime detection, medicine, financial market, advertising and astronomy.

It started with Los Angeles police driving a car with a laptop in front of them, which guided them with possible crime hotspots in next 24 hours produced by data mining models. University researchers have used similar models to predict earthquake after-shocks, and they are using such models to predict human behaviours and crime hotspots.

It then showed applications of DNA and genome analysis in medicine for diagnosis, predicting price variations for trading in financial market, decision theory used by NASA for selecting the best one out of 35 billion possible Man-to-Mars missions. It also showed how data mining was used for advertising by predicting what people might want to buy, which might get clues about that even before people realize by themselves! It ends with application in astronomy where a telescopy array is collecting 30 Terabytes of data per second, to unlock the secret of university.

Although it has talked nothing about big data techniques like Hadoop, it is an easy-to-understand introduction of data mining and big data for people who know nothing or little about it, like your boss, family and friends. It provides an opportunity to educate them and let them know what you are doing.

The video is available on SBSonDemand at You can also find it at

Again, I love the statement given at the very beginning of this post, and am looking forward to getting rich one day. :-)

Posted in Big Data, Data Mining | Tagged , | 4 Comments