Canberra IAPA Seminar – Text Analytics: Natural Language into Big Data – 17 February

Topic: Text Analytics: Natural Language into Big Data
Speaker: Dr. Leif Hanlen, Technology Director at NICTA
Date: Tuesday 17 February
Time: 5.30pm for a 6pm start
Cost: Nil
Where: SAS Offices, 12 Moore Street, Canberra, ACT 2600
Registration URL:

We outline several activities in NICTA relating to understanding and mining free text. Our approach is to develop agile service-focussed solutions that provide insight into large text corpora, and allow end users to incorporate current text documents into standard numerical analysis technologies.

Dr. Leif Hanlen is Technology Director at NICTA, Australia’s largest ICT research centre. Leif is also an adjunct Associate Professor of ICT at the Australian National University and an adjunct Professor of Health at the University of Canberra. He received a BEng (Hons I) in electrical engineering, BSc (Comp Sci) and PhD (telecomm) from the University of Newcastle Australia. His research focusses on applications Machine Learning to text processing.

Please feel free to forward this invite to your friends and colleagues who might be interested. Thanks.

Posted in Big Data, Data Mining | Tagged , | 5 Comments

Recordings of RStudio Webinar Series on Essential Tools for Data Science with R

by Yanchang Zhao,

RStudio recently ran a series of live webinars on Essential Tools for Data Science with R, but it is inconvenient for people from other time zones to attend. Fortunately, the recordings have been made available online, which you can watch if you haven’t attended the live webinars. Below is a list of recordings.

1. The Grammar and Graphics of Data Science
– dplyr: a grammar of data manipulation – Hadley Wickham
– ggvis: Interactive graphics in R – Winston Chang
– URL:

2. Reproducible Reporting
– The Next Generation of R Markdown – Jeff Allen
– Knitr Ninja – Yihui Xie
– Packrat – A Dependency Management System for R – J.J. Allaire & Kevin Ushey
– URL:

3. Interactive Reporting
– Embedding Shiny Apps in R Markdown documents – Garrett Grolemund
– Shiny: R made interactive – Joe Cheng
– URL:

Posted in R | Tagged | 4 Comments

R and Data Mining – Examples and Case Studies now in Chinese

My book titled R and Data Mining – Examples and Case Studies now has its Chinese version, translated by researchers at South China University of Technology, and published by China Machine Press in September 2014. It is sold in China only, at a price of RMB 49 Yuan. If you are in China, it is an opportunity to get a copy of the book at a bargain price.

Details of the book is available at, and its original English version can be bought from Amazon at

Its first 11 chapters can be downloaded for free at, and R code and data for the book are available at

RDataMining book in Chinese

Posted in Data Mining, R | Tagged , | 2 Comments

R and Data Mining Workshop at AusDM 2014, Brisbane, 27 November

R and Data Mining Workshop at AusDM 2014

There will be a half-day workshop on R and Data Mining at the AusDM 2014 conference in Brisbane, Thursday afternoon, 27 November. The workshop will be composed of several sessions on data mining with R, including

  • Introduction to Data Mining with R
  • Association Rule Mining with R
  • Text Mining with R — an Analysis of Twitter Data
  • Regression and Classification with R
  • Data Clustering with R

Examples of R code will be presented at all sessions. At the end of every session, attendees will have 10 to 15 minutes to practice with the provided R code on computers.

If you are interested in attending the workshop or AusDM 2014, you can still register for the conference by Wednesday 26 November at The workshop is included in conference registration.

If you cannot attend the conference, you can find the workshop details and download its slides at

Posted in Data Mining, R | Tagged , | Leave a comment

Slides of keynote speeches, tutorials and panelist presentations at IEEE Big Data 2014

Slides of keynote speeches, tutorials and panelist presentations at the 2014 IEEE International Conference on Big Data can be found at the conference website at links below.

(1) Keynote speech
– Never-Ending Language Learning, Tom Mitchell – E. Fredkin University Professor, Machine Learning Department, Carnegie Mellon University
– Smart Data – How you and I will exploit Big Data for personalized digital health and many other activities, Amit Sheth, LexisNexis Ohio Eminent Scholar, Kno.e.sis – Wright State University
– Addressing Human Bottlenecks in Big Data, Joseph M. Hellerstein, Chancellor’s Professor of Computer Science, University of California, Berkeley and Trifacta

(2) Tutorials
– Big Data Stream Mining
Presenters: Gianmarco De Francisci Morales, Joao Gama, Albert Bifet, andWei Fan
– Big ML Software for Modern ML Algorithms
Presenters: Eric P. Xing and Qirong Ho
– Large-scale Heterogeneous Learning in Big Data Analytics
Presenters: Jun Huan
– Big Data Benchmarking
Presenters:  Chaitan Baru and Tilmann Rabl

(3) Panel: Big Data Challenges and Opportunities

Posted in Big Data, Data Mining | Tagged , | 1 Comment

Free Stanford online course on Statistical Learning (with R) starting on 19 Jan 2015

This is an introductory-level course in supervised learning, with a focus on regression and classification methods. The syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines. Some unsupervised learning methods are discussed: principal components and clustering (k-means and hierarchical).

The lectures cover all the material in An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013). As of January 5, 2014, the pdf for this book will be available for free, with the consent of the publisher, on the book website.

Classes Start: Jan 19, 2015
Classes End: Apr 03, 2015
Course Staff: Prof. Trevor Hastie, Prof. Rob Tibshirani
Price: Free

Posted in Data Mining, R | Tagged , | 4 Comments

AusDM 2014 Conference Program

The Program of AusDM 2014 Conference is now available at It features two keynote talks, one on Learning in Sequential Decision Problems by Prof Peter Bartlett from UC Berkeley, and the other on Making Sense of a Random World through Statistics by Prof Geoff McLachlan from University of Queensland. It also has a half-day workshop on R and Data Mining, providing hands-on experience on data mining with R. Moreover, there will be 24 presentations of accepted papers, covering topics on machine learning, information retrieval, health & bioinformatics,  collaborative filtering & recommendation, clustering, data fusion, record linkage and sensor networks.

See detailed conference program at and register for the conference at

Posted in Data Mining, R | Tagged , | 1 Comment