A Coursera course on Machine Learning starts on 16 June

A 10-week course on Machine Learning by Andrew Ng from Stanford University will start on Coursera on 16 June. Below are descriptions of the course picked up from Coursera.

The course provides a broad introduction to machine learning, data mining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI).

The course will also draw from numerous case studies and applications, so that you’ll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.

See details and join the course at http://www.coursera.org/course/ml

Posted in Data Mining | Tagged | 1 Comment

CFP: AusDM 2014 – the 12th Australasian Data Mining Conference

*********************************************************
12th Australasian Data Mining Conference (AusDM 2014)
Brisbane, Australia
27-28 November 2014
http://ausdm14.ausdm.org/
*********************************************************

Data Mining is the art and science of intelligent analysis of (usually big) data sets for meaningful insights. Data mining is actively applied across all industries including defence, medicine, science, finance, customer relationship management, government, insurance, telecommunications, retail and distribution, transportation, and utilities.

The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. Since AusDM’02 the conference has showcased research in data mining, providing a forum for presenting and discussing the latest research and developments. Since 2006, all proceedings have been printed as volumes in the CRPIT series.

This year’s conference, AusDM’14 builds on this tradition of facilitating the cross-disciplinary exchange of ideas, experience and potential research directions. Specifically, the conference seeks to showcase: Industry Case Studies; Research Prototypes; Practical Analytics Technology; and Research Student Projects. AusDM’14 will be a meeting place for pushing forward the frontiers of data mining in industry and academia. We have lined up an excellent Keynote Speaker program.

Publication and topics
======================

We are calling for papers, both research and applications, and from both academia and industry, for publication and presentation at the conference. All papers will go through peer-review by a panel of international experts. Accepted papers will be published in an upcoming volume (Data Mining and Analytics 2014) of the Conferences in Research and Practice in Information Technology (CRPIT) series by the Australian Computer Society which is also held in full-text on the ACM Digital Library. The proceeding in electronic version will be distributed at the conference. For more details on CRPIT please see http://www.crpit.com.

This year we are introducing a new track “Industry Showcase” for industry participants to present the state-of-the-art analytics projects. These submissions can be of non-academic-publication style and will be for presentation only. These case studies and data mining experiences will not be included in the conference proceeding.

Please note that we require that at least one author for each accepted paper will register for the conference and present their work.

AusDM invites contributions addressing current research in data mining and knowledge discovery as well as experiences, novel applications and future challenges. Topics of interest include, but are not restricted to:

- Applications and Case Studies | Lessons and Experiences
– Big Data Analytics
– Biomedical and Health Data Mining
– Business Analytics
– Computational Aspects of Data Mining
– Data Integration, Matching and Linkage
– Data Mining Education
– Data Mining in Security and Surveillance
– Data Preparation, Cleaning and Preprocessing
– Data Stream Mining
– Evaluation of Results and their Communication
– Implementations of Data Mining in Industry
– Integrating Domain Knowledge
– Link, Tree, Graph, Network and Process Mining
– Multimedia Data Mining
– New Data Mining Algorithms
– Professional Challenges in Data Mining
– Privacy-preserving Data Mining
– Social Network and Social Media Mining
– Spatial and Temporal Data Mining
– Text Mining
– Visual Analytics
– Web Mining and Personalization

Submission of papers
====================

We invite three types of submissions for AusDM 2014:

- Research Track:
Normal academic submissions reporting on research progress, with a paper length of between 8 and 12 pages in CRPIT style, as detailed below. For academic submissions we will use a double-blinded review process, i.e. paper submissions must NOT include authors names or affiliations or acknowledgments referring to funding bodies. Self-citing references should also be removed from the submitted papers for the double blind reviewing purpose. These information can be added on after the review.

- Application Track:
Submissions on specific data mining implementations and experiences in government and industry settings. Submissions in this category can be between 4 and 8 pages in CRPIT style, as detailed below. A committee made of mix of academic and industry representatives will review these submissions.

- Industry Showcase:
Submissions in this track are presentation only. In this track, government and industry participants can present the case studies and their experiences without getting worried about publication. We call for an extended abstract up to two pages to assess these submissions. A special committee made of industry representatives will review these submissions.

Paper submissions in Research and Application tracks are required to follow the general format specified for papers in the CRPIT series by the Australian Computer Society. Submission details are available from http://crpit.com/AuthorsSubmitting.html. LaTeX styles and Word templates may be found on this site. LaTeX is the recommended typesetting package.

The electronic submissions must be in PDF only, and made through the AusDM’14 Submission Page at https://www.easychair.org/conferences/?conf=ausdm2014.

Important Dates
===============

Submission of abstracts: 28 July 2014
Submission of full papers: 4 August 2014 (midnight PST)
Notification of authors: 22 September 2014
Final version and author registration: 14 October 2014
Conference 27-28 November 2014

Organising Committee
====================

Program Chairs (Research)
Lin Liu, University of South Australia, Adelaide
Xue Li, University of Queensland, Brisbane, Australia

Program Chairs (Application)
Yanchang Zhao, Department of Immigration & Border Protection, Australia; and RDataMining.com
Kok-Leong Ong, Deakin University, Melbourne

Conference Chairs
Richi Nayak, Queensland University of Technology, Brisbane, Australia
Paul Kennedy, University of Technology, Sydney

Sponsorship Chair
Andrew Stranieri, University of Ballarat, Ballarat

Local Chair
Yue Xu, Brisbane, Australia

Steering Committee Chairs
Simeon Simoff, University of Western Sydney
Graham Williams, Australian Taxation Office

Other Steering Committee Members
Peter Christen, The Australian National University, Canberra
Paul Kennedy, University of Technology, Sydney
Jiuyong Li, University of South Australia, Adelaide
Kok-Leong Ong, Deakin University, Melbourne
John Roddick, Flinders University, Adelaide
Andrew Stranieri, University of Ballarat, Ballarat
Geoff Webb, Monash University, Melbourne

Join us on LinkedIn
===================
http://www.linkedin.com/groups/AusDM-4907891

Posted in Data Mining | Tagged | 1 Comment

Multidimensional Scaling (MDS) with R

This page shows Multidimensional Scaling (MDS) with R. It demonstrates with an example of automatic layout of Australian cities based on distances between them. The layout obtained with MDS is very close to their locations on a map.

At first, the data of distances between 8 city in Australia are loaded from http://rosetta.reltech.org/TC/v15/Mapping/data/dist-Aus.csv.

dist.au <- read.csv("http://rosetta.reltech.org/TC/v15/Mapping/data/dist-Aus.csv")

Alternatively, we can download the file first and then read it into R from local drive.

dist.au <- read.csv("dist-Aus.csv")
dist.au
##    X    A   AS    B    D    H    M    P    S
## 1  A    0 1328 1600 2616 1161  653 2130 1161
## 2 AS 1328    0 1962 1289 2463 1889 1991 2026
## 3  B 1600 1962    0 2846 1788 1374 3604  732
## 4  D 2616 1289 2846    0 3734 3146 2652 3146
## 5  H 1161 2463 1788 3734    0  598 3008 1057
## 6  M  653 1889 1374 3146  598    0 2720  713
## 7  P 2130 1991 3604 2652 3008 2720    0 3288
## 8  S 1161 2026  732 3146 1057  713 3288    0

Then we remove the frist column, acronyms of cities, and set them to row names.

row.names(dist.au) <- dist.au[, 1]
dist.au <- dist.au[, -1]
dist.au
##       A   AS    B    D    H    M    P    S
## A     0 1328 1600 2616 1161  653 2130 1161
## AS 1328    0 1962 1289 2463 1889 1991 2026
## B  1600 1962    0 2846 1788 1374 3604  732
## D  2616 1289 2846    0 3734 3146 2652 3146
## H  1161 2463 1788 3734    0  598 3008 1057
## M   653 1889 1374 3146  598    0 2720  713
## P  2130 1991 3604 2652 3008 2720    0 3288
## S  1161 2026  732 3146 1057  713 3288    0

After that, we run Multidimensional Scaling (MDS) with function cmdscale(), and get x and y coordinates.

fit <- cmdscale(dist.au, eig = TRUE, k = 2)
x <- fit$points[, 1]
y <- fit$points[, 2]

Then we visualise the result, which shows the positions of cities are very close to their relative locations on a map.

plot(x, y, pch = 19, xlim = range(x) + c(0, 600))
city.names <- c("Adelaide", "Alice Springs", "Brisbane", "Darwin", "Hobart", 
    "Melbourne", "Perth", "Sydney")
text(x, y, pos = 4, labels = city.names)

 

mds1

By flipping both x- and y-axis, Darwin and Brisbane are moved to the top (north), which makes it easier to compare with a map.

x <- 0 - x
y <- 0 - y
plot(x, y, pch = 19, xlim = range(x) + c(0, 600))
text(x, y, pos = 4, labels = city.names)

 

mds2

MDS is also implemented in the igraph package as layout.mds.

library(igraph)
g <- graph.full(nrow(dist.au))
V(g)$label <- city.names
layout <- layout.mds(g, dist = as.matrix(dist.au))
plot(g, layout = layout, vertex.size = 3)

mds3

 

Posted in R | Tagged | 5 Comments

New book release: Data Mining Applications with R

Book title: Data Mining Applications with R
Editors: Yanchang Zhao, Yonghua Cen
Publisher: Elsevier
Publish date: December 2013
ISBN: 978-0-12-411511-8
Length: 514 pages
URL: http://www.rdatamining.com/books/dmar

An edited book titled Data Mining Applications with R was released in December 2013, which features 15 real-word applications on data mining with R.

Book preview on Google Books

R code, data and color figures for the book

Buy the book on
Amazon
Elsevier
Google Books

Below is its table of contents.fig1 fig2 fig3 fig4 fig5 fig6

  • Foreword
    Graham Williams
  • Chapter 1 Power Grid Data Analysis with R and Hadoop
    Terence Critchlow, Ryan Hafen, Tara Gibson and Kerstin Kleese van Dam
  • Chapter 2 Picturing Bayesian Classifiers: A Visual Data Mining Approach to Parameters Optimization
    Giorgio Maria Di Nunzio and Alessandro Sordoni
  • Chapter 3 Discovery of emergent issues and controversies in Anthropology using text mining, topic modeling and social network analysis of microblog content
    Ben Marwick
  • Chapter 4 Text Mining and Network Analysis of Digital Libraries in R
    Eric Nguyen
  • Chapter 5 Recommendation systems in R
    Saurabh Bhatnagar
  • Chapter 6 Response Modeling in Direct Marketing: A Data Mining Based Approach for Target Selection
    Sadaf Hossein Javaheri, Mohammad Mehdi Sepehri and Babak Teimourpour
  • Chapter 7 Caravan Insurance Policy Customer Profile Modeling with R Mining
    Mukesh Patel and Mudit Gupta
  • Chapter 8 Selecting Best Features for Predicting Bank Loan Default
    Zahra Yazdani, Mohammad Mehdi Sepehri and Babak Teimourpour
  • Chapter 9 A Choquet Ingtegral Toolbox and its Application in Customer’s Preference Analysis
    Huy Quan Vu, Gleb Beliakov and Gang Li
  • Chapter 10 A Real-Time Property Value Index based on Web Data
    Fernando Tusell, Maria Blanca Palacios, María Jesús Bárcena and Patricia Menéndez
  • Chapter 11 Predicting Seabed Hardness Using Random Forest in R
    Jin Li, Justy Siwabessy, Zhi Huang, Maggie Tran and Andrew Heap
  • Chapter 12 Supervised classification of images, applied to plankton samples using R and zooimage
    Kevin Denis and Philippe Grosjean
  • Chapter 13 Crime analyses using R
    Madhav Kumar, Anindya Sengupta and Shreyes Upadhyay
  • Chapter 14 Football Mining with R
    Maurizio Carpita, Marco Sandri, Anna Simonetto and Paola Zuccolotto
  • Chapter 15 Analyzing Internet DNS(SEC) Traffic with R for Resolving Platform Optimization
    Emmanuel Herbert, Daniel Migault, Stephane Senecal, Stanislas Francfort and Maryline Laurent
Posted in Data Mining, R | Tagged , | 6 Comments

Preview of book Data Mining Applications with R

An edited book titled Data Mining Applications with R will be on market soon, which features 15 real-word applications on data mining with R. A preview of the book is available on Google Books. R code, data and color figures for the book can be downloaded at RDataMining.com.

Below is its table of contents.

  • Foreword
    Graham Williams
  • Chapter 1 Power Grid Data Analysis with R and Hadoop
    Terence Critchlow, Ryan Hafen, Tara Gibson and Kerstin Kleese van Dam
  • Chapter 2 Picturing Bayesian Classifiers: A Visual Data Mining Approach to Parameters Optimization
    Giorgio Maria Di Nunzio and Alessandro Sordoni
  • Chapter 3 Discovery of emergent issues and controversies in Anthropology using text mining, topic modeling and social network analysis of microblog content
    Ben Marwick
  • Chapter 4 Text Mining and Network Analysis of Digital Libraries in R
    Eric Nguyen
  • Chapter 5 Recommendation systems in R
    Saurabh Bhatnagar
  • Chapter 6 Response Modeling in Direct Marketing: A Data Mining Based Approach for Target Selection
    Sadaf Hossein Javaheri, Mohammad Mehdi Sepehri and Babak Teimourpour
  • Chapter 7 Caravan Insurance Policy Customer Profile Modeling with R Mining
    Mukesh Patel and Mudit Gupta
  • Chapter 8 Selecting Best Features for Predicting Bank Loan Default
    Zahra Yazdani, Mohammad Mehdi Sepehri and Babak Teimourpour
  • Chapter 9 A Choquet Ingtegral Toolbox and its Application in Customer’s Preference Analysis
    Huy Quan Vu, Gleb Beliakov and Gang Li
  • Chapter 10 A Real-Time Property Value Index based on Web Data
    Fernando Tusell, Maria Blanca Palacios, María Jesús Bárcena and Patricia Menéndez
  • Chapter 11 Predicting Seabed Hardness Using Random Forest in R
    Jin Li, Justy Siwabessy, Zhi Huang, Maggie Tran and Andrew Heap
  • Chapter 12 Supervised classification of images, applied to plankton samples using R and zooimage
    Kevin Denis and Philippe Grosjean
  • Chapter 13 Crime analyses using R
    Madhav Kumar, Anindya Sengupta and Shreyes Upadhyay
  • Chapter 14 Football Mining with R
    Maurizio Carpita, Marco Sandri, Anna Simonetto and Paola Zuccolotto
  • Chapter 15 Analyzing Internet DNS(SEC) Traffic with R for Resolving Platform Optimization
    Emmanuel Herbert, Daniel Migault, Stephane Senecal, Stanislas Francfort and Maryline Laurent
Posted in Data Mining, R | Leave a comment

Step by step to build my first R Hadoop System

by Yanchang Zhao, RDataMining.com

After reading documents and tutorials on MapReduce and Hadoop and playing with RHadoop for about 2 weeks, finally I have built my first R Hadoop system and successfully run some R examples on it. My experience and steps to achieve that are presented at http://www.rdatamining.com/big-data/rhadoop. Hopefully it will make it easier to try RHadoop for R users who are new to Hadoop. Note that I tried this on Mac only and some steps might be different for Windows.

Before going through the complex steps, you may want to have a look what you can get with R and Hadoop. There is a video showing Wordcount MapReduce in R at http://www.youtube.com/watch?v=hSrW0Iwghtw.

If you are interested enough to try R on Handoop, please follow the steps below, whose details are available at http://www.rdatamining.com/big-data/rhadoop.

1. Install Hadoop
2. Run Hadoop
3. Install R
4. Install RHadoop
5. Run R jobs on Hadoop
6. What’s Next

Enjoy MapReducing with R!

Posted in Big Data, R | Tagged , | 2 Comments

An excellent introduction to MapReduce and Hadoop

by Yanchang Zhao, RDataMining.com

The lectures in week 3 of a free online course Introduction to Data Science give an excellent introduction to MapReduce and Hadoop, and demonstrate with examples how to use MapReduce to do various tasks, such as, word frequency counting, matrix multiplication, simple social network analysis, and a join operation like in a relational database. There are also interesting comparisons with relational DB. The examples look simple, but they are scalable and can handle really Big Data. The course also introduces NoSQL systems.

Although the course has been closed, all lecture videos can be accessed via the “Preview” button on the course page at the above link.

They are definitely worth watching if you want to get some idea about MapReduce and Hadoop.

Posted in Big Data, Data Mining | 14 Comments