New book release: Data Mining Applications with R

Book title: Data Mining Applications with R
Editors: Yanchang Zhao, Yonghua Cen
Publisher: Elsevier
Publish date: December 2013
ISBN: 978-0-12-411511-8
Length: 514 pages
URL: http://www.rdatamining.com/books/dmar

An edited book titled Data Mining Applications with R was released in December 2013, which features 15 real-word applications on data mining with R.

Book preview on Google Books

R code, data and color figures for the book

Buy the book on
- Amazon
- Elsevier
- Google Books

Below is its table of contents.fig1 fig2 fig3 fig4 fig5 fig6

  • Foreword
    Graham Williams
  • Chapter 1 Power Grid Data Analysis with R and Hadoop
    Terence Critchlow, Ryan Hafen, Tara Gibson and Kerstin Kleese van Dam
  • Chapter 2 Picturing Bayesian Classifiers: A Visual Data Mining Approach to Parameters Optimization
    Giorgio Maria Di Nunzio and Alessandro Sordoni
  • Chapter 3 Discovery of emergent issues and controversies in Anthropology using text mining, topic modeling and social network analysis of microblog content
    Ben Marwick
  • Chapter 4 Text Mining and Network Analysis of Digital Libraries in R
    Eric Nguyen
  • Chapter 5 Recommendation systems in R
    Saurabh Bhatnagar
  • Chapter 6 Response Modeling in Direct Marketing: A Data Mining Based Approach for Target Selection
    Sadaf Hossein Javaheri, Mohammad Mehdi Sepehri and Babak Teimourpour
  • Chapter 7 Caravan Insurance Policy Customer Profile Modeling with R Mining
    Mukesh Patel and Mudit Gupta
  • Chapter 8 Selecting Best Features for Predicting Bank Loan Default
    Zahra Yazdani, Mohammad Mehdi Sepehri and Babak Teimourpour
  • Chapter 9 A Choquet Ingtegral Toolbox and its Application in Customer’s Preference Analysis
    Huy Quan Vu, Gleb Beliakov and Gang Li
  • Chapter 10 A Real-Time Property Value Index based on Web Data
    Fernando Tusell, Maria Blanca Palacios, María Jesús Bárcena and Patricia Menéndez
  • Chapter 11 Predicting Seabed Hardness Using Random Forest in R
    Jin Li, Justy Siwabessy, Zhi Huang, Maggie Tran and Andrew Heap
  • Chapter 12 Supervised classification of images, applied to plankton samples using R and zooimage
    Kevin Denis and Philippe Grosjean
  • Chapter 13 Crime analyses using R
    Madhav Kumar, Anindya Sengupta and Shreyes Upadhyay
  • Chapter 14 Football Mining with R
    Maurizio Carpita, Marco Sandri, Anna Simonetto and Paola Zuccolotto
  • Chapter 15 Analyzing Internet DNS(SEC) Traffic with R for Resolving Platform Optimization
    Emmanuel Herbert, Daniel Migault, Stephane Senecal, Stanislas Francfort and Maryline Laurent
Posted in Data Mining, R | Tagged , | 6 Comments

Preview of book Data Mining Applications with R

An edited book titled Data Mining Applications with R will be on market soon, which features 15 real-word applications on data mining with R. A preview of the book is available on Google Books. R code, data and color figures for the book can be downloaded at RDataMining.com.

Below is its table of contents.

  • Foreword
    Graham Williams
  • Chapter 1 Power Grid Data Analysis with R and Hadoop
    Terence Critchlow, Ryan Hafen, Tara Gibson and Kerstin Kleese van Dam
  • Chapter 2 Picturing Bayesian Classifiers: A Visual Data Mining Approach to Parameters Optimization
    Giorgio Maria Di Nunzio and Alessandro Sordoni
  • Chapter 3 Discovery of emergent issues and controversies in Anthropology using text mining, topic modeling and social network analysis of microblog content
    Ben Marwick
  • Chapter 4 Text Mining and Network Analysis of Digital Libraries in R
    Eric Nguyen
  • Chapter 5 Recommendation systems in R
    Saurabh Bhatnagar
  • Chapter 6 Response Modeling in Direct Marketing: A Data Mining Based Approach for Target Selection
    Sadaf Hossein Javaheri, Mohammad Mehdi Sepehri and Babak Teimourpour
  • Chapter 7 Caravan Insurance Policy Customer Profile Modeling with R Mining
    Mukesh Patel and Mudit Gupta
  • Chapter 8 Selecting Best Features for Predicting Bank Loan Default
    Zahra Yazdani, Mohammad Mehdi Sepehri and Babak Teimourpour
  • Chapter 9 A Choquet Ingtegral Toolbox and its Application in Customer’s Preference Analysis
    Huy Quan Vu, Gleb Beliakov and Gang Li
  • Chapter 10 A Real-Time Property Value Index based on Web Data
    Fernando Tusell, Maria Blanca Palacios, María Jesús Bárcena and Patricia Menéndez
  • Chapter 11 Predicting Seabed Hardness Using Random Forest in R
    Jin Li, Justy Siwabessy, Zhi Huang, Maggie Tran and Andrew Heap
  • Chapter 12 Supervised classification of images, applied to plankton samples using R and zooimage
    Kevin Denis and Philippe Grosjean
  • Chapter 13 Crime analyses using R
    Madhav Kumar, Anindya Sengupta and Shreyes Upadhyay
  • Chapter 14 Football Mining with R
    Maurizio Carpita, Marco Sandri, Anna Simonetto and Paola Zuccolotto
  • Chapter 15 Analyzing Internet DNS(SEC) Traffic with R for Resolving Platform Optimization
    Emmanuel Herbert, Daniel Migault, Stephane Senecal, Stanislas Francfort and Maryline Laurent
Posted in Data Mining, R | Leave a comment

Step by step to build my first R Hadoop System

by Yanchang Zhao, RDataMining.com

After reading documents and tutorials on MapReduce and Hadoop and playing with RHadoop for about 2 weeks, finally I have built my first R Hadoop system and successfully run some R examples on it. My experience and steps to achieve that are presented at http://www.rdatamining.com/tutorials/rhadoop. Hopefully it will make it easier to try RHadoop for R users who are new to Hadoop. Note that I tried this on Mac only and some steps might be different for Windows.

Before going through the complex steps, you may want to have a look what you can get with R and Hadoop. There is a video showing Wordcount MapReduce in R at http://www.youtube.com/watch?v=hSrW0Iwghtw.

If you are interested enough to try R on Handoop, please follow the steps below, whose details are available at http://www.rdatamining.com/tutorials/rhadoop.

1. Install Hadoop
2. Run Hadoop
3. Install R
4. Install RHadoop
5. Run R jobs on Hadoop
6. What’s Next

Enjoy MapReducing with R!

Posted in Big Data, R | Tagged , | 2 Comments

An excellent introduction to MapReduce and Hadoop

by Yanchang Zhao, RDataMining.com

The lectures in week 3 of a free online course Introduction to Data Science give an excellent introduction to MapReduce and Hadoop, and demonstrate with examples how to use MapReduce to do various tasks, such as, word frequency counting, matrix multiplication, simple social network analysis, and a join operation like in a relational database. There are also interesting comparisons with relational DB. The examples look simple, but they are scalable and can handle really Big Data. The course also introduces NoSQL systems.

Although the course has been closed, all lecture videos can be accessed via the “Preview” button on the course page at the above link.

They are definitely worth watching if you want to get some idea about MapReduce and Hadoop.

Posted in Big Data, Data Mining | 13 Comments

CFP: the 11th Australasian Data Mining Conference (AusDM 2013), submission extended to 31 July

*********************************************************************
The 11th Australasian Data Mining Conference (AusDM 2013)
Canberra, Australia, 13-15 November 2013, http://ausdm13.togaware.com
Join us on LinkedIn: http://www.linkedin.com/groups/AusDM-4907891
*********************************************************************

Data mining, the art and science of intelligent analysis of (usually large) data sets for meaningful (and previously unknown) insights, is now being actively applied in industries including defence, medicine, science, financial services, customer analytics, government, insurance, telecommunications, retail and distribution, transportation, and utilities.

The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. Since AusDM’02 the conference has showcased research in data mining, providing a forum for presenting and discussing the latest research and developments. Since 2006, all proceedings have been printed as volumes in the CRPIT series.

This year’s conference, AusDM’13, co-hosted with the Asian Conference on Machine Learning (ACML, http://acml2013.conference.nicta.com.au/), builds on this tradition of facilitating the cross-disciplinary exchange of ideas, experience and potential research directions. Specifically, the conference seeks to showcase: Industry Case Studies; Research Prototypes; Practical Analytics Technology; and Research Student Projects. AusDM’13 will be a meeting place for pushing forward the frontiers of data mining in industry and academia.

Publication and topics

We are calling for papers, both research and applications, and from both academia and industry, for presentation at the conference. All papers will go through double-blind, peer-review by a panel of international experts. Accepted papers will be published in an up-coming volume (Data Mining and Analytics 2013) of the Conferences in Research and Practice in Information Technology (CRPIT) series by the Australian Computer Society which is also held in full-text on the ACM Digital Library and will also be distributed at the conference. For more details on CRPIT please see http://www.crpit.com. Please note that we require that at least one author for each accepted paper will register for the conference and present their work. Selected papers will be invited to extend to publish in Journal of Research and Practice in Information Technology (http://www.jrpit.com).

AusDM invites contributions addressing current research in data mining and knowledge discovery as well as experiences, novel applications and future challenges. Topics of interest include, but are not restricted to:
- Applications and Case Studies — Lessons and Experiences
- Biomedical and Health Data Mining
- Business Analytics
- Computational Aspects of Data Mining
- Data Integration, Matching and Linkage
- Data Mining Education
- Data Preparation, Cleaning and Preprocessing
- Data Stream Mining
- Evaluation of Results and their Communication
- Implementations of Data Mining in Industry
- Integrating Domain Knowledge
- Link, Graph, Network and Process Mining
- Multimedia Data Mining
- New Data Mining Algorithms
- Professional Challenges in Data Mining
- Privacy-preserving Data Mining
- Spatial and Temporal Data Mining
- Text Mining and Web Mining
- Visual Analytics

Keynote speakers

As is tradition for AusDM we have lined up an excellent keynote speaker program. Each speaker is a well known research and/or practitioner in data mining and related disciplines. The keynote program provides an opportunity to hear from some of the world’s leaders on what the technology offers and where it is heading.

An international academic keynote presentation will be shared with the ACML conference. The two industry keynotes at AusDM 2013 will be:

- Klaus Felsche, Director Intent Management and Analytics at the Department of Immigration and Citizenship.
Title: TBC

- Dr Paul Wong, Director, Office of Research Excellence, The Australian National University.
Title: TBC (Predictive Network Analytics for Government Research Planning)

Submission of papers

We invite two types of submissions for AusDM 2013:

- Academic submissions: Normal academic submissions reporting on research progress, with a paper length of between 8 and 12 pages in CRPIT style, as detailed below. Academic submissions we will use a double-blinded review process, i.e. paper submissions must NOT include authors names or affiliations (and also not acknowledgements referring to funding bodies). Self-citing references should also be removed from the submitted papers (they can be added on after the review) for the double blind reviewing purpose.

- Industry submissions: Submissions from governments and industry can report on specific data mining implementations and experiences. Submissions in this category can be between 4 and 8 pages in CRPIT style, as detailed below. These submissions do not need to be double-blinded. A special committee made of industry representatives will assess industry submissions.

Paper submissions are required to follow the general format specified for papers in the CRPIT series by the Australian Computer Society. Submission details are available from http://crpit.com/AuthorsSubmitting.html. LaTeX styles and Word templates may be found on this site. LaTeX is the recommended typesetting package.

The electronic submissions must be in PDF only, and made through the AusDM’13 Submission Page, which will be available at http://ausdm13.togaware.com/.

Important Dates

Submission of full papers:              15 July 2013  extended to 31 July 2013 (midnight PST)
Notification of authors:                1 September 2013
Final version and author registration:  1 October 2013
Conference:                             13-15 November 2013

Organising Committee

Program Chairs (Academic)
Kok-Leong Ong, Deakin University, Melbourne
Lin Liu, University of South Australia, Adelaide

Program Chair (Industry)
Yanchang Zhao, Department of Immigration & Citizenship, Australia; and RDataMining.com

Conference Chairs
Peter Christen, The Australian National University, Canberra
Paul Kennedy, University of Technology, Sydney

Sponsorship Chair
Andrew Stranieri, University of Ballarat, Ballarat

Steering Committee Chairs
Simeon Simoff, University of Western Sydney
Graham Williams, Australian Taxation Office

Other Steering Committee Members
Peter Christen, The Australian National University, Canberra
Paul Kennedy, University of Technology, Sydney
Jiuyong Li, University of South Australia, Adelaide
Kok-Leong Ong, Deakin University, Melbourne
John Roddick, Flinders University, Adelaide
Andrew Stranieri, University of Ballarat, Ballarat
Geoff Webb, Monash University, Melbourne

Posted in Data Mining | Tagged | Leave a comment

Call for participation: DMApps 2013 – an International Workshop on Data Mining Applications in Industry and Government

Call for participation: DMApps 2013 – an International Workshop on Data Mining Applications in Industry and Government
in conjunction with PAKDD 2013, Gold Coast, Australia, April 14, 2013
http://dmapps2013.rdatamining.com

To attend the workshop, you need to register for PAKDD 2013 http://pakdd2013.pakdd.org.

DMApps 2013 Workshop Program

8:30 – 8:40    Welcome and Introduction to the Workshop. Dr Warwick Graco and Dr Inna Kolyshkina

8:40 – 9:30    Keynote speech. Behavior Computing: Discovering Complex Behavior Intelligence. Prof. Longbing Cao

9:30 – 10:00   Real-time Television ROI Tracking using Mirrored Experimental Designs. Brendan Kitts

10:00 – 10:30 Coffee Break

10:30 – 11:00  Using Scan-Statistical Correlations for Network Change Analysis. Adriel Cheng, Peter Dickinson

11:00 – 11:30  Predicting High Impact Academic Papers Using Citation Network Features. Daniel McNamara, Paul Wong, Peter Christen and Kee Siong Ng

11:30 – 12:00  Combination of effective machine learning techniques and chemometric analysis for evaluation of Bupleuri Radix through high-performance thin-layer chromatographic. Xiaoping Cheng, Hongmin Cai, Ping He and Runtiao Tian

12:00 – 12:30  An OLAP Server for Sensor Networks using Augmented Statistics Trees. Neil Dunstan

12:30 – 13:00  Indirect information linkage for OSINT through authorship analysis of aliases. Robert Layton, Charles Perez, Babiga Birregah, Paul Watters and Marc Lemercier

13:00 – 14:00 Lunch

14:00 – 14:30 Dynamic Similarity-Aware Inverted Indexing for Real-Time Entity Resolution. Banda Ramadan, Peter Christen, Huizhi Liang, David Hawking and Ross Gayler

14:30 – 15:00  Identifying dominant economic sectors and stock markets: A social network mining approach. Ram Babu Roy and Uttam Sarkar

15:00 – 15:30 Coffee Break

15:30 – 16:00  Ensemble Model of Artificial Neural Networks for Petroleum Reservoir Characterization. Fatai Anifowose, Jane Labadin and Abdulazeez Abdulraheem

16:00 – 16:30  A Comparison of Visualization Data Mining Methods for Kernel Smoothing Techniques for Cox Processes with Application To Spatial Decision Support Systems. David Rohde, Ruth Huang, Jonathan Corcoran and Gentry White

16:30 – 17:00  Parallel Sentiment Polarity Classification Method with Substring Feature Reduction. Ken Zhang and Lin Shang

17:00 – 17:30  On the Evaluation of the Homogeneous Ensembles with CV-passports. Aneesha Bakharia, Vladimir Nikulin and Tian-Hsiang Huang

17:30 – 18:00  Identifying Authoritative and Reliable Contents in Community Question Answering with Domain Knowledge. Lifan Guo and Xiaohua Hu

Posted in Data Mining | Tagged | Leave a comment

New book announcement: R and Data Mining – Examples and Case Studies

R and Data Mining: Examples and Case Studies
Author: Yanchang Zhao
Publisher: Academic Press, Elsevier
Publish date: December 2012
ISBN: 978-0-12-396963-7
Length: 256 pages
URL: http://www.rdatamining.com/books/rdm

This book introduces into using R for data mining with examples and case studies. It contains 1) examples on decision trees, random forest, regression, clustering, outlier detection, time series analysis, association rules, text mining and social network analysis; and 2) three real-world case studies.

Table of Contents and Abstracts:
http://www.rdatamining.com/books/rdm/toc

R Code and Data for the book:
http://www.rdatamining.com/books/rdm/code

Sample pages on Google Books:
http://books.google.com.au/books?id=FEOh08LBD9UC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false

Buy the book on Amazon:
http://www.amazon.com/Data-Mining-Examples-Case-Studies/dp/0123969638

Posted in Data Mining, R | Tagged , | Leave a comment