R code and data for book “R and Data Mining: Examples and Case Studies” are now available at http://www.rdatamining.com/books/rdm/code. An online PDF version of the book (the first 11 chapters only) can also be downloaded at http://www.rdatamining.com/docs.

Below are its details and table of contents.

Book title: **R and Data Mining: Examples and Case Studies**

Author: Yanchang Zhao

Publisher: Elsevier

Publish date: December 2012

ISBN: 978-0-123-96963-7

234 pages

URL: http://www.rdatamining.com/books/rdm

**Table of Contents**

1 Introduction

1.1 Data Mining

1.2 R

1.3 Datasets

1.3.1 The Iris Dataset

1.3.2 The Bodyfat Dataset

2 Data Import and Export

2.1 Save and Load R Data

2.2 Import from and Export to .CSV Files

2.3 Import Data from SAS

2.4 Import/Export via ODBC

2.4.1 Read from Databases

2.4.2 Output to and Input from EXCEL Files

3 Data Exploration

3.1 Have a Look at Data

3.2 Explore Individual Variables

3.3 Explore Multiple Variables

3.4 More Explorations

3.5 Save Charts into Files

4 Decision Trees and Random Forest

4.1 Decision Trees with Package party

4.2 Decision Trees with Package rpart

4.3 Random Forest

5 Regression

5.1 Linear Regression

5.2 Logistic Regression

5.3 Generalized Linear Regression

5.4 Non-linear Regression

6 Clustering

6.1 The k-Means Clustering

6.2 The k-Medoids Clustering

6.3 Hierarchical Clustering

6.4 Density-based Clustering

7 Outlier Detection

7.1 Univariate Outlier Detection

7.2 Outlier Detection with LOF

7.3 Outlier Detection by Clustering

7.4 Outlier Detection from Time Series

7.5 Discussions

8 Time Series Analysis and Mining

8.1 Time Series Data in R

8.2 Time Series Decomposition

8.3 Time Series Forecasting

8.4 Time Series Clustering

8.4.1 Dynamic Time Warping

8.4.2 Synthetic Control Chart Time Series Data

8.4.3 Hierarchical Clustering with Euclidean Distance

8.4.4 Hierarchical Clustering with DTW Distance

8.5 Time Series Classification

8.5.1 Classification with Original Data

8.5.2 Classification with Extracted Features

8.5.3 k-NN Classification

8.6 Discussions

8.7 Further Readings

9 Association Rules

9.1 Basics of Association Rules

9.2 The Titanic Dataset

9.3 Association Rule Mining

9.4 Removing Redundancy

9.5 Interpreting Rules

9.6 Visualizing Association Rules

9.7 Discussions and Further Readings

10 Text Mining

10.1 Retrieving Text from Twitter

10.2 Transforming Text

10.3 Stemming Words

10.4 Building a Term-Document Matrix

10.5 Frequent Terms and Associations

10.6 Word Cloud

10.7 Clustering Words

10.8 Clustering Tweets

10.8.1 Clustering Tweets with the k-means Algorithm

10.8.2 Clustering Tweets with the k-medoids Algorithm

10.9 Packages, Further Readings and Discussions

11 Social Network Analysis

11.1 Network of Terms

11.2 Network of Tweets

11.3 Two-Mode Network

11.4 Discussions and Further Readings

12 Case Study I: Analysis and Forecasting of House Price Indices

12.1 Importing HPI Data

12.2 Exploration of HPI Data

12.3 Trend and Seasonal Components of HPI

12.4 HPI Forecasting

12.5 The Estimated Price of a Property

12.6 Discussion

13 Case Study II: Customer Response Prediction and Profit Optimization

13.1 Introduction

13.2 The Data of KDD Cup 1998

13.3 Data Exploration

13.4 Training Decision Trees

13.5 Model Evaluation

13.6 Selecting the Best Tree

13.7 Scoring

13.8 Discussions and Conclusions

14 Case Study III: Predictive Modeling of Big Data with Limited Memory

14.1 Introduction

14.2 Methodology

14.3 Data and Variables

14.4 Random Forest

14.5 Memory Issue

14.6 Train Models on Sample Data

14.7 Build Models with Selected Variables

14.8 Scoring

14.9 Print Rules

14.9.1 Print Rules in Text

14.9.2 Print Rules for Scoring with SAS

14.10 Conclusions and Discussion

15 Online Resources

15.1 R Reference Cards

15.2 R

15.3 Data Mining

15.4 Data Mining with R

15.5 Classification/Prediction with R

15.6 Time Series Analysis with R

15.7 Association Rule Mining with R

15.8 Spatial Data Analysis with R

15.9 Text Mining with R

15.10 Social Network Analysis with R

15.11 Data Cleansing and Transformation with R

15.12 Big Data and Parallel Computing with R

R Reference Card for Data Mining

Bibliography

General Index

Package Index

Function Index

Pingback: R code for book “R and Data Mining: Examples and Case Studies” | Things about R | Scoop.it

Very interesting. I wrotte a post in my blog about it, with a link to your blog post.

Pingback: R code and data for book “R and Data Mining: Examples and Case Studies” | Things about R | Scoop.it

Pingback: R code for book “R and Data Mining: Examples and Case Studies ... | BIG data, Data Mining, Predictive Modeling, Visualization | Scoop.it

Pingback: R code for book “R and Data Mining: Examples and Case Studies ... | Network Analysis + | Scoop.it

An excellet book!

I followed the Chap. 10 and it’s great!! I tryed my own data. Everything was explained and I could follow it and I’m a bigginer in R. The only thing that was different was that the term-document-matrix only show me the dimensions, entries, sparch, maximal term leng, and weight, but no the matrix😦 but it works!!!

However at the begining of Chap. 11 the

> # change it to a Boolean matrix

> termDocMatrix[termDocMatrix>=1] <- 1

produced the error '[<-. simple_sparse_array´(as.simple_sparse_array(x),…, value= value):

Only numeric subscripting is implemented.

Could you please oriented me?

Thanks for the book!!

SB

Sorry I forgot that with “inspect (termDocumentMatrix)” display the matrix information.

Best

S

I also have this problem….

“termDocMatrix” is an ordinary matrix, not a term-document matrix created with package tm.

Referring to section 10.7, “myTdm2” is a term-document matrix, and then it is converted into an ordinary matrix “m2” with as.matrix(). After that, “m2” is used as input in section 11.1 for social network analysis.

> m2 <- as.matrix(myTdm2)

Therefore, to use your own data, you need to convert it with as.matrix() first, before running code for social network analysis in chapter 11.

Excellent book for Beginners! I followed The Case Study I: Analysis and Forecasting of House

Price Indices. Has anyone tried this?? Any Ideas on how to go about trying the analysis of property market with other factors not mentioned in the chapter, such as economic environment, population size, CPI (Consumer Price Index)??

You need to do it in a different approach, such as regression or classification, or even time series forecasting with regression.

Thank you so much for the response! I have just started learning, the book has been really helpful.

Still on the Case Study I, The rows in the data comprises of Date (months) and House Index, how do I get the prices of houses in all the months? An example was given on a house that was sold at $535,000 in September 2009, that was used to predict the price in next 2 months which results to $616,083. How do I test the correctness of the predicted price? ($616,083). If I have known the price in the months, I could cut back the data and predict against a date I know the price, That way, I can compare my predicted price with the actual price. Please any idea on How I can get the house price in each month? and other ways (such as 1 month compared to previous 2) of creating different models to compare against my test data so as to know the best model.

I would be glad if i could get any more details on how to manipulate the House price indices Data set.