R code and data for book “R and Data Mining: Examples and Case Studies”

R code and data for book “R and Data Mining: Examples and Case Studies” are now available at http://www.rdatamining.com/books/rdm/code. An online PDF version of the book (the first 11  chapters only) can also be downloaded at http://www.rdatamining.com/docs.

Below are its details and table of contents.

Book title: R and Data Mining: Examples and Case Studies
Author: Yanchang Zhao
Publisher: Elsevier
Publish date: December 2012
ISBN: 978-0-123-96963-7
234 pages
URL: http://www.rdatamining.com/books/rdm

Table of Contents
1 Introduction
1.1 Data Mining
1.2 R
1.3 Datasets
1.3.1 The Iris Dataset
1.3.2 The Bodyfat Dataset

2 Data Import and Export
2.1 Save and Load R Data
2.2 Import from and Export to .CSV Files
2.3 Import Data from SAS
2.4 Import/Export via ODBC
2.4.1 Read from Databases
2.4.2 Output to and Input from EXCEL Files

3 Data Exploration
3.1 Have a Look at Data
3.2 Explore Individual Variables
3.3 Explore Multiple Variables
3.4 More Explorations
3.5 Save Charts into Files

4 Decision Trees and Random Forest
4.1 Decision Trees with Package party
4.2 Decision Trees with Package rpart
4.3 Random Forest

5 Regression
5.1 Linear Regression
5.2 Logistic Regression
5.3 Generalized Linear Regression
5.4 Non-linear Regression

6 Clustering
6.1 The k-Means Clustering
6.2 The k-Medoids Clustering
6.3 Hierarchical Clustering
6.4 Density-based Clustering

7 Outlier Detection
7.1 Univariate Outlier Detection
7.2 Outlier Detection with LOF
7.3 Outlier Detection by Clustering
7.4 Outlier Detection from Time Series
7.5 Discussions

8 Time Series Analysis and Mining
8.1 Time Series Data in R
8.2 Time Series Decomposition
8.3 Time Series Forecasting
8.4 Time Series Clustering
8.4.1 Dynamic Time Warping
8.4.2 Synthetic Control Chart Time Series Data
8.4.3 Hierarchical Clustering with Euclidean Distance
8.4.4 Hierarchical Clustering with DTW Distance
8.5 Time Series Classification
8.5.1 Classification with Original Data
8.5.2 Classification with Extracted Features
8.5.3 k-NN Classification
8.6 Discussions
8.7 Further Readings

9 Association Rules
9.1 Basics of Association Rules
9.2 The Titanic Dataset
9.3 Association Rule Mining
9.4 Removing Redundancy
9.5 Interpreting Rules
9.6 Visualizing Association Rules
9.7 Discussions and Further Readings

10 Text Mining
10.1 Retrieving Text from Twitter
10.2 Transforming Text
10.3 Stemming Words
10.4 Building a Term-Document Matrix
10.5 Frequent Terms and Associations
10.6 Word Cloud
10.7 Clustering Words
10.8 Clustering Tweets
10.8.1 Clustering Tweets with the k-means Algorithm
10.8.2 Clustering Tweets with the k-medoids Algorithm
10.9 Packages, Further Readings and Discussions

11 Social Network Analysis
11.1 Network of Terms
11.2 Network of Tweets
11.3 Two-Mode Network
11.4 Discussions and Further Readings

12 Case Study I: Analysis and Forecasting of House Price Indices
12.1 Importing HPI Data
12.2 Exploration of HPI Data
12.3 Trend and Seasonal Components of HPI
12.4 HPI Forecasting
12.5 The Estimated Price of a Property
12.6 Discussion

13 Case Study II: Customer Response Prediction and Profit Optimization
13.1 Introduction
13.2 The Data of KDD Cup 1998
13.3 Data Exploration
13.4 Training Decision Trees
13.5 Model Evaluation
13.6 Selecting the Best Tree
13.7 Scoring
13.8 Discussions and Conclusions

14 Case Study III: Predictive Modeling of Big Data with Limited Memory
14.1 Introduction
14.2 Methodology
14.3 Data and Variables
14.4 Random Forest
14.5 Memory Issue
14.6 Train Models on Sample Data
14.7 Build Models with Selected Variables
14.8 Scoring
14.9 Print Rules
14.9.1 Print Rules in Text
14.9.2 Print Rules for Scoring with SAS
14.10 Conclusions and Discussion

15 Online Resources
15.1 R Reference Cards
15.2 R
15.3 Data Mining
15.4 Data Mining with R
15.5 Classification/Prediction with R
15.6 Time Series Analysis with R
15.7 Association Rule Mining with R
15.8 Spatial Data Analysis with R
15.9 Text Mining with R
15.10 Social Network Analysis with R
15.11 Data Cleansing and Transformation with R
15.12 Big Data and Parallel Computing with R

R Reference Card for Data Mining


General Index

Package Index

Function Index

About these ads

About Yanchang Zhao

I am a data miner, using R for data mining applications. My work on R and data mining: RDataMining.com; Twitter; Group on Linkedin; and Group on Google.
This entry was posted in Data Mining, R and tagged , . Bookmark the permalink.

12 Responses to R code and data for book “R and Data Mining: Examples and Case Studies”

  1. Pingback: R code for book “R and Data Mining: Examples and Case Studies” | Things about R | Scoop.it

  2. Very interesting. I wrotte a post in my blog about it, with a link to your blog post.

  3. Pingback: R code and data for book “R and Data Mining: Examples and Case Studies” | Things about R | Scoop.it

  4. Pingback: R code for book “R and Data Mining: Examples and Case Studies ... | BIG data, Data Mining, Predictive Modeling, Visualization | Scoop.it

  5. Pingback: R code for book “R and Data Mining: Examples and Case Studies ... | Network Analysis + | Scoop.it

  6. Silvia Brambila says:

    An excellet book!
    I followed the Chap. 10 and it’s great!! I tryed my own data. Everything was explained and I could follow it and I’m a bigginer in R. The only thing that was different was that the term-document-matrix only show me the dimensions, entries, sparch, maximal term leng, and weight, but no the matrix :( but it works!!!

    However at the begining of Chap. 11 the
    > # change it to a Boolean matrix
    > termDocMatrix[termDocMatrix>=1] <- 1

    produced the error '[<-. simple_sparse_array´(as.simple_sparse_array(x),…, value= value):
    Only numeric subscripting is implemented.

    Could you please oriented me?

    Thanks for the book!!


  7. Jason says:

    I also have this problem….

  8. “termDocMatrix” is an ordinary matrix, not a term-document matrix created with package tm.

    Referring to section 10.7, “myTdm2″ is a term-document matrix, and then it is converted into an ordinary matrix “m2″ with as.matrix(). After that, “m2″ is used as input in section 11.1 for social network analysis.
    > m2 <- as.matrix(myTdm2)

    Therefore, to use your own data, you need to convert it with as.matrix() first, before running code for social network analysis in chapter 11.

  9. Ahmad says:

    Excellent book for Beginners! I followed The Case Study I: Analysis and Forecasting of House
    Price Indices. Has anyone tried this?? Any Ideas on how to go about trying the analysis of property market with other factors not mentioned in the chapter, such as economic environment, population size, CPI (Consumer Price Index)??

    • You need to do it in a different approach, such as regression or classification, or even time series forecasting with regression.

      • Ahmad says:

        Thank you so much for the response! I have just started learning, the book has been really helpful.

        Still on the Case Study I, The rows in the data comprises of Date (months) and House Index, how do I get the prices of houses in all the months? An example was given on a house that was sold at $535,000 in September 2009, that was used to predict the price in next 2 months which results to $616,083. How do I test the correctness of the predicted price? ($616,083). If I have known the price in the months, I could cut back the data and predict against a date I know the price, That way, I can compare my predicted price with the actual price. Please any idea on How I can get the house price in each month? and other ways (such as 1 month compared to previous 2) of creating different models to compare against my test data so as to know the best model.

        I would be glad if i could get any more details on how to manipulate the House price indices Data set.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s