Datasets to Practice Your Data Mining

There are many datasets available online for free for research use. Some of them are listed below.

- The R Datasets Package:
There are around 90 datasets available in the package. Most of them are small and easy to feed into functions in R.
See a list of data with the statement below:
> library(help=”datasets”)

- Frequent Itemset Mining Dataset Repository:
click-stream data, retail market basket data, traffic accident data and web html document data (large size!).
See the website also for implementations of many algorithms for frequent itemset and association rule mining.

- ACM KDD Cup:
the annual Data Mining and Knowledge Discovery competition organized by ACM SIGKDD, targeting real-world problems

- UCI KDD Archive:
an online repository of large data sets which encompasses a wide variety of data types, analysis tasks, and application areas

- UCI Machine Learning Repository:
a collection of databases, domain theories, and data generators

- CMU StatLib Datasets Archive

- Time Series Data Library:
a collection of about 800 time series drawn from many different fields

- EconData:
a source of economic time series data from Inforum, at the University of Maryland

- UCR Time Series Data Archive:
data for time series classification and clustering

- GeoDa Center:
A collection of spatial data

The links of above datasets are provided at RDataMining website, and more datasets will be added to the website later.

Yanchang Zhao
RDataMining: http://www.rdatamining.com
Twitter: http://www.twitter.com/RDataMining
Group on Linkedin: http://group.rdatamining.com
Group on Google: http://group2.rdatamining.com

About these ads

About Yanchang Zhao

I am a data miner, using R for data mining applications. My work on R and data mining: RDataMining.com; Twitter; Group on Linkedin; and Group on Google.
This entry was posted in Data Mining. Bookmark the permalink.

6 Responses to Datasets to Practice Your Data Mining

  1. Castro says:

    I would like to say thanks to the particular blogger a lot not merely due to this posting but in addition the almost all preceding campaigns. I stumbled upon thesimpleyoga.com to be tremendously interesting. I will be re-occurring in order to thesimpleyoga.com to read more.

  2. Have you ever considered about adding a little bit more than just your articles? I mean, what you say is important and all. However just imagine if you added some great images or videos to give your posts more, “pop”! Your content is excellent but with pics and video clips, this site could definitely be one of the best in its field. Excellent blog!

  3. Said-Ul-Haq says:

    Sir is there any Blog dataset??
    I need Blog dataset please refer me the link, I need it for my thesis.

    Thanks in advance.

  4. Pingback: Big Dadasets « Tarek Hoteit

  5. Natnael says:

    Is there free dataset for job posting websites like linkedin or similar?Is there free dataset for linkedin?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s