CFP: the 11th Australasian Data Mining Conference (AusDM 2013), submission due 15 July

*********************************************************************
The 11th Australasian Data Mining Conference (AusDM 2013)
Canberra, Australia, 13-15 November 2013, http://ausdm13.togaware.com
Join us on LinkedIn: http://www.linkedin.com/groups/AusDM-4907891
*********************************************************************

Data mining, the art and science of intelligent analysis of (usually large) data sets for meaningful (and previously unknown) insights, is now being actively applied in industries including defence, medicine, science, financial services, customer analytics, government, insurance, telecommunications, retail and distribution, transportation, and utilities.

The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. Since AusDM’02 the conference has showcased research in data mining, providing a forum for presenting and discussing the latest research and developments. Since 2006, all proceedings have been printed as volumes in the CRPIT series.

This year’s conference, AusDM’13, co-hosted with the Asian Conference on Machine Learning (ACML, http://acml2013.conference.nicta.com.au/), builds on this tradition of facilitating the cross-disciplinary exchange of ideas, experience and potential research directions. Specifically, the conference seeks to showcase: Industry Case Studies; Research Prototypes; Practical Analytics Technology; and Research Student Projects. AusDM’13 will be a meeting place for pushing forward the frontiers of data mining in industry and academia.

Publication and topics

We are calling for papers, both research and applications, and from both academia and industry, for presentation at the conference. All papers will go through double-blind, peer-review by a panel of international experts. Accepted papers will be published in an up-coming volume (Data Mining and Analytics 2013) of the Conferences in Research and Practice in Information Technology (CRPIT) series by the Australian Computer Society which is also held in full-text on the ACM Digital Library and will also be distributed at the conference. For more details on CRPIT please see http://www.crpit.com. Please note that we require that at least one author for each accepted paper will register for the conference and present their work. Selected papers will be invited to extend to publish in Journal of Research and Practice in Information Technology (http://www.jrpit.com).

AusDM invites contributions addressing current research in data mining and knowledge discovery as well as experiences, novel applications and future challenges. Topics of interest include, but are not restricted to:
- Applications and Case Studies — Lessons and Experiences
- Biomedical and Health Data Mining
- Business Analytics
- Computational Aspects of Data Mining
- Data Integration, Matching and Linkage
- Data Mining Education
- Data Preparation, Cleaning and Preprocessing
- Data Stream Mining
- Evaluation of Results and their Communication
- Implementations of Data Mining in Industry
- Integrating Domain Knowledge
- Link, Graph, Network and Process Mining
- Multimedia Data Mining
- New Data Mining Algorithms
- Professional Challenges in Data Mining
- Privacy-preserving Data Mining
- Spatial and Temporal Data Mining
- Text Mining and Web Mining
- Visual Analytics

Keynote speakers

As is tradition for AusDM we have lined up an excellent keynote speaker program. Each speaker is a well known research and/or practitioner in data mining and related disciplines. The keynote program provides an opportunity to hear from some of the world’s leaders on what the technology offers and where it is heading.

An international academic keynote presentation will be shared with the ACML conference. The two industry keynotes at AusDM 2013 will be:

- Klaus Felsche, Director Intent Management and Analytics at the Department of Immigration and Citizenship.
Title: TBC

- Dr Paul Wong, Director, Office of Research Excellence, The Australian National University.
Title: TBC (Predictive Network Analytics for Government Research Planning)

Submission of papers

We invite two types of submissions for AusDM 2013:

- Academic submissions: Normal academic submissions reporting on research progress, with a paper length of between 8 and 12 pages in CRPIT style, as detailed below. Academic submissions we will use a double-blinded review process, i.e. paper submissions must NOT include authors names or affiliations (and also not acknowledgements referring to funding bodies). Self-citing references should also be removed from the submitted papers (they can be added on after the review) for the double blind reviewing purpose.

- Industry submissions: Submissions from governments and industry can report on specific data mining implementations and experiences. Submissions in this category can be between 4 and 8 pages in CRPIT style, as detailed below. These submissions do not need to be double-blinded. A special committee made of industry representatives will assess industry submissions.

Paper submissions are required to follow the general format specified for papers in the CRPIT series by the Australian Computer Society. Submission details are available from http://crpit.com/AuthorsSubmitting.html. LaTeX styles and Word templates may be found on this site. LaTeX is the recommended typesetting package.

The electronic submissions must be in PDF only, and made through the AusDM’13 Submission Page, which will be available at http://ausdm13.togaware.com/.

Important Dates

Submission of full papers:              15 July 2013 (midnight PST)
Notification of authors:                1 September 2013
Final version and author registration:  1 October 2013
Conference:                             13-15 November 2013

Organising Committee

Program Chairs (Academic)
Kok-Leong Ong, Deakin University, Melbourne
Lin Liu, University of South Australia, Adelaide

Program Chair (Industry)
Yanchang Zhao, Department of Immigration & Citizenship, Australia; and RDataMining.com

Conference Chairs
Peter Christen, The Australian National University, Canberra
Paul Kennedy, University of Technology, Sydney

Sponsorship Chair
Andrew Stranieri, University of Ballarat, Ballarat

Steering Committee Chairs
Simeon Simoff, University of Western Sydney
Graham Williams, Australian Taxation Office

Other Steering Committee Members
Peter Christen, The Australian National University, Canberra
Paul Kennedy, University of Technology, Sydney
Jiuyong Li, University of South Australia, Adelaide
Kok-Leong Ong, Deakin University, Melbourne
John Roddick, Flinders University, Adelaide
Andrew Stranieri, University of Ballarat, Ballarat
Geoff Webb, Monash University, Melbourne

Posted in Data Mining | Tagged | Leave a comment

Call for participation: DMApps 2013 – an International Workshop on Data Mining Applications in Industry and Government

Call for participation: DMApps 2013 – an International Workshop on Data Mining Applications in Industry and Government
in conjunction with PAKDD 2013, Gold Coast, Australia, April 14, 2013
http://dmapps2013.rdatamining.com

To attend the workshop, you need to register for PAKDD 2013 http://pakdd2013.pakdd.org.

DMApps 2013 Workshop Program

8:30 – 8:40    Welcome and Introduction to the Workshop. Dr Warwick Graco and Dr Inna Kolyshkina

8:40 – 9:30    Keynote speech. Behavior Computing: Discovering Complex Behavior Intelligence. Prof. Longbing Cao

9:30 – 10:00   Real-time Television ROI Tracking using Mirrored Experimental Designs. Brendan Kitts

10:00 – 10:30 Coffee Break

10:30 – 11:00  Using Scan-Statistical Correlations for Network Change Analysis. Adriel Cheng, Peter Dickinson

11:00 – 11:30  Predicting High Impact Academic Papers Using Citation Network Features. Daniel McNamara, Paul Wong, Peter Christen and Kee Siong Ng

11:30 – 12:00  Combination of effective machine learning techniques and chemometric analysis for evaluation of Bupleuri Radix through high-performance thin-layer chromatographic. Xiaoping Cheng, Hongmin Cai, Ping He and Runtiao Tian

12:00 – 12:30  An OLAP Server for Sensor Networks using Augmented Statistics Trees. Neil Dunstan

12:30 – 13:00  Indirect information linkage for OSINT through authorship analysis of aliases. Robert Layton, Charles Perez, Babiga Birregah, Paul Watters and Marc Lemercier

13:00 – 14:00 Lunch

14:00 – 14:30 Dynamic Similarity-Aware Inverted Indexing for Real-Time Entity Resolution. Banda Ramadan, Peter Christen, Huizhi Liang, David Hawking and Ross Gayler

14:30 – 15:00  Identifying dominant economic sectors and stock markets: A social network mining approach. Ram Babu Roy and Uttam Sarkar

15:00 – 15:30 Coffee Break

15:30 – 16:00  Ensemble Model of Artificial Neural Networks for Petroleum Reservoir Characterization. Fatai Anifowose, Jane Labadin and Abdulazeez Abdulraheem

16:00 – 16:30  A Comparison of Visualization Data Mining Methods for Kernel Smoothing Techniques for Cox Processes with Application To Spatial Decision Support Systems. David Rohde, Ruth Huang, Jonathan Corcoran and Gentry White

16:30 – 17:00  Parallel Sentiment Polarity Classification Method with Substring Feature Reduction. Ken Zhang and Lin Shang

17:00 – 17:30  On the Evaluation of the Homogeneous Ensembles with CV-passports. Aneesha Bakharia, Vladimir Nikulin and Tian-Hsiang Huang

17:30 – 18:00  Identifying Authoritative and Reliable Contents in Community Question Answering with Domain Knowledge. Lifan Guo and Xiaohua Hu

Posted in Data Mining | Tagged | Leave a comment

New book announcement: R and Data Mining – Examples and Case Studies

R and Data Mining: Examples and Case Studies
Author: Yanchang Zhao
Publisher: Academic Press, Elsevier
Publish date: December 2012
ISBN: 978-0-12-396963-7
Length: 256 pages
URL: http://www.rdatamining.com/books/rdm

This book introduces into using R for data mining with examples and case studies. It contains 1) examples on decision trees, random forest, regression, clustering, outlier detection, time series analysis, association rules, text mining and social network analysis; and 2) three real-world case studies.

Table of Contents and Abstracts:
http://www.rdatamining.com/books/rdm/toc

R Code and Data for the book:
http://www.rdatamining.com/books/rdm/code

Sample pages on Google Books:
http://books.google.com.au/books?id=FEOh08LBD9UC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false

Buy the book on Amazon:
http://www.amazon.com/Data-Mining-Examples-Case-Studies/dp/0123969638

Posted in Data Mining, R | Tagged , | Leave a comment

Two free online courses starting soon: Data Analysis (with R) and Social Network Analysis

There are two online courses starting soon on Coursera, which are free to register.

1. Data Analysis (with R)

It is a 8-week online course starting on Jan 22nd 2013 <https://www.coursera.org/course/dataanalysis>.

This course is an applied statistics course focusing on data analysis. The course will begin with an overview of how to organize, perform, and write-up data analyses. Then it will cover some of the most popular and widely used statistical methods like linear regression, principal components analysis, cross-validation, and p-values. Instead of focusing on mathematical details, the lectures will be designed to help you apply these techniques to real data using the R statistical programming language, interpret the results, and diagnose potential problems in data analysis.

2. Social Network Analysis

It is a 9-week online course starting on March 4th 2013 <https://www.coursera.org/course/sna>.

This course will use social network analysis, both its theory and computational tools, to make sense of the social and information networks that have been fueled and rendered accessible by the internet.

Posted in Data Mining, R | Tagged , | Leave a comment

R code and data for book “R and Data Mining: Examples and Case Studies”

R code and data for book “R and Data Mining: Examples and Case Studies” are now available at http://www.rdatamining.com/books/rdm/code. An online PDF version of the book (the first 11  chapters only) can also be downloaded at http://www.rdatamining.com/docs.

Below are its details and table of contents.

Book title: R and Data Mining: Examples and Case Studies
Author: Yanchang Zhao
Publisher: Elsevier
Publish date: December 2012
ISBN: 978-0-123-96963-7
234 pages
URL: http://www.rdatamining.com/books/rdm

Table of Contents
1 Introduction
1.1 Data Mining
1.2 R
1.3 Datasets
1.3.1 The Iris Dataset
1.3.2 The Bodyfat Dataset

2 Data Import and Export
2.1 Save and Load R Data
2.2 Import from and Export to .CSV Files
2.3 Import Data from SAS
2.4 Import/Export via ODBC
2.4.1 Read from Databases
2.4.2 Output to and Input from EXCEL Files

3 Data Exploration
3.1 Have a Look at Data
3.2 Explore Individual Variables
3.3 Explore Multiple Variables
3.4 More Explorations
3.5 Save Charts into Files

4 Decision Trees and Random Forest
4.1 Decision Trees with Package party
4.2 Decision Trees with Package rpart
4.3 Random Forest

5 Regression
5.1 Linear Regression
5.2 Logistic Regression
5.3 Generalized Linear Regression
5.4 Non-linear Regression

6 Clustering
6.1 The k-Means Clustering
6.2 The k-Medoids Clustering
6.3 Hierarchical Clustering
6.4 Density-based Clustering

7 Outlier Detection
7.1 Univariate Outlier Detection
7.2 Outlier Detection with LOF
7.3 Outlier Detection by Clustering
7.4 Outlier Detection from Time Series
7.5 Discussions

8 Time Series Analysis and Mining
8.1 Time Series Data in R
8.2 Time Series Decomposition
8.3 Time Series Forecasting
8.4 Time Series Clustering
8.4.1 Dynamic Time Warping
8.4.2 Synthetic Control Chart Time Series Data
8.4.3 Hierarchical Clustering with Euclidean Distance
8.4.4 Hierarchical Clustering with DTW Distance
8.5 Time Series Classification
8.5.1 Classification with Original Data
8.5.2 Classification with Extracted Features
8.5.3 k-NN Classification
8.6 Discussions
8.7 Further Readings

9 Association Rules
9.1 Basics of Association Rules
9.2 The Titanic Dataset
9.3 Association Rule Mining
9.4 Removing Redundancy
9.5 Interpreting Rules
9.6 Visualizing Association Rules
9.7 Discussions and Further Readings

10 Text Mining
10.1 Retrieving Text from Twitter
10.2 Transforming Text
10.3 Stemming Words
10.4 Building a Term-Document Matrix
10.5 Frequent Terms and Associations
10.6 Word Cloud
10.7 Clustering Words
10.8 Clustering Tweets
10.8.1 Clustering Tweets with the k-means Algorithm
10.8.2 Clustering Tweets with the k-medoids Algorithm
10.9 Packages, Further Readings and Discussions

11 Social Network Analysis
11.1 Network of Terms
11.2 Network of Tweets
11.3 Two-Mode Network
11.4 Discussions and Further Readings

12 Case Study I: Analysis and Forecasting of House Price Indices
12.1 Importing HPI Data
12.2 Exploration of HPI Data
12.3 Trend and Seasonal Components of HPI
12.4 HPI Forecasting
12.5 The Estimated Price of a Property
12.6 Discussion

13 Case Study II: Customer Response Prediction and Profit Optimization
13.1 Introduction
13.2 The Data of KDD Cup 1998
13.3 Data Exploration
13.4 Training Decision Trees
13.5 Model Evaluation
13.6 Selecting the Best Tree
13.7 Scoring
13.8 Discussions and Conclusions

14 Case Study III: Predictive Modeling of Big Data with Limited Memory
14.1 Introduction
14.2 Methodology
14.3 Data and Variables
14.4 Random Forest
14.5 Memory Issue
14.6 Train Models on Sample Data
14.7 Build Models with Selected Variables
14.8 Scoring
14.9 Print Rules
14.9.1 Print Rules in Text
14.9.2 Print Rules for Scoring with SAS
14.10 Conclusions and Discussion

15 Online Resources
15.1 R Reference Cards
15.2 R
15.3 Data Mining
15.4 Data Mining with R
15.5 Classification/Prediction with R
15.6 Time Series Analysis with R
15.7 Association Rule Mining with R
15.8 Spatial Data Analysis with R
15.9 Text Mining with R
15.10 Social Network Analysis with R
15.11 Data Cleansing and Transformation with R
15.12 Big Data and Parallel Computing with R

R Reference Card for Data Mining

Bibliography

General Index

Package Index

Function Index

Posted in Data Mining, R | Tagged , | 7 Comments

CFP: DMApps 2013 – Workshop on Data Mining Applications in Industry and Government, submission due by Jan 6, 2013

CALL FOR PAPERS
DMApps 2013: the International Workshop on Data Mining Applications in Industry & Government
In conjunction with PAKDD 2013, Gold Coast, Australia, April 14-17, 2013
http://dmapps2013.rdatamining.com

The 2013 International Workshop on Data Mining Applications in Industry & Government (DMApps 2013) will provide a platform for industrial data mining practitioners to share knowledge and experience, and also provide a bridge between academia and industry for applying new advanced data mining techniques to industrial applications. The audience will be composed of industrial data mining practitioners, as well as academic researchers who are interested in designing algorithms to meet industrial needs. The workshop will foster the collaboration between academia and industry and speed-up the process for new techniques to transfer from academic research to industrial applications.

The workshop focuses on applications of data mining in real-world projects. Topics include, but not limited to data mining applications in:
• Finance
• Retail
• Insurance
• Telecommunications
• Crime & Homeland Security
• Stock Market
• Social Welfare
• Social Media
• Medicine and Health
• Education
• Sports
• Transport
• Education
• Environment
• Manufacturing
• Government
• Other Fields

Long and Short Papers
There are two types of paper that can be submitted. One is a long paper covering research into real-world data mining applications in industry and government. The other is a short paper up to four pages from managers and practitioners covering a challenging and informative issue in data mining. This includes what the issue was, how it was managed and what lessons were learned from the activity. The page limit is 12 pages for long papers and 4 pages for short papers. All papers should be with 10pt font size, following the Springer LNCS/LNAI manuscript submission guidelines (http://www.springer.de/comp/lncs/authors.html). The submission due date is December 14, 2012.

Important Dates
Submission due:                           January 6, 2013
Notification to authors:              January 31, 2013
Camera-ready due:                      February 15, 2013
Workshop date:                            April 14, 2013

Submission Procedure
All papers must be submitted electronically in PDF format at https://www.easychair.org/conferences/?conf=dmapps2013. All submitted papers will be reviewed by 2 or 3 reviewers. Selected outstanding long papers presented at the workshop will be included in a LNCS/LNAI post Proceedings of PAKDD Workshops published by Springer.

Attendance
Submitting a paper to the workshop means that if the paper is accepted, at least one author should attend the workshop to present the paper.

Organising Committee
Workshop Chairs

Warwick Graco
Operational Analytics,
Australian Taxation Office
Warwick.Graco@ato.gov.au

Inna Kolyshkina
Chair of the South Australian Chapter
Australian Institute of Analytics Professionals
ikolyshkina@yahoo.com

Program Chairs

Yanchang Zhao
Department of Immigration & Citizenship,
Australia; and RDataMining.com
yanchang@rdatamining.com

Clifton Phua
Data Analytics Department,
Institute for Infocomm Research, Singapore
cwcphua@i2r.a-star.edu.sg

Posted in Data Mining | Tagged | Leave a comment

Call for contribution: the RDataMining package – an R package for data mining

Join the RDataMining project to build a comprehensive R package for data mining
http://www.rdatamining.com/package

We have started the RDataMining project on R-Forge to build an R package for data mining. The package will provide various functionalities for data mining, with contributions from many R users. If you have developed or will implement any data mining algorithms in R, please participate in the project to make your work available to R users worldwide.

Background
Although there are many R packages for various data mining functionalities, there are many more new algorithms designed and published every year, without any R implementations for them. It is far beyond the capability of a single team, even several teams, to build packages for oncoming new data mining algorithms. On the other hand, many R users developed their own implementations of new data mining algorithms, but unfortunately, used for their own work only, without sharing with other R users. The reason could be that they donot know or donot have time to build packages to share their code, or they might think that it is not worth building a package with only one or two functions.

Objective
To forester the development of data mining capability in R and facilitate sharing of data mining codes/functions/algorithms among R users, we started this project on R-Forge to collaboratively build an R package for data mining, with contributions from many R users, including ourselves.

How it works
The project works in a way similar to an edited book. We, as organizors, send out call for participation and solicit R users to join this project and contribute their implemented functions and algorithms. The contributed functions will build up and make a package.

Function authors will be responsible for the development, maintenance and documentation of their contributed functions. We will put all functions together as one package and also make a manual for the package.

Function authors will be acknowledged as authors of corresponding functions in help documentation and manual of the package. We, as the organizor of the package, will be shown as the manager/maintainer of the whole package.

It’s free to join or quit the project at any time, and authors can withdraw their contributed functions at any time.

Links
The RDataMining package and project: http://www.rdatamining.com/package
The RDataMining project on R-Forge:  http://package.rdatamining.com or
http://r-forge.r-project.org/projects/rdatamining/

Contact
Yanchang Zhao <yanchang at rdatamining.com>

Join the RDataMining Project, and we will work together to build a comprehensive R package for data mining.

Posted in Data Mining, R | Tagged , , | 1 Comment