Seminar of IAPA Canberra Chapter: Big Data in the Social Sciences, Wednesday 29 April

Date: Wednesday 29 April
Time: 5.45pm for a 6pm start
Cost: Nil
Where: SAS Offices, 12 Moore Street, Canberra, ACT 2600
RSVP link:

Join us for our next ACT Chapter Event on the 29th of April. We have guest speaker Robert Ackland presenting on Big Data in the Social Sciences. Robert will share on the challenges, give insight into new opportunities and discuss some of the tools that are available for social science big data research.

More about the event: Social scientists are increasingly using large-scale datasets from the Web (e.g. Twitter, WWW hyperlinks, Facebook etc.) to seek answers to long-standing questions about social, economic and political behaviour. An example was how social media data was used to study social inequality, diurnal and seasonal mood changes and the spread of protest during the Arab Spring. This presentation aims to highlight the methodological challenges and opportunities of big data (in particular, social media data) in empirical social science research.

About Robert Ackland

Robert Ackland is an Associate Professor in the Australian Demographic and Social Research Institute at the Australian National University and leads the Virtual Observatory for the Study of Online Networks ( His PhD was in economics, focusing on index number theory in the context of cross-country comparisons of income and inequality. Robert has been studying online social and organisational networks since 2002 and his research has been funded by five Australian Research Council grants. His research has appeared in journals such as the Review of Economics and Statistics, Social Networks, Computational Economics, Social Science Computer Review, and the Journal of Social Structure. VOSON was established in 2005, and aims to advance the social science of the Internet by conducting research, developing research tools, and providing research training. The VOSON software for hyperlink network construction and analysis has been publicly available since 2006 and has been used by around 2000 researchers worldwide. Robert established the Social Science of the Internet specialisation in the ANU’s Master of Social Research in 2008, and his book Web Social Science: Concepts, Data and Tools for Social Scientists in the Digital Age (SAGE) was published in July 2013. He is CEO of Uberlink Corp ( which he established to commercialise the VOSON software.

Posted in Big Data, Data Mining | Tagged , | 1 Comment

Hadoop and Neo4j

Hadoop is being widely used for processing big data and Neo4j is a popular open-source graph database. When doing social network analysis on big data, a “natural” thought is to use them together. Unfortunately, Neo4j cannot work directly on HDFS or HBase. Is it good to use them together for social network analysis of big data? If yes, any pros/cons and how to do it efficiently? Or shall we try other options, such as Hadoop + Giraph, or Spark + GraphX? Please share your ideas, and all suggestions or experiences will be appreciated. Thanks.

Anyway, to know more about how Neo4j and Hadoop can work together, I came across two presentations below, which might be interested to those who are doing social network analysis of big data.

Serious network analysis using Hadoop and Neo4j

I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop

Posted in Big Data, Data Mining | Tagged , | 1 Comment

CFP: 13th Australasian Data Mining Conference (AusDM 2015)

The 13th Australasian Data Mining Conference (AusDM 2015)
Sydney, Australia, 8-9 August 2015
co-located with SIGKDD’15
Join us on LinkedIn:

The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. It is devoted to the art and science of intelligent analysis of (usually big) data sets for meaningful (and previously unknown) insights. This conference will enable the sharing and learning of research and progress in the local context and new breakthroughs in data mining algorithms and their applications across all industries.

Publication and topics
We are calling for papers, both research and applications, and from both academia and industry, for presentation at the conference. All papers will go through double-blind, peer-review by a panel of international experts. Accepted papers will be published in an up-coming volume (Data Mining and Analytics 2015) of the Conferences in Research and Practice in Information Technology (CRPIT) series by the Australian Computer Society which is also held in full-text on the ACM Digital Library and will also be distributed at the conference. For more details on CRPIT please see Please note that we require that at least one author for each accepted paper will register for the conference and present their work. AusDM invites contributions addressing current research in data mining and knowledge discovery as well as experiences, novel applications and future challenges.

Topics of interest include, but are not restricted to:
– Applications and Case Studies – Lessons and Experiences
– Big Data Analytics
– Biomedical and Health Data Mining
– Business Analytics
– Computational Aspects of Data Mining
– Data Integration, Matching and Linkage
– Data Mining Education
– Data Mining in Security and Surveillance
– Data Preparation, Cleaning and Preprocessing
– Data Stream Mining
– Evaluation of Results and their Communication
– Implementations of Data Mining in Industry
– Integrating Domain Knowledge
– Link, Tree, Graph, Network and Process Mining
– Multimedia Data Mining
– New Data Mining Algorithms
– Professional Challenges in Data Mining
– Privacy-preserving Data Mining
– Spatial and Temporal Data Mining
– Text Mining
– Visual Analytics
– Web and Social Network Mining

Submission of papers
We invite two types of submissions for AusDM 2015:

Academic submissions: Normal academic submissions reporting on research progress, with a paper length of between 8 and 12 pages in CRPIT style, as detailed below. Academic submissions we will use a double-blinded review process, i.e. paper submissions must NOT include authors names or affiliations (and also not acknowledgements referring to funding bodies). Self-citing references should also be removed from the submitted papers (they can be added on after the review) for the double blind reviewing purpose.

Industry submissions: Submissions from governments and industry can report on specific data mining implementations and experiences. Submissions in this category can be between 4 and 8 pages in CRPIT style, as detailed below. These submissions do not need to be double-blinded. A special committee made of industry representatives will assess industry submissions.

Paper submissions are required to follow the general format specified for papers in the CRPIT series by the Australian Computer Society. Submission details are available from

Important Dates
Submission of full papers: Monday 20 April 2015 (midnight PST)
Notification of authors: Sunday June 7 2015
Final version and author registration: Sunday June 28 2015
Conference: 8-9 August 2015

Posted in Data Mining | Tagged | 1 Comment

UIUC free online courses on data mining starting on 9 Feb, lectured by Prof. Jiawei Han et al.

by Yanchang Zhao,

A series of free online data mining courses will start on 9 Feb 2015, lectured by Prof. Jiawei Han and several other staff at UIUC. Prof. Han is one of the top data mining researchers around the world, and has authored “Data Mining: Concepts and Technique”, one of the most popular data mining textbooks. Do not miss the opportunity if you are interested in learning data mining techniques.

Course 1. Pattern Discovery in Data Mining, by Prof. Jiawei Han
Start: 9 Feb 2015
End: 8 Mar 2015

Course 2. Text Retrieval and Search Engines, by Chengxiang Zhai
Start: 16 Mar 2015
End: 12 Apr 2015

Course 3. Cluster Analysis in Data Mining, by Prof. Jiawei Han
Start: 27 Apr 2015
End: 24 May 2015

Course 4. Text Mining and Analytics, by Chengxiang Zhai
Start: 8 Jun 2015
End: 5 Jul 2015

Course 5. Data Visualization, by John C. Hart
Start: 20 Jul 2015
End: 16 Aug 2015

You can join above 5 courses for free. However, if you want to get a verified certificate, you may choose to pay $55 for each individual course, or take the whole set of Data Mining Specialization courses, which includes above 5 courses and a Capstone project. See details at the

Posted in Data Mining | Tagged | 2 Comments

Free online data mining and machine learning courses by Stanford University

by Yanchang Zhao,

Three free online data mining and machine learning courses lectured by professors at Stanford University started in past two weeks, which provide excellent opportunities to learn advanced data mining and machine learning techniques. If you are interested, be quick to join and they are still open.

1. Machine Learning
Start: Jan 19, 2015
End: Apr 20, 2015
Instructor: Andrew Ng, Stanford University

2. Mining Massive Datasets
Start: Jan 31, 2015
End: Mar 25, 2015
Instructors: Jure Leskovec, Anand Rajaraman and Jeff Ullman, Stanford University

3. Statistical Learning (with R)
Start: Jan 20, 2015
End: Apr 5, 2015
Instructors: Prof. Trevor Hastie, Prof. Rob Tibshirani, Stanford University

Visit for more news on free online courses and webinars on data mining and analytics.

Posted in Big Data, Data Mining, R | Tagged , , | 2 Comments

Canberra IAPA Seminar – Text Analytics: Natural Language into Big Data – 17 February

Topic: Text Analytics: Natural Language into Big Data
Speaker: Dr. Leif Hanlen, Technology Director at NICTA
Date: Tuesday 17 February
Time: 5.30pm for a 6pm start
Cost: Nil
Where: SAS Offices, 12 Moore Street, Canberra, ACT 2600
Registration URL:

We outline several activities in NICTA relating to understanding and mining free text. Our approach is to develop agile service-focussed solutions that provide insight into large text corpora, and allow end users to incorporate current text documents into standard numerical analysis technologies.

Dr. Leif Hanlen is Technology Director at NICTA, Australia’s largest ICT research centre. Leif is also an adjunct Associate Professor of ICT at the Australian National University and an adjunct Professor of Health at the University of Canberra. He received a BEng (Hons I) in electrical engineering, BSc (Comp Sci) and PhD (telecomm) from the University of Newcastle Australia. His research focusses on applications Machine Learning to text processing.

Please feel free to forward this invite to your friends and colleagues who might be interested. Thanks.

Posted in Big Data, Data Mining | Tagged , | 5 Comments

Recordings of RStudio Webinar Series on Essential Tools for Data Science with R

by Yanchang Zhao,

RStudio recently ran a series of live webinars on Essential Tools for Data Science with R, but it is inconvenient for people from other time zones to attend. Fortunately, the recordings have been made available online, which you can watch if you haven’t attended the live webinars. Below is a list of recordings.

1. The Grammar and Graphics of Data Science
– dplyr: a grammar of data manipulation – Hadley Wickham
– ggvis: Interactive graphics in R – Winston Chang
– URL:

2. Reproducible Reporting
– The Next Generation of R Markdown – Jeff Allen
– Knitr Ninja – Yihui Xie
– Packrat – A Dependency Management System for R – J.J. Allaire & Kevin Ushey
– URL:

3. Interactive Reporting
– Embedding Shiny Apps in R Markdown documents – Garrett Grolemund
– Shiny: R made interactive – Joe Cheng
– URL:

Posted in R | Tagged | 4 Comments