Coronavirus data analysis with R, tidyverse and ggplot2

Coronavirus data analysis – an analysis of data around the Novel Coronavirus (COVID-19) with R, tidyverse and ggplot2. Download full analysis reports at links below.

Coronavirus data analysis – world wide
http://www.rdatamining.com/docs/Coronavirus-data-analysis-world.pdf

Coronavirus data analysis – China
http://www.rdatamining.com/docs/Coronavirus-data-analysis-china.pdf

Coronavirus - cases by country

Posted in R | Tagged | 7 Comments

An 8-hour course on R and Data Mining

I will run an 8-hour course on R and Data Mining at Black Mountain, CSIRO, Australia on 10 & 13 December 2018.

The course materials, incl. slides, R scripts and datasets, are available at http://www.rdatamining.com/training/course.

Below is outline of the course.

Part I:
– R Programming: basics of R language and programming, parallel computing, and data import and export
– Data Exploration and Visualisation: summary, stats and various charts
– Regression and Classification: linear regression and logistic regression, decision trees and random forest
– Data Clustering: k-means clustering, k-medoids clustering, hierarchical clustering and density-based clustering

Part II:
– Time Series Analysis: time series decomposition, forecasting, classification and clustering
– Association Rule Mining: mining and selecting interesting association rules, redundancy removal, and rule visualisation
– Text Mining: text mining, word cloud, topic modelling, and sentiment analysis,
– Network Analysis and Graph Mining: graph construction, graph query, centrality measures, and graph visualisation
– Big Data: Hadoop, Spark and R

Posted in Data Mining, R | Tagged , | Leave a comment

CFP: AusDM 2018, Bathurst, Australia, 28-30 Nov 2018

16th Australasian Data Mining Conference (AusDM 2018)
Bathurst, Australia,
28-30 November 2018 
The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. It is devoted to the art and science of intelligent analysis of (usually big) data sets for meaningful (and previously unknown) insights. This conference will enable the sharing and learning of research and progress in the local context and new breakthroughs in data mining algorithms and their applications across all industries.
Since AusDM’02 the conference has showcased research in data mining, providing a forum for presenting and discussing the latest research and developments. Built on this tradition, AusDM’18 will facilitate the cross-disciplinary exchange of ideas, experience and potential research directions. Specifically, the conference seeks to showcase: Research Prototypes; Industry Case Studies; Practical Analytics Technology; and Research Student Projects. AusDM’18 will be a meeting place for pushing forward the frontiers of data mining in academia and industry.
Publication and topics
We are calling for papers, both research and applications, and from both academia and industry, for presentation at the conference. All papers will go through double-blind, peer-review by a panel of international experts. Accepted papers will be published in the AusDM 2018 proceedings by Springer. Some selected papers will be invited for submission with extension in a special edition of a Springer journal. Please note that we require that at least one author for each accepted paper will register for the conference and present their work. One full registration will cover at most two papers.
AusDM invites contributions addressing current research in data mining and knowledge discovery as well as experiences, novel applications and future challenges. Topics of interest include, but are not restricted to:
– Applications of Data Mining and Case Studies – Big Data Analytics
– Biomedical and Health Data Mining
– Business Analytics
– Computational Aspects of Data Mining
– Data Integration, Matching and Linkage
– Data Mining Education
– Data Mining in Security and Surveillance
– Data Preparation, Cleaning and Preprocessing
– Data Stream Mining
– Implementations of Data Mining in Industry
– Integrating Domain Knowledge
– Knowledge Discovery and Presentation
– Link, Tree, Graph, Network and Process Mining
– Multimedia Data Mining
– Mobile Data Mining
– New Data Mining Algorithms
– Privacy-preserving Data Mining
– Spatial and Temporal Data Mining
– Text Mining
– Web and Social Network Mining
Keynote speakers
As is tradition for AusDM we have lined up an excellent keynote speaker program. Each speaker is a well known researcher and/or practitioner in data mining and related disciplines. The keynote program provides an opportunity to hear from some of the world’s leaders on what the technology offers and where it is heading.
Submission of papers
We invite three types of submissions for AusDM 2018:
– Academic submissions: Regular academic submissions can be made in Research Track reporting on research progress, with a paper length up to 12 pages. For academic submissions we will use a double-blind review process, i.e. paper submissions must NOT include author names or affiliations (and also not acknowledgements referring to funding bodies). Self-citing references should also be removed from the submitted papers (they can be added on after the review) for the double blind reviewing purpose.
– Industry submissions: Submissions can be made in the Application Track to report on specific data mining implementations and experiences in governments and industry projects. Submissions in this category can be up to 12 pages. These submissions do not need to be double-blinded. A special committee made of industry representatives will assess industry submissions.
– Industry Showcase submissions: Submission from industry and government on an analytics solution that has raised profits, reduced costs and/or achieved other important policy and/or business outcomes can be made in this track with a one page Abstract only.
Paper submissions are required to follow the general format specified for papers. LaTeX styles and Word templates will be available while LaTeX will be the recommended typesetting package.
The electronic submissions must be in PDF only, and made through the AusDM’18 Submission Page.
Important Dates
Paper Submission Closed: Friday 20 July 2018
Authors Notified: Monday 1 October 2018
Camera Ready Submission: Monday 15 October 2018
Preliminary Program Available: Wednesday 31 October 2018
Early Bird Cut-Off Date for Authors: Friday 2 November 2018
Conference Dates: Wednesday 28 – Friday 30 November 2018
Posted in Data Mining | 1 Comment

Vacancy of Research Scientist in Data Analytics, Data61, CSIRO, Sydney

Research Scientist in Data Analytics, Data61, CSIRO, Sydney

  • Undertake innovative research in the area of Data Analytics
  • Use your expertise in machine learning or data mining to solve real-word problems!
  • Join CSIRO Data61’s team, the largest data innovation group in Australia

 

The Position:

At Data61 we are bringing together exceptional people from research and industry to create the largest data innovation group in Australia.  Data61’s Analytics Research Group delivers innovative data analytical solutions for industry, and currently has an opportunity for an experienced research scientist in the field of machine learning and/or data mining.

As the successful candidate you will join an award-winning team and benefit from interactions with Data61’s world-class researchers.  You will perform various data analytics related to infrastructure and asset management as outlined below.

Under the direction of the Principal Research Scientist you will:

  • Develop algorithm, interface, and system for intelligent infrastructure/asset management.
  • Research on data analytics and related topics for infrastructure and asset management.
  • Conduct data pre-processing, analysis, and other necessary tasks related to the research problem.
  • Communicate effectively and respectfully with all staff, clients and suppliers in the interests of good business practice, collaboration and enhancement of CSIRO’s reputation.
  • Work as part of a multi-disciplinary, often regionally dispersed research team, to carry out tasks under limited direction in support of scientific research.
  • Work collaboratively with colleagues within your team, the business unit and across CSIRO, to reach objectives.

 

Location:             Eveleigh, NSW

Salary:                 CSOF5 – AU $95K – $103K plus up to 15.4% superannuation

Tenure:                Specified term of 1 year and 6 months

Reference:           46942

Closing date:       Applications will remain open until filled.

 

Pre-Requisites:

Education/Qualifications: A doctorate and or equivalent research experience in a relevant discipline area, such as Machine learning and Data Mining.

Communication: Excellent written and oral communication skills, evidenced by high-level reporting, presentation and negotiation abilities, and the capacity to identify and influence critical stakeholders to gain support for contentious proposals/ideas.

To be successful you will need:

Strong research record evidenced by high quality publications.

Strong motivation for industrial innovation.

Good programming skills.

 

Desirable criteria:

Experience with one or more of the following:

o Bayesian nonparametrics

o nonhomogeneous stochastic process

o large scale data mining/processing/inference

o Experience with dynamic processes/networks, or spatial-temporal data.

o Experience with software tool development based on popular platforms such as Spark, Azure, or similar.

 

How to Apply: 

Please upload one document only containing both your CV/Resume and cover letter providing enough information relevant to this position to enable the selection panel to determine your suitability.  If your application proceeds to the next stage you may be asked to provide additional information.

Please view the full position details and instructions on how to apply here:

https://jobs.csiro.au/job/Sydney%2C-NSW-Research-scientist/428894300/

 

Who we are:

The Commonwealth Scientific and Industrial Research Organisation (CSIRO)

http://www.csiro.au/

Posted in Uncategorized | 1 Comment

Vacancy of Senior Research Scientist in Data Analytics, Data61, CSIRO, Sydney

Senior Research Scientist

URL: https://jobs.csiro.au/job/Sydney%2C-NSW-Senior-Research-Scientist/428894500/

  • Undertake innovative research in the area of Data Analytics
  • Use your expertise in machine learning or data mining to solve real-word problems!
  • Join CSIRO Data61’s team, the largest data innovation group in Australia

 

The Position

At Data61 we are bringing together exceptional people from research and industry to create the largest data innovation group in Australia.  Data61’s Analytics Research Group delivers innovative data analytical solutions for industry, and currently has an opportunity for an experienced senior research scientist in the field of machine learning and/or data mining.

As the successful candidate you will join an award-winning team and benefit from interactions with Data61’s world-class researchers.  You will perform various data analytics related to infrastructure and asset management as outlined below.

Under the direction of the Principal Research Scientist you will:

  • Develop algorithm, interface, and system for intelligent infrastructure/asset management.
  • Research on data analytics and related topics for infrastructure and asset management.
  • Conduct data pre-processing, analysis, and other necessary tasks related to the research problem.
  • Communicate effectively and respectfully with all staff, clients and suppliers in the interests of good business practice, collaboration and enhancement of CSIRO’s reputation.
  • Work as part of a multi-disciplinary, often regionally dispersed research team, to carry out tasks under limited direction in support of scientific research.
  • Work collaboratively with colleagues within your team, the business unit and across CSIRO, to reach objectives.

 

Location:             Eveleigh, NSW

Salary:                 CSOF6 – AU $109K – $128K plus up to 15.4% superannuation

Tenure:                Specified term of 3 years

Reference:           46943

Closing date:       Applications will remain open until filled.

 

Pre-Requisites:

Education/Qualifications: A doctorate and or equivalent research experience in a relevant discipline area, such as Machine learning and Data Mining.

Communication: Excellent written and oral communication skills, evidenced by high-level reporting, presentation and negotiation abilities, and the capacity to identify and influence critical stakeholders to gain support for contentious proposals/ideas.

To be successful you will need:

Experience in conducting research projects either within universities or within an industry based research lab.

Excellent research record evidenced by quality publications in high impact Machine Learning, AI, Data Mining conferences and journals.

Proven track-record of contributions towards industrial innovation.

Experience in (co-)supervising PhD students or research staff in research projects and high-quality publications.

Good programming skills using Python, Matlab, or similar program.

A significant record of science innovation and creativity plus the ability to apply well developed research skills to scientific investigations.

Desirable criteria:

Experience with one or more of the following:

o Bayesian nonparametrics

o nonhomogeneous stochastic process

o large scale data mining/processing/inference

o Experience with dynamic processes/networks, or spatial-temporal data.

How to Apply:

Please upload one document only containing both your CV/Resume and cover letter providing enough information relevant to this position to enable the selection panel to determine your suitability.  If your application proceeds to the next stage you may be asked to provide additional information.

Please view the full position details and instructions on how to apply here:
http://www.csiro.au/~/media/Positions/2017/Data61/46943_Senior_Research_Scienist_Data_Analytics_CSOF6_PD.docx

Who we are:
The Commonwealth Scientific and Industrial Research Organisation (CSIRO)
http://www.csiro.au/

Posted in Data Mining, Uncategorized | 1 Comment

RDataMining Tutorial on Machine Learning with R

I have run a tutorial on Machine Learning with R for the Melbourne Data Science Week in June 2017, which consists of four sessions:

  • R Programming:
    basics of R language and programming, parallel computing, and data import and export
  • Association Rule Mining with R:
    mining and selecting interesting association rules, redundancy removal, and rule visualisation
  • Text Mining with R:
    text mining, word cloud, topic modelling, and sentiment analysis,
  • Social Network Analysis with R:
    graph construction, graph query, centrality measures, and graph visualisation

All materials of the above tutorial, incl. PDF slides, datasets and R scripts can be downloaded as a single ZIP archive at

http://www.rdatamining.com/training/medascin/MLwR.zip

How to use it:

  1. Decompress the ZIP archive, and you will find file and folders below:
  • MLwR.Rproj: RStudio project file
  • code: R scripts
  • data: datasets
  • docs: PDF slides
  • figures: charts
  1. Open the “MLwR.Rproj” file with RStudio
  2. Open each PDF slides file (in folder “docs”) and run its corresponding R scripts (in folder “code”) to learn each topic

Detailed instructions for the tutorial are available at

http://www.rdatamining.com/training/medascin

Posted in Uncategorized | 1 Comment

Slides on Association Rule Mining with R

See my latest slides on Association Rule Mining with R at
http://www.rdatamining.com/training/medascin/RDM-slides-association-rule-mining-with-r.pdf

It is one of my tutorials on Machine Learning with R for the Melbourne Data Science Week on 1 June 2017. If you are interested, details can be found at
http://www.datasciencemelbourne.com/medascin2017/session/datamining-applications-with-r/

Posted in Uncategorized | 2 Comments

AusDM 2017: submission deadline extended to 22 May

AusDM 2017 will be a special event this year being held in conjunction with IJCAI in Melbourne. This is a tremendous opportunity to present data mining research from Australia to a wider audience, with collaborative arrangements with IJCAI to invite wider participation.

Submissions are required by 5pm Monday 22 May 2017. Visit http://ausdm17.ausdm.org for details.

Posted in Uncategorized | 1 Comment

Melbourne Data Science Week, 29 May – 2 June 2017

Melbourne Data Science Week
29 May – 2 June 2017

Two sold out events from 2016 are combining in 2017 to create what will hopefully be a great Data Science-palooza for Melbourne. Learn about applications, data, ideas and the latest tools for data science. Participate in panel sessions and break-time discussions with your colleagues from industry, academia and government. Hear from the datathon winners about how they did it.

For those who want hands on Data Science training there will be 8 full day tutorials from Mon-Thu.

I will run a tutorial on Machine Learning with R on 1 June, covering association rules, text mining and social network analysis. See details of the tutorial at http://www.datasciencemelbourne.com/medascin2017/session/datamining-applications-with-r/.

The tutorials are 80% full and will shortly sell out, so reserve your place now at  http://www.datasciencemelbourne.com/medascin2017/.
Posted in Data Mining | Tagged | 1 Comment

Short Course on R and Data Mining, University of Canberra, Fri 7 Oct 2016

Short Course on R and Data Mining

Information Technology and Engineering, University of Canberra

Fees: There is no fees for the short course but seats are limited to 60 – so register early through http://www.meetup.com/CanberraDataSci/events/234168862/

Presenters: Dr Yanchang Zhao (Adjunct Professor, UC), Professor Dharmendra Sharma

Time: 9:30am – 12:30pm, Fri 7 Oct 2016

Room: 2B02 (Building 2, room B02, University of Canberra)

Map and Parking:

http://www.canberra.edu.au/maps/pdf-maps/PARKING-Casual.pdf

Course Outline:

The course will cover R programming, data exploration and visualisation, and data mining with R. It will cover four topics below in two sessions. Each 1.5-hour session will consist of presentations on two topics, followed by lab for students to do exercises.

– R Programming and Data Exploration and Visualisation with R

– Regression and Classification with R

– Association Rule Mining with R

– Text Mining with R — an Analysis of Twitter Data

Instructions, prerequisites and slides for the course are or will be available at http://www.rdatamining.com/training/uc.

Posted in Data Mining, R, text mining | Tagged , , , | Leave a comment