Theme 2

Data curation at scale

To build new data curation methods through machine learning, crowd-sourcing and human-in-the-loop techniques to achieve data curation at scale.

THEME LEADER

Prof. Gianluca Demartini

Chief Investigator

About the Theme

This theme aims to address challenges related to:

Data Curation
How to improve ad-hoc/manual approaches, automated approaches and crowd-sourced approaches?

Large Scale Curation
How to address lack of repeatability and verification of outcomes and limited scalability by means of human-machine collaboration?

Holistic Framework
How to achieve improvements in data preparation tasks to enable cost effective time to value from data; traceability of data transformations, and improve repeatability and trust in data curation processes?

This research theme is led by Associate Professor Gianluca Demartini from The University of Queensland. It focuses on improving manual and automated data curation approaches and addressing scalability challenges of hybrid human-machine systems. Research leaders and partners in our team connected to this theme are working to prepare the next generation of data leaders with a comprehensive understanding of efficient and effective data curation approaches. This involves developing automated and crowd-sourced approaches to data curation, solutions addressing lack of repeatability and verification of outcomes of data curation and limited scalability, and thus improving time to value from data as well as trust in data curation processes. Centre researchers and partners involved in Theme 2 related problems have identified the need for an open source toolkit that provides proof-of-concept for cutting edge research methods and algorithms relating to data curation at scale. This research will be informed by research from other themes, for example through data conceptualisations created in Theme 5 – Agility in value creation from data.