Full paper “Human-in-the-loop Regular Expression Extraction for Single Column Format Inconsistency” by Shaochen Yu, Lei Han, Marta Indulska, Shazia Sadiq and Gianluca Demartini accepted at TheWebConf WWW2023
We propose a novel hybrid human-machine system that leverages crowdsourcing to address syntactic format inconsistencies in an effective and cost-efficient way. We first ask crowd workers to select training examples for our inference algorithm through data selection and result validation. Then, we propose and make use of a novel rule-based learning algorithm to infer the regular expression that works for the format consistency issues in a given structured dataset. In this way, we are able to apply the created regular expression to the entire dataset to find more consistency issues. Having experts writing regular expressions is no longer required.
Welcome to Catherine Sai (Kate) who has joined CIRES for two months as a visiting PhD researcher from the Technical University of Munich. Kate has a Masters in Industrial Engineering from Karlsruhe Institute of Technology (KIT), and four years industry experience, including working as a Cognitive Computing Consultant and Data Scientist.
She is focused on data driven business process improvement, and her PhD topic focuses on the automatic identification, extraction, and in-depth comparison of process requirements from complex texts with a business actual process. During her time at CIRES, Kate will work with Associate Professor Gianluca Demartini and Professor Shazia Sadiq on a project involving the automated identification or relevant (textual) requirements based on a business process.
Kate’s excited to join forces across research institutions and countries, and is looking forward to collaborating with CIRES partners and researchers across the Centre.
We’d like to welcome Muhammed Elyas Meguellati to CIRES! Elyas is a PhD researcher based at The University of Queensland. He will be working on the Customer Data Stories project in collaboration with our industry partner Allianz Worldwide Partners Australia, partner investigator Mr Shane Downey MPhil, Associate Professor Gianluca Demartini, and Professor Shazia Sadiq.
Elyas has a Masters Degree in applied computing from the University of Malaya, and his research interests include natural language processing and deeplearning. Welcome Elyas!!
Welcome back to 2023! We’d like to extend a very warm welcome to our first CIRES Postdoctoral Research Fellow, Junliang Yu!
Junliang is based at The University of Queensland and his work will particularly focus on dealing with data sparsity and noise problems in real-world datasets, and on promoting algorithmic transparency.
During his PhD, he published over 10 peer-reviewed papers in the most prestigious conferences including the Conference on Knowledge Discovery and Data Mining (KDD), World Wide Web (WWW), IEEE’s International Conference on Data Mining (ICDM), AAAI, the Conference on Information and Knowledge Management (CIKM), and journals including IEEE’s Transactions on Knowledge and Data Engineering (TDKE) and the International Journal on Very Large Data Bases (VLDBJ).
Junliang’s research interests are data mining and machine learning, with a particular focus on recommender systems, tiny machine learning, and self-supervised learning. He will work closely with Centre Director Professor Shazia Sadiq and CIRES Chief Investigator Associate Professor yin hongzhi.
Junliang is looking forward to building successful collaborative relationships across the Centre and making great contributions to CIRES.