In the world of research there is agreement that the social science community must improve its capability for transparency and replicability through appraising and promoting trust in publicly-funded research. Research means evidence which means data. Research funders and publishes of journal articles now expect researchers to make explicit and share the data sources they have used to underpin their findings.
The sharing of data is often the last thing on the priority list of a busy researcher, from senior academic to PhD student, and data being shared often suffer from being a ‘quick and dirty’ upload. Research data get uploaded in various repositories around the world, run by specialist data centres, by university repositories and by journals, and almost every ‘data publisher’ uses a different way of checking data they acquire. Data quality is not always rigorously assessed, partly due to the lack of skills of repository managers in appreciating disciplinary issues or the detail of data.
Based on a detailed appreciation of what makes a high quality dataset, and what checks can be made and how errors might be noted, this project aims to pass on this expertise to the research and data publishing communities via an easy-to-use tool that assesses quantitative data for known quality issues; and to create associated training materials that make explicit data quality assessment of numeric data. QAMyData will offer an easy-to-use tool/service that automatically detects some of the most common problems in numeric data and creates a ‘data health check’. Submission can be done multiple times until a ‘clean bill of health’ is produced, and issues identified are remedied. Clean data receives a clean bill of health certificate plus outputs a high quality codebook/data dictionary – both useful takeaways to demonstrate quality assurance for onwards submission to a journal or data repository.
In summary, the tool will be useful to those people charged with having to, or wanting to, share their research data, or reuse less than clean data. The associated training to be delivered through the UK Data Service, AQMen and NCRM can help to improve awareness of what makes high quality data.
The proposed key outputs are:
Our Advisory Board has a number of high profile data repositories and publishers on board who we will be working with to help scope and evaluate the tools and functionality.
Principal Investigator: Louise Corti, UK Data Service
Co-Investigator: Vernon Gayle, AQMen, University of Edinburgh
Funders: National Centre for Research Methods (NCRM), Economic and Social Research Council (ESRC)
Project dates: 8 January 2018 – 8 January 2019 (12 months)