When analysing secondary data, the researcher does not have the in-depth insider understanding that a data collector would have. You should use your critical understanding of how data should be collected for a project, such as yours, in order to decide whether the pre-existing data measures up.
Emma Smith (2008) provides an excellent list of issues to think about when approaching a dataset to reanalyse it. The following is based on the questions she asks analysts to consider once the research aims have been established:
- How were the data collected?
Were the data collected by a reputable organisation with trained staff? Were robust and objective procedures used? Professional organisations with permanent staff are more likely to operate standardised and documented procedures.
- What types of questions were used?
When you consider the questions, do they appear to be unbiased and likely to produce reliable outputs? Most of our datasets have a copy of the data collection instrument (most commonly a survey form) in the documentation.
- How relevant are the data to your own research question?
Does the data contain the right topics? Did the questions asked allow you to capture the concept you want? Are the definitions used compatible with your research? It is wise not to assume anything by looking at the variable list alone, check the questions asked in the questionnaire and any other documentation which lists what was done to the data after collection.
- Can you access the data?
Most of the data in the service is available to all, but each catalogue record contains a statement about access conditions which will let you know if there are restrictions. It is wise not to rely on the questionnaire alone. The codebooks with the data will let you know what variables are in the final data and how they have been coded. Some detail may have been restricted for the purpose of protecting respondents' confidentiality.
- What are the sampling strategies and response rates?
Most statistical procedures assume that data are representative and sampled at random. Variations from this may require users to use particular analytical techniques to adjust for this. The sample size will always affect the accuracy of results.
- Who was in the population from which the sample was drawn?
Does your sample, for example, represent all individuals in the UK in 2011 or all persons aged 16 or over living in private households in Great Britain in 2014, or all households in England and Wales in 2000? You should be able to find out who was in the target population and the dates of the fieldwork from the catalogue entry.
- Are there missing values?
Missing values may indicate that a question did not work, or that not everyone had data collected in the same way. Either way, if you have a lot of missing values for a key variable you should investigate this.
Answers to these questions are usually to be found either in the catalogue record itself, or in the documentation for the study which is linked from the study catalogue record.