Screening data prior to applying specific statistical analyses is a critical requirement during a
research project. The dissertation is no different. In fact, the methods chapter (chapter 3) of your
dissertation proposal should be a step-by-step plan that addresses the data screening that will be
performed and how the data will be evaluated to make sure that the planned statistical tests are
appropriate given the data obtained. This assignment will allow you to practice your
understanding of these procedures so you will be able to write this section of the dissertation
proposal.
In an essay of 250-500 words, describe data screening. Address the following items:
How does data screening differ from data cleaning?
What part of data screening checks for the actual number of responses you have for each
variable?
How is data screening related to the assumptions of the statistic you will choose to analyze your
data?
Why are descriptive statistics a part of data screening?
What kinds of plots are useful in data screening?
What can you do if your data do not meet the assumptions of your statistic?
Data Screening
Once your data is in hand whether derived from experimental, survey or archival
methods and have already been entered in the data analysis software database, prior to
conducting the actual data analysis you must resist the temptation of overlooking the data
screening step which involves critical examination of the quality of the collected data
(Creswell, 1994). This is mainly because, data screening is one of the most important and
salient issues that face researchers prior to embarking on the sophisticated multivariate
statistical analyses. Data screening is the process of checking data for errors followed by
subsequent removal or fixation of these errors. It is always important to make sure that data
screening is carried out before the actual data analysis to help in ensuring that integrity of the
collected data is assured (Flick, 2002).
However, despite the fact that data screening and data cleaning are two processes of
ensuring integrity of the collected data, they are actually different (Neumann, 2000). The
DATA SCREENING 2
main difference between these two processes is that data screening is mostly involved with
checking and detection of errors in collected data whereas data cleaning is the actual process
of removing or fixing the errors detected in the collected data through data screening (Flick,
2002). One of the most significant parts of data screening is the checking for missing data
which is concerned with checking for the actual number of responses you have for each
variable. This is crucial because it ensures that there is no missing data prior to data analysis
(Rea & Parker, 1992).
Data screening is also found to be closely related to the assumptions of the statistic
that the researcher will choose to analyse the collected data (Creswell, 1994). This is due to
the fact that the assumptions of the statistic can only be met when the integrity of the
collected data has not been compromised. Therefore, data screening helps to ensure this
integrity is maintained, thus ensuring that the assumptions of the statistic are met as well as
enabling the removal or fixation of detected errors (Rea & Parker, 1992). Hence, the
collected data is only checked whether it meets the assumptions of the statistic after it has
been subjected to data screening.
Descriptive statistics form an important part of data screening because they check for
the skewness and kurtosis of the collected data (Sauders, Philip & Thornhill, 1997). For
instance, skewness is the symmetry of a distribution of collected data whereas kurtosis is the
clustering of individual scores of collected data toward the centre of distribution for a
particular variable because there are unacceptable levels of skewness and kurtosis which can
only be determined using descriptive statistics. Thus, descriptive statistics help to identify
values that have significantly deviated from the normality, that is, the outliers (Neumann,
2000). However, there are various plots that are useful in data screening including the
histograms, bar charts, stem and leaf plots, box plots as well as scatter plots. All these kinds
of plots help to check for errors in the collected data (Sauders, Philip & Thornhill, 1997).
DATA SCREENING 3
If the collected data do not meet the assumptions of the statistic, necessary measures
are taken to fix or remove the errors in the collected data. For example, most sources of errors
observed in collected data through data screening include missing data, outliers, normality,
linearity, homoscedasticity and multicollinearity (Flick, 2002). All these sources of errors in
collected data need to be addressed through the appropriate procedures in order for the data to
meet the assumptions of the statistic (Sapsford & Jupp, 2006).
DATA SCREENING 4
References
Creswell, J.W. (1994). Research design: Qualitative and quantitative approaches. Thousand
Oaks, California: Sage.
Flick, U. (2002). An Introduction to Qualitative Research. London: Sage.
Neumann, W.L. (2000). Social Research Methods: Qualitative and Quantitative Approach,
3 rd ed. Boston, MA: McGraw Hill.
Rea, L.M. & Parker, R.A. (1992). Designing and conducting survey research: A
comprehensive guide. San Francisco: Jossey-Bass.
Sapsford, R. & Jupp, V. (2006). Data Collection and Analysis, 2 nd edition, Thousand Oaks:
Sage Publications Ltd.
Sauders, M., Philip, L. & Thornhill, A. (1997). Research Methods for Business Students.
London: Pitman Publishing.