Epidemiology and biostatistics

Instructions: Please answer all questions thoroughly. Please give a detailed rational for all
multiple choice answers selected. please show all work on the calculations

Assignment on Epidemiology a7 BioStatistics
5 What number and percentage of the 44 depressed subjects were treated with antidepressant, do
you think adequate number received treatment medication

Of the 44 participants who were depressed subjects were treated with anti-depressants only 13%
were reported to be using anti-depressants 37% were not being treated with any convectional
interventions like exercise and herbal, which one would say was not significant enough to
attribute antidepressants as an intervention strategy, also chi-square didnt attribute antidepressant
as significant intervention strategy

  1. The researcher excluded person from the study who had history of psychiatric illness

In a clinical trial, the investigators must specify Inclusion and exclusion criteria for
participation in the study. Inclusion criteria are characteristics that the prospective subjects must
have if they are to be included in the study, while exclusion criteria are those characteristics that
disqualify prospective subjects from inclusion in the study. Inclusion and exclusion criteria
may include factors such as age, sex, race, ethnicity, type and stage of disease, the subject’s
previous treatment history, and the presence or absence (as in the case of the “healthy” or

“control” subject) of other medical, psychosocial, or emotional conditions. One would say in this

Exercise 2

  1. What statistics were used to describe the demographic variable Estimated Year Family Income
    Measure of central tendency, distribution and dispersion
    7 should demographic variable education be analyzed with parametric or non parametric
    statistical technique
    Education level here is ordinal variable therefore it will use non parametric statistical technique,
    Nonparametric methods are useful for analysis of nominal or ordinal data. They are also useful
    whenever questions occur concerning the underlying assumptions of a counterpart parametric
    procedure for interval or ratio data. In general, parametric procedures will have nonparametric
    counterparts, although the hypothesis tested will not always be exactly the same. For example, a
    parametric two-sample test for differences in means, may have a counterpart nonparametric test
    which is a two-sample test for differences in medians.

Exercise 3

  1. Looking at Table 1, what descriptive analysis techniques were performed on interval and ratio

The interval /ratio data was analysed using Descriptive statistics which analysis of data that
helps to describe, show or summarize data in a meaningful way such that, for example, patterns
might emerge from the data. Descriptive statistics do not, however, allow us to make conclusions
beyond the data we have analysed or reach conclusions regarding any hypotheses we might have

made. They are simply a way to describe our data., which consisted of Measures of central
tendency: these are ways of describing the central position of a frequency distribution for a group
of data. In this case, the frequency distribution is simply the distribution and pattern of marks
scored by the 100 students from the lowest to the highest. We can describe this central position
using a number of statistics, including the mode, median, and mean. You can read about
measures of central tendency and Measures of spread: these are ways of summarizing a group of
data by describing how spread out the scores are. Measures of spread help us to summarize how
spread out these scores are. To describe this spread, a number of statistics are available to us,
including the range, quartiles, absolute deviation, variance and standard deviation.

9.0 Are there significant difference between the intervention and the control groups of the

Chi square did not show any significance difference between the intervention and control group
as the P<0.05 which rule out any significance difference

Exercise 4

  1. what number or percentage(%) of the total number of respondents used CRT .

Each entry in the table contains the frequency or count of the occurrences of values within a
particular group or interval, and in this way, the table summarizes the distribution of values in
the sample. enerally the class interval or class width is the same for all classes. The classes all
taken together must cover at least the distance from the lowest value (minimum) in the data set
up to the highest (maximum) value, In case of our set of data the cumulative frequency and the
percentage is 45% and the frequency is 24

6 Explain why the number of total subjects in Table 2 is for 859 subjects when the total subjects
of the sample is stated as 869

Because of missing value and exclusion criteria

: Frequency Distributions w ith Percentages
Answer questions: 4 and 10

  1. what level of educational achievement by the mother is the mode

We can easily the following formula is used to identify the modal group (the group with the
highest frequency), which is 11 – 15 schooling years
Estimated Mode = L +   f m − f m-1

 × w
(f m − f m-1 ) + (f m − f m+1 )

 L is the lower class boundary of the modal group
 f m-1 is the frequency of the group before the modal group
 f m is the frequency of the modal group
 f m+1 is the frequency of the group after the modal group
 w is the group width
10 Do you think the sample can be generalized for the population of Whole of USA
Indeed NO s because the sample is not inclusive of all demographics although and seems not to
include other races like Africa America, Hispanic among others also the sample too small
Exercise 6
Cumulative percentage and Percentile ranks
5.0 what number and percentage of nurses documented a different pain score from grimacing
9.is this study only applicable to elderly population

The study is all inclusive and can be validly be used by elderly as well as younger pattient

Exercise 7
In fig 2 which value is placed in Y axis and X axis
Y axis is the dependent variable while X axis is the independent variable

  1. Examining figure 1 and 2 and compare their distribution patterns, are they similar in the

Figure 1 shows a normally distributed data while Figure 2 show left skewed data, When you
have a normally distributed sample you can legitimately use both the mean or the median as your
measure of central tendency. In fact, in any symmetrical distribution the mean, median and mode
are equal. However, in this situation, the mean is widely preferred as the best measure of central
tendency as it is the measure that includes all the values in the data set for its calculation, and any
change in any of the scores will affect the value of the mean. This is not the case with the median
or mode.

Exercise 8

  1. The breast feeding rate post intervention score were better than pre-intervention score as
    shown by the mean as well as standard deviation
  2. The implications that it is necessary to advocate or promote breast feeding as post
    intervention as opposed to pre-intervention
    Exercise 11
    3What other statistics could be used to determine length of labour
    Mean, Mode, maximum, variance and Stardard deviation
    Can the findings from the study be generalized to include all black women
    Yes, because the sample was adequate and it will be inclusive

Exercise 11
Determine mode, median and mean of the following nursing students enrolled in year 2001 t0
563, 593, 606, 520, 563, 610 and 577
To find the Mean, add up all the numbers, then divide by how many numbers there are:
To find the Mode, or modal value, place the numbers in value order then count how many of
each number. The Mode is the number which appears most often (you can have more than one
mode),in this case 563 appears twice

To find the Median, place the numbers in value order and find the middle number (or the mean
of the middle two numbers). In this case the mean of the 10 th and 11 th values:

563, 563,520, 577,593,606, 610
9 Assuming an alpla=0.01 which nursing speciality demonstrated a significant change in
popularity between 1 and 2 in questionnaire administration, Cronbach’s alpha determines the
internal consistency or average correlation of items in a survey instrument to gauge its reliability.
Computation of alpha is based on the reliability of a test relative to other tests with same number
of items, and measuring the same construct of interest Alpha coefficient ranges in value from 0
to 1 and may be used to describe the reliability

Execise 16

1.0 The researchers analysed the data they collected as though it were at what level of
Ordinal scale
4.- comparing the mean baseline and post test depression scores of control group, it is very clear
that including the control group intervention strengthen the experiment because it give an
opportunity to analyze the experiment holistically reducing bias in interpretation

Exercise 19 Skewness of a distribution


A histogram with two peaks is called “bimodal” since it has two values or data ranges that appear
most often in the data. In a process that is repeated over time, we typically expect the data to
appear in the familiar, bell-shaped curve of the normal distribution. Thus, the bimodal histogram
can signal something out of the ordinary. When viewing this histogram, the data looks quite
different – in fact, this second histogram almost seems to have a roughly normal distribution (or
slightly skewed distribution) with a single peak

  1. Negatively Skewed

negatively skewed distribution, the mode is higher than the median which is higher than the
mean therefore in our case the data set has most of the score above the mean, meaning most of
the 3rd moment about the mean is called skewness .In a negatively skewed distribution the tail of
a distribution points toward the low scores

Exercise 22

The relationship is a positive significant relationship, where the dependent variable is influencing
independent variable positively, increase in dependent variable leads to increase in independent

  1. The figure 1 shows an extreme value, else called an outlier, which can be seen in the presence
    of a very large mean, and therefore interfering with normal distribution

Exercise 23 Pearson products-moment

There is a significant association between strength index 120/ s and triple hop index with p value
less tha o.o5 The Pearson product-moment correlation coefficient is a measure of the strength
and direction of association that exists between two variables measured on at least an interval

10 The R is a measure of the correlation between the observed value and the predicted value of
the criterion variable. R Square (R2) is the square of this measure of correlation and indicates the
proportion of the variance in the criterion variable which is accounted for by our model. In
essence, this is a measure of how good a prediction of the criterion variable we can make by
knowing the predictor variables, in this case are indicate 66% of the association

Exercise 29 ttest for independent groups

3.0 The ttest of -3.15 is significant at p<0.05 indicating a statistical significant difference
between women and men, first because it is lower than critical value of the study, For a two-

tailed test if the calculated value of t exceeds the tabled value, then report the p value in the table.
For a one-tailed test, the p value is divided by two. So ‘p < 0.05’ becomes ‘p < 0.025.”

The table should include values for p=0.1 so that a one-tailed test can be conducted at the p=0.05
level, Negative t-values: The sign of a t-value tells us the direction of the difference in sample


We do report t test value as an absolute value, so whether negative or positive does not matter
here therefore ttest with absolute value 2.50 is smaller than 2.74. Case I represents the null
hypothesis (H O : µ 1 = µ 2 ) indicating that the mean of group one equals the mean of group two;
both samples come from the same population. This would signify that the drug had no effect on
blood pressure. The difference in the means is small, suggesting that they come from the same
population. Case II represents the alternate hypothesis (H A :µ 1 ≠ µ 2 ), indicating that the mean of
group one does not equal the mean of group two; the two sample means are from different
populations. The difference in the means is too large to come from one population in most cases.
Hence the means are probably coming from two different populations. A t-test decides which of
these hypotheses to accept.

Exercise 36 ANOVA


Participants in the intervention group reported a reduction in mobility difficulty at 12 weeks, ans
this is significant as shown by P vale of <0.05

6.0 The one-way analysis of variance (ANOVA) is used to determine whether there are any
significant differences between the means of three or more independent (unrelated) groups
therefore not appropriate in the case of one group