Methods Of Data Collection In Statistics

These are my notes on methods of data collection in statistics.

Ankr Store on Amazon, keep your electronics charged by the best! f you buy something, I get a small commission and that makes it easier to keep on writing. Thank you in advance if you buy something.


If you want to draw valid conclusions from a study, you must collect the data according to a well developed plan. This plan must include the question or questions to be answered as well as an appropriate method of data collection and analysis. 

Methods Of Data Collection
First, we need to introduce some terms and concepts.
A population is the entire group of individuals or items that we are interested in.
A frame is a list of all the members from which the sample is to be taken. This is usually the same as the population.
A sample is the part of the population that is actually being examined.
A sample survey is the process of collecting information from a sample. Information obtained from the sample is usually used to make inferences about a population parameters.
A census is the process of collecting information from all the units in a
population. It is feasible to do a census if the population is small and the
process of getting information does not destroy or modify units of the
population. A census is often too costly and sometimes too damaging to the
population being studied. We usually have to take samples instead. 

Experiments and Observational Studies
An experiment is a planned activity that results in measurements. In an
experiment, the experimenter creates differences in the variables involved in
the study and then observes the effects of such differences on the resulting
measurements. 

An observational study is an activity in which the experimenter observes the
relationships among variables rather than creating them. Experiments have some
advantages, but unfortunately, it is impossible or unethical to conduct an
experiment. Sometimes we must use an observational study. 

One of the problems with observational studies is that their results often
cannot be generalized to a population, because many observational studies use
samples that are not representative of the population at interest.  These
samples might simply be easiest to obtain. Another problem is that of
confounding factors. These occur when the two variables of interest are related
to a third variable instead of just to each other. 

Planning and Conducting Surveys
There are many methods of getting a sample from the population. Some sampling
methods are better than others. Biased sampling methods result in values that
are systematically different from the population values or systematically favor
certain outcomes. 

Judgmental sampling, samples of convenience, and volunteer samples are some of
the methods that generally result in biased outcomes. Sampling methods that are
based on a probabilistic selection of samples such as simple random sampling,
generally result in unbiased outcomes. 

Judgmental sampling makes use of a nonrandom approach to determine which item
of the population is to be selected in the sample. The approach is entirely
based on the judgement of the person selecting the sample. 

Using a sample of convenience is another method that can result in biased
outcomes. Samples of convenience are easy to obtain. 

Volunteer samples, in which the subjects choose to be part of the sample, may
also result in biased outcomes.

Simple Random Sampling
Simple random sampling is a process of obtaining a sample from a population in
which each member has an equal chance of being selected. In this type of sample,
there is no bias or preference for one individual over another. Simple random
samples, also known as random samples, are obtained in two different ways:
sampling with replacement from a finite population and sampling without
replacement from an infinite population. 

To select a simple random sample from a population, we need to use some kind of
chance mechanism. 

Beside simple random sampling, there are other sampling procedures that make
use of a random phenomenon to get a sample from a population. In a systematic
sampling procedure, the first item is selected at random from the first k items
in the frame, and then every kth item is included in the sample. This method is
popular among biologists, foresters, environmentalists, and marine scientists. 

In stratified random sampling, the population is divided into groups called
strata, and a simple random sample is selected from each stratum. Strata are
homogeneous groups of population units. The units in a given stratum are similar
in some characteristics, whereas those in different strata differ in those
characteristics. 

In proportional sampling, the population is divided into groups called strata,
and a simple random sample of size proportional to the stratum size is selected
from each stratum. Proportional sampling is the preferred method of stratified
sampling. 

In cluster sampling, a population is divided into existing, non-homogeneous
groups called clusters. A simple random sample of clusters is obtained, and all
individuals within the selected clusters are included in the sample. In order to
safely use cluster sampling, each cluster should be representative of the
population as a whole. Cluster sampling is often used to reduce the cost of
obtaining a sample, especially when the population is large.

Bias In Surveys
For a survey to produce reliable results, it must be properly designed and
conducted. Sample should be selected using a proper randomization technique. A
nonrandom selection will limit the generalizability of the results. Furthermore,
interviewers should be trained in proper interviewing techniques. The attitude
and behavior of the interviewer should not lead to any specific answers, because
this would result in a biased outcome. Questions should be carefully worded, as
the wording of a question can affect the response, and leading questions should
be avoided. 

A survey is biased if it systematically favors certain outcomes. Response bias
occurs when a respondent provides an answer that is either factually wrong or
does not accurately reflect his or her true opinion. 

Non-response bias may occur if the person selected for an interview cannot be
contacted or refuses to answer. If such individuals are different, as a group,
from those who are eventually interviewed, then the results may not accurately
reflect the whole population.

Under-coverage bias may occur if part of the population is left out of the
selection process. 

Wording effect bias may occur if confusing or leading questions are asked. 

Planning Experiments
A dependent or response variable is the variable to be measured in the
experiment. An independent or explanatory variable is a variable that may
explain the differences in responses. We are interested in studying the
effects of independent variables on dependent variables.

An experimental unit is the smallest unit of the population to which a
treatment is applied. A confounding variable is a variable whose effect on the
response cannot be separated from the effect of the explanatory variable. In
properly constructed experiments, an experimenter tries to control confounding
variables. Confounding can be an even more serious problem in observational
studies because the experimenter has no control over the confounding variables. 

A factor is a variable whose effect on the response is of interest in the
experiment. Factors are of two types- qualitative and quantitative. Qualitative
is where the data is in non-numerical groups. Quantitative is where the data can
be measured numerically. 

Levels are the values of a factor used in the experiment. An experiment can have
one or more factors. The number of levels used in the experiment may differ from
factor to factor. Treatments are the factor-level combination used in the
experiment. If the experiment has only one factor, then all the levels of that
factor are considered treatments of the experiment. 

A control group is a group of experimental units similar to all the other
experimental units except that it is not given any treatment. A control group is
used to establish the baseline response expected from experimental units if no
treatment is given. 

A placebo groups is a control group that receives a treatment that looks and
feels similar to an experimental treatment but is expected to have no effect. A
placebo is a medicine that looks exactly like the real medicine but does not
contain any active ingredients. 

Single and Double Blind Experiments
Similarly, it is possible that measurements or subject interaction will be
biased if the person taking the measurements knows whether a person received a
placebo or not. Blinding technique is used in medical experiments to prevent
such a bias. The blinding technique can be used in two different fashions-
double blinding and single blinding. In a single blind experiment, either the
person does not know which treatment he or she is receiving or the person
measuring the patient's reaction does not know which treatment was given. In
a double blind experiment, both the patient and the person measuring the
patient's reaction do not know which treatment the patient was given. 

Double blind experiments are preferred, but in certain situations they simply
cannot be conducted. 

Randomization
The technique of randomization is used to average the effects of extraneous
factors on responses. In other words, it balances the effects of factors you
cannot see.

If each experimental unit is supposed to receive only one treatment, then which
experimental unit receives which treatment should be determined randomly. If
each experimental unit is supposed to receive all treatments, then the order of
treatments should be determined randomly for each experimental unit. 

Blocking
The technique of blocking is used to control the effects of known factors. A
block is a group of homogeneous experimental units. Experimental units in a
block are similar in certain characteristics, whereas those in different blocks
differ in those characteristics. 

Replication
Replication refers to the process of giving a certain treatment numerous times
in an experiment, or even repeating an experiment multiple times. Replication
reduces chance variation among results. It also allows us to estimate chance
variation among results. 

Completely Randomized Design
In a completely randomized design, treatments are assigned randomly to all
experimental units, or experimental units are assigned randomly to all
treatments. This design can compare any number of treatments. There are
advantages in having an equal number of experimental units for each treatment.

Randomized Block Design
If the treatments are the only systematic differences present in the
experiment, then the completely randomized design is best for comparing the
responses. But often there are other factors affecting responses. Unless they
are controlled, the results will be biased. One way to control the effects of
known extraneous factors is to form groups of similar units called blocks. In a
randomized block design, all experimental units allows the experimenter to
account for systematic differences in responses due to a known factor and leads
to more precise conclusions from the experiment. 

Matched-Pairs Design
If there are only two treatments to be compared in the presence of a blocking
factor, then you should use a randomized paired comparison design. This can be
designed in different ways. Form two or more blocks of two experimental units
each. Experimental units within each block should be matched by some relevant
characteristics. Within each block, toss a coin to assign two treatments to the
two experimental units randomly. Each block will have one experimental and one
control unit. Because both experimental units are similar to each other except
for the treatment received, the differences in responses can be attributed to
the differences in treatments. This type of experiment is called a matched-pairs
design. 

Alternatively, each experimental unit can be used as its own block. Assign both
treatments to each experimental unit, but in random order. To control the effect
of the order of treatment, randomly determine the order. With each experimental
unit, toss a coin to decide whether the order of treatments should be treatment1
or treatment2. Because both treatments are assigned to the same experimental
unit, the individual effects of experimental units are nullified, and the
differences in responses can attributed to the differences in treatments. 

Control groups allow us to see what would happen to experimental units over
time. Placebo groups go above and beyond control groups by implementing an
intervention or treatment that is similar to the target intervention or
treatment. This helps us account for even more external factors.