Statistics Essentials

These are my notes on statistics essentials.

Table of Contents

Statistics and Problem Solving

A population is the total set of subjects or things we are interested in studying. Populations are defined by what a researcher is studying and can come in all shapes and sizes.


A frame is a list containing all members of the population.


Population parameters are facts about the population. Since parameters are descriptions of the population, a population can have many parameters. Parameters can be averages, percentages, minimums, or maximums. For a specific population at a specific point in time, population parameters do not change.


A sample is a subset of the population which is used to gain insight about the population. Samples are used to represent a larger group, the population. 


A statistic is a fact or characteristic about the sample. For any given sample a statistic is a fixed number. Statistics are used as estimates of population parameters. 


A process is a method for obtaining a desired result. The idea of a process is closely tied to quality control. In order to improve a process, there must be an understanding of how the process is currently performing. This required definition and measurement of the process. 


The science of statistics is divided into two categories, descriptive and inferential. Descriptive methods describe and summarize data. Descriptive statistics is the collection, organization, and presentation of data.


The objective of inferential statistics is to make reasonable guesses about the population characteristics using sample data. 

Collecting and Analyzing Data

Part of becoming a problem solver and user of statistics is developing an ability to appraise the quality of measurements. When you encounter data, consider whether the concept under study is adequately reflected by the proposed measurements, is the data measured accurately, and is there a sufficient quantity of the data to draw a reasonable conclusion.


Measurement and data are an integral part of science. Methods have been developed to solve research problems. Gather information about the phenomenon being studied. On the basis of the data, formulate a preliminary generalization or hypothesis. Collect further data to test the hypothesis. If the data and other subsequent experiments support the hypothesis, it becomes a law.


There are two ways to obtain data, observation and controlled experiments. In a statistical analysis, it is usually not possible to recover from poorly measured concepts or badly collected measurements. 


A response variable measures the outcome of interest in a study. An explanatory variable causes or explains changes in a response variable. Isolating the effects of one variable on another means anticipating potentially confounding variables and designing a controlled experiment to produce data in which the values of the confounding variable are regulated.


Observational data comes about from measuring things. They can be extremely valuable. 


Much of the statistical information presented to us is in the form of surveys. So, it is important to understand them and how they are done. In some cases, the purpose of a survey is purely descriptive. However, in many cases the researcher is interested in discovering a relationship.


Data in which the observations are restricted to a set of values that possess gaps is called discrete. Data that can take on any value within some interval is called continuous. The quality of data is referred to as its level of measurement. When analyzing data, you must be exceedingly conscious of the data’s level of measurement because many statistical analyses can only be applied to data that possess a certain level of measurement. 


Data that represents whether a variable possesses some characteristic is called nominal. Ordinal data represents categories that have some associated order. Note that ordinal data is also nominal, but it also possesses the additional property of ordinality. 


If the data can be ordered and the arithmetic difference is meaningful, the data is interval. An example of interval data is temperature. Interval data is numerical data that possesses both the property of ordinality and the interval property. Ratio data is similar to interval data, except that it has a meaningful zero point and the ratio of two data points is meaningful. 


Qualitative data is data measured on a nominal or ordinal scale. Quantitative data is measured on an interval or ratio scale. 


Time series data originates as measurements usually taken from some process over equally spaced intervals of time. Time series data originate from processes. Processes can be divided into two categories: stationary and nonstationary. All time series that are interesting vary, and the nature of the variability determines how the process is characterized. In a stationary process the time series varies around some central value and has approximately the same variation over the series. In a nonstationary process, the time series possess a trend, the tendency for the series to either increase or decrease over time. 


Cross-sectional data are measurements created at approximately the same period of time.