# What Is Statistics

These are my notes on what is statistics.

Data is any collection of numbers, characters, images, or any other items that provide information about something. What is Statistics? It is a way of reasoning, along with a collection of tools and methods, designed to help us understand the world. What are Statistics? Statistics are particular calculations made from data.

The characteristics recorded about each individual are called variables. They are usually found as the columns of a data table with a name in the header that identifies what has been recorded.

Some variables are called nominal because they name categories. That means you can’t do math on the data or that it would make no sense if you did. Descriptive responses to questions are often categories.

When a variable contains measured numerical values with measurement units, we call it a quantitative variable. Quantitative variables typically record an amount or degree of something. For quantitative variables, its measurement units provide a meaning for the numbers. Some quantitative variables do not have obvious units, like the stock market. Sometimes a variable with numerical values can be treated as either categorical or quantitative depending on what we want to know from it.

For a categorical variable, each individual is assigned one of two possible values. However, some variables will have many values, this is an identifier variable. Identifier variables do not tell us anything useful about their categories because we know there is exactly one in each. Identifiers are part of what is called metadata, or the data about data.

Variables that report order without natural units are often called ordinal variables. You still have to look at what you want from your study to understand what you want to learn from the variable to decide whether to treat it as categorical or quantitative.

Models are summaries and simplifications of data that help our understanding in many ways. It is a simplification of reality that gives us information that we can learn from and use. Without making models for how data vary, we would be limited to reporting only what the data we have says.

Don’t label a variable as categorical or quantitative without thinking about the data and what they represent. The same variable can sometimes take on different roles. Do not assume that a variable is quantitative just because its values are numbers. Categories are often given numerical labels. Do not let that fool you into thinking they have quantitative meaning. Look at the context.

Always be skeptical. One reason to analyze data is to discover the truth. Even when you are told a context for the data, it may turn out that the truth is a bit different. The context colors out interpretation of the data, so those who want to influence what you think may slant the context.

Data are recorded values, whether numbers or labels, together with their context. A data table is an arrangement of data in which each row represents a case and each column represents a variable. The context ideally tells who was measured, what was measured, and how the data were collected, and why the study was performed.

An individual about whom or which we have data is a case. A respondent is someone who answers or responds to a survey. A subject is a human experimental unit, also called a participant. A participant is a human experimental unit, also called a subject. A variable holds information about the same characteristic for many cases. A categorical variable names categories with words or numerals.

A nominal variable can be applied to a variable whose values are used only to name categories. A quantitative variable is a variable in which the numbers are values of measured quantities.  A unit is a quantity or amount adopted as a standard of measurement, such as dollars or hours.

Metadata is the data about data. It can provide information to uniquely identify cases, making it possible to combine data from different sources, protect privacy, or label cases uniquely. An ordinal variable can be applied to a variable whose categorical values possess some kind of order. A model is a description or representation, in mathematical or statistical terms, of the behavior of a phenomenon based on data.

Example 1

Because of the difficulty of weighing a dolphin in the ocean, researchers caught and measured 12 dolphins, recording their weight, fin length, body length, and sex. They hoped to find a way to estimate weight from the other more easily determined quantities.

1. Who was measured?

12 dolphins

1. When were the measurements taken?

This information is not given

1. Where were the measurements taken?

This information is not given

1. Why were the measurements taken?

To find an easier way to estimate the weight of a dolphin

1. How did the researchers obtain the measurements?

Researchers collected data on the 12 dolphins they were able to catch

1. Specify whether the variables are categorical or quantitative.

The variable weight is quantitative and units were not provided

The variable fin length is quantitative and units were not provided

The variable body length is quantitative and units were not provided

The variable sex is categorical

Example 2

Researchers investigating the impact of prenatal care on newborn health collected data from 708 births during 1991-1993. They kept track of the mother’s age, the number of weeks the pregnancy lasted, the type of birth, the level of prenatal care the mother had, the weight and sex of the babies, and whether the babies exhibited health problems.

1. Identify the who for the description of data.

The 708 births

1. Identify the what for the description of data.

Baby’s health problems, sex of the babies, level of prenatal care, type of birth, weight of the babies, duration of pregnancy, and mother’s age

1. Identify the when for the description of data.

Between the years 1991-1993

1. Identify the where for the description of data

This information is not given

1. Identify the why for the description of data

To determine the effect of prenatal care on the babies health

1. Identify the how for the description of data

This information is not given