These are my notes and thoughts on what statistics is all about.Science and Math Books
One of the first considerations is designing appropriate studies. The purpose is to collect data. This process can be done with either surveys or experiments. One of the most popular ways to collect data is the observational study in a way that does not affect them. Surveys have to be worded carefully to get good information.
An experiment is another popular way to gather data. It involves treatments on participants so that clear comparisons can be made. After treatments are made, responses are recorded.
Collecting quality data is a major consideration. It really does no good to get bad data. So, studies and experiments must be planned well. Once you have good data, you can make a good report on what you found. To minimize bias in a survey, you have to be random when selecting participants.
These are numerical values that describe a data set. This is usually done through different types of categories. If the data are categorical they are usually summarized using the number of individuals in each group. This is called the frequency. If you use the percentage of individuals, it is called the relative frequency.
Numerical data represent measurements or counts. You can do more with numerical data. For example, you can get the measure of center and the measure of spread in the data.
Some descriptive statistics are more appropriate than others in certain situations. The average is not always the best measure of the center of a data set.
Charts and Graphs
Data is summarized in a visual way using charts and graphs. These are displays that are organized to give you a big picture of the data.
Some of the basic graphs used for categorical data include pie charts and bar graphs. These break down variables in the data.
For numerical data, a different type of graph is needed. Histograms and box plots are usually used to represent numerical data. These types of graphs make it easier to visualize the data.
A variable is a characteristic that is being counted or measured. A distribution is a listing of the possible values of a variable and how often they occur.
Different types of distributions exist for different types of variables.
If a variable is counting the number of successes in a certain number of trials, it has a binomial distribution.
If the variable takes on values that occur according to a bell-shaped curve, then that variable has a normal distribution.
If the variable is based on sample averages and you have limited data, the t-distribution may be in order.
When it comes to distributions, you need to know how to decide which distribution a particular variable has, how to find probabilities for it, and how to figure out what the long-term average and standard deviation of the outcomes would be.
After data has been collected and described, it is time to do the statistical analysis. There are many types of analyses. You have to choose the appropriate type for your data.
You often see statistics that try to estimate numbers pertaining to an entire population. However, it is just an estimate and most studies only ask a small number of people their questions. What happens is that data is collected on a small sample of people. Sometimes the results they get are very inaccurate.
Sample results vary from sample to sample, and this amount of variability needs to be reported but usually it is not. The statistic used to measure and report the level of precision in someone’s sample result is called the margin of error. The range of the margin of error is called the confidence interval.
One major staple of research studies is called hypothesis testing. A hypothesis test is a technique for using data to validate or invalidate a claim about a population.
The elements about a population that are most often tested are:
- The population mean
- The population proportion
- The difference in two population means or proportions
Hypothesis tests are used in a host of areas that affect your everyday life, such as medical studies, advertisements, and polling data. Often you only hear the conclusions of hypothesis tests but you don’t see the methods used to come to these conclusions.
To perform statistical analyses, researchers use software that depends on formulas. You have to use them correctly, though. Some of the most common mistakes made in conclusions are overstating the results. Until you do a controlled experiment, you can’t make a cause-and-effect conclusion based on relationships you find.
Statistics is about much more than numbers. You need to understand how to make appropriate conclusions from studying data and be smart enough to not believe everything you read.