statistical thinking

rational and critical processing of data to draw reliable conclusions

statistical inference

estimate or prediction about a population based on data collected from a sample

resource constraints

statistical data collection is costly, limiting available size, scope, and quality of data

bell curve

normal distribution

deviation

difference between the value of a data point and the mean value for the set

median

the middle number in a sorted list of numbers

weighted mean

calculation of the mean where data points are not treated equally

range

measure of spread (variation) in a data set

mode

a hump (local high point) in the shape of a variable's distribution

distribution

states the relative frequency of each possible value of a variable

standard deviation

a commonly used measure of variability in a data set: typical deviation from the mean

outlier

a number (data point) that strongly deviates from the rest (z-score in excess of 3 or -3)

percentile

a ranking method based on breaking up a data set into 100 equal parts

relative frequency

how often something occurs (frequency per number of observations)

correlation coefficient

a statistic (r) expressing the strength and direction of the relationship between two variables (values: -1 to 1)

statistical bias

a systematic error affecting your results in some subset of sample or data

non-response bias

where certain subgroups are under-represented because of low response and participation rates

scatter plot

a graphed cluster of dots each representing values of two variables

bar graph

graph with vertical bars for each category

pie chart

graph representing values as sections of a circle

discrete values

where only several values are possible

continuous values

where many values are possible

linear regression

finding the line of best fit to a set of points

least squares

the most common method for linear regression

Type 1 error

false positive: a correlation assumed where absent

Type 2 error

false negative: correlation ignored though real

z-score

how many standard deviations a data value is from the mean

standardizing

converting values to remove specific units and make sets comparable

context

information about how, where, where etc. the data was collected

law of large numbers

as sample size increases, relative frequencies approach actual probability value

blinding

withholding of information whether or not a participant is allocated to the study group or the control group

p-value

how likely is it that your data are random noise that looks like a pattern

significance

the p-value is less than or equal to alpha

alpha level

significance level: how likely is it that your data shows something that's actually real

confidence interval

how likely it is that a parameter value in the sample reflects actual parameter value in the population