Level 226 Level 228
Level 227

Graphing Data


80 words 0 ignored

Ready to learn       Ready to review

Ignore words

Check the boxes below to ignore/unignore words, then click save at the bottom. Ignored words will never appear in any learning session.

All None

Ignore?
statistics
The science of collecting, organizing, analyzing, and interpreting data in order to make decisions
Types of Observational Studies
Simulation, survey, census, experiment, sampling
Descriptive Statistics
The methods of organizing and summarizing data
Inferential Statistics
Involves making generalizations from a sample to a population
Population (N)
The entire collection of individuals or objects about which information is desired
Sample (n)
A subset of the population, selected for study in some prescribed manner
Variable
an alphabetic character representing a number, called the value, which is either arbitrary or not fully specified or unknown. It is usually a letter like x or y.
data
A collection of information gathered for a purpose. Data may be in the form of either words or numbers.
Not counting/measuring
Categorical (qualitative) Data
Numerical (quantitative) Data
Counting (discrete) and measuring (continuous)
Discrete Data
A list-able set of values; usually counts of items
Continuous Data
Data can take on any values in the domain of the variable; usually measurements of something
Numerical and discrete
What type of data is the income of adults in your city?
Categorical
What type of data is the color of M&M's?
Numerical and continuous
What type of data is the birth weights of female babies born in a particular hospital?
Bar graphs
What graphs are appropriate for categorical data?
Bar graph
Bars do not touch; categorical data is typically on the horizontal axis; to describe: comment on which occurred the most often or least often
Measures of Center
Mean, median, mode
Mean
the sum of all the values divided by the number of values
Median
A segment or Ray that joins a vertex to the midpoint of the opposite side
Mode
the most common value
Sample mean (average)
μ
Population mean (average)
Resistant
Not significantly affected by extreme values
Is either the mean or the median resistant?
The median in resistant; the mean is not resistant
Measures of Spread (Variation)
Range, interquartile range (IQR), variance, and standard deviation (σ)
Outliers, gaps, clusters
What do unusual features include?
Mean is bigger than the median
Is the mean or median bigger in right-skewed data?
Median is bigger than the mean
Is the mean or median bigger in left-skewed data?
parameter
Number that describes a population
p
Population proportion
σ^2
Population variance
Statistic
A number that describes a sample
When should you report the mean and standard deviation?
When the graph is symmetrical and there are no outliers in the data
When should you report the median and IQR?
When the graph is skewed right/left OR the data has outliers
Population Variance
σ^2: average of the squared deviations
Population Standard Deviation
The square root of variance
Standard Deviation
Measure of the average distance from the mean
Yes
Is the IQR resistant?
What happens to a data set if we add a number, x?
The measures of center are increased by x while the measures of spread are not changed
Density
Can be created by smoothing histograms; ALWAYS on or above the horizontal axis; has an area of exactly ONE
Z score
Standardized score
Normal Curve
Bell-shaped, symmetrical curve; as the standard deviation increases, the curve flattens and spreads
Empirical Rule
Approximately 68% of the data are within 1 σ of the mean; approximately 95% of the data are within 2 σ of the mean
When can the empirical rule be used?
Only when the graph is a normal curve
Center, shape, spread, unusual features, context
What should you use to describe a box plot?
Continuous Random Variables
A variable that may take on an infinite number of potential outcomes.
Uniform Distribution
f(x) =
Standard Normal Density Curves
Do not show actual values; written in terms of z scores; always has a mean of 0 and a SD of 1
The units found in the problem
What units are always used for SD?
Normal PDF
Graphing only
Normal CDF
Will find the probability of an area from the lower bound to the upper bound
InvNormal
Will find the z-score for probability
Normal PDF (X)
Standard normal curve
A scatterplot
What type of graph should be used for bivariate, numerical data?
Sample Correlation Coefficient (r)
A quantitative assessment (measurement) of the STRENGTH & DIRECTION of the LINEAR relationship between bivariate, quantitative data
Properties of r
Legitimate values include: [-1,1]; 0 implies no correlation
What does r tell you?
It is a measure of the extent to which x & y are linearly related
Least Squares Regression Line (LSRL)
Line of best fit; minimizes the sum of the squares of the deviations from the line
X Variable
Independent or explanatory variable
Y Variable
Dependent or response variable
What is the interpretation for slope?
For each unit increase in x, we predict an approximate mean increase/decrease of b in y
Extrapolation
Predicting x and y values by using data outside the original data set
The LSRL and r and both non-resistant
Are the correlation coefficient (r) and the LSRL resistant or non-resistant?
Residuals
The vertical deviation between the observations and the LSRL
The sum is always zero
What is the sum of the residuals for the LSRL?
Observed - expected (y-y^)
How do you find the residual?
Residual Plot
A scatterplot of the (x, residual) pairs
Yes
Can residuals be graphed against other statistics besides x?
What is the purpose of a residual plot?
To tell if a linear association exists between the x & y variables
What happens if no pattern exists in the residual plot?
It is called random scatter and the relationship is linear
Negative
Counting from the decimal point to the left makes the exponent ___________________________________
Positive
Counting from the decimal point to the right makes the exponent ___________________________________________
Variation
Difference of values; spread
Variance
Measures of spread
Coefficient of Determination (r^2)
Remains the same no matter which variable is labled x; just because we know r^2 doesn't mean we know the sign of r
Interpretation of r^2
Approximately r^2% of the variation in y can be explained by the linear relationship of x & y
outlier
a data value that is either much greater or much less than the median
Influential Point
A point that influences where the LSRL is located; if removed, it will significantly change the slope of the LSRL
No
Is the coefficient of determination resistant?