Introduction to Simplistics (Self-Guided)

- Self Guided Classes
- 77 (Registered)
Are you ready to be a stats ninja? This course is your beginning, your “white belt” training, if you will. By the end of this course, you will know what it takes to understand your data, to understand your model, to make inferences, and to estimate uncertainty.
We’ll begin with quick (but important) discussions of data ethics and measurement before tackling the foundation of all statistics: models. First, we’ll first speak of univariate models, including how to visualize them and how to compute estimates, such as means, medians, standard deviations, etc. As we begin with univariate models, you’ll learn to speak (well, type, technically) the language of flexplot, which will enable you to visualize both the data and the model, and allow you to judge how well the data fit your model.
You’ll then learn about bivariate models, both the old names (t-tests, ANOVA, regression) and the new (linear model, linear model, and….um…linear model). We’ll begin by visualizing these models with scatterplots and beeswarm plots, give you insights in what problems to look out for, and teach you how to interpret estimates from these models (e.g., Cohen’s d, correlation coefficients, mean differences, slopes, intercepts).
Before moving on to multivariate models, we’ll take a quick statistical siesta and learn how to evaluate models. How do we know whether they are good models? You’ll learn about linear models assumptions (linearity, normality, etc.), how to evaluate them visually (e.g., with SL plots or residual dependence plots), and why they matter. Then we get to the fun of multivariate models, where you’ll learn both the old names (ANCOVA, factorial ANOVA, multiple regression) and the new names (linear models, linear models, and ….. do I really have to say it?).
Finally, we’ll save the most complicated for last: probability. We’ll learn the basics of significance testing, Bayesian inference, and (my personal favorite) model comparisons. We’ll use each of these to make decisions about data and to estimate uncertainty.
Course Structure
This class is broken down into twelve “units”:
• Ethics
• Measurement
• Univariate Models
• Bivariate Models
• Diagnostics
• Linear Models
• Multivariate Linear Models
• Conditioning
• Interaction Effects
• Probability 1
• Probability 2
• Model Comparisons
Within each unit, you’ll see a unit introduction that itemizes the learning objectives, followed by a series of videos (mostly from my YouTube channel), textbook readings, optional scholarly articles, and your assignments. For each unit you’ll have at least two assignments:
- A unit discussion board. Here you will find questions students have asked in the past, and have the opportunity to posts questions yourself.
- A weekly quiz. These quizzes test your understanding of the learning objectives. You are welcome (and encouraged) to take these quizzes as many times as you like, both to perfect your score (which really doesn’t matter, except for your own gratification) and to receive more questions. (I aim to have a pool that is much larger than the actual quizzes you take so you’ll have different quizzes every time).
For some of the weeks, you’ll also have a “practice quiz,” which is much more R-focused, allowing you to analyze actual datasets and answer questions about those datasets.
There is also a midterm and a final for this course. Again, there’s no penalty for scoring poorly. It’s simply an opportunity to self-evaluate your understanding of course content.
Curriculum
- 13 Sections
- 37 Lessons
- Lifetime
- Canvas Course link
This course is best taken on Canvas, where you can post discussion questions, take quizzes, and practice in R. The first lesson contains instructions on accessing canvas and the passkey for doing so.
1 - Introduction to Unit 1 - Ethics
Welcome to Unit 1! Here's the learning objectives. If you know these, you're in great shape!
- Understand how my approach differs from the traditional approach
- Understand the advantages of the modeling approach
- Know the key players in the replication crisis and the role they played
- Be able to differentiate between CDA, rough CDA, p-hacking, HARKing, etc.
- Understand the three dimensions of data analyst intention and what each dimension means
- Understand how to conduct data analysis ethically
- Understand the five grassroots values
- Know the difference between the status quo values and the grassroots values
- Understand how to make changes to the scientific culture
4 - Introduction to Unit 2 - Measurement
Learning Objectives
- Know what a construct is and be able to identify a construct
- Understand what operational definitions are
- How psychologists measure constructs
- Know what reliability means and how to measure it
- Know the following terms:
- Internal consistency
- Test-retest reliability
- Interrater reliability
- Know what validity means
- What measurement has to do with statistics
- Scales of measurement and what really matters
- Why variable types actually matters (and why it doesn't)
Notes about Readings
- The textbook chapter is required reading.
- The McNeish article is fantastic, though it might be a bit technical. It makes a very specific point (that coefficient alpha is overused and abused), but it's an important specific point. It might be good for comps preparation
- The Messick article is considered a seminal article. It's a goldmine of higher-level thoughts on validity. But, Messick is a terrible writer. This would also be good information to embed in your thinking during comps.
2 - Introduction to Unit 3 - Univariate Models
- How to interpret a histogram and barplot
- Why we should avoid using line graphs for univariate data
- Why we should avoid pie charts
- Positive vs negative skew
- Know what an ideal histogram/barchart look like
- Be able to recognize the following from a histogram:
- Outliers
- Skewness
- Bimodality
- Miscoded values
- Be able to recognize the following from a barchart:
- Missing data
- Too many labels
- Unknown labels
- Group imbalances
- Mislabeled Categories
- Limitations of graphics
- Advantages of Estimates
- Meaning of Central tendency
- Meaning of variability
4 - Introduction to Unit 4 - Bivariate Visuals
Learning Objectives
Scatterplots
- interpret scatterplots
- confirmation bias, graphics, and statistics
- lowess curves versus regression lines
- transparency and sampling
- identify
- nonlinearity
- bivariate outliers
- influential datapoints
- high leverage datapoints
Plotting Categorical Data
- interpret bar plots, boxplots, violin plots, beeswarm plots
- know why barplots suck
- know what jittering means
- know the weaknesses of barplots, boxplots, violin plots, and beeswarm plots
- how to recognize skewness and group imbalances
Bivariate Estimates
- Know why we need visuals AND estimates
- Know what a conditional mean is
- Know what a conditional variance is
- Know how to interpret the following
- intercept
- slope
- correlation coefficient (r)
- group mean difference
- cohen’s d
- relationship between mean difference and a slope
- “benchmarks” for small/medium/large correlation coefficients and cohen’s ds
- How to use slope and intercept to predict someone’s score
R
- How to plot a scatterplot
- How to plot a beeswarm plot
- How to compute estimates in R
- How to fit a linear model in R
5 - Introduction to Unit 5 - Computing Probabilities
Learning Objectives
- Understand how to compute probabilities from a finite set
- Finite versus infinite samples
- Basic idea of sampling
- Know what a population is
- Know what a probability density function (PDF) is
- Know how a PDF can be used to estimate probabilities
- Understand a prior
- Understand a posterior
- Understand the role of subjective beliefs
- Understand the Bayesian approach to estimating the population
5 - Introduction to Unit 6 - NHST
Learning Objective
- Know what a standard error is
- Know a sampling distribution
- Difference between distribution of raw scores versus distribution of means
- Difference between point estimate and confidence interval
- What a confidence interval tells you
- What a prediction interval tells you
- What a confidence interval can be used for
- The correct interpretation of a CI
- Understand the null and alternative distributions
- Understand how central limit theorem is used for hypothesis testing
- Understand how to increase power
- Know the difference between Type I/II errors
- Know one-tailed versus two-tailed tests
- How ethics relates to sampling theory/significance testing
- How p-hacking inflates p-values
- Required conditions to correctly interpret a p-value
- Relationship between strict CDA and p-values
- Alternatives to NHST
6 - Unit 7 Introduction - Diagnostics
Learning Objective
- components of a model (fit and error/residual)
- know what a residual tells you about your model
- the four critical assumptions of linear models
- normality of variables versus normality of residuals
- homo (or hetero) skedasticity — what does it mean?
- what does independence mean?
- why do we assess assumptions?
- how to interpret a residual dependence plot (and what it tells you)
- how to interpret an SL plot (and what it tells you)
- how to generate diagnostic plots in R
4 - Introduction to Week 8 - The Linear Model
Learning Objectives
- Understand what a model is
- Understand the structure of the LM
- Understand what “useful” means in statistical models
- How to assess fit in LMs
- Understand why the LM is important
- Old name for numeric LMs
- How to visualize numeric on numeric
- How to interpret diagnostic plots
- The mathematical equation for numeric LMs
- What estimates we’re interested in
- The old name for categorical LMs
- How to visualize categorical on numeric
- Diagnostics for categorical on numeric
- Know what slope/intercept represent with categorical predictors
- What it means to “zero-fy” your data
- The old name for 3+ categorical LMs
- How to visualize categorical on numeric
- How to visualize diagnostics for categorical LMs
- Mathematical equation for 3+ category LMs
- Know what each estimate represents
- How you would run an ANOVA/t-test/regression using linear models in R
5 - Introduction to Unit 9 - Multivariate LMs
Learning Objectives
- Three reasons we use multivariate analyses (study interaction effects, control for things, and improve prediction)
- Ways to plot 3+ dimensional data
- Know what ghost lines are
- outcome variable versus control variable versus interest variable
- how to control graphics with flexplot
- added variable plots
- What three things are we looking for when looking at paneled plots? (Trends, nonlinearity, and nonparallel lines)
- What added variable plots tell us
- When are added variable plots justified?
4 - Introduction to Unit 10 - Conditioning
Learning Objectives
- Understand the three reasons we’d want to use multivariate LMs
- Understand multicollinearity and why it’s a problem
- Understand what conditioning is conceptually
- Understand how residualizing relates to conditioning
- The danger in doing multivariate GLMs with p-values
- How to avoid the p-value danger
- Map “control” language into parameter estimates
- How to visualize conditioning relationships
- What R code to use to visualize conditional relationships
2 - Understand the three reasons we’d want to use multivariate LMs
- Unit 11 Introduction - Interaction Effects
Learning Objectives
- Understand what an interaction is (both conceptually and mathematically)
- Interpreting main effects when interactions are present
- What language maps into interaction
- how to visualize multivariate relationships with two variables
- visually identify interactions
- Why Flexplot is better than a simple slopes analysis
- What assumption does an ANCOVA make about interactions?
- Why it's tough to estimate effect sizes for interaction effect
- The "old name" for models with interaction effects
- How to fit interaction effects in R
3 - Unit 12 Introduction - Model Comparisons
Learning Objectives
- The fit versus complexity tradeoff
- Understand what overfitting is
- Model comparisons as tools versus procedures
- Model comparison vs. NHST approach
- What are we looking for when seeing a compare.fits plot
- Difference between a nested and non-nested model
- What does it mean when AIC/BIC/BF/etc disagree?
- Three reasons p-values are okay in model comparisons
- Metrics we use to compare models
- Know which metrics we can use for non-nested models
- How to perform model comparisons in R
- The foundation of all statistics
- Formulating research questions into simple model comparisons
- Four steps to converting a research question into a model comparison
- Model comparisons as a tool versus a procedure
4