Data Visualization and Random Forest (Live) – May 2025
Curriculum
- 5 Sections
- 7 Lessons
- Lifetime
- Canvas AccessThis course is best taken on Canvas, where you can post discussion questions, take quizzes, and practice in R. The first lesson contains instructions on accessing canvas and the passkey for doing so.1
- Unit 1 Overview
Data Mining
Objectives
- Know what we buy estimates with
- Perfect fitting models
- Spent vs. unspent df
- Complexity vs. fit tradeoff
- Four reasons to do a GLM
- Why GLMs suck at last two purposes
- Understand what data mining is
- Understand what overfitting is
- How to prevent overfitting
- Three strategies for selecting variables
- Linking hypotheses to specific parameters
- Research questions versus hypotheses (and tools appropriate for each)
- Why having more than 3 predictors is a red flag
1 - Unit 2 Overview - Visualizing Data
Learning Objectives
What we’re looking for in bivariate plot
Problems to look out for: curvilinear relationships
What binning is
Rules for identifying which variable to put on the x axis
Identifying main effects of paneled variables
Purpose of added variable plots (AVPs)
Weakness of AVPs
Identifying interactions from visuals
When should you NOT use AVPs
Visualizing 3+ variables
What are you looking for in multi panel plots
The purpose of marginal plots
How to interpret marginal plots
Conceptually what a three-way interaction isBuilding models from visual vs. building visuals from models
How to build models from visuals
Why we specifically look for interactions and nonlinear effects
What added variable plots (AVPs) are doing
AVPs as approximations
Partial residual plots vs. AVPs
PRPs and showing what we failed to fit
Dustin’s extension to PRPs
How to do PRPs in flexplot
Adding back fit to the residuals versus not
What you can visualize with PRPs
Residual dependence plots vs. partial residual plots
Using PRPs to detect three-way interactions
Reducing bin size for multi panel plotsWhy reporting with tables is stupid
Why we prefer visuals for reporting results
What visual partitions are
Three rules for visual partitions
Biggest threat: failing to miss something you could have modeled
5 Step strategy for identifying visual partitions4 - Unit 3 Overview
Random Forest and LMs
Objectives
- General strategy for RF
- How to model RF in R
- How to Compute importance/OOB
- How to visualize RF in flexplot
- Different uses of RF
- Building LMs from RF visuals
- Two methods for variable selection using RF
- Basics of VSURF
- Three steps of VSURF
- Pros/Cons of VSURF
- Four steps of my approach
- Pros/Cons of my approach
2 - Unit 4 Overview
Objectives
- General strategy for RF
- How to model RF in R
- How to Compute importance/OOB
- How to visualize RF in flexplot
- Different uses of RF
- What visual partitions are
- Rules for plotting visual partitions
- The steps for identifying visual partitions
3