
Linear Regression Surfaces with Marginal Effects in 3D
Source:vignettes/articles/linear_models_3d.Rmd
linear_models_3d.Rmd
Introduction
Multiple regressions create regression surfaces, which have as many dimensions as there are explanatory and outcome variables in the multiple regression model. Marginal effects are a direction in this surface and can be visualized by a line in the surface.
This article demonstrates a 3 dimensional visualization of marginal
effects with add_marginals()
and regression surface with
add_3d_surface()
for a linear regression with two
explanatory variables. This kind of model creates a flat regression
surface.
Other types of regression surfaces are shown in Linear models with an interaction term and Generalized linear models.
Setup
The package regress3d is built on the syntax of plotly in R. Both libraries are called in the initial setup.
Linear Regression Surfaces
The basic functionality of the functions
add_3d_surface()
and add_marginals()
are
demonstrated in this article. Both functions:
- default to the plot data specified in the call to
plot_ly()
, and - have an optional
ci
argument to visualize the confidence intervals, which defaults toTRUE
.
Both the marginal effects and the regression surface are shown below, but either can be plotted independently of the other.
Data
The variables in this example are county level demographic and voting measures from 2016.
-
r_shift
: the shift towards Donald Trump in 2016, as measured as the difference between the county’s vote for Trump in 2016 and the county’s vote for the Republican presidential nominee, Mitt Romney, in 2012, -
median_income16
: median income, and -
any_college
: the percent of the county that was enrolled in college at some point, regardless of whether they graduated.
The regression is weighted by pop_estimate16
, the number
of people in a county, to capture the influence of large counties.
Code and graphic
The code starts by specifying a model. We then create a
plotly::plot_ly()
object using the same variables. Next we
layer on the scattercloud, 3D surface, marginals, and labels. Note that
while regression notation often uses
and
to represent the explanatory variables, and
for the outcome, the plotly command will use
and
for the explanatory variables and
for the outcome variable.
mymodel <- lm(r_shift ~ median_income16 + any_college,
data = county_data, weight = pop_estimate16)
plot_ly( data = county_data,
x = ~median_income16,
y = ~any_college,
z = ~r_shift) %>%
add_3d_surface(model = mymodel)%>%
add_marginals(model = mymodel)
- Click and drag on the graphic to rotate it.
- When needed, you can reset the image to its starting point. Hover over the image and click on the house symbol at the top right above the graphic title. Alternately, you can reload the page.
- The regression surface is created from the predicted values of the model. It is shown in the flat dark blue surface.
- The gray curved surfaces above and below the regression surface represent the 95% confidence intervals.
- The marginal effects and their corresponding confidence intervals are shown in red and orange lines.
- Remove a marginal effect by clicking once on the marginal effect in the legend. Return it by clicking on the marginal effect again.
- Isolate a marginal effect and remove all other layers by double clicking on the marginal effect in the legend. Double click again to return all layers.
More on marginal effects
The orange marginal effects in the graphic represent the change in , median income, as is held constant. The orange line traverses the range of median income while , education, is held constant at its mean 51.2%. Note that although the regression is weighted by county population, the mean value is not weighted. Therefore 51.2% does not represent the mean value of college attendance for the country in 2016.
add_marginals()
defaults to holding
constant at its mean. The decision over which value to hold
constant at is arbitrary in a linear regression. The rate of change of
the marginal effect is unchanged for any value of
.
add_marginals()
allows the user to specify other values
with the argument x2_constant_val
. See the documentation
for add_marginals()
.
The same logic holds for the red line representing the marginal effect of , education, while , income, is held constant.
Improvements with plotly, scales, and regress3d
We can improve this interactive image by adding scatterpoints with
the plotly function add_markers()
, a
hovertemplate for each point, and better labels.
The library scales helps present the numeric values, percentages, and dollars more clearly.
regress3d lets us modify the legend text by changing
x1_direction_name
and x2_direction_name
. The
default is “Predicted marginal effect of x1”. It is possible, but not
shown here, to adjust which value
()
is held constant at. See the article
on interaction terms for an example.
library(scales)
plot_ly( data = county_data,
x = ~median_income16,
y = ~any_college,
z = ~r_shift) %>%
add_markers(size = ~pop_estimate16,
hovertemplate = ~paste("<b>", county_state,"</b><br>",
"County population: ",comma(pop_estimate16),"<br><br>",
"County median income: ",dollar(median_income16),"<br>",
"County college experience: ",percent(any_college,
accuracy = 0.1,
scale = 1),"<br>",
"Shift to Trump: ",percent(r_shift,
accuracy = 0.01, scale = 1),"<br>"),
showlegend = F, name = "county",
color = I("black")) %>%
add_3d_surface(model = mymodel) %>%
add_marginals(model = mymodel,
x1_direction_name = "Marginal effect of<br>county median income",
x2_direction_name = "Marginal effect of<br>county college experience") %>%
layout(
title = "\nCounty shift to Trump in 2016",
scene = list(xaxis = list(title = 'County median income'),
yaxis = list(title = 'County college experience'),
zaxis = list(title = 'Shift to Trump')),
legend = list(font = list(size = 8))
)
Numeric regression results
This graphic corresponds to the following numerical regression results.
#>
#> Please cite as:
#> Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
#> R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
% shift to Trump, 2012-2016 | |
County median income ($1,000s) | -0.013 |
(0.007) | |
County college experience | -0.344* |
(0.012) | |
Constant | 20.103* |
(0.512) | |
Observations | 3,111 |
Adjusted R2 | 0.371 |
Note: | * p<0.05 |