Skip to contents

Introduction

Multiple regressions create regression surfaces, which have as many dimensions as there are explanatory and outcome variables in the multiple regression model. Marginal effects are a direction in this surface and can be visualized by a line in the surface.

This article demonstrates a 3 dimensional visualization of marginal effects with add_marginals() and regression surface with add_3d_surface() for a linear regression with two explanatory variables. This kind of model creates a flat regression surface.

Other types of regression surfaces are shown in Linear models with an interaction term and Generalized linear models.

Setup

The package regress3d is built on the syntax of plotly in R. Both libraries are called in the initial setup.

Linear Regression Surfaces

The basic functionality of the functions add_3d_surface() and add_marginals() are demonstrated in this article. Both functions:

  • default to the plot data specified in the call to plot_ly(), and
  • have an optional ci argument to visualize the confidence intervals, which defaults to TRUE.

Both the marginal effects and the regression surface are shown below, but either can be plotted independently of the other.

Data

The variables in this example are county level demographic and voting measures from 2016.

  • r_shift: the shift towards Donald Trump in 2016, as measured as the difference between the county’s vote for Trump in 2016 and the county’s vote for the Republican presidential nominee, Mitt Romney, in 2012,
  • median_income16: median income, and
  • any_college: the percent of the county that was enrolled in college at some point, regardless of whether they graduated.

The regression is weighted by pop_estimate16, the number of people in a county, to capture the influence of large counties.

Code and graphic

The code starts by specifying a model. We then create a plotly::plot_ly() object using the same variables. Next we layer on the scattercloud, 3D surface, marginals, and labels. Note that while regression notation often uses x1x_1 and x2x_2 to represent the explanatory variables, and yy for the outcome, the plotly command will use xx and yy for the explanatory variables and zz for the outcome variable.

mymodel <- lm(r_shift ~ median_income16 + any_college, 
              data = county_data, weight = pop_estimate16)

plot_ly( data = county_data,
         x = ~median_income16,
         y = ~any_college,
         z = ~r_shift) %>%
  add_3d_surface(model = mymodel)%>%
  add_marginals(model = mymodel)
  • Click and drag on the graphic to rotate it.
  • When needed, you can reset the image to its starting point. Hover over the image and click on the house symbol at the top right above the graphic title. Alternately, you can reload the page.
  • The regression surface is created from the predicted values of the model. It is shown in the flat dark blue surface.
  • The gray curved surfaces above and below the regression surface represent the 95% confidence intervals.
  • The marginal effects and their corresponding confidence intervals are shown in red and orange lines.
  • Remove a marginal effect by clicking once on the marginal effect in the legend. Return it by clicking on the marginal effect again.
  • Isolate a marginal effect and remove all other layers by double clicking on the marginal effect in the legend. Double click again to return all layers.

More on marginal effects

The orange marginal effects in the graphic represent the change in x1x_1, median income, as x2x_2 is held constant. The orange line traverses the range of median income while x2x_2, education, is held constant at its mean 51.2%. Note that although the regression is weighted by county population, the mean value is not weighted. Therefore 51.2% does not represent the mean value of college attendance for the country in 2016.

add_marginals() defaults to holding x2x_2 constant at its mean. The decision over which value to hold x2x_2 constant at is arbitrary in a linear regression. The rate of change of the marginal effect is unchanged for any value of x2x_2. add_marginals() allows the user to specify other values with the argument x2_constant_val. See the documentation for add_marginals().

The same logic holds for the red line representing the marginal effect of x2x_2, education, while x1x_1, income, is held constant.

Improvements with plotly, scales, and regress3d

We can improve this interactive image by adding scatterpoints with the plotly function add_markers(), a hovertemplate for each point, and better labels.

The library scales helps present the numeric values, percentages, and dollars more clearly.

regress3d lets us modify the legend text by changing x1_direction_name and x2_direction_name. The default is “Predicted marginal effect of x1”. It is possible, but not shown here, to adjust which value x2x_2 (x1x_1) is held constant at. See the article on interaction terms for an example.

library(scales)
plot_ly( data = county_data,
         x = ~median_income16,
         y = ~any_college,
         z = ~r_shift) %>%
  add_markers(size = ~pop_estimate16, 
              hovertemplate = ~paste("<b>", county_state,"</b><br>", 
                                     "County population: ",comma(pop_estimate16),"<br><br>",
                                     "County median income: ",dollar(median_income16),"<br>", 
                                     "County college experience: ",percent(any_college, 
                                                                           accuracy = 0.1, 
                                                                           scale = 1),"<br>", 
                                     "Shift to Trump: ",percent(r_shift, 
                                                                accuracy = 0.01, scale = 1),"<br>"),
              showlegend = F, name = "county",
              color = I("black")) %>%
  add_3d_surface(model = mymodel) %>%
  add_marginals(model = mymodel,
                x1_direction_name = "Marginal effect of<br>county median income",
                x2_direction_name = "Marginal effect of<br>county college experience") %>%
  layout( 
    title = "\nCounty shift to Trump in 2016",
    scene = list(xaxis = list(title = 'County median income'),
                 yaxis = list(title = 'County college experience'),
                 zaxis = list(title = 'Shift to Trump')),
    legend = list(font = list(size = 8))
  )

Numeric regression results

This graphic corresponds to the following numerical regression results.

#> 
#> Please cite as:
#>  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
#>  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
% shift to Trump, 2012-2016
County median income ($1,000s) -0.013
(0.007)
County college experience -0.344*
(0.012)
Constant 20.103*
(0.512)
Observations 3,111
Adjusted R2 0.371
Note: * p<0.05