Now that we have defined the relationship between two variables using ordinary least squares regression, we want to extend that framework to include omre variables. Multiple regression is the procedure used to predict an outcome (i.e. dependent variable) from two or more independent variables. We are going to use the depression data set which includes baseline measures from 200 individuals who participated in psychological intervention (Boswell, n.d.). We can visually represent these data using a 3d scatter plot where our dependent variable is on the z-axis and independent variables are on the x and y-axes.
We now have two slopes (coefficients) to estimate In the OLS regression chapter we defined the slope of the line as the product of the correlation and the ratio of the standard deviations. We might be tempted to estimate the two slopes using:
Although it is possible to estimate the slopes using a closed form solution, it requires knowledge of matrix algebra which is beyond the scope of this book. Instead, to explore multiple regression visually, we will use maximum likelihood estimation. Our goal, like OLS, is to minimize the sum of squared residuals. The following function generalizes the calculation of the sum of squared residuals introduced in the maximum likelihood estimation for an arbitrary number of independent variables.
Show the code
residual_sum_squares <-function(parameters, predictors, outcome) {if(length(parameters) -1!=ncol(predictors)) {stop('Number of parameters does not match the number of predictors.') } predicted <-0for(i in1:ncol(predictors)) { predicted <- predicted + parameters[i] * predictors[i] } predicted <- predicted + parameters[length(parameters)] residuals <- outcome - predicted ss <-sum(residuals^2)return(ss)}
We can then use the optim_save function to find the combination of slopes and intercept that minimizes the sum of squared residuals.