Week 7 Linear Regression Exercises

Week 7 Linear Regression Exercises
Simple Regression
Research Question: Does the number of hours worked per week (workweek) predict family income (income)?
Using Polit2SetA data set, run a simple regression using Family Income (income) as the outcome variable (Y) and Number of Hours Worked per Week (workweek) as the independent variable (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.
Follow these steps when using SPSS:
- Open Polit2SetA data set.
- Click on Analyze, then click on Regression, then Linear.
- Move the dependent variable (income) in the box labeled “Dependent” by clicking the arrow button. The dependent variable is a continuous variable.
- Move the independent variable (workweek) into the box labeled “Independent.”
- Click on the Statisticsbutton (right side of box) and click on Descriptives, Estimates, Confidence Interval (should be 95%), and Model Fit, then click on Continue.
- Click on OK.
Assignment: Through analysis of the SPSS output, answer the following questions.
- What is the total sample size?
- What is the mean income and mean number of hours worked?
- What is the correlation coefficient between the outcome and predictor variables? Is it significant? How would you describe the strength and direction of the relationship?
- What it the value of R squared (coefficient of determination)? Interpret the value.
- Interpret the standard error of the estimate? What information does this value provide to the researcher?
- The model fit is determined by the ANOVA table results (F statistic = 37.226, 1,376 degrees of freedom, and the p value is .001). Based on these results, does the model fit the data? Briefly explain. (Hint: A significant finding indicates good model fit.)
- Based on the coefficients, what is the value of the y-intercept (point at which the line of best fit crosses the y-axis)?
- Based on the output, write out the regression equation for predicting family income.
- Using the regression equation, what is the predicted monthly family income for women working 35 hours per week?
- Using the regression equation, what is the predicted monthly family income for women working 20 hours per week?
Multiple Regression
Assignment: In this assignment we are trying to predict CES-D score (depression) in women. The research question is: How well do age, educational attainment, employment, abuse, and poor health predict depression?
Using Polit2SetC data set, run a multiple regression using CES-D Score (cesd) as the outcome variable (Y) and respondent’s age (age), educational attainment (educatn), currently employed (worknow), number, types of abuse (nabuse), and poor health (poorhlth) as the independent variables (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.
Follow these steps when using SPSS:
1. Open Polit2SetC data set.
2. Click on Analyze,then click on Regression, then Linear.
3. Move the dependent variable, CES-D Score (cesd) into the box labeled “Dependent” by clicking on the arrow button. The dependent variable is a continuous variable.
4. Move the independent variables (age, educatn, worknow, and poorhlth) into the box labeled “Independent.” This is the first block of variables to be entered into the analysis (block 1 of 1). Click on the bottom (top right of independent box), marked “Next”; this will give you another box to enter the next block of indepdent variables (block 2 of 2). Here you are to enter (nabuse). Note: Be sure the Method box states “Enter”.
5. Click on the Statistics button (right side of box) and click on Descriptives, Estimates, Confidence Interval (should be 95%), R square change, and Model Fit, and then click on Continue.
6. Click on OK.
Assignment: (When answering all questions, use the data on the coefficients panel from Model 2).
- Analyze the data from the SPSS output and write a paragraph summarizing the findings. (Use the example in the SPSS output file as a guide for your write-up.)
- Which of the predictors were significant predictors in the model?
- Which of the predictors was the most relevant predictor in the model?
- Interpret the unstandardized coefficents for educational attainment and poor health.
- If you wanted to predict a woman’s current CES-D score based on the analysis, what would the unstandardized regression equation be? Include unstandardized coefficients in the equation.
Week 7 - Linear Regression Exercises SPSS Output
Simple Linear Regression SPSS Output
Descriptive Statistics |
|||
Mean |
Std. Deviation |
N |
|
Family income prior month, |
$1,485.49 |
$950.496 |
378 |
all sources |
|||
Hours worked per week in |
33.52 |
12.359 |
378 |
current job |
Correlations
Family income |
Hours worked |
|||||||
prior month, all |
per week in |
|||||||
sources |
current job |
|||||||
Pearson Correlation |
Family income prior month, |
1.000 |
.300 |
|||||
all sources |
||||||||
Hours worked per week in |
.300 |
1.000 |
||||||
current job |
||||||||
Sig. (1-tailed) |
Family income prior month, |
. |
.000 |
|||||
all sources |
||||||||
Hours worked per week in |
.000 |
. |
||||||
current job |
||||||||
N |
Family income prior month, |
378 |
378 |
|||||
all sources |
||||||||
Hours worked per week in |
378 |
378 |
||||||
current job |
||||||||
Model Summary |
||||||||
Model |
Adjusted R |
Std. Error of the |
||||||
R |
R Square |
Square |
Estimate |
|||||
1 |
.300a |
.090 |
.088 |
$907.877 |
||||
a. Predictors: (Constant), Hours worked per week in current job
ANOVAb
Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
||
1 |
Regression |
3.068E7 |
1 |
3.068E7 |
37.226 |
.000a |
|
Residual |
3.099E8 |
376 |
824241.002 |
||||
Total |
3.406E8 |
377 |
a.Predictors: (Constant), Hours worked per week in current job
b.Dependent Variable: Family income prior month, all sources
Coefficientsa
Model |
Unstandardized |
Standardized |
95.0% Confidence Interval |
|||||
Coefficients |
Coefficients |
for B |
||||||
B |
Std. Error |
Beta |
t |
Sig. |
Lower Bound |
Upper Bound |
||
1 |
(Constant) |
711.651 |
135.155 |
5.265 |
.000 |
445.896 |
977.405 |
|
Hours worked per week |
23.083 |
3.783 |
.300 |
6.101 |
.000 |
15.644 |
30.523 |
|
in current job |
a. Dependent Variable: Family income prior month, all sources
Part II: Multiple Regression SPSS Output
This part is going to begin with an example that has been interpreted for you. Analyze the output provided and read the interpretation of the data so that you will have an understanding of what you will do for the multiple regression assignment.
Descriptive Statistics
Mean |
Std. Deviation |
N |
|
CES-D Score |
18.5231 |
11.90747 |
156 |
CESD Score, Wave 1 |
17.6987 |
11.40935 |
156 |
Number types of abuse |
.83 |
1.203 |
156 |
Correlations
CESD Score, |
Number types |
|||
CES-D Score |
Wave 1 |
of abuse |
||
Pearson Correlation |
CES-D Score |
1.000 |
.412 |
.347 |
CESD Score, Wave 1 |
.412 |
1.000 |
.187 |
|
Number types of abuse |
.347 |
.187 |
1.000 |
|
Sig. (1-tailed) |
CES-D Score |
. |
.000 |
.000 |
CESD Score, Wave 1 |
.000 |
. |
.010 |
|
Number types of abuse |
.000 |
.010 |
. |
N |
CES-D Score |
156 |
156 |
156 |
CESD Score, Wave 1 |
156 |
156 |
156 |
|
Number types of abuse |
156 |
156 |
156 |
Model Summary
Model |
Change Statistics |
|||||||||
Adjusted R |
Std. Error of |
R Square |
||||||||
R |
R Square |
Square |
the Estimate |
Change |
F Change |
df1 |
df2 |
Sig. F Change |
||
1 |
.412a |
.170 |
.164 |
10.88446 |
.170 |
31.506 |
1 |
154 |
.000 |
|
2 |
.496b |
.246 |
.236 |
10.41016 |
.076 |
15.352 |
1 |
153 |
.000 |
|
a.Predictors: (Constant), CESD Score, Wave 1
b.Predictors: (Constant), CESD Score, Wave 1, Number types of abuse
ANOVAc
Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
||
1 |
Regression |
3732.507 |
1 |
3732.507 |
31.506 |
.000a |
|
Residual |
18244.613 |
154 |
118.472 |
||||
Total |
21977.120 |
155 |
|||||
2 |
Regression |
5396.278 |
2 |
2698.139 |
24.897 |
.000b |
|
Residual |
16580.842 |
153 |
108.372 |
||||
Total |
21977.120 |
155 |
a.Predictors: (Constant), CESD Score, Wave 1
b.Predictors: (Constant), CESD Score, Wave 1, Number types of abuse
c. Dependent Variable: CES-D Score
Coefficientsa
Model |
Unstandardized |
Standardized |
95.0% Confidence Interval for |
||||||||||
Coefficients |
Coefficients |
B |
|||||||||||
B |
Std. Error |
Beta |
t |
Sig. |
Lower Bound |
Upper Bound |
|||||||
1 |
(Constant) |
10.911 |
1.612 |
6.768 |
.000 |
7.726 |
14.095 |
||||||
CESD Score, Wave 1 |
.430 |
.077 |
.412 |
5.613 |
.000 |
.279 |
.581 |
||||||
2 |
(Constant) |
9.584 |
1.579 |
6.071 |
.000 |
6.465 |
12.702 |
||||||
CESD Score, Wave 1 |
.376 |
.075 |
.360 |
5.035 |
.000 |
.228 |
.523 |
||||||
Number types of |
2.772 |
.707 |
.280 |
3.918 |
.000 |
1.374 |
4.170 |
||||||
abuse |
a. Dependent Variable: CES-D Score
In the regression example, we were statistically controlling for women’s level of depression 2 years earlier and attempting to determine if recent abuse experiences affected current levels of depression, earlier depression held constant.
The correlation between CES-D scores in the two waves of data collection was moderate and positive, r = .412. You can see this value in the Model Summary panel—the value of R in the first step is the bivariate correlation (i.e., r) between the two CES-D scores. Yes, R2 was statistically significant at p < .001in both steps of the regression analysis, as shown in the ANOVA panel. Using R2 increased from .170 in the first model to .246 when the abuse variable was added. The R2 change (increase) of .076 (7.6%) was significant at p<.001, as shown in the Model Summary panel, under change Statistics. This indicates that even when prior levels of depression were held constant, recent abuse accounted for a significant amount of variation in current depression scores. The availability of longitudinal data does not “prove” that abuse experiences affected the women’s level of depression, but it does offer greater supportive evidence than cross-sectional data. If we wanted to predict current CES-D scores, using prior CES-D scores and abuse experiences as predictors, the unstandardized regression equation would be as follows: Y’= 9.584 + .376 (cesdwav1) + 2.772 (nabuse). This information comes from the panel labeled Coefficients.
In terms of the independent variables there are two coefficients on the panel labeled coefficients. The first is the unstandardized coefficients (b-values) which represent the individual contribution of each predictor to the model. The b-value for number types of abuse (2.772) tells us about the relationship between CES-D score (Dependent variable) and number type of abuse (independent variable). These values are used when making predictions and they tell us to what degree the independent variable affects the outcome when the effects of all other variables in the equation are held constant. For example, the interpretation of number, types of abuse is as follows: For each unit increase in the number, types of abuse, the CES-D score (depression) increases by 2.772 units. The increase is dependent on the units that the variable is measured in. So, for each additional type of abuse reported the CES-D depression score will increase by 2.772 points. Always check the value in the significance column to determine if the variables are making a significant contribution to the model.
The second coefficient reported is the standardized Beta coefficient. The standardized coefficient tells us the number of standard deviations that the dependent variable will change as a result of one standard deviation change in the independent variable. The standardized coefficient is typically used
to permit the researcher to understand which of the independent variables is most important in explaining the dependent variable. In the above example, the CES-D score, Wave 1 has a Beta coefficient of .360 and the number, types of abuse has a Beta coefficient of .280. This indicates that the CES-D score, Wave 1 is the most significant predictor in the model and makes the strongest unique contribution to explaining the dependent variable. Note: When you are determining the most significant predictor ignore the negative sign if one exists. So, a predictor with a Beta of -.96 is stronger than a Beta of .55.
SPSS Output for Multiple Regression Assignment
Descriptive Statistics
Mean |
Std. Deviation |
N |
|
CES-D Score |
18.5815 |
11.78965 |
939 |
Respondent's age at time of |
36.54749 |
6.234511 |
939 |
interview |
|||
Educational attainment |
1.57 |
.584 |
939 |
Currently employed? |
.45 |
.498 |
939 |
Poor health self rating |
.06 |
.247 |
939 |
Number types of abuse |
.85 |
1.160 |
939 |
Correlations
Respondent' |
Number |
||||||
CES-D |
s age at time |
Educational |
Currently |
Poor health |
types of |
||
Score |
of interview |
attainment |
employed? |
self rating |
abuse |
||
Pearson |
CES-D Score |
1.000 |
.061 |
-.155 |
-.220 |
.270 |
.370 |
Correlation |
Respondent's age at |
.061 |
1.000 |
.065 |
-.077 |
.140 |
-.020 |
time of interview |
|||||||
Educational attainment |
-.155 |
.065 |
1.000 |
.060 |
-.074 |
-.026 |
|
Currently employed? |
-.220 |
-.077 |
.060 |
1.000 |
-.162 |
-.073 |
|
Poor health self rating |
.270 |
.140 |
-.074 |
-.162 |
1.000 |
.095 |
|
Number types of abuse |
.370 |
-.020 |
-.026 |
-.073 |
.095 |
1.000 |
|
Sig. (1-tailed) |
CES-D Score |
. |
.031 |
.000 |
.000 |
.000 |
.000 |
Respondent's age at |
.031 |
. |
.023 |
.009 |
.000 |
.272 |
|
time of interview |
|||||||
Educational attainment |
.000 |
.023 |
. |
.032 |
.012 |
.215 |
|
Currently employed? |
.000 |
.009 |
.032 |
. |
.000 |
.012 |
|
Poor health self rating |
.000 |
.000 |
.012 |
.000 |
. |
.002 |
|
Number types of abuse |
.000 |
.272 |
.215 |
.012 |
.002 |
. |
|
N |
CES-D Score |
939 |
939 |
939 |
939 |
939 |
939 |
Respondent's age at |
939 |
939 |
939 |
939 |
939 |
939 |
|
time of interview |
|||||||
Educational attainment |
939 |
939 |
939 |
939 |
939 |
939 |
|
Currently employed? |
939 |
939 |
939 |
939 |
939 |
939 |
Poor health self rating |
939 |
939 |
939 |
939 |
939 |
939 |
|
Number types of abuse |
939 |
939 |
939 |
939 |
939 |
939 |
Model Summary
Model |
Change Statistics |
|||||||||
Adjusted R |
Std. Error of |
R Square |
Sig. F |
|||||||
R |
R Square |
Square |
the Estimate |
Change |
F Change |
df1 |
df2 |
Change |
||
1 |
.348a |
.121 |
.117 |
11.07693 |
.121 |
32.148 |
4 |
934 |
.000 |
|
2 |
.483b |
.233 |
.229 |
10.34980 |
.112 |
136.849 |
1 |
933 |
.000 |
|
a. Predictors: (Constant), Poor health self rating, Educational attainment, Respondent's age at time of interview, Currently employed?
b. Predictors: (Constant), Poor health self rating, Educational attainment, Respondent's age at time of interview, Currently employed?, Number types of abuse
ANOVAc
Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
||
1 |
Regression |
15777.841 |
4 |
3944.460 |
32.148 |
.000a |
|
Residual |
114600.356 |
934 |
122.698 |
||||
Total |
130378.197 |
938 |
|||||
2 |
Regression |
30436.854 |
5 |
6087.371 |
56.829 |
.000b |
|
Residual |
99941.343 |
933 |
107.118 |
||||
Total |
130378.197 |
938 |
a. Predictors: (Constant), Poor health self rating, Educational attainment, Respondent's age at time of interview, Currently employed?
b. Predictors: (Constant), Poor health self rating, Educational attainment, Respondent's age at time of interview, Currently employed?, Number types of abuse
c. Dependent Variable: CES-D Score
Coefficientsa
Model |
Unstandardized |
Standardized |
95.0% Confidence Interval for |
||||||
Coefficients |
Coefficients |
B |
|||||||
B |
Std. Error |
Beta |
t |
Sig. |
Lower Bound |
Upper Bound |
|||
1 |
(Constant) |
22.182 |
2.351 |
9.434 |
.000 |
17.567 |
26.796 |
||
Respondent's age at |
.045 |
.059 |
.024 |
.767 |
.443 |
-.070 |
.161 |
||
time of interview |
|||||||||
Educational attainment |
-2.608 |
.624 |
-.129 |
-4.179 |
.000 |
-3.832 |
-1.383 |
||
Currently employed? |
-4.092 |
.738 |
-.173 |
-5.544 |
.000 |
-5.540 |
-2.643 |
||
Poor health self rating |
10.928 |
1.503 |
.229 |
7.270 |
.000 |
7.978 |
13.878 |
||
2 |
(Constant) |
18.165 |
2.224 |
8.169 |
.000 |
13.801 |
22.528 |
||
Respondent's age at |
.068 |
.055 |
.036 |
1.240 |
.215 |
-.040 |
.176 |
||
time of interview |
|||||||||
Educational attainment |
-2.518 |
.583 |
-.125 |
-4.318 |
.000 |
-3.663 |
-1.374 |
||
Currently employed? |
-3.605 |
.691 |
-.152 |
-5.219 |
.000 |
-4.961 |
-2.250 |
||
Poor health self rating |
9.496 |
1.410 |
.199 |
6.735 |
.000 |
6.729 |
12.263 |
||
Number types of abuse |
3.432 |
.293 |
.338 |
11.698 |
.000 |
2.856 |
4.008 |
a. Dependent Variable: CES-D Score

-
Rating:
5/
Solution: Week 7 Linear Regression Exercises