Module 6 - General Linear Models: ANOVA & ANCOVA (2015)

Module 6 - General Linear Models: ANOVA & ANCOVA
The two exercises below utilize the data sets career-a.sav and career-f.sav, which can be downloaded from this Web site:
www.Pyrczak.com/data
1. You are interested in evaluating the effect of job satisfaction (satjob2) and age category (agecat4) on the combined DV of hours worked per week (hrsl) and years of education (educ). Use career-a.sav for steps a and b.
a. Develop the appropriate research questions and/or hypotheses for main effects and interaction.
b. Screen data for missing data and outliers. What steps, if any, are necessary for reducing missing data and outliers?
For all subsequent analyses in Question 1, use career-f.sav and the transformed variables of hrs2 and educ 2.
c. Test the assumptions of normality and linearity of DVs.
i. What steps, if any, are necessary for increasing normality?
ii. Are DVs linearly related?
d. Conduct MANOVA with post hoc (be sure to test for homogeneity of variance-covariance).
a. Can you conclude homogeneity of variance-covariance? Which test statistic is most ap¬propriate for interpretation of multivariate results?
b. Is factor interaction significant? Explain.
c. Are main effects significant? Explain.
d. What can you conclude from univariate ANOVA and post hoc results?
e. Write a results statement.
2. Building on the previous problem, in which you investigated the effects of job satisfaction (satjobl) and age category (agecat4) on the combined dependent variable of hours worked per week (hrsl) and years of education (educ), you are now interested in controlling for respondents' income such that rin- com91 will be used as a covariate. Complete the following using career-a.sav.
i. Develop the appropriate research questions and/or hypotheses for main effects and interaction.
ii. Screen data for missing data and outliers. What steps, if any, are necessary for reducing missing data and outliers?
For all subsequent analyses in Question 2, use career-f.sav and the transformed variables of hrs2, educ2, and rincoml.
iii. Test the assumptions of normality and linearity of DVs and covariate.
i. What steps, if any, are necessary for increasing normality?
ii. Are DVs and covariate linearly related?
c. Conduct a preliminary MANCOVA to test the assumptions of homogeneity of variance- covariance and homogeneity of regression slopes/planes.
i. Can you conclude homogeneity of variance-covariance? Which test statistic is most appropriate for interpretation of multivariate results?
ii. Do factors and covariate significantly interact? Explain.
d. Conduct MANCOVA.
i. Is factor interaction significant? Explain.
ii. Are main effects significant? Explain.
iii. What can you conclude from univariate ANOVA results?
e. Write a results statement.
3. Compare the results from Question 1 and Question 2. Explain the differences in main effects.
The following output was generated from conducting a forward multiple regression to identify which IVs {urban, birthrat, Inphone, and Inradio) predict Ingdp. The data analyzed were from the SPSS country-a.sav data file.
Variables Entered/Removed 1
Variables Variables
Model Entered Removed Method
1 Forward
(Criterion:
LNPHONE Probability
y-of-F-to-e
nter <=
.050)
2 Forward
(Criterion:
BIRTHRAT Probability
y-of-F-to-e
nter <=
050)
a Dependent Variable: LNGDP
Model Summary
Std. Error Change Statistics
Adjusted of the R Square Sig. F
Model R R Square R Square Estimate Change F Change df1 df2 Change
1 ,941a .886 .885 5180 886 862 968 1 111 ,000
2 .943" .890 888 .5109 .004 4.095 1 110 .045
a Predictors: (Constant), LNPHONE
b Predictors: (Constant), LNPHONE, BIRTHRAT
Coefficients"
Model Unstandardized Coefficients Standardi
zed Coefficien ts t Sifl. Correlations Collinearity Statistics
B Std Error Beta Zero-order Partial Part Tolerance VIF
1 (Constant) LNPHONE 6 389 .736 058 .025 941 110 662 29.376 .000 000 .941 941 941 1.000 1.000
2 (Constant) LNPHONE BIRTHRAT 6 878 663 -1 29E-02 .248 .044 006 849
- 113 27 744 15238 -2 024 000 .000 045 941 -811 .824 - 189 482 -064 .322 322 3 104 3 104
a Dependent Variable LNGDP
a- Predictors: (Constant), LNPHONE b Predictors: (Constant), LNPHONE, BIRTHRAT c. Dependent Variable: LNGDP
Excluded Variables
Collinearitv Statistics
Partial Minimum
Model Beta In t Sip. Correlation Tolerance VIF Tolerance
1 URBAN 095a 1.901 .060 .178 .404 2.475 404
BIRTHRAT -.113a -2.024 045 -.189 .322 3.104 .322
LN RADIO .026a .557 .579 .053 .461 2.171 .461
2 URBAN ,091b 1.848 .067 .174 .403 2.479 .225
LNRADIO 021b .455 .650 044 .459 2.178 .243
a Predictors in the Model: (Constant), LNPHONE
b- Predictors in the Model: (Constant), LNPHONE. BIRTHRAT
c. Dependent Variable: LNGDP
ANOVAc
Model Sum of Squares df Mean Square F Siq.
1 Regression Residual Total 231.539 29.782 261.321 1
111 112 231.539 .268 862.968 ,000a
2 Regression Residual Total 232.608 28.713 261.321 2 110 112 116.304 .261 445.561 .000"
a. Predictors: (Constant), Inphone
b. Predictors: (Constant), Inphone, birthrat
c. Dependent Variable: Ingdp
i. Evaluate the tolerance statistics. Is multicollinearity a problem .'
ii. What variables create the model to predict Ingdp? What statistics support your response?
iii. Is the model significant in predicting Ingdp? Explain.
iv. What percentage of variance in Ingdp is explained by the model?
v. Write the regression equation for Ingdp.
This question utilizes the data sets profile-a.sav and profile-b.sav, which can be downloaded from this Web site:
www.Pvrczak.com/data
You are interested in examining whether the variables shown here in brackets [years of age (age), hours worked per week (hrsl), years of education (educ), years of education for mother (maeduc), and years of education for father (paeduc)] are predictors of individual income (rincmdol). Complete the following steps to conduct this analysis.
a. Using profile-a.sav, conduct a preliminary regression to calculate Mahalanobis distance. Iden¬tify the critical value for chi-square. Conduct Explore to identify outliers. Which cases should be removed from further analysis?
For all subsequent analyses, use profile-b.saw Make sure that only cases where MAH l < 22.458 are selected.
b. Create a scatterplot matrix. Can you assume linearity and normality?
c. Conduct a preliminary regression to create a residual plot. Can you assume normality and ho- moscedasticity?
d. Conduct multiple regression using the Enter method. Evaluate the tolerance statistics. Is multicollinearity a problem?
e. Does the model significantly predict rincmdol? Explain.
f. Which variables significantly predict rincmdol? Which variable is the best predictor of the DV?
g. What percentage of variance in rincmdol is explained by the model?
h. Write the regression equation for the standardized variables.
i. Explain why the variables of mother's and father's education are not significant predictors of rincmdol.
The following exercises seek to determine what underlying structure exists among the following variables in profile-a.sav: highest degree earned (degree), hours worked per week (hrsl), job satisfaction (satjob), years of education (educ), hours per day watching TV (tvhours), general happiness {happy), degree to which life is exciting(life), and degree to which the lot of the average person is getting worse (anomiaS).
1. The following output was generated for the initial analysis. Varimax rotation wa$ utilized.
Communalities
Initial Extraction
degree 1 000 933
hrs1 1 000 602
satjob 1 000 447
educ 1 000 939
tvhours 1 000 556
happy 1 000 .576
life 1 000 500
anomia5 1 000 317
Extraction Method Principal Component Analysis
Total Variance Explained
Component Initial Eigenvalues Extraction Sums of Squared loadings Rotation Sums of Squared Loadings
Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of Variance Cumulative %
1 2423 30 293 30 293 2 423 30 293 30 293 1 879 23 488 23 488
2 1 426 17.822 48 115 1 426 17 822 48 115 1 734 21.676 45 165
3 1 021 12 760 60 875 1 021 12 760 60 875 1 257 15.710 60 875
4 886 11.077 71 952
5 796 9 955 81 907
6 728 9 094 91 001
7 .607 7 589 98 590
8 113 1 410 100 000
Extraction Method: Principal Component Analysis
"n Metho Scree Plgf 9 plot
Component Number
Reproduced Correlations
degree hrs1 satjob educ tvhours happy life anomia5
Reproduced Correlation degree .933" .176 -.039 .935 -.239 -.119 .230 .118
hrs1 .176 ,602b -239 .194 -.576 -.077 .141 -.049
satjob -.039 -.239 447b -.062 .214 .469 -.436 -.297
educ .935 194 -062 939b -.255 -.142 .252 .131
tvhours -.239 -.576 .214 -.255 .556" .066 -.136 .047
happy -.119 -.077 .469 -.142 .066 576b -.526 -.412
life .230 .141 -.436 .252 -.136 -.526 500b .371
anomia5 .118 -.049 -.297 .131 .047 -.412 .371 ,317b
Residual® degree .004 -068 -.050 .032 -.004 -.034 -.037
hrs1 .004 .104 .011 .361 -.031 -.046 .112
satjob -.068 .104 -.037 -.105 -.197 .151 .158
educ -.050 .011 -.037 .026 -.002 -029 -.012
tvhours 032 .361 -.105 .026 .014 -.012 -.099
happy -.004 -.031 -.197 -.002 .014 .159 .177
life -.034 -.046 .151 -.029 -.012 .159 -.217
anomia5 -.037 .112 .158 -.012 -099 .177 -.217
Extraction Method: Principal Component Analysis.
a Residuals are computed between observed and reproduced correlations. There are 12 (42.0%) nonredundant residuals with absolute
values greater than 0.05.
b. Reproduced communalities
a. Assess the eigenvalue criterion. How many components were retained? Is the eigenvalue ap-propriate, considering the number of factors and the communalities?
b. Assess the variance explained by the retained components. What is the total variability ex-plained by the model? Is this amount adequate?
c. Assess the scree plot. At which component does the plot begin to level off?
d. Assess the residuals. How many residuals exceed the .05 criterion?
e. Having applied the four criteria, do you believe the number of components retained in this analy¬sis is appropriate? If not, what is your recommendation?
2. Assume that you believe four components should be retained from the analysis in the previous exer¬cise. Conduct a factor analysis with varimax rotation (be sure to retain four components).
a. Evaluate each of the four criteria. Has the model fit improved? Explain.
b. Provide two alternatives for improving the model.
Prediction and Association Practice Exercise
Use Practice Data Set 2 in Appendix B. If we want to predict salary from years of education, what salary would you predict for someone with 12 years of education? What salary would you predict for someone with a college education (16 years)?
Use Practice Data Set 2 in Appendix B. Determine the prediction equation for pre¬dicting salary based on education, years of service, and sex. Which variables are significant predictors? If you believe that men were paid more than women were, what would you conclude after conducting this analysis?
Data Set 2 Appendix B
A survey of employees is conducted. Each employee provides the following infor¬mation: Salary (SALARY), Years of Service (YOS), Sex (SEX), Job Classification (CLASSIFY), and Education Level (EDUC). Note that you will have to code SEX (Male = 1, Female = 2) and CLASSIFY (Clerical = 1, Technical = 2, Professional = 3), and indicate that they are measured on a nominal scale.
Name A Jj?> [Jwidth "{"Decimals] Label Values Missin g ] Columns E Align Measure T~ Role
1 salary Numeric 8 2 None None 8 9 Right f Scale S Input
2 yos Numeric 8 2 None None 8 m Right # Scale \ Input
3 sex Numeric 8 2 (1 00. Male) None 8 m Right Nominal \ Input
4 classify Numeric 8 2 {1 00, Cleric None 8 9 Right A Nominal \ Input
5 educ Numeric 8 2 None None 8 9 Right f Scale \ Input
SALARY YOS SEX CLASSIFY EDUC
35,000 8 Male Technical 14
18,000 4 Female Clerical 10
20,000 1 Male Professional 16
50,000 20 Female Professional 16
38,000 6 Male Professional 20
20,000 6 Female Clerical 12
75,000 17 Male Professional 20
40,000 4 Female Technical 12
30,000 8 Male Technical 14
22,000 15 Female Clerical 12
23,000 16 Male Clerical 12
45,000 2 Female Professional 16

-
Rating:
5/
Solution: Module 6 - General Linear Models: ANOVA & ANCOVA