regression analysis, (week 2, 5, and 8)

Question # 00018576 Posted By: maqj Updated on: 06/27/2014 12:26 PM Due on: 06/29/2014
Subject Business Topic General Business Tutorials:
Question
Dot Image


The Major League Baseball Data Set (collected from 2005) on the the next tab describes
some descriptive stats for all MLB teams in 2005. It might be interesting to test whether wins
(a potential dependent variable) or attendance (a potential depedent variable) can be
explained linearly by some combination of variables listed in the dataset. Please develop a
multiple regression scenario (that makes sense) from the dataset below and test it by
following the project directions.

Team
Boston
New York Yankees
Oakland
Baltimore
Los Angles Angels
Cleveland
Chicago White Sox
Toronto
Minnesota
Tampa Bay
Texas
Detroit
Seattle
Kansas City
Atlanta
Arizona
Houston
Cincinnati
New York Mets
Pittsburgh
Los Angeles Dodgers
San Diego
Washington
San Francisco
St Louis
Florida
Philadelphia
Milwaukee
Chicago Cubs
Colorado

Seating Capacity Salary ($)
Salary ($M)
Batting
ERA
33,871
123,505,125
123.5
0.281
57,746
208,306,817
208.3
0.276
43,662
55,425,762
55.4
0.262
48,262
73,914,333
73.9
0.269
45,050
97,725,322
97.7
0.270
43,368
41,502,500
41.5
0.271
44,321
75,178,000
75.2
0.262
50,516
45,719,500
45.7
0.265
48,678
56,186,000
56.2
0.259
44,027
29,679,067
29.7
0.274
52,000
55,849,000
55.8
0.267
40,000
69,092,000
69.1
0.272
45,611
87,754,334
87.8
0.256
40,529
36,881,000
36.9
0.263
50,062
86,457,302
86.5
0.265
49,075
62,329,166
62.3
0.256
42,000
76,799,000
76.8
0.256
42,059
61,892,583
61.9
0.261
55,775
101,305,821
101.3
0.258
38,127
38,133,000
38.1
0.259
56,000
83,039,000
83.0
0.253
42,445
63,290,833
63.3
0.257
56,000
48,581,500
48.6
0.252
40,800
90,199,500
90.2
0.261
49,625
92,106,833
92.1
0.270
42,531
60,408,834
60.4
0.272
43,500
95,522,000
95.5
0.270
42,400
39,934,833
39.9
0.259
38,957
87,032,933
87.0
0.270
50,381
48,155,000
48.2
0.267

HR
4.74
4.52
3.69
4.56
3.68
3.61
3.61
4.06
3.71
5.39
4.96
4.51
4.49
5.49
3.98
4.84
3.51
5.15
3.76
4.42
4.38
4.13
3.87
4.33
3.49
4.16
4.21
3.97
4.19
5.13

Error
199
229
155
189
147
207
200
136
134
157
260
168
130
126
184
191
161
222
175
139
149
130
117
128
170
128
167
175
194
150

109
95
88
107
87
106
94
95
102
124
108
110
86
125
86
94
89
104
106
117
106
109
92
90
100
103
90
119
101
118

Stolen Bases Wins
45
84
31
83
161
62
137
72
102
151
67
66
102
53
92
67
115
72
153
73
58
99
45
71
83
96
116
79
65
65

95.0
95.0
88.0
74.0
95.0
93.0
99.0
80.0
83.0
67.0
79.0
71.0
69.0
56.0
90.0
77.0
89.0
73.0
83.0
67.0
71.0
82.0
81.0
75.0
100.0
83.0
88.0
81.0
79.0
67.0

Attendance
2,847,798
4,090,440
2,108,818
2,623,904
3,404,636
2,014,220
2,342,804
2,014,995
2,034,243
1,141,915
2,525,259
2,024,505
2,724,859
1,371,181
2,520,904
2,059,327
2,805,060
1,923,254
2,827,549
1,817,245
3,603,680
2,869,787
2,730,352
3,181,020
3,542,271
1,852,608
2,665,304
2,211,323
3,100,092
1,914,385



Week 2

As preparation for the final research paper, formulate a theory about the correlation between measurable independent variables (causes) and one measurable dependent variable (the effect). Be sure to have at least two independent variables for proposed research paper. The topic proposal should include the following four items which serve as the foundation for the final research paper after instructor feedback is given.


1) Purpose Statement

In one paragraph, state the correlation and identify the primary independent variables.

State the correlation as in the following:

“The dependent variable _______ is determined by independent variables ________, _________, ________, and ________.”

Identify and defend the “primary” independent variable, or the variable believed to have the strongest impact on the dependent variable:

“The most important independent variable in this relationship is ________ because _________.”


2) Definition of Variables

For each variable, write a single definition paragraph talking about the variable. Paragraphs should be in this order: dependent variable, primary independent variable, and three independent variables.

In addition to defining the independent variables, defend why each determines the dependent variable.

For the primary independent variable, at least two research sources that discuss the variable also must be cited. These sources need not be technical documents but should contain evidence to justify the relationship between the primary independent variable and the dependent variable. List these sources in the Works Cited (reference) page.

**Note:Citations from encyclopedias, Wikipedia, blogs, abstracts, or non-governmental websites are not acceptable research sources.


3) Data Description

For each of the variables, at least 30 observations of cross-sectional or time-series data must be obtained. Thus for the final research paper, a data matrix that is at least 30 rows by numbers of variables must be presented.

In one paragraph, identify the data sources and describe the data (i.e., which government agencies supply the data, which methods are used to compile them, when they were collected, etc.).


4) Works Cited Page

The final page of the proposal should be a Works Cited page listing the two research sources for the primary independent variable and the data sources, with a separate citation for each table of data, including specific table numbers for each of the sources.


Upload this Word File in dropbox labeled "Project Topic and Feasibility Paper".
Your instructor will provide you with feedback within seven days of the last day of week two.





Week 5

In week two you submitted the following --

As preparation for the final research paper, formulate a theory about the correlation between measurable independent variables (causes) and one measurable dependent variable (the effect). Be sure to have at least two independent variables for proposed research paper. The topic proposal should include the following four items which serve as the foundation for the final research paper after instructor feedback is given.

For Week five -- [50 points]

Create a draft of your report including the data. This copy of the data may be in an Excel spreadsheet.

________________________________________

**Below is a copy of the requirements for your proposal from week 2. Points are assigned for week 5 work as listed on each item.

1) Purpose Statement [10pts]

In one paragraph, state the correlation and identify the primary independent variables.

State the correlation as in the following:

"The dependent variable _______ is determined by independent variables ________, _________, ________, and ________."

Identify and defend the "primary" independent variable, or the variable believed to have the strongest impact on the dependent variable:

"The most important independent variable in this relationship is ________ because _________."

________________________________________

2) Definition of Variables [10pts]

For each variable, write a single definition paragraph talking about the variable. Paragraphs should be in this order: dependent variable, primary independent variable, and three independent variables.

In addition to defining the independent variables, defend why each determines the dependent variable.

For the primary independent variable, at least two research sources that discuss the variable also must be cited. These sources need not be technical documents but should contain evidence to justify the relationship between the primary independent variable and the dependent variable. List these sources in the Works Cited (reference) page.

**Note: Citations from encyclopedias, Wikipedia, blogs, abstracts, or non-governmental websites are not acceptable research sources.

________________________________________

3) Data Description [10pts]

For each of the variables, at least 30 observations of cross-sectional data must be obtained. Thus for the final research paper, a data matrix that is at least 30 rows by numbers of variables must be presented.

In one paragraph, identify the data sources and describe the data (i.e., which government agencies supply the data, which methods are used to compile them, when they were collected, etc.).

________________________________________

4) Works Cited Page [10pts]

The final page of the proposal should be a Works Cited page listing the two research sources for the primary independent variable and the data sources, with a separate citation for each table of data, including specific table numbers for each of the sources.

5) Data [10pts]

________________________________________

Upload this Word File in dropbox labeled "Term Project Proposal"




Final Research Paper:

Assessment Rubric for this paper is available under Course Home, Asessment Rubric link.

Week 8

________________________________________

Purpose Statement and Model

1) In the introductory paragraph, state why the dependent variable has been chosen for analysis. Then make a general statement about the model:

"The dependent variable _______ is determined by variables ________, ________, ________, and ________."

2) In the second paragraph, identify the primary independent variable and defend why it is important.

"The most important variable in this analysis is ________ because _________." In this paragraph, cite and discuss the two research sources that support the thesis, i.e., the model.

3) Write the general form of the regression model (less intercept and coefficients), with the variables named appropriately so reader can identify each variable at a glance:

Dep_Var = Ind_Var_1 + Ind_Var_2 + Ind_Var_3

For instance, a typical model would be written:

Price_of_Home = Square_Footage + Number_Bedrooms + Lot_Size

Where

Price_of_Home: brief definition of dependent variable

Square_Footage: brief definition of first independent variable

Number_Bedrooms: brief definition of second independent variable

Lot_Size: brief definition of third independent variable

[Note: student of course replaces these variable names with his/her own variable names.]



Definition of Variables

4) Define and defend all variables, including the dependent variable, in a single paragraph for each variable. Also, state the expectations for each independent variable. These paragraphs should be in numerical order, i.e., dependent variable, X1, then X2, etc.

In each paragraph, the following should be addressed:

< How is the variable defined in the data source?

< Which unit of measurement is used?

< For the independent variables: why does the variable determine Y?

< What sign is expected for the independent variable's coefficient, positive or negative? Why?



Data Description


5) In one paragraph, describe the data and identify the data sources.

< From which general sources and from which specific tables are the data taken? (Citing a website is not acceptable.)

< Which year or years were the data collected?

< Are there any data limitations?



Presentation and Interpretation of Results


6) Write the regression (prediction) equation:

Dep_Var = Intercept + c1 * Ind_Var_1 + c2 * Ind_Var_2 + c3 * Ind_Var_3



7) Identify and interpret the adjusted R2 (one paragraph):

< Define "adjusted R2."

< What does the value of the adjusted R2 reveal about the model?

< If the adjusted R2 is low, how has the choice of independent variables created this result?

8) Identify and interpret the F test (one paragraph):

< Using the p-value approach, is the null hypothesis for the F test rejected or not rejected? Why or why not?

< Interpret the implications of these findings for the model.


9) Identify and interpret the t tests for each of the coefficients (one separate paragraph for each variable, in numerical order):

< Are the signs of the coefficients as expected? If not, why not?

< For each of the coefficients, interpret the numerical value.

< Using the p-value approach, is the null hypothesis for the t test rejected or not rejected for each coefficient? Why or why not?

< Interpret the implications of these findings for the variable.

< Identify the variable with the greatest significance.

10) Analyze multicollinearity of the independent variables (one paragraph):

< Generate the correlation matrix.

< Define multicollinearity.

< Are any of the independent variables highly correlated with each other? If so, identify the variables and explain why they are correlated.

< State the implications of multicollinearity (if found) for the model.

11) Other (not required):

< If any additional techniques for improving results are employed, discuss these at the end of the paper.


Works Cited Page

12) Use the proper format to list the works cited under two headings:

Research: two sources

Data: a separate citation for each of the variables used in the paper.




________________________________________

Upload this Word File in dropbox labeled "Term Project Report"


*******************************************************




Example_TopicProposal_Week2


Women in the Workforce: A Wonderful Addition or a Woeful Mistake?

EC 315

Former Student Example

Fall I 2007

Women in the Workforce: A Wonderful Addition or a Woeful Mistake?

Background

Many human resources professionals, scholars, feminists, and economists tout the addition of women to the U.S. workforce. Wendell French (2005) speculates in Human Resources Managementthat the continuous stream of women entering the workforce will explain a 55% increase in total U.S. labor force expansion between the years of 2002 and 2012 (p. 57). In addition, the percentage of working women continues to increase (French, 2005, p. 57). As women comprise an increasingly larger share of the labor market, their contributions, education, and effect on the economy warrants discussion. The aim of this project is to determine the effects of the entrance of larger proportions of increasingly educated women over the age of 25 with 4 years of college on the productivity of non-farm business in the United States, while holding the rate of population growth for this specific class (females over age 25), average number of hours worked, and average salary constant. This study employs a time-series analysis with observations from 1960 to 2006 included. Demographic data on education was taken from the U.S. Census Bureau and productivity information from the Bureau of Labor Statistics. The model (less constants and coefficients is):

OUTPUT = %COLLEGE_FEM + AVG_SAL + AVG_HOURS + POP_GROWTH

The result or dependent variable, OUTPUT, includes non-farm, seasonally adjusted output per hour. This variable is calculated using the ratio of the output of goods and services to labor hours required to produce them. %COLLEGE_FEM, the first independent variable, is the percentage of the female population age 25 and over who have completed 4 years of college and is compiled by the U.S. Census Bureau. This measure is used because of the established relationship between productivity and higher learning. If other independent variables are held constant, increases in education should result in a positive change in productivity (Sweetman, 2002 & Saxton, 2000). AVG_SAL is the real hourly compensation received by employees in non-farming business sectors. It is seasonally adjusted and indexed to 1992. This figure is utilized in the formula because wages have become a progressively significant incentive for workers to remain in or become more productive in the workforce. This being said, a positive relationship between average wages and productivity should exist (Fazzari, 2007). The average weekly hours spent on the job is another possible predictor of the dependent variable, OUTPUT. The Census Bureau collects this data from American workers for the Bureau of Labor Statistics. Many employers make adjustments to the hours which employees work in order to affect changes in productivity (International Labour Office Geneva, 2007). The additional time spent on the job increases output; essentially this should mean that a positive relationship exists between average hours and productivity, all other independent variables held equal (Skoczylas & Tissot 2004). Finally, as the population grows so does potential, equilibrium, and per capita output; this should also affect hourly productivity in a positive fashion (Fazzari, 2007).

References

International Labour Office; Geneva, (2007). Working time around the world: Main findings and policy implications. Retrieved August 29, 2007, from International Labour Office Web site: http://www.ilo.org/wcmsp5/groups/public/---dgreports/--- dcomm/documents/publication/wcms_082838.pdf

Fazzari, (2007, April 17). Retrieved September 2, 2007, from Washington State University, St. Louis Web site: artsci.wustl.edu/~ec104sf/Lec%20Notes%20104-8.doc

French, W.L. (2005). Human Resources Management. New York: Houghton Mifflin Company.

Saxton, Jim (January 2000). Joint Economic Committee Study. Retrieved September 1, 2007, from The United States House of Representatives Web site: http://www.house.gov/jec/educ.htm

Skoczylas, L., & B, Tissot (2005). Revisiting Recent Productivity Developments Across OECD Countries. Bank for International Settlements, Retrieved September 2, 2007, from http://www.ifcommittee.org/tissot.pdf.

Sweetman, A. (2002, November 27). Working smarter: Education and productivity. The Review of Economic Performance and Social Progress, Retrieved September 1, 2007, from http://www.irpp.org/miscpubs/archive/repsp1202/sweetman.pdf





regression (example)


Baseball League - 1999

Purpose Statement:

The intent of this project is to measure the impact of winning percentage of the baseball teams for 1999 and its highest correlation to the pitching saves followed by other variables such as total payroll, runs batted in, batting average, home runs, runs, earned run average and pitching saves. To be price, winning percentage by professional teams will be the dependent variable and the remaining will be under independent variables.

Abstract:

In this paper, I am hypothesizing that the winning percentage is directly related to the total runs batted in (RBI), total payroll, batting average, home runs (HR), runs (R), earned run average (ERA) and pitching saves. I will demonstrate the relationship of the aforementioned independent variables and the dependent variable.

Text box: winning =The regression model is as follows:

Definition of Variables:

The dependent variable, winning percentage (Wining), is determined by independent variables,total runs batted in (RBI), total payroll, batting average, home runs (HR), runs (R), earned run average (ERA) and pitching saves.

The primary independent variable is pitching saves, is defined as the number of saves, or percentage of save opportunities successfully converted. This variable is the most significant independent variable followed by earned run average.

The independent variable, runs batted in (RBI), is defined as the total runs batted by the each baseball team in 1999. This variable tells us the total runs batted in but is highly insignificant.

The independent variable, earned run average (ERA) , is defined as the mean ofearned runs given up by apitcher per nineinnings pitched. It is determined by dividing the number of earned runs allowed by the number of innings pitched and multiplying by nine. This variable is highly significant. The independent variable, runs (R), is defined as he total runs made by each team during the tournament. This variable is selected to see the impact of the overall runs made by different baseball teams.

The independent variable, payroll, is defined as the salary given to each player of the baseball teams. This variable illustrates that which team had been hired on the highest payroll and hence it justifies in a way the performance of each team player – high performers actually.

The independent variable, home runs (HR), is scored when the ball is hit in such a way that the batter is able to reach home safely in one play without any errors being committed by the defensive team in the process. Home runs are among the most popular aspects of baseball and, as a result, prolific home run hitters are usually the most popular among fans and consequently the highest paid by teams.

The independent variable, batting average, is a measure of a batter's performance obtained by dividing the total of base hits by the number of times at bat, not including walks. This variable illustrates the degree of achievement or accomplishment in any activity.

Relationship of Variables:

The relationship between winning percentage and all the independent variables is positive except for earned batting average. Of all the independent variables, only payroll, pitching saves and earned batting average are the significant variables even payroll is little on the other side.

To be precise, we will show the summary statistics of all the variables in the table below:

Winning %

Payroll

batting avg

HR

R

RBI

ERA

Saves

Mean

0

48794290

0

184

823

782

5

41

Median

0

46422154

0

188

838

794

5

40

Standard Deviation

0

22406268

0

32

79

76

1

7

Kurtosis

-1

-1

-1

0

0

0

0

-1

Skewness

0

0

0

0

0

0

0

0

Range

0

77340955

0

139

323

317

2

26

Minimum

0

14650000

0

105

686

643

4

29

Maximum

1

91990955

0

244

1009

960

6

55

Count

30

30

30

30

30

30

30

30

We will also see the regression outputs and conclude which variables are the most significant and how the model is coming along.

About 95.2% of the variation in the winning percentage is accounted by all the independent variables. This number is nothing but the value of R-square or coefficient of determination. This indicates that the overall model has a good fit.

Works Sited

· File 1999 Baseball Data.xls


SampleFinalPaperTermProject


Women in the Workforce: A Wonderful Addition or a Woeful Mistake?

EC 315

Sample Report

Term Project Report

Fall I 2007

TABLE OF CONTENTS

BACKGROUND 3

DICUSSION OF RESULTS 4

SUMMARY 8

REFERENCES 9

APPENDIX 10

Women in the Workforce: A Wonderful Addition or a Woeful Mistake?

Background

Many human resources professionals, scholars, feminists, and economists tout the addition of women to the U.S. workforce. Wendell French (2005) speculates in Human Resources Managementthat the continuous stream of women entering the workforce will explain a 55% increase in total U.S. labor force expansion between the years of 2002 and 2012 (p. 57). In addition, the percentage of working women continues to increase (French, 2005, p. 57). As women comprise an increasingly larger share of the labor market, their contributions, education, and effect on the economy warrants discussion. The aim of this project is to determine the effects of the entrance of larger proportions of increasingly educated women over the age of 25 with 4 years of college on the productivity of non-farm business in the United States, while holding the rate of population growth for this specific class (females over age 25), average number of hours worked, and average salary constant. This study employs a time-series analysis with observations from 1966 to 2006 included. Demographic data on education was taken from the U.S. Census Bureau and productivity information from the Bureau of Labor Statistics. The model (less constants and coefficients is):

OUTPUT = %COLLEGE_FEM + AVG_SAL + AVG_HOURS + POP_GROWTH

The result or dependent variable, OUTPUT, includes non-farm, seasonally adjusted output per hour. This variable is calculated using the ratio of the output of goods and services to labor hours required to produce them. %COLLEGE_FEM, the first independent variable, is the percentage of the female population age 25 and over who have completed 4 years of college and is compiled by the U.S. Census Bureau. This measure is used because of the established relationship between productivity and higher learning. If other independent variables are held constant, increases in education should result in a positive change in productivity (Sweetman, 2002 & Saxton, 2000). AVG_SAL is the real hourly compensation received by employees in non-farming business sectors. It is seasonally adjusted and indexed to 1992. This figure is utilized in the formula because wages have become a progressively significant incentive for workers to remain in or become more productive in the workforce. This being said, a positive relationship between average wages and productivity should exist (Fazzari, 2007). The average weekly hours spent on the job is another possible predictor of the dependent variable, OUTPUT. The Census Bureau collects this data from American workers for the Bureau of Labor Statistics. Many employers make adjustments to the hours which employees work in order to affect changes in productivity (International Labour Office Geneva, 2007). The additional time spent on the job increases output; essentially this should mean that a positive relationship exists between average hours and productivity, all other independent variables held equal (Skoczylas & Tissot 2004). Finally, as the population grows so does potential, equilibrium, and per capita output; this should also affect hourly productivity in a positive fashion (Fazzari, 2007).

Discussion of Results

The model was regressed and yielded the following results:

Regression equation is:OUPUT = - 100 + 1.50 %COLLEGE_FEM + 1.27 AVG_SAL + 0.497 AVG_HOURS+ 4.46 POP_GROWTH

DEPENDENT VARIABLE: OUTPUT ADJUSTED R2= .9914 n = 41

Independent Variables

Coefficient

Student t

Significance of t

%COLLEGE_FEM

1.500940684

5.820633573

1.20689E-06

AVG_SAL

1.268280718

12.52026283

1.1126E-14

AVG_HOURS

0.49737168

3.052832861

0.004

POP_GROWTH

4.461596032

1.187963082

0.24262667

Durbin Watson = .802773

The focus of this analysis is on the impact of %COLLEGE_FEM on OUTPUT. As evident above, the Durbin Watson Statistic of .802773 fell into the rejection region, indicating positive autocorrelation. Autocorrelation occurs when a pattern exists between the error terms due to a variable missing from the analysis. In this regression, the coefficients of the independent variables are biased to an unknown extent and are not reliable or reportable in a scholarly paper, publication, or report.

If the Durbin-Watson for this regression had passed, the independent variables %COLLEGE_FEM and AVG_SAL would have been significant at ? = .05, .025, .01, .005, and .001. The variable AVG_HOURS would have shown significance at ? = .05, .025, .01, and .005. However, the independent variable, POP_GROWTH shows inadequate significance with a p value of .2426; this does not meet the criteria of showing significance at the ? = .05, or even .10 level; however, it is significant when the sig value is less than or equal to ?.

The R2 value of 99.2% suggests that the independent variables account for 99.2% of the variation of the outcome; this is often the case in time series regressions, in which one observation builds upon another. R2 cannot be relied upon since the Durbin-Watson indicates positive autocorrelation. It is desirable for R2 to be 50% or greater. The Adj. R2 value of 99.1% is also thought of as a "good" thing, if the Durbin-Watson had passed. In the case of adjusted R2, this regression indicates that the independent variables explain 99.1% of the variance of the dependent variable. Both statistics must be less than or equal to one, but greater than or equal to 0. Often, people rely heavily on the use of R2 and Adj. R2, disregarding the Durbin-Watson test. Since this analysis is a time series regression, the Durbin-Watson is more valuable to the analyst than the R2 and Adj. R2. Without a passing Durbin-Watson test for a time-series analysis, both of the preceding are useless, as is the case with this regression.

Another important consideration when regressing an equation is the presence of multicollinearity. In this model, there was a complete absence of it. All pairs of independent variables were regressed, and the resulting R2from the bivariate regressions compared to the R2 of the entire model. The results are detailed below:


Regression Statistics % College Females & Avg Salary

Multiple R

0.960929531

R Square

0.923385564

Regression Statistics % College Fem & Avg Hours

Multiple R

0.82176622

R Square

0.67529972

Regression Statistics for % College Fem & Pop Growth

Multiple R

0.245677796

R Square

0.060357579

Regression Statistics for Avg_Sal & Avg_Hours

Multiple R

0.865413869

R Square

0.748941165

Regression Statistics Avg_Sal & Pop_Growth

Multiple R

0.21991564

R Square

0.048362889

Regression Statistics of Avg_Hours & Pop_Growth

Multiple R

0.348112319

R Square

0.121182187


As is evident above, the bivariate regressions yield R2 values less than the value of the entire regression. Multicollinearity happens when two more of the predictors have a linear relationship. When this occurs, the statistical software package does not know which variable to give the coefficient to. Often, one coefficient will be near zero while the other coefficient of the collinear variable will be the source of all the affect on the outcome, causing the coefficients to be biased. The absence of multicollinearity indicates that the coefficients are not biased due to its presence.

Another useful way to analyze the effect of independent variables on the outcomes is through coefficients. The coefficients for the model are detailed below:

Independent Variables

Coefficient

%COLLEGE_FEM

1.500940684

AVG_SAL

1.268280718

AVG_HOURS

0.49737168

POP_GROWTH

4.461596032

Durbin Watson = .592049

The 1.5009 coefficient value for %COLLEGE_FEM indicates that for every one percent increase in the percentage of females who have obtained four years of college, output increases by 1.5009. 1.2682 is the coefficient for the predictor, AVG_SAL and indicates that for every single unit increase in salary, output increases by 1.268. The .4973 coefficient value for AVG_ HOURS suggests that for 1 hour increase, output is increased by .4973. POP_GROWTH brings with it a coefficient of 4.4615 indicating that for every 1,000 people that are added to the population, output increases by 4.4616. While these coefficients indicate a fairly significant relationship there are two things that must first be considered:

  • The Durbin-Watson test failed, which means that the coefficients of the predictors are biased to an unknown extent.
  • In the case of POP_GROWTH, the p value method of hypothesis test indicates that the variable is insignificant.

Had the Durbin-Watson passed, and the sig value of POP_GROWTH been less than or equal to alpha, the coefficients of the variables would yield the results detailed above. According to the coefficients, population growth has the most significant effect on the outcome, followed by the percentage of women with 4 years of college, average salary, and finally average hours.

Summary

Unfortunately, the failure of the Durbin-Watson test makes the model biased to an unknown extent. Unless the missing variable can be found, the results are essentially useless. If the Durbin-Watson had passed, all variables except POP_GROWTH would be significant predictors of the outcome. This would indicate that as the number of women who have 4 years of college increases, so does output per hour. In addition, the predictions made in the Background section would prove to be true.

References

International Labour Office; Geneva, (2007). Working time around the world: Main findings and policy implications. Retrieved August 29, 2007, from International Labour Office Web site: http://www.ilo.org/wcmsp5/groups/public/---dgreports/--- dcomm/documents/publication/wcms_082838.pdf

Fazzari, (2007, April 17). Retrieved September 2, 2007, from Washington State University, St. Louis Web site: artsci.wustl.edu/~ec104sf/Lec%20Notes%20104-8.doc

French, W.L. (2005). Human Resources Management. New York: Houghton Mifflin Company.

Saxton, Jim (January 2000). Joint Economic Committee Study. Retrieved September 1, 2007, from The United States House of Representatives Web site: http://www.house.gov/jec/educ.htm

Skoczylas, L., & B, Tissot (2005). Revisiting Recent Productivity Developments Across OECD Countries. Bank for International Settlements, Retrieved September 2, 2007, from http://www.ifcommittee.org/tissot.pdf.

Sweetman, A. (2002, November 27). Working smarter: Education and productivity. The Review of Economic Performance and Social Progress, Retrieved September 1, 2007, from http://www.irpp.org/miscpubs/archive/repsp1202/sweetman.pdf

Appendix

Year

OUPUT

%COLLEGE_FEM

AVG_SAL

AVG_HOURS

POP_GROWTH

1966

63.585

5.3

73.197

113.286

1.16

1967

64.687

5.6

75.15

111.415

1.09

1968

66.893

6.6

77.778

110.788

1

1969

66.993

5.8

78.773

110.013

0.98

1970

67.988

5.8

79.842

108.212

1.17

1971

70.703

5.9

81.365

107.678

1.26

1972

73.061

6.3

83.953

107.885

1.07

1973

75.336

6.6

85.479

107.506

0.95

1974

74.208

6.8

84.493

105.93

0.91

1975

76.221

7.1

85.213

104.435

0.99

1976

78.728

7.4

87.37

104.562

0.95

1977

79.984

8

88.715

103.956

1.01

1978

81.022

7.9

90.25

103.618

1.06

1979

80.739

8.1

90.249

102.934

1.1

1980

80.579

8.9

89.999

101.779

0.96

1981

81.691

8.6

90.211

101.381

0.98

1982

80.831

8.62

91.107

100.764

0.95

1983

84.458

9.2665

91.109

101.696

0.91

1984

86.121

9.7429

91.121

102.403

0.87

1985

87.456

10.0016

92.113

102.189

0.89

1986

90.151

10.0983

95.175

101.198

0.92

1987

90.608

12.7948

95.523

101.429

0.89

1988

92.106

10.6

96.675

101.118

0.91

1989

92.783

11.1578

95.064

101.612

0.94

1990

94.515

11.48

96.047

100.465

1.07

1991

96.06

11.86

97.414

99.679

1.08

1992

100

13.1

100

100

1.14

1993

100.411

13.4596

99.478

100.556

1.08

1994

101.524

13.67

99.061

100.927

0.99

1995

102.009

14.0052

98.773

100.832

0.95

1996

104.715

15.1403

99.45

100.257

0.92

1997

106.415

15.3831

100.352

100.955

0.96

1998

109.354

16.95

104.89

100.867

0.92

1999

112.508

16.1

107.542

101.268

0.9

2000

115.689

16.3

111.56

100.514

1.01

2001

118.583

16.8

112.848

99.17

1.0102

2002

123.473

17.1915

115.098

98.947

1.0102

2003

128.034

17.458

117.076

98.505

1.00927

2004

131.542

17.606

118.166

98.51

1.00977

2005

134.097

17.8933

118.94

98.322

1.0098

2006

135.393

18.1343

119.664

98.555

1.00975

Minitab Regression Calculation

Worksheet size: 10000 cells.

Welcome to Minitab, press F1 for help.

Regression Analysis: OUPUT versus %COLLEGE_FEM, AVG_SAL, ...

The regression equation is

OUPUT = - 100 + 1.50 %COLLEGE_FEM + 1.27 AVG_SAL + 0.497 AVG_HOURS

+ 4.46 POP_GROWTH

Predictor Coef SE Coef T P

Constant -100.40 21.17 -4.74 0.000

%COLLEGE_FEM 1.5009 0.2579 5.82 0.000

AVG_SAL 1.2683 0.1013 12.52 0.000

AVG_HOURS 0.4974 0.1629 3.05 0.004

POP_GROWTH 4.462 3.756 1.19 0.243

S = 1.87253 R-Sq = 99.2% R-Sq(adj) = 99.1%

Analysis of Variance

Source DF SS MS F P

Regression 4 16156.6 4039.2 1151.94 0.000

Residual Error 36 126.2 3.5

Total 40 16282.8

Source DF Seq SS

%COLLEGE_FEM 1 15517.5

AVG_SAL 1 587.1

AVG_HOURS 1 47.1

POP_GROWTH 1 4.9

Unusual Observations

Obs %COLLEGE_FEM OUPUT Fit SE Fit Residual St Resid

22 12.8 90.608 94.368 0.617 -3.760 -2.13R

35 16.3 115.689 120.049 0.587 -4.360 -2.45R

R denotes an observation with a large standardized residual.

Durbin-Watson statistic = 0.802773

Dl = 1.29, which is greater than .802773, signaling the failure of the DW test.

Test for Multicollinearity

SUMMARY OUTPUT

Regression Statistics % College Females & Avg Salary

Multiple R

0.96092953

R Square

0.92338556

Adjusted R Square

0.92142109

Standard Error

1.18075699

Observations

41

Regression Statistics % College Fem & Avg Hours

Multiple R

0.82176622

R Square

0.67529972

Adjusted R Square

0.66697407

Standard Error

2.43078502

Observations

41

Regression Statistics for % College Fem & Pop Growth

Multiple R

0.2456778

R Square

0.06035758

Adjusted R Square

0.03626418

Standard Error

4.13510486

Observations

41

SUMMARY OUTPUT

Regression Statistics for Avg_Sal & Avg_Hours

Multiple R

0.86541387

R Square

0.74894117

Adjusted R Square

0.74250376

Standard Error

6.26476959

Observations

41

Regression Statistics Avg_Sal & Pop_Growth

Multiple R

0.21991564

R Square

0.04836289

Adjusted R Square

0.02396194

Standard Error

12.1970003

Observations

41

Regression Statistics of Avg_Hours & Pop_Growth

Multiple R

0.34811232

R Square

0.12118219

Adjusted R Square

0.0986484

Standard Error

3.66430833

Observations

41

Dot Image
Tutorials for this Question
  1. Tutorial # 00018027 Posted By: maqj Posted on: 06/27/2014 12:27 PM
    Puchased By: 3
    Tutorial Preview
    The solution of regression analysis, (week 2, 5, and 8)...
    Attachments
    regression_analysis,_(week_2,_5,_and_8).docx (34.25 KB)
    Recent Feedback
    Rated By Feedback Comments Rated On
    o....to10 Rating Instant customer chat support 03/14/2016

Great! We have found the solution of this question!

Whatsapp Lisa