Math 104 Final Project 2014

• Project – Advertising
• Project – Restaurant ratings
2. It will be your choice of the final project.
3. Perform Exploratory Data Analysis (EDA) on your data.
4. Write the project report on your data set (with no more than 10 pages – in Hard copy). You can attach as many pages of printout as you wish to substantiate your report. Your report should contain the executive summary (abstract), objective, methods, procedure, analyses, and conclusions of the study of your choice to your data set. You are encouraged to use PHStat and/or Excel to do any computation.
5. You should email me the your excel files that you used in the regressions, showing how you transformed the data to obtain the corresponding regression You may use WORD, PowerPoint or any presentation software.
6. Your report will be graded based on your presentation of work (25%), use of technology (25%), and understanding of statistical methods and procedures and data analysis (50%).Project
Zagat’s Restaurant Ratings.
Zagat’s publishes restaurant ratings for various locations in the United States. The file RESTRATE.xls contains the Zagat rating for food, décor, service, and the price per person for a sample of 53 restaurants located in New York City and 53 restaurants located in Long Island. Suppose you wanted to develop a regression model to predict the price per person based on a variable that represents the sum of the ratings for food, décor, and service. Using EXCEL or PHStat2, answer the following: The following is a minimum guideline about what you should analyze.
a) Construct a scatter diagram of price against summated rating. Describe the relationship that you may see. Does this appear to have some association (linear or non-linear)?
b) Perform exploratory data analysis, such as numerical measures and/or the box-and-whisker plot
for this data set.
c) From (a) and (b), does any simple linear model appear to hold? You may want to run some testing to substantiate why or why not.
d) Does multiple regression model appear to hold? You may want to run some testing to substantiate why or why not. If so, find the regression equation to predict price from location.
e) Suppose now that you want to develop a regression model to predict the price per person based on a variable that represents the sum of the ratings for food, décor, and service, and on location (New York City (Locate = 0) or Long Island (Locate =1)).
f) Is the regression significant? Report the results of the appropriate test, and interpret its meaning.
g) Does summated rating have significant impact on price, following adjustment for location? In particular, are New York City’s restaurants significantly more expensive or significantly less on average than those in Long Island?
h) Include an interaction term in the model and, at the 0.05 level of significance, determine whether it makes a significant contribution to the model.
i) Summarize and comment on your results.
Project
Use of the best-subsets approach to model building
Consider the file advertising.xls showing data for magazine titles, the cost of a full-color page advertisement (page), audience (subscribers in thousands), male percentage of subscribers, and household income. The objective of this project is to find out if there is any relationship among variables using regression analysis techniques. You are to write a report about your findings after analyzing the data set. The following is a minimum guideline about what
you should analyze.
State your statistical objective for this data set.
a) Perform exploratory data analysis, such as numerical measures or the box-and-whisker plot for this data set.
b) Construct scatter diagrams for pairs of variables. Describe the relationship that you may see. Do these appear to
have some association (linear or non-linear)?
c) From (a) and (b), does any simple linear model appear to hold? You may want to run some testing to
substantiate why or why not.
d) Apply the best-subsets approach to model building to see if there is any variable that shouldn’t be used for this model.
e) Consider the male percentage of subscribers as categorical data, for example, if it is more than 66%, input as “male magazine,” between 66% and 33% as “gender free,” and less than 33% as “female magazine.” Then
introduce dummy variables for these data. Will this give you a meaningful (better) output for this model since
some households use male names to subscribe any magazine? Can you introduce any other dummy variables to improve your analysis?
f) Once you determine which variables are to be used, perform a multiple regression analysis, including collinearity, on this subset of variables.
g) Summarize and comment on your results.

-
Rating:
5/
Solution: Math 104 Final Project 2014 Solution Paper