Predicting Housing Median Prices

Question

Predicting Housing Median Prices

Question # 00420823 Posted By: kimwood Updated on: 11/09/2016 05:44 AM Due on: 11/09/2016

Subject Statistics Topic General Statistics Tutorials:

Question

7.2 Predicting Housing Median Prices. The file BostonHousing.xls contains information on over 500 census tracts in Boston, where for each tract 14 variables are recorded. The last column (CAT.MEDV) was derived from MEDV, such that it obtains the value 1 if MEDV>30 and 0 otherwise. Consider the goal of predicting the median value (MEDV) of a tract, given the information in the first 13 columns.

Partition the data into training (60%) and validation (40%) sets.

a. Perform a k-NN prediction with all 13 predictors (ignore the CAT.MEDV column), trying values of k from 1 to 5. Make sure to normalize the data (click "normalize input data"). What is the best k chosen? What does it mean?

b. Predict the MEDV for a tract with the following information, using the best k: (Copy this table with the column names to a new worksheet and then in "Score new data" choose "from worksheet.")

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD 0.2 0 7 0 0.538 6 62 4.7 4 TAX PTRATIO B LSTAT 307 21 360 10

c. Why is the error of the training data zero?

d. Why is the validation data error overly optimistic compared to the error rate when applying this k-NN predictor to new data?

e. If the purpose is to predict MEDV for several thousands of new tracts, what would be the disadvantage of using k-NN prediction? List the operations that the algorithm goes through in order to produce each prediction.7.2 Predicting Housing Median Prices. The file BostonHousing.xls contains information on over 500 census tracts in Boston, where for each tract 14 variables are recorded. The last column (CAT.MEDV) was derived from MEDV, such that it obtains the value 1 if MEDV>30 and 0 otherwise. Consider the goal of predicting the median value (MEDV) of a tract, given the information in the first 13 columns.
Partition the data into training (60%) and validation (40%) sets.
a. Perform a k-NN prediction with all 13 predictors (ignore the CAT.MEDV column), trying values of k from 1 to 5. Make sure to normalize the data (click "normalize input data"). What is the best k chosen? What does it mean?
b. Predict the MEDV for a tract with the following information, using the best k: (Copy this table with the column names to a new worksheet and then in "Score new data" choose "from worksheet.")
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD 0.2 0 7 0 0.538 6 62 4.7 4 TAX PTRATIO B LSTAT 307 21 360 10
c. Why is the error of the training data zero?
d. Why is the validation data error overly optimistic compared to the error rate when applying this k-NN predictor to new data?
e. If the purpose is to predict MEDV for several thousands of new tracts, what would be the disadvantage of using k-NN prediction? List the operations that the algorithm goes through in order to produce each prediction.

Rating:

4.9/5

Tutorials for this Question

Get the Solution

Great! We have found the solution of this question!

score 3 · Accepted Answer · 11/09/2016 05:44 AM

Solution: Predicting Housing Median Prices

Tutorial # 00416273 Posted By: kimwood Posted on: 11/09/2016 05:44 AM

Puchased By: 3

Tutorial Preview

The solution of Predicting Housing Median Prices...

Get the Solution

Attachments

Predicting_Housing_Median_Prices.zip (8.13 KB)

Predicting Housing Median Prices

Solution: Predicting Housing Median Prices

Related Questions and Answers

Whatsapp our consultant to discuss your concerns happy to help :)

Lisa