Statistics project
This project will involve the gathering of samples and the
use of both descriptive and inferential statistics. It will be due by December
17th.
All work must be shown and the write up must
be typed (mathematical formulations may be hand written). You may use any
software with statistical applications (Excel, StatCrunch, Minitab, etc.),
calculator or any other statistical tool for your analysis work (there are some
online tools that can calculate confidence intervals and hypothesis tests). You
may also use the textbook and any other written resource to aid you in your
work (website, etc.). The work on this project must be your own. Seeking assistance from anyone else (Tutor,
Classmate, etc.) is strictly forbidden (you are on your honor).I
will be available to answer any questions that you may have for this project
for clarity purposes.
Project Concept
This project will focus on analyzing weather data for two
cities in upstate New York (Binghamton and Syracuse). The data is located on
the NOAA website. The project will be in three parts:
Part 1: Perform a descriptive and numerical analysis
of a sample of average daily temperatures for the city of Syracuse, NY.
Part 2: Create confidence intervals to estimate the average monthly
precipitation for both Syracuse and Binghamton, NY.
Part 3: Using a twosample dependent (matched pairs) design, determine
if there exists a difference in average winter monthly snowfall.
Data Needed for the Project
The NOAA website contains the data needed to perform the
analysis. There will be instructions on how to get to the website and navigate
to the locations of where the data can be found. For all three parts please
restrict yourself within the time period from January 1965 to December 2012.
Part 1: Descriptive and Numerical Analysis
for Syracuse, NY (1965 to 2012)
For this part of the project you need to gather the daily average temperature for 50 different days from January
1965 to December 2012 for Syracuse, NY. At the end of this project document I
will provide some details of what can be used to gather this sample properly.
Part 2: Confidence Interval Estimates for Monthly
Precipitation for Binghamton and Syracuse, NY (1965 to 2012)
For this part of the project you need to gather the total
monthly precipitation for 30
different months from January 1965 to December 2012 for Binghamton and
Syracuse, NY. You must gather a different simple random sample (SRS) of 30
months for each city (one set of 30 months for Binghamton and a separate set of
30 months for Syracuse). At the end of this project document I will provide
some details of what can be used to gather this sample properly. I have created
a chart (called Part 2 Data Sheet) that will help you organize this data.
Part 3: TwoSample Dependent (Matched Pairs) Design for
Winter Month Snowfall (1965 to 2012)
For the last part of the project you need to gather a
matched pair design sample of monthly total snowfall totals for 30 different “winter” months
from January 1965 to December 2012 for Binghamton and Syracuse. Winter months
will be defined as December, January, February and March. This sample differs
from the other parts of the project in that you only need to obtain oneSRS of 30 different winter
months. Once that sample is created then you need to get the monthly snowfall
totals for both Binghamton and Syracuse for that month (i.e. if one of your
months is February 1992 then you need to gather the snowfall totals for both
Binghamton and Syracuse for February 1992). I have created a chart (called Part
3 Data Sheet) that will help you organize this data.
Statistical Analysis
Part 1: Descriptive and Numerical
Analysis
Using the sample of average daily temperatures gathered for
Syracuse, NY please conduct the following descriptive and numerical analysis.
 StemandLeaf Plot
 Frequency Table using
between 6 and 10 classes (your choice)
 Frequency Histogram based
on the frequency table you created
 Five Number Summary and
Boxplot (including fences)
 Mean, Median, and Mode
 Standard Deviation
*note: for the purposes of calculating mean and standard
deviation you do not need to calculate it by hand (you may use the statistical
features of your calculator or computer instead)
Part 2: Confidence Interval Estimator
Using the Part 2 data gathered please create a 95%
confidence interval estimator for the population average monthly precipitation
for Binghamton and Syracuse, NY. In other words, you will create two different
95% confidence intervals, one for Binghamton and another for Syracuse.
Part 3: Twosample Dependent (Matched Pairs) Design Hypothesis Test
Using the Part 3 data gathered please run a dependent
hypothesis test for the population difference of monthly average snowfall at
the 5% level of significance. Please be sure to show all four steps of the
hypothesis test (you may use the p value approach for this if you wish). Note:
you do not need to show me the work for the mean and standard deviation
calculations since you can use the statistical functions of your calculator for
them.
Project Summary
For the final part of the project I would like you to answer
the following questions below.
a) Using
the Descriptive Analysis from part 1, does it appear that the population of average
daily temperatures for Syracuse, NY is normally distributed? Why or why not?
b) Using
the confidence intervals from part 2, is it possible that the population means
of monthly precipitation for Binghamton and Syracuse, NY are the same? Why or
why not?
c) Using
the dependent hypothesis test from part 3, what were you able to conclude about
the difference of monthly snowfall between Binghamton and Syracuse (i.e. did
one of the cities appear to have a higher average monthly snowfall total?)
d) The
sample size for parts 2 and 3 of the analysis were set at 30 (hint: it had to
be at least 30). Why is this important?
e) In
the past students have tried to use the random date generator from random.org
to gather their part 2 data by ignoring the day and using just the month and
year (i.e. if they got a date of 3/20/1977, they would throw out the 20 and use
just 3/1977 as their random month). Does this process actually result in a true
simple random sample? Why or why not?
Write Up Details
When putting the project together for submission please make
sure the materials are in the following order:
 Cover Page (with your name
and date)
 The raw data you gathered
(for parts 1,2, and 3 of the project)
 Part 1 Analysis
 Part 2 Analysis
 Part 3 Analysis
 Project Summary Questions
Please be sure to type all that you can for this project. If
hand calculations need to be shown please be sure they are clear and legible.
Random
Sample Resources
 For part 1 of the project
I highly recommend using the calendar date generator found on the
random.org website. The address is
.random.org/calendardates/">http://www.random.org/calendardates/
Be sure to include all days of the week when using this
feature (it defaults to just weekdays).
 For parts 2 and 3 you can
use random.org to generate lists of integers. The address is
.random.org/integers/">http://www.random.org/integers/
For the part 2 months, generate two separate lists of at
least 30 values. One that uses digits 112 (for the months) and the other that
uses digits 19652011 (for the years). You can then merge them together for
your month/year combo.
For the part 3 months, use the same process as the part 2
process except use 03 for your winter months..
0 = December, 1 =
January, 2 = February, 3 = March
I highly recommend gathering more than 30 random months for
parts 2 and 3. So if you get a repeat month/year you can ignore it and move to
the next month/year for your sample.
Part 1 Data Sheet
Number

Date

Average Temp


Number

Date

Average Temp

1




31



2




32



3




33



4




34



5




35



6




36



7




37



8




38



9




39



10




40



11




41



12




42



13




43



14




44



15




45



16




46



17




47



18




48



19




49



20




50



21





22





23





24





25





26





27





28





29





30





Part 2 Data Sheet
Binghamton
Syracuse
Number

Month

Total
Precipitation


Number

Month

Total
Precipitation

1




1



2




2



3




3



4




4



5




5



6




6



7




7



8




8



9




9



10




10



11




11



12




12



13




13



14




14



15




15



16




16



17




17



18




18



19




19



20




20



21




21



22




22



23




23



24




24



25




25



26




26



27




27



28




28



29




29



30




30



Part 3 Data Sheet
Number

Month

Total Monthly
Snowfall for Syracuse

Total Monthly
Snowfall for Binghamton

Difference

1





2





3





4





5





6





7





8





9





10





11





12





13





14





15





16





17





18





19





20





21





22





23





24





25





26





27





28





29





30





Finding Data on the
NOAA Website (Part 1)
To find the data required for the project :
 Go
to website location: .erh.noaa.gov/bgm/">www.erh.noaa.gov/bgm/
 In
the left panel under Climate choose Local.
 Above
Observed Weather Reports choose Local Data/Records.
 Under
Climate Data choose Local Records and Averages.
 Above
Local Climate Information choose your city (Binghamton, Scranton or
Syracuse)
 Use
Past Preliminary Climatology Data (CF6). There will be two drop
tabs (one for month and one for year).. Select your month/year and then
choose Get Data
This should get you to the web page with weather
information. Please read Part 2 to learn how to find the data needed on the
page.
There are appears to be two different formats for the
weather data to be displayed when you look up a month:
Example of FORMAT 1.
This is the standard format that most months may be
presented as. The example below is of March 1994 for Binghamton, NY (BGM).
DAY = Day of the month
For Part 1 of the project you will be using the AVG Column
(for the daily Average temperature).
For Part 2 of the project (monthly total precipitation) you
will look at the bottom of the PCPN column to get the monthly precipitation
(cross reference with the Total row). In this example the total monthly
precipitation is 5.06 inches.
For Part 3 of the project (monthly total snowfall) you will
look at the bottom of the SNOW column to get the monthly snowfall (cross
reference with the Total row). In this example the total monthly snowfall is
27.8 inches.
.gif">
Example of FORMAT 2
Format 2 looks much like format 1. When you need to find
information that is used for Parts 1 they exist in the same column just like
format 1. Finding total monthly precipitation and snowfall (for Parts 2 and 3)
becomes a little more involved in format 2. The best way to find it is to move
down the page a little and look for the following (this is an example of March,
1999 for Binghamton):
.jpg">
Note that I circled where you can find the Total Monthly
Precipitation and Snowfall (2.57 inches of precipitation and 33.8 inches of
snow for this month).
Looking over the website it looks like much of the older
information is in Format 1 and the newer information is in Format 2. If you
have any difficulty finding information please do not hesitate to ask.