Classify each of your variables. You can put them into some kind of table as shown in the example
Welcome to your first project discussion/assignment. In this assignment, we will be a) familiarizing ourselves with our project data set, b) classifying our variables (module 1) and c) performing some descriptive statistics (module 2). To get started, download your groups data set from the Project Data Sets page. This discussion board is the place for you to collaborate with your group. This is supposed to be a guided process so the more you communicate here, the more feedback we can give you. You will need to have a minimum of three substantive posts. Only posts in the discussion board will count. You may also link files that I can access to see your contribution to the work (you are responsible for ensuring that I am able to access the file or link).
The final product that you will be turning into Group Assignment 1 is as follows:
 Classify each of your variables. You can put them into some kind of table as shown in the example. If you have a data set with a large number of variables, you only need to classify 15 of them. Make sure that these are variables of interest to you. Do not classify identifier columns (e.g. patient number, case name, addresses, etc). Make sure that you are looking at the actual information stored in your data and not just making an assumption based on the variable name. You MUST INCLUDE THE VARIABLES THAT YOU USE IN PARTS 2 & 3 of this assignment.

 Numeric or Categorical
 If numeric, whether they are discrete or continuous (put N/A if categorical)
 By their scale: Nominal, Ordinal, Interval, Ratio
Example:
Variable Name 
Numeric or Categorical 
Discrete or Continuous 
Scale 
State 
Categorical 
N/A 
Nominal 
Maximum Annual Temperature 
Numeric 
Continuous 
Interval 
 Propose three questions that you could attempt to answer using this dataset. Note that this question needs to be answered using the variables that you have access to but doesn't preclude using additional information available on the internet. For example, if you have a dataset that includes city or state, you could look up the population density of each of those states and add that variable yourself. Then your question could be something like: Do dense population states have a higher mean crime per capita than less populous states. To answer this question, you could then create an additional categorical variable based on the densities that you looked up earlier. ***NOTE*** We are NOT answering these questions now. These are the questions you will try to answer in the final paper that you submit at the end of the course. Additional hints and requirements for creating questions:
 The question MUST contain the name of at least TWO DIFFERENT VARIABLES. These names don't have to match exactly and should be converted to something selfexplanatory. If the variable is "low_birth", I would convert it to "low birth weight", or "SMOKE" to "smoking mothers"
 The question MUST contain some term to indicate the statistic used/calculated such as "proportion", "mean", "median" or "correlate" (or correlation as appropriate).
 DO NOT make up information that you do not have explicitly. You do not need to be clever with your wording. This is a common mistake...just stick with the basics and the information that is recorded. Using my example data set, a bad question might be something like "do mother's that smoke during their first pregnancy have a lower mean birthweight". Notice that this is similar but I added something about it being a FIRST pregnancy. If you look at my data, I know NOTHING about which pregnancy this is, how many she has had, etc.
 For each question, clearly state the explanatory variable and the response variable (also known as the independent and dependent variables, respectively) necessary to answer that question. You must use a minimum of four different variables between all three questions. For example, you could use the same response or explanatory variable for all three questions, but it would need to be paired with a different variable in each question. Example questions:
 Is there a higher proportion of low birth weight babies (low  response variable) in mothers who smoke while pregnant (smoke  explanatory variable) than in those who do not?
 Is the mean birthweight (birthweight  response variable) of babies born to mothers over the age of 25 lower than mothers under the age of 25 (Under 25 is a new explanatory variable that I created using the age variable)
 Do the number of physician visits during pregnancy (# visits  explanatory variable) correlate with the birth weight of the baby (birthweight  response variable)
 For each question, create a summary table for each level of the explanatory variable. For this, Excel's Pivot Tables and filtering will be useful and allow you to quickly produce these results. If the response variable is numeric and the explanatory variable is categorical, determine the minimum, mean, standard deviation, Q1, median (Q2), Q3, maximum and the size of each level (each different category). If the response variable is categorical, provide the frequency and relative frequency of each level of the explanatory  response pair. For example, if there are two levels in each, then there will be four rows.
 Numeric Variable (this combines the levels of the explanatory variable while performing calculations on
Example: Is the mean birthweight (response variable) of babies born to mothers over the age of 25 greater than mothers under the age of 25 (explanatory)
Explanator Variable level 
Mean 
St. Dev 
Min 
Q1 
Median 
Q2 
Max 
N 
Over 25 
2995.8 
820.0 
709 
2410 
2984.5 
3481 
4990 
54 
Under 25 
2924.2 
691.5 
1330 
2414 
2977 
3475 
4593 
135 

 Categorial Variable  (note that the relative frequencies should add up to 1)
Example: Is there a higher proportion of low birth weight babies (low  response variable) in mothers who smoke while pregnant (smoke  explanatory variable) than in those who do not?
Variable  level 
Frequency 
Relative Frequency 
smoke  Low Birth Weight 
30 
0.159 
smoke  Not Low Birth Weight 
44 
0.233 
not smoke  Low Birth Weight 
29 
0.153 
not smoke  Not Low Birth Weight 
86 
0.455 
 For each question, produce an appropriate graph. For numeric variables, histograms, box plots or bar charts are always a good place to start. For categorical variables, bar charts can be used. In this chart ONLY plot either frequency, relative frequency, or means. IF you are doing a box plot for a question (another great option), than this will use the 5number summary (min, Q1, Median, Q3, max). Create a box plot for each level of the explanatory variable.

Rating:
5/
Solution: Classify each of your variables. You can put them