Classify each of your variables. You can put them into some kind of table as shown in the example

Question # 00792648 Posted By: shortone Updated on: 02/04/2021 07:55 AM Due on: 02/24/2021
Subject Mathematics Topic Algebra Tutorials:
Question
Dot Image

Welcome to your first project discussion/assignment. In this assignment, we will be a) familiarizing ourselves with our project data set, b) classifying our variables (module 1) and c) performing some descriptive statistics (module 2). To get started, download your groups data set from the Project Data Sets page. This discussion board is the place for you to collaborate with your group. This is supposed to be a guided process so the more you communicate here, the more feedback we can give you. You will need to have a minimum of three substantive posts. Only posts in the discussion board will count. You may also link files that I can access to see your contribution to the work (you are responsible for ensuring that I am able to access the file or link). 

The final product that you will be turning into Group Assignment 1 is as follows:

  1. Classify each of your variables. You can put them into some kind of table as shown in the example. If you have a data set with a large number of variables, you only need to classify 15 of them. Make sure that these are variables of interest to you. Do not classify identifier columns (e.g. patient number, case name, addresses, etc). Make sure that you are looking at the actual information stored in your data and not just making an assumption based on the variable name. You MUST INCLUDE THE VARIABLES THAT YOU USE IN PARTS 2 & 3 of this assignment.
    1. Numeric or Categorical
    2. If numeric, whether they are discrete or continuous (put N/A if categorical)
    3. By their scale: Nominal, Ordinal, Interval, Ratio

Example:

Variable Name

Numeric or Categorical

Discrete or Continuous

Scale

State

Categorical

N/A

Nominal

Maximum Annual Temperature

Numeric

Continuous

Interval

  1. Propose three questions that you could attempt to answer using this dataset. Note that this question needs to be answered using the variables that you have access to but doesn't preclude using additional information available on the internet. For example, if you have a dataset that includes city or state, you could look up the population density of each of those states and add that variable yourself. Then your question could be something like: Do dense population states have a higher mean crime per capita than less populous states. To answer this question, you could then create an additional categorical variable based on the densities that you looked up earlier. ***NOTE*** We are NOT answering these questions now. These are the questions you will try to answer in the final paper that you submit at the end of the course. Additional hints and requirements for creating questions:
    1. The question MUST contain the name of at least TWO DIFFERENT VARIABLES. These names don't have to match exactly and should be converted to something self-explanatory. If the variable is "low_birth", I would convert it to "low birth weight", or "SMOKE" to "smoking mothers"
    2. The question MUST contain some term to indicate the statistic used/calculated such as "proportion", "mean", "median" or "correlate" (or correlation as appropriate).
    3. DO NOT make up information that you do not have explicitly. You do not need to be clever with your wording. This is a common mistake...just stick with the basics and the information that is recorded. Using my example data set, a bad question might be something like "do mother's that smoke during their first pregnancy have a lower mean birthweight". Notice that this is similar but I added something about it being a FIRST pregnancy. If you look at my data, I know NOTHING about which pregnancy this is, how many she has had, etc. 
  2. For each question, clearly state the explanatory variable and the response variable (also known as the independent and dependent variables, respectively) necessary to answer that question. You must use a minimum of four different variables between all three questions. For example, you could use the same response or explanatory variable for all three questions, but it would need to be paired with a different variable in each question. Example questions:
    1. Is there a higher proportion of low birth weight babies (low - response variable) in mothers who smoke while pregnant (smoke - explanatory variable) than in those who do not?
    2. Is the mean birthweight (birthweight - response variable) of babies born to mothers over the age of 25 lower than mothers under the age of 25 (Under 25 is a new explanatory variable that I created using the age variable) 
    3. Do the number of physician visits during pregnancy (# visits - explanatory variable) correlate with the birth weight of the baby (birthweight - response variable)
  3. For each question, create a summary table for each level of the explanatory variable. For this, Excel's Pivot Tables and filtering will be useful and allow you to quickly produce these results. If the response variable is numeric and the explanatory variable is categorical, determine the minimum, mean, standard deviation, Q1, median (Q2), Q3, maximum and the size of each level (each different category). If the response variable is categorical, provide the frequency and relative frequency of each level of the explanatory - response pair. For example, if there are two levels in each, then there will be four rows.
    1. Numeric Variable (this combines the levels of the explanatory variable while performing calculations on 

Example: Is the mean birthweight (response variable) of babies born to mothers over the age of 25 greater than mothers under the age of 25 (explanatory)

Explanator Variable level

Mean

St. Dev

Min

Q1

Median

Q2

Max

N

Over 25

2995.8

820.0

709

2410

2984.5

3481

4990

54

Under 25

2924.2

691.5

1330

2414

2977

3475

4593

135

    1. Categorial Variable - (note that the relative frequencies should add up to 1)

Example: Is there a higher proportion of low birth weight babies (low - response variable) in mothers who smoke while pregnant (smoke - explanatory variable) than in those who do not?

Variable - level

Frequency

Relative Frequency

smoke - Low Birth Weight

30

0.159

smoke - Not Low Birth Weight

44

0.233

not smoke - Low Birth Weight

29

0.153

not smoke - Not Low Birth Weight

86

0.455

  1. For each question, produce an appropriate graph. For numeric variables, histograms, box plots or bar charts are always a good place to start. For categorical variables, bar charts can be used. In this chart ONLY plot either frequency, relative frequency, or means. IF you are doing a box plot for a question (another great option), than this will use the 5-number summary (min, Q1, Median, Q3, max). Create a box plot for each level of the explanatory variable.

 

Dot Image
Tutorials for this Question
  1. Tutorial # 00791526 Posted By: shortone Posted on: 02/04/2021 07:56 AM
    Puchased By: 2
    Tutorial Preview
    The solution of Classify each of your variables. You can put them...
    Attachments
    Solution-00791526.zip (80 KB)

Great! We have found the solution of this question!

Whatsapp Lisa