|
|
|
Step-By-Step Guide
Resources
Step-By-Step Guide
Stats Task 2.1: Measures of Central Tendency
For this task, you need to respond to Roger Snow’s request to determine which statistic(s) he should report for the average BMI for his obesity study.
The steps below will walk you through the process.
- Before you begin your work, find a peer to work with for this task. Although each of you will be responsible for submitting your own work, you may work with one another to better analyze the question and come up with the important components of your response. You may also partner up with more than one person or as otherwise advised by your mentor.
- Review the email and make sure you understand what question you are being asked to answer in this task.
- Review the Resources (above) to do additional research on BMI, measures of central tendency, variation, and other statistical terms you will need to be familiar with to answer the questions. At this point, just skim the resources page so that you know what is available to you as you work through the rest of this task.
Determine what the mean, median and mode tell you about data.
Open the Household Income for Rose Park Example in the Resources section (above). This working example is a resource you will use with your peers to review the key statistical concepts so you can tell Robert how to determine which measure(s) to use in his study.
Note: There are two worksheets in this file: the first “Data and Statistics” sheet includes all of the data in the sample; the second “Chart” sheet is the graph showing you another view of the data. (To view each worksheet in the spreadsheet, click on the appropriate tab in the bottom left of the window)
The purpose in looking at this example is so you have an opportunity to play around with real data in a sample, which will hopefully make it easier for you to understand the issues Roger must consider when making the decision about which statistic(s) to use and why. Your goal in the next few steps is to become more familiar with the differences and similarities between mean, median and mode and to determine when to use each separately or in combination with one another to summarize a group of data.
Tip: It may be helpful to save two copies of this file (e.g., one called “Modified Rose Park”) so as changes are made to the data to test the different concepts, comparisons to the original version can still be made.
- Look at the first sheet (data and statistics) to view the data for this sample.
- Column A is the identification number assigned to each house in the sample. These were randomly assigned in the data collection process instead of using the real addresses. This is done to help ensure that each individual’s information is kept confidential (“#15 earns $77,000 per year” does not tell a reader which household is being discussed whereas “234 Fiction Drive earns $77,000 per year” does).
- Column B is the annual income for each household.
- In the upper right corner of the spreadsheet, you will see the ”Statistical Calculations” for the 100 households in this study. These statistics are “live” and will change as you change the numbers within the “Household Income” column.
- Look at the statistics for mean, median and mode. Notice how they are different and similar. Refer back to the Resources section for more details about what these quantities are and what they tell you.
- The right column (below the pink heading) contains the number of households that have income in each income group (grouped in increments of $5,000) and is used to generate the graph in the “Chart” worksheet. The data is “live,” reflecting the data in the “Household Income” column and will update accordingly. Note: the data in this section contains an array formula and cannot be changed without using advanced Spreadsheet features.
- Look at the “Chart” worksheet (graph).
- This graph is a visual representation of the data, which in this case, is household income. It shows you how many households have incomes within each income group (grouped in increments of $5,000). This shows you where the data points fall (called the distribution) and how close or far apart the data points are to one another (called the variation). Note: the income groups are rounded up to the next 5k increment (e.g., 30,001 would be part of the 35,000 group).
- Compare and contrast the mean, median, and mode. Completing the next few steps will help highlight the similarities and differences between the mean, median, and mode so you can determine how they are helpful in summarizing data. You will use the Example as a case study, playing with the data to see how it changes, which will assist you in then answering Roger’s specific questions about the BMI study.
To get a better idea of what the mean tells you about the sample, imagine you are looking at the data for the first time.
- Refer back to the Resources section to review the definition of mean.
- Now look at the graph. Can you guess what the mean is, based on the graph alone?
- Look at the lowest value (“minimum”) and the highest value (“maximum”).
- If you were to average the two numbers together, what would the number be?
- Would that number accurately reflect the average household income of Rose Park? Why or why not?
- Are there a lot of values clustered together on the right side of the chart (right skewed) or on the left (left skewed)?
- If the chart is right skewed, increase the value of the estimated mean you calculated in the previous step.
- If the chart is skewed to the left, decrease the value of your estimated mean.
- If the values are clustered in the middle and gradually get fewer as you go to the right or left (more of a “normal distribution”) then your original estimate will be more accurate.
- Are there any outliers in your study? The greater the number of outliers, the less likely the mean will reflect the average. A researcher may look at outliers more closely to determine whether or not the data point is incorrect (the wrong value was entered into the statistics program, the data comes from a questionable source, or forother reasons is suspicious). In some studies, if there is reason to suspect error, outliers are discarded and not included in statistical calculations. (A researcher should inlude any criteria they used to exclude data points from calculations in their report, whether an outlier or another data point).
- Now look back at the data and statistics sheet. How close was your guess to the actual mean? If your guess was quite different from the actual mean, why do you think it was?
- Experiment with the data to see if you can make the mean closer to $50,000. Note: You may have to make numerous changes in order affect the data significantly.
Tip: Spending some time manipulating the data to see how different changes affect the statistical calculations will help develop a better understanding of the statistical concepts.
Tip: By using a spreadsheet’s “Sort” feature to organize the data by the “Household Income” in column B, it is easier to make changes to specific incomes. Make sure to only sort columns A and B; doing so will keep the “Home ID” and the income matched properly.
- What do you have to do to make the mean decrease? Are the changes you have to make small or large?
- As you decrease the mean, what happens to the median and the mode?
- What happens to the graph?
- Is it more or less skewed than before?
- Think about whether or not the mean is an accurate estimate of “average” in the Household Income for Rose Park Example.
- What does the mean tell you about the data?
- What does the mean not tell you about the data?
- Are there any outliers? If so, how they impact the mean? What happens to the mean if you add, remove or change the outliers?
- Why is the mean helpful or not helpful as a description of the “average” in his study?
- Now think about whether or not the mean is an accurate estimate of “average” in the case of Roger’s study.
- How is Roger’s study similar to and different from the Example?
- What does the mean tell you in Roger’s case?
- What does the mean not tell you?
- What factors would make the mean helpful or not helpful as a description of the “average” in his study?
- Now, look at another measure of central tendency - the mode. To get a better idea of what the mode might tell you, imagine you are looking at the data for the first time.
- Refer back to the Resources section to review the definition of the mode.
- Now look at the graph. Can you guess what the mode is, based on the graph alone?
- Now look back at the data and statistics sheet. How close was your guess to the actual mode? If your guess was quite different from the actual mode, why do you think it was?
- Remember that in this case, the graph is based on household incomes grouped in increments of $5,000 while the mode is based on the raw data, so the graph may or may not be helpful in determining the mode. Your guess would have likely been more precise if you could see the exact income of each household instead of just how many households fell into each income group.
- Depending on how data in a study is calculated and how much data points can vary, researchers may group the raw data and calculate the mode based on the groups rather than on the raw data. For example, if a researcher gathered data on income from tax documents they might have information down to the dollar and input that as the raw data (the Example incomes raw data are in thousand dollar increments). That data could vary much more than in the Example (1000x more). Calculating the mode on on a small group who had the the following incomes {20001, 20002, 20003, 20004, 20000, 30000, 30000} would return $30,000 for the raw data, whereas the mode for groups to the nearest $1000 would be $20,000.
- Think about whether or not the mode is an accurate estimate of “average” in the Household Income for Rose Park Example.
- What does the mode tell you in this case?
- What does the mode not tell you?
- Are there any outliers? If so, how they impact the mode? How would adding, removing, or changing the outliers affect the mode?
- Why is the mode helpful or not helpful as a description of the “average” in the income study?
- Now think about whether or not the mode is an accurate estimate of “average” in the case of Roger’s study.
- How may Roger’s study be similar to and different from the Example?
- What does the mode tell you in Roger’s case?
- What does the mode not tell you?
- What factors would make the mode helpful or not helpful as a description of the “average” in his study?
- Finally, look at the last measure of central tendency – the median. To get a better idea of what the median tells you, imagine you are looking at the data for the first time.
- Refer back to the Resources section to review the definition of the median.
- Now look at the graph. Can you guess what the median is, based on the graph alone?
- Look at the lowest value (“minimum”) and the highest value (“maximum”). Now determine approximately for which value would have half the numbers above and half the numbers below it. Would that number accurately reflect the average household income of Rose Park? Why or why not?
- Are there a lot of values clustered together on the right side of the chart (right skewed) or on the left side of the chart (left skewed)?
- If the chart is right skewed, the median should fall more towards the right side of the chart.
- If the chart is skewed to the left, the median should fall more towards the left side of the chart.
- If the values are clustered in the middle and gradually decrease as you go to the right or left (more of a “normal distribution”) then the median should also fall in the middle of the chart.
- Now look back at the data and statistics sheet. How close was your guess to the actual median? If your guess was quite different from the actual median, why do you think it was?
- Experiment with the data and change the first 4 numbers to be $20,000 and the next 5 numbers to be $35,000. (Note: You may need to refer back to the original version in order for these changes to have a noticeable affect.)
- How does that change the median?
- How much did it change the median?
- Why does it change the median?
- What do you have to do to make the median increase? Decrease?
- When you changed the data, how did the mean and mode change? Why did each one change?
- What happens to the graph as you change the data and what does it tell you?
- Does the shape change? Is it more or less skewed than before?
- Think about whether or not the median is an accurate estimate of “average” in the Household Income for Rose Park Example.
- What does the median tell you in this case?
- What does the median not tell you?
- Are there any outliers? If so, how do they impact the median? How would adding, removing, or changing the outliers affect the median?
- Why is the median helpful or not helpful as a description of the “average” in his study?
- Now that you have experimented with the data in the Example, think about whether or not the median is an accurate estimate of “average” in the case of Roger’s study.
- How may Roger’s study be similar to and different from the Example?
- What does the median tell you in Roger’s case?
- What does the median not tell you?
- What factors would make the median helpful or not helpful as a description of the “average” in his study?
- Think back on all the changes you made to the data and how different changes affected the mean, median, and mode.
- Which factors affected each calculation the most? The least?
- Where there factors that affected more than one calculation?
- How were the affects you noticed while playing with the data related to factors you may have read about in other sources that affect each of the statistics?If, in your research, you came across factors that affect the mean, median, or mode that you didn’t notice while working through the Example go back and play around with the numbers to see the effect. Doing so will help make the concepts more concrete in your mind.
Draft an Email Response
- On your own, draft an email response explaining to Roger how he would determine which statistic(s) to use to report out the “average” in his study.
- Make sure to identify which factors are important to consider when making the decision about which measure(s) of central tendency to report. Equally important, think about what important information is not included in the measure(s) you selected. If the reader only had the statistics you suggested, what information would they be missing?
- One important way to explain a statistical concept is to provide an example (such as the Household Income for Rose Park Example provided here). This helps people understand abstract concepts in the context of something they are more familiar with. Think about a similar example that you could use to explain the differences between mean, median and mode to Roger. You may use either the Household Income example or another example of your choosing, or a combination of the two to include in your response.
- Make sure your email response:
- Is clear and concise – it should be easy to understand and not include any information that would be distracting to the reader.
- Answers the question – “how should I determine which of these calculations to use and why?”
- Includes an example(s) that illustrates the differences between mean, median and mode, noting when and how you might use each to report the “average” The example will help illustrate what each of the measures can and cannot tell you about the data.
- Conduct a Peer Review.
- Exchange the email response you wrote with your peer.
- Review one another’s email response to determine if the email meets the requirements listed above. If an email response does not meet all of the requirements, help one another identify how it can be modified to be more clear and/or complete.
- While it is appropriate to work together on this task, make each of your responses unique, reflecting your own work and thinking.
- Submit your individual response to your mentor. Review the checklist located in the Submit Your Work section of this task before submitting your response to your mentor.
Resources
Stats Task 2.1 Resources
The resources below will help you get started on this task. You use other resources that you've used in this or other rotations, or to do additional research to help clarify concepts or to gain a deeper understanding of the subject matter. View the General Skills Resources link on the left for more information on research including evaluating web resources.
Household Income for Rose Park Example
A spreadsheet with data and a graph that is used in the step-by-step guide of this task.
Mean, Median, Mode and Range
This overview provides in-depth descriptions of these statistical terms.
Measures of Central Tendency
More definitions of Central Tendency as well as a Glossary of other terms used in statistics.
Outlier Definition
A resource from Wikipedia on the outliers, including a definition and how some discussion on how outliers can affect the mean.
Examples of Skewed Graphs
Visual examples of skewed graphs with commentary including short descriptions on the relation of the mean and median in skewed graphs.
Population vs. Sample
This resource provides examples of populations vs. samples and explains the importance of random sampling.
Sources of Bias
Descriptions of different bias in sampling including examples of each.
|
|
|
|