Graphical Representations of Data#

Electric Vehicle Distribution#

The United States Government General Services Administration maintains a huge database of public available information. One of the datasets they publish is the Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs) that are currently registered through Washington State Department of Licensing

A random sample of 18 eletric vehicles was selected from this dataset and the make of the car was recorded. The resuls are given below,

BMW, FORD, CHEVROLET, TESLA, FORD, BMW, VOLVO, TESLA, TESLA, NISSAN, CHEVROLET, BMW, TESLA, NISSAN, CHEVROLET, TESLA, TESLA, CHEVROLET

  1. What is the population being sampled in this experiment?

  2. Construct a relative frequency distribution and histogram for this sample of data. Label the axes. What is the most common (frequent) electric vehicle make in this sample of data?

  3. Based on this sample, can we conclude your answer to #2 is the most common eletric vehicle make in the United States? Why or why not?

Note

We will come back and examine this dataset again at various points in the class.

Twitter by Tweets and Followers#

Important

The data will be downloaded directly onto the your calculator in class via a cable link.

If you want to try out the dataset in Python at any point, you can download the full dataset here

The following table lists the top 20 Twitter users by follower count (as of 2022).

Source: Kaggle

Twitter Username

Name

Followers Count

Tweet Count

BarackObama

Barack Obama

13444655

16467

justinbieber

Justin Bieber

114357427

31399

katyperry

KATY PERRY

108900656

11625

rihanna

Rihanna

106201663

10630

Cristiano

Cristiano Ronaldo

99274403

3780

taylorswift13

Taylor Swift

90373941

716

ladygaga

The Countess

84576292

9744

elonmusk

Elon Musk

82898543

17487

TheEllenShow

Ellen DeGeneres

77595645

23819

KimKardashian

Kim Kardashian

72170641

35405

selenagomez

Selena Gomez

65791273

5388

jtimberlake

Justin Timberlake

63231222

4348

BillGates

Bill Gates

58128522

3848

neymarjr

Neymar Jr

56508547

42181

britneyspears

Britney Spears

55863848

5632

ddlovato

Demi Lovato

54260759

17793

shakira

Shakira

52803100

7805

jimmyfallon

jimmy fallon

51467139

14098

KingJames

LeBron James

51182528

8347

imVkohli

Virat Kohli

47768699

2593

Use this information to answer the following questions.

  1. Construct a frequency distribution for the follower count of the top twenty Twitter users. Use 8 classes for this sample of data.

    1. Find the class width.

    2. Find the class limits.

    3. Find the class frequencies.

    4. Using the frequency distribution, create a histogram of the sample. Sketch the histogram in your classwork notebook. Make sure to label the axes and classes.

    5. What is the shape of this distribution? What does this tell you about this sample?

  2. Construct a cumulative frequency distribution for the 8 classes.

    1. Using the cumulative frequency distribution, create a histogram of the sample. Sketch the cumulative histogram in your classwork notebook. Make sure to label the axes and classes.

    2. Based on this histogram and distribution, what percent of the distribution is below the seventh class? Justify your answer.

    3. Based on this histogram and distribution, estimate the median of this distribution. Justify your answer.

    4. Based on this histogram and distribution, estimate the 25 th percentile of this distribution, i.e. find the class where at least 25 percent of the distribution is below it.

  3. Construct a boxplot using the follower count for the top twenty Twitter users.

    1. Sketch the boxplot in your classwork notebook. Be sure to label the axes. In addition, label the points on the graph that correspond to the five-number summary: minimum, 1 th quartile, median, 2 nd quartile, maximum.

    2. Based on this boxplot, what is the 75 th percentile of this distribution?

    3. Based on this box plot, what is the 25 th percentile of this distribution?

    4. Compare your answers in part a and b with your answers to #2b and #2d. Why are these answers different? Which answer is more accurate? Why?

    5. Is the median of this distribution closer to the minimum value of the sample or the maximum value of the sample?

  4. Using follower count as the response variable and tweet count as the indicator variable, create a scatter plot of follower count vs. tweet count.

    1. Sketch the scatter plot in your classwork notebook. Label your axes.

    2. Describe the type of correlation seen in this graph. What does this tell you about the relationship between follower count and tweet count?

A.P. Exam Practice#

  1. 2021, Free Response, #5a-b

A research center conducted a national survey about teenage behavior. Teens were asked whether they had consumed a soft drink in the past week. The following table shows the counts for three independent random samples from major cities.

Baltimore

Detroit

San Diego

Total

Yes

727

1,232

1,482

3,441

No

177

431

798

1,406

Total

904

1,663

2,280

4,847

  1. Suppose one teen is randomly selected from each city’s sample. A researcher claims that the likelihood of selecting a teen from Baltimore who consumed a soft drink in the past week is less than the likelihood of selecting a teen from either one of the other cities who consumed a soft drink in the past week because Baltimore has the least number of teens who consumed a soft drink. Is the researcher’s claim correct? Explain your answer.

  2. Consider the values in the table. Construct a segmented bar chart of relative frequencies based on the information in the table.

  3. Which city had the smallest proportion of teens who consumed a soft drink in the previous week? Determine the value of the proportion.

  1. 2021, Free Response #6

Attendance at games for a certain baseball team is being investigated by the team owner. The following boxplots summarize the attendance, measured as average number of attendees per game, for 47 years of the team’s existence. The boxplots include the 30 years of games played in the old stadium and the 17 years played in the new stadium.

../../../_images/2021_apstats_frp_06a.png

Use this graph to answer the following question,

  1. Compare the distributions of average attendance between the old and new stadiums.

The following scatterplot shows average attendance versus year,

../../../_images/2021_apstats_frp_06b.png

Use this graph to answer the following question,

  1. Compare the trends in average attendance over time between the old and new stadium.

Consider the following scatterplots,

../../../_images/2021_apstats_frp_06c.png

Use these graphs to answer the following questions,

  1. Graph I shows the average attendance versus number of games won for each year. Describe the relationship between the variables.

  2. Graph II shows the same information as Graph I, but also indicates the old and new stadiums. Does Graph II suggest that the rate at which attendance changes as number of games won increases is different in the new stadium compared to the old stadium? Explain your reasoning.

  3. Consider the three variables: number of games won, year, and stadium. Based on the graphs, explain how one of those variables could be a confounding variable in the relationship between average attendance and the other variables.

  1. 2018, Free Response #5a-b

The following histograms summarize the teaching year for the teachers at two high schools, A and B.

../../../_images/2018_apstats_frp_05.png

Teaching year is recorded as an integer, with first-year teachers recorded as 1, second-year teachers recorded as 2, and so on. Both sets of data have a mean teaching year of 8.2, with data recorded from 200 teachers at High School A and 221 teachers at High School B. On the histograms, each interval represents possible integer values from the left endpoint up to but not including the right endpoint.

  1. The median teaching year for one high school is 6, and the median teaching year for the other high school is 7. Identify which high school has each median and justify your answer.

  2. An additional 18 teachers were not included with the data recorded from the 200 teachers at High School A. The mean teaching year of the 18 teachers is 2.5. What is the mean teaching year for all 218 teachers at High School A?

  1. 2017, Free Response #4

The chemicals in clay used to make pottery can differ depending on the geographical region where the clay originated. Sometimes, archaeologists use a chemical analysis of clay to help identify where a piece of pottery originated. Such an analysis measures the amount of a chemical in the clay as a percent of the total weight of the piece of pottery. The boxplots below summarize analyses done for three chemicals—X, Y, and Z—on pieces of pottery that originated at one of three sites: I, II, or III.

../../../_images/2017_apstats_frp_04.png

Use this information to answer the following questions.

  1. For chemical Z, describe how the percents found in the pieces of pottery are similar and how they differ among the three sites.

  2. Consider a piece of pottery known to have originated at one of the three sites, but the actual site is not known.

    1. Suppose an analysis of the clay reveals that the sum of the percents of the three chemicals X, Y, and Z is 20.5%. Based on the boxplots, which site—I, II, or III—is the most likely site where the piece of pottery originated? Justify your choice.

    2. Suppose only one chemical could be analyzed in the piece of pottery. Which chemical—X, Y, or Z— would be the most useful in identifying the site where the piece of pottery originated? Justify your choice.

  1. 2017, Free Response #1

Researchers studying a pack of gray wolves in North America collected data on the length x, in meters, from nose to tip of tail, and the weight y, in kilograms, of the wolves. A scatterplot of weight versus length revealed a relationship between the two variables described as positive, linear, and strong.

  1. For the situation described above, explain what is meant by each of the following words.

    1. Positive:

    2. Linear:

    3. Strong:

The data collected from the wolves were used to create the least-squares equation y = -16.46 + 35.02 x.

  1. Interpret the meaning of the slope of the least-squares regression line in context.

c. One wolf in the pack with a length of 1.4 meters had a residual of -9.67 kilograms. What was the weight of the wolf?

  1. 2014, Free Response, #1

An administrator at a large university is interested in determining whether the residential status of a student is associated with level of participation in extracurricular activities. Residential status is categorized as on campus for students living in university housing and off campus otherwise. A simple random sample of 100 students in the university was taken, and each student was asked the following two questions.

  • Are you an on campus student or an off campus student?

  • In how many extracurricular activities do you participate?

The responses of the 100 students are summarized in the frequency table shown.

../../../_images/2014_apstats_frp_01a.png
  1. Calculate the proportion of on campus students in the sample who participate in at least one extracurricular activity and the proportion of off campus students in the sample who participate in at least one extracurricular activity.

    On campus proportion:

    Off campus proportion:

  2. The responses of the 100 students are summarized in the segmented bar graph shown.

../../../_images/2014_apstats_frp_01b.png

Write a few sentences summarizing what the graph reveals about the association between residential status and level of participation in extracurricular activities among the 100 students in the sample.

  1. 2015, Free Response, #1

Two large corporations, A and B, hire many new college graduates as accountants at entry-level positions. In 2009 the starting salary for an entry-level accountant position was $36,000 a year at both corporations. At each corporation, data were collected from 30 employees who were hired in 2009 as entry-level accountants and were still employed at the corporation five years later. The yearly salaries of the 60 employees in 2014 are summarized in the boxplots below.

../../../_images/2015_apstats_frp_1.png
  1. Write a few sentences comparing the distributions of the yearly salaries at the two corporations.

  2. Suppose both corporations offered you a job for $36,000 a year as an entry-level accountant. Based on the boxplots, give one reason why you might choose to accept the job at corporation A.

  3. Suppose both corporations offered you a job for $36,000 a year as an entry-level accountant. Based on the boxplots, give one reason why you might choose to accept the job at corporation B.

  1. 2016, Free Response, #2

Product advertisers studied the effects of television ads on children’s choices for two new snacks. The advertisers used two 30-second television ads in an experiment. One ad was for a new sugary snack called Choco-Zuties, and the other ad was for a new healthy snack called Apple-Zuties.

For the experiment, 75 children were randomly assigned to one of three groups, A, B, or C. Each child individually watched a 30-minute television program that was interrupted for 5 minutes of advertising. The advertising was the same for each group with the following exceptions.

  • The advertising for group A included the Choco-Zuties ad but not the Apple-Zuties ad.

  • The advertising for group B included the Apple-Zuties ad but not the Choco-Zuties ad.

  • The advertising for group C included neither the Choco-Zuties ad nor the Apple-Zuties ad.

After the program, the children were offered a choice between the two snacks. The table below summarizes their choices.

../../../_images/2016_apstats_frp_02.png
  1. Do the data provide convincing statistical evidence that there is an association between type of ad and children’s choice of snack among all children similar to those who participated in the experiment?

  2. Write a few sentences describing the effect of each ad on children’s choice of snack.

  1. 2009, Free Response Form B, #1

../../../_images/2009_apstats_frp_formb_01.png

As gasoline prices have increased in recent years, many drivers have expressed concern about the taxes they pay on gasoline for their cars. In the United States, gasoline taxes are imposed by both the federal government and by individual states. The boxplot above shows the distribution of the state gasoline taxes, in cents per gallon, for all 50 states on January 1, 2006.

  1. Based on the boxplot, what are the approximate values of the median and the interquartile range of the distribution of state gasoline taxes, in cents per gallon? Mark and label the boxplot to indicate how you found the approximated values.

  2. The federal tax imposed on gasoline was 18.4 cents per gallon at the time the state taxes were in effect. The federal gasoline tax was added to the state gasoline tax for each state to create a new distribution of combined gasoline taxes. What are approximate values, in cents per gallon, of the median and interquartile range of the new distribution of combined gasoline taxes? Justify your answer.