Residual Analysis#

TODO

Instructions#

  1. Download the csv dataset in the Data Set section and place it in the Linux Files folder on your folder system where you save your .py scripts.

  2. Create a Python .py script named LASTNAME_FIRSTNAME_project_eight.py in your Linux Files folder on your file system. You can do this by opening an IDLE session, creating a new file and then saving it. Replace LASTNAME and FIRSTNAME with your last and first name, respectively.

  3. Create a docstring at the very top of the script file. Keep all written answers in this area of the script.

  4. Read the Background section.

  5. Read the Loading In Data section.

  6. Load in the data from the .csv file using the technique outlined in the Loading In Data section.

  7. Perform all exercises and answer all questions in the Project section. Label your script with comments as indicated in the instructions of each problem.

  8. When you are done,zip your script and the csv file in a zip file named LASTNAME_FIRSTNAME_project_eight.zip

  9. Upload the zip file to the Google Classroom Project Four Assignment.

Loading In Data#

The following code snippet will load in a CSV spreadsheet named example.csv, parse it into a list and then print it to screen, assuming that CSV file is saved in the same folder as your script. Modify this code snippet to fit the datasets in this lab and then use it to load in the provided datasets in Datasets section.

import csv

# read in data
with open('example.csv') as csv_file:
    csv_reader = csv.reader(csv_file)
    raw_data = [ row for row in csv_reader ]

# separate headers from data
headers = raw_data[0]
columns = raw_data[1:]

# grab first column from csv file and ensure it's a number (not a string)
column_1 = [ float(row[0]) for row in columns ]

print(column_1)

Background#

TODO

Old Faithful#

TODO

Kentucky Derby#

TODO

Project#

TODO

RESIDUAL ANALYSIS!

Data Set#

Celebrity Twitter#

You can download the full dataset here

The following table is a preview of the data you will be using for this project.

Celebrity Twitter Followers and Tweet Count#

twitter_username

twitter_userid

domain

name

followers_count

tweet_count

BarackObama

813286

obamabook.com

BarackObama

13444655

16467

justinbieber

27260086

smarturl.it

Justin Bieber

114357427

31399

katyperry

21447363

katyperry.com

KATY PERRY

108900656

11625

rihanna

79293791

rihannanow.com

Rihanna

106201663

10630

Cristiano

155659213

Cristiano Ronaldo

99274403

3780

taylorswift13

17919972

grmypro.co

Taylor Swift

90373941

716

ladygaga

14230524

The Countess

84576292

9744

elonmusk

44196397

Elon Musk

82898543

17487

TheEllenShow

15846407

ellentube.com

Ellen DeGeneres

77595645

23819

The fifth column represents the number of followers for a given Twitter user. The sixth column represents the number of tweets for a given Twitter user.

Old Faithful#

You can download the full dataset here.

The following table is a preview of the data you will be using for this project.

Old Faithful Eruption and Waiting Times#

eruptions

waiting

3.6

79

1.8

54

3.333

74

2.283

62

4.533

85

2.883

55

4.7

88

3.6

85

1.95

51

4.35

85

1.833

54

3.917

84

4.2

78

1.75

47

4.7

83

2.167

52

The first column represents the length of the eruption in minutes. The second column represents the waiting time in minutes until the next eruption.

Kentucky Derby Winning Times#

You can download the full dataset here.

The following table is the a preview of the data you will be using for this project.

Kentucky Derby Winning Times#

year

winner

jockey

trainer

owner

distance

track_condition

time_string

time_sec

triple_crown_winner

2022

Rich Strike

Sonny Leon

Eric Reed

RED TR-Racing

1.25

Fast

2:02.61

122.61

FALSE

2021

Mandaloun

Florent Geroux

Brad H. Cox

Juddmonte Farm

1.25

Fast

2:01.02

121.02

FALSE

2020

Authentic

John Velazquez

Bob Baffert

Spendthrift Farm LLC, MyRaceHorse Stable, Madaket Stables LLC, Starlight Racing

1.25

Fast

2:00.61

120.61

FALSE

The first column represents the year of the race. The ninth column represents the winning time in seconds.