Residual Analysis#
TODO
Instructions#
Download the csv dataset in the Data Set section and place it in the
Linux Files
folder on your folder system where you save your.py
scripts.Create a Python
.py
script namedLASTNAME_FIRSTNAME_project_eight.py
in yourLinux Files
folder on your file system. You can do this by opening an IDLE session, creating a new file and then saving it. ReplaceLASTNAME
andFIRSTNAME
with your last and first name, respectively.Create a docstring at the very top of the script file. Keep all written answers in this area of the script.
Read the Background section.
Read the Loading In Data section.
Load in the data from the
.csv
file using the technique outlined in the Loading In Data section.Perform all exercises and answer all questions in the Project section. Label your script with comments as indicated in the instructions of each problem.
When you are done,zip your script and the csv file in a zip file named
LASTNAME_FIRSTNAME_project_eight.zip
Upload the zip file to the Google Classroom Project Four Assignment.
Loading In Data#
The following code snippet will load in a CSV spreadsheet named example.csv
, parse it into a list and then print it to screen, assuming that CSV file is saved in the same folder as your script. Modify this code snippet to fit the datasets in this lab and then use it to load in the provided datasets in Datasets section.
import csv
# read in data
with open('example.csv') as csv_file:
csv_reader = csv.reader(csv_file)
raw_data = [ row for row in csv_reader ]
# separate headers from data
headers = raw_data[0]
columns = raw_data[1:]
# grab first column from csv file and ensure it's a number (not a string)
column_1 = [ float(row[0]) for row in columns ]
print(column_1)
Background#
TODO
Old Faithful#
TODO
Kentucky Derby#
TODO
Project#
TODO
RESIDUAL ANALYSIS!
Data Set#
Celebrity Twitter#
You can download the full dataset here
The following table is a preview of the data you will be using for this project.
twitter_username |
twitter_userid |
domain |
name |
followers_count |
tweet_count |
BarackObama |
813286 |
obamabook.com |
BarackObama |
13444655 |
16467 |
justinbieber |
27260086 |
smarturl.it |
Justin Bieber |
114357427 |
31399 |
katyperry |
21447363 |
katyperry.com |
KATY PERRY |
108900656 |
11625 |
rihanna |
79293791 |
rihannanow.com |
Rihanna |
106201663 |
10630 |
Cristiano |
155659213 |
Cristiano Ronaldo |
99274403 |
3780 |
|
taylorswift13 |
17919972 |
grmypro.co |
Taylor Swift |
90373941 |
716 |
ladygaga |
14230524 |
The Countess |
84576292 |
9744 |
|
elonmusk |
44196397 |
Elon Musk |
82898543 |
17487 |
|
TheEllenShow |
15846407 |
ellentube.com |
Ellen DeGeneres |
77595645 |
23819 |
The fifth column represents the number of followers for a given Twitter user. The sixth column represents the number of tweets for a given Twitter user.
Old Faithful#
You can download the full dataset here
.
The following table is a preview of the data you will be using for this project.
eruptions |
waiting |
3.6 |
79 |
1.8 |
54 |
3.333 |
74 |
2.283 |
62 |
4.533 |
85 |
2.883 |
55 |
4.7 |
88 |
3.6 |
85 |
1.95 |
51 |
4.35 |
85 |
1.833 |
54 |
3.917 |
84 |
4.2 |
78 |
1.75 |
47 |
4.7 |
83 |
2.167 |
52 |
The first column represents the length of the eruption in minutes. The second column represents the waiting time in minutes until the next eruption.
Kentucky Derby Winning Times#
You can download the full dataset here
.
The following table is the a preview of the data you will be using for this project.
year |
winner |
jockey |
trainer |
owner |
distance |
track_condition |
time_string |
time_sec |
triple_crown_winner |
2022 |
Rich Strike |
Sonny Leon |
Eric Reed |
RED TR-Racing |
1.25 |
Fast |
2:02.61 |
122.61 |
FALSE |
2021 |
Mandaloun |
Florent Geroux |
Brad H. Cox |
Juddmonte Farm |
1.25 |
Fast |
2:01.02 |
121.02 |
FALSE |
2020 |
Authentic |
John Velazquez |
Bob Baffert |
Spendthrift Farm LLC, MyRaceHorse Stable, Madaket Stables LLC, Starlight Racing |
1.25 |
Fast |
2:00.61 |
120.61 |
FALSE |
The first column represents the year of the race. The ninth column represents the winning time in seconds.