Bias#

In this lab, you will perform some graphical analysis on a famously biased data set and use statistical reasoning to draw conclusions about the method of observation used to generate the data.

Instructions#

Create a folder named LASTNAME_FIRSTNAME_project_three, replacing LASTNAME and FIRSTNAME with your last name and first name, respectively.
Download the csv dataset below and place it in the new folder you created in step 1.
In the same folder, create a Microsoft Word docx document named project_three.docx.
In the same folder, create a Python py script named project_three.py
Read the Project section.
Answer the indicated questions in the Project section in the .docx document file.
When you are done, zip your folder and all its contents in a file named LASTNAME_FIRSTNAME_project_three.zip
Upload the zip file here: TODO

Loading In Data#

TODO

Background#

In the years 1969, 1970, 1971 and 1972, the Selective Service System in the United States held a draft lottery by order of President Lyndon B. Johnson for men born between the dates of January 1, 1944 and December 31, 1950 *.

*: Vietnam War Draft Lottery source

Individuals born between these dates were to be selected at random and drafted into military service to fight in the Vietnam War.

Method of Observation#

The method used to select individuals for service is highly controversial. Many argued it was not truly random and unfairly selected certain groups of individuals over others.

365 days of the year were printed on sheets of paper and placed in a shoebox.

{ January 1, January 2, … , Feburary 1, February 2, … , December 30, December 31 }

Slips of paper were then selected at random and anyone of eligible age who had a birthday on the date indicated would be drafted. The important point is individuals who shared the same birthday would be drafted at the same time. As example, two men who had the birthdays April 5:sup:th, 1946 and April 5:sup:th, 1947 would both be drafted in the event a slip of paper “April 5” was selected.

Project#

Discuss the following questions
- Is the selection method used for the draft random? Why or why not?
- If the selection method used for the draft were truly random, what shape would you expect a frequency distribution of the sample to have?
- Given the information provided on the selection method, what shape do you expect a frequency distribution of the sample to have?
- What are some possible sources of bias in the draft lottery? List the cases and identify the type of bias in each case.
Using the birth month of the drafted individual as the bins, construct histograms for the years 1969, 1970, 1971, 1972. Include both the frequency distributions and the histograms in your report.
Based on the histograms constructed, describe the shape of the distribution for each year’s draft lottery. - Are the graphs skewed, uniform, normal or bimodal? - What is the mode of the birth month for each year? - What can we conclude about the relative likelihood of a male with a birthday in January being drafted versus a male with a birthday in December being drafted for the years 1969? Does this same result appear to hold for 1970, 1971 and 1972? - Discuss the results. Was the draft lottery fair? If not, why not? If so, why? Justify your answer.

Data Set#

You can download the full dataset here.

The following table is the a preview of the data you will be using for this project.

Vietnam Draft Lottery Data#
M	D	N69	N70	N71	N72
1	1	305	133	207	150
1	2	159	195	225	328
1	3	251	336	246	42
1	4	215	99	264	28
1	5	101	33	265	338
1	6	224	285	242	36
1	7	306	159	292	111
1	8	199	116	287	206
1	9	194	53	338	197
1	10	325	101	231	37
1	11	329	144	90	174
1	12	221	152	228	126
1	13	318	330	183	298
1	14	238	71	285	341
1	15	17	75	325	221
1	16	121	136	74	309
11	1	19	243	366	107
11	2	34	205	190	214
11	3	348	294	300	232
11	4	266	39	166	339
11	5	310	286	211	223
11	6	76	245	186	211
11	7	51	72	17	299
11	8	97	119	260	312
11	9	80	176	237	151
11	10	282	63	227	257
11	11	46	123	244	159
11	12	66	255	259	66
11	13	126	272	247	124
11	14	127	11	316	237
11	15	131	362	318	176
11	16	107	197	120	209
11	17	143	6	298	284
11	18	146	280	175	160
11	19	203	252	333	270
11	20	185	98	125	301
11	21	156	35	330	287
11	22	9	253	93	102
11	23	182	193	181	320
11	24	230	81	62	180
11	25	132	23	97	25
11	26	309	52	209	344

The meaning of the columns is as follows.

M represents the birth month of the draftee,

M = 1, 2, 3, … , 11, 12

D represents the birth day of the draftee,

D = 1, 2, 3, … , 30, 31

And N69, N70, N71 and N72 represent the number of individuals selected with a given birth date in the years 1969, 1970, 1971 and 1972, respectively.

Cleaning the Data Set#

The dataset is broken down by day. Each entry corresponds to a particular birthdate, month and day. The lab is asking to group the data into monthly classes, so the frequency distribution can be visualized with a histogram grouped by month. Therefore, the data will need grouped and totaled by month before generating a histogram.

The following code snippet will:

create a list, named data_1969, of twelve 0’s, [0, 0, 0, ... , 0, 0], one for each month,.
step through column_1 along with the row_number.
grab the corresponding entry of the third column, column_3[row_number]
add the value of the third column to the corresponding entry in data_1969

data_1969 = [ 0 ] * 12

for row_number, entry in enumerate(column_1):
    data_1969[int(entry) - 1] += column_3[row_number]

M	D	N69	N70	N71	N72
1	1	305	133	207	150
1	2	159	195	225	328
1	3	251	336	246	42
1	4	215	99	264	28
1	5	101	33	265	338
1	6	224	285	242	36
1	7	306	159	292	111
1	8	199	116	287	206
1	9	194	53	338	197
1	10	325	101	231	37
1	11	329	144	90	174
1	12	221	152	228	126
1	13	318	330	183	298
1	14	238	71	285	341
1	15	17	75	325	221
1	16	121	136	74	309
11	1	19	243	366	107
11	2	34	205	190	214
11	3	348	294	300	232
11	4	266	39	166	339
11	5	310	286	211	223
11	6	76	245	186	211
11	7	51	72	17	299
11	8	97	119	260	312
11	9	80	176	237	151
11	10	282	63	227	257
11	11	46	123	244	159
11	12	66	255	259	66
11	13	126	272	247	124
11	14	127	11	316	237
11	15	131	362	318	176
11	16	107	197	120	209
11	17	143	6	298	284
11	18	146	280	175	160
11	19	203	252	333	270
11	20	185	98	125	301
11	21	156	35	330	287
11	22	9	253	93	102
11	23	182	193	181	320
11	24	230	81	62	180
11	25	132	23	97	25
11	26	309	52	209	344

M	D	N69	N70	N71	N72
1	1	305	133	207	150
1	2	159	195	225	328
1	3	251	336	246	42
1	4	215	99	264	28
1	5	101	33	265	338
1	6	224	285	242	36
1	7	306	159	292	111
1	8	199	116	287	206
1	9	194	53	338	197
1	10	325	101	231	37
1	11	329	144	90	174
1	12	221	152	228	126
1	13	318	330	183	298
1	14	238	71	285	341
1	15	17	75	325	221
1	16	121	136	74	309
11	1	19	243	366	107
11	2	34	205	190	214
11	3	348	294	300	232
11	4	266	39	166	339
11	5	310	286	211	223
11	6	76	245	186	211
11	7	51	72	17	299
11	8	97	119	260	312
11	9	80	176	237	151
11	10	282	63	227	257
11	11	46	123	244	159
11	12	66	255	259	66
11	13	126	272	247	124
11	14	127	11	316	237
11	15	131	362	318	176
11	16	107	197	120	209
11	17	143	6	298	284
11	18	146	280	175	160
11	19	203	252	333	270
11	20	185	98	125	301
11	21	156	35	330	287
11	22	9	253	93	102
11	23	182	193	181	320
11	24	230	81	62	180
11	25	132	23	97	25
11	26	309	52	209	344

Bias

Contents