-
School size and score results can have a negative effect on student performance. The larger the number of students, the lower success in grades.
-
The budget per student and spending levels are negatively correlated with scores.
-
Students from Charter schools perform better than students from District schools. The data shows a significant difference with the overall passing rate among students between Charter and District schools. The largest Charter school has much better scores than the smallest District school. This data only takes into consideration large District and small Charter school size so there is probably a discrepancy with the two types of schools.
- My passing score and definition of a passing rate is an assumption. I decided to set it at a score of 65 based on Google's top search results for American High Schools. I did not look up Charter or District school categories.
# dependencies
import pandas as pd
import numpy as np
# csv file location
schools_rdata = "raw_data/schools_complete.csv"
students_rdata = "raw_data/students_complete.csv"
# format decimals
pd.options.display.float_format = '{:,.2f}'.format
# read schools data
schools_df = pd.read_csv(schools_rdata)
schools_df.head()
# rename column 'name' to 'school'
school = schools_df.rename(columns={"name": "school"})
school.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
School ID | school | type | size | budget | |
---|---|---|---|---|---|
0 | 0 | Huang High School | District | 2917 | 1910635 |
1 | 1 | Figueroa High School | District | 2949 | 1884411 |
2 | 2 | Shelton High School | Charter | 1761 | 1056600 |
3 | 3 | Hernandez High School | District | 4635 | 3022020 |
4 | 4 | Griffin High School | Charter | 1468 | 917500 |
# read students data
students_df = pd.read_csv(students_rdata)
students_df.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Student ID | name | gender | grade | school | reading_score | math_score | |
---|---|---|---|---|---|---|---|
0 | 0 | Paul Bradley | M | 9th | Huang High School | 66 | 79 |
1 | 1 | Victor Smith | M | 12th | Huang High School | 94 | 61 |
2 | 2 | Kevin Rodriguez | M | 12th | Huang High School | 90 | 60 |
3 | 3 | Dr. Richard Scott | M | 12th | Huang High School | 67 | 58 |
4 | 4 | Bonnie Ray | F | 9th | Huang High School | 97 | 84 |
# find totals for district:
school_list = students_df['school'].unique()
total_schools = len(school_list)
total_students = students_df['name'].count()
total_budget = schools_df['budget'].sum()
# find averages for scores:
ave_math_score=students_df['math_score'].mean()
ave_reading_score=students_df['reading_score'].mean()
# locate and count passing (i.e. better than 65, based on US high school data) scores
math_pass_df = students_df.loc[students_df["math_score"] >= 65, :]
math_pass_count = math_pass_df['math_score'].count()
reading_pass_df = students_df.loc[students_df["reading_score"] >= 65, :]
reading_pass_count = reading_pass_df['reading_score'].count()
# calculate percentages:
percent_pass_math = math_pass_count/total_students*100
percent_pass_reading = reading_pass_count/total_students*100
percent_pass_overall = (math_pass_count + reading_pass_count)/total_students*50
# build district summary dataframe:
district_breakdown = pd.DataFrame({"Total Schools": [total_schools],
"Total Students": [total_students],
"Total Budget": [total_budget],
"Average Math Score": [ave_math_score],
"Average Reading Score": [ave_reading_score],
"% Passing Math":[percent_pass_math],
"% Passing Reading":[percent_pass_reading],
"% Overall Passing Rate": [percent_pass_overall]})
district_breakdown['Total Budget'] = district_breakdown['Total Budget'].map('${:,.2f}'.format)
district_breakdown=district_breakdown[['Total Schools','Total Students','Total Budget','Average Math Score', 'Average Reading Score','% Passing Math','% Passing Reading','% Overall Passing Rate']]
district_breakdown
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Total Schools | Total Students | Total Budget | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate | |
---|---|---|---|---|---|---|---|---|
0 | 15 | 39170 | $24,649,428.00 | 78.99 | 81.88 | 84.73 | 96.20 | 90.46 |
# prepare schools df for merge with average students scores:
schools_df = schools_df.rename(index=str, columns={"name":"school"})
school_groups_df =students_df.groupby(["school"])
school_score_groups_df = school_groups_df[['school','math_score','reading_score']].mean().reset_index()
# merge schools data with average scores data:
aves_merge_df = pd.merge(school_score_groups_df, schools_df, on="school")
# find pass counts by school and merge with school summary df:
school_math_pass_group = math_pass_df.groupby(["school"])
school_math_pass_df = school_math_pass_group[['math_score']].count().reset_index()
school_reading_pass_group = reading_pass_df.groupby(["school"])
school_reading_pass_df = school_reading_pass_group[['reading_score']].count().reset_index()
pass_count_merge_df = pd.merge(aves_merge_df, school_math_pass_df, on="school")
pass_count_merge_df = pd.merge(pass_count_merge_df, school_reading_pass_df, on="school")
# complete per school summary with percentages columns and per student budgets:
pass_count_merge_df['% Passing Math'] = pass_count_merge_df['math_score_y']/pass_count_merge_df['size']*100
pass_count_merge_df['% Passing Reading'] = pass_count_merge_df['reading_score_y']/pass_count_merge_df['size']*100
pass_count_merge_df['% Overall Passing'] = (pass_count_merge_df['% Passing Math'] + pass_count_merge_df['% Passing Reading'])/2
pass_count_merge_df['Per Student Budget'] = pass_count_merge_df['budget']/pass_count_merge_df['size']
school_summary = pass_count_merge_df.rename(index=str, columns={"school":"School",'math_score_x':'Average Math Score',
'reading_score_x':'Average Reading Score','type':'School Type',
'size':'Total Students','budget':'Total Budget'})
school_summary = school_summary.drop(['math_score_y','reading_score_y','School ID'], axis = 1)
school_summary = school_summary[["School",'School Type','Total Students','Total Budget','Per Student Budget',
'Average Math Score', 'Average Reading Score','% Passing Math',
'% Passing Reading','% Overall Passing']]
school_summary = school_summary.set_index('School')
school_summary['Total Budget'] = school_summary['Total Budget'].map('${:,.2f}'.format)
school_summary['Per Student Budget'] = school_summary['Per Student Budget'].map('${:,.2f}'.format)
school_summary
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
School Type | Total Students | Total Budget | Per Student Budget | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing | |
---|---|---|---|---|---|---|---|---|---|
School | |||||||||
Bailey High School | District | 4976 | $3,124,928.00 | $628.00 | 77.05 | 81.03 | 77.91 | 94.55 | 86.23 |
Cabrera High School | Charter | 1858 | $1,081,356.00 | $582.00 | 83.06 | 83.98 | 100.00 | 100.00 | 100.00 |
Figueroa High School | District | 2949 | $1,884,411.00 | $639.00 | 76.71 | 81.16 | 77.18 | 94.54 | 85.86 |
Ford High School | District | 2739 | $1,763,916.00 | $644.00 | 77.10 | 80.75 | 78.20 | 93.87 | 86.04 |
Griffin High School | Charter | 1468 | $917,500.00 | $625.00 | 83.35 | 83.82 | 100.00 | 100.00 | 100.00 |
Hernandez High School | District | 4635 | $3,022,020.00 | $652.00 | 77.29 | 80.93 | 77.73 | 94.61 | 86.17 |
Holden High School | Charter | 427 | $248,087.00 | $581.00 | 83.80 | 83.81 | 100.00 | 100.00 | 100.00 |
Huang High School | District | 2917 | $1,910,635.00 | $655.00 | 76.63 | 81.18 | 77.72 | 94.48 | 86.10 |
Johnson High School | District | 4761 | $3,094,650.00 | $650.00 | 77.07 | 80.97 | 77.97 | 94.48 | 86.22 |
Pena High School | Charter | 962 | $585,858.00 | $609.00 | 83.84 | 84.04 | 100.00 | 100.00 | 100.00 |
Rodriguez High School | District | 3999 | $2,547,363.00 | $637.00 | 76.84 | 80.74 | 77.94 | 94.62 | 86.28 |
Shelton High School | Charter | 1761 | $1,056,600.00 | $600.00 | 83.36 | 83.73 | 100.00 | 100.00 | 100.00 |
Thomas High School | Charter | 1635 | $1,043,130.00 | $638.00 | 83.42 | 83.85 | 100.00 | 100.00 | 100.00 |
Wilson High School | Charter | 2283 | $1,319,574.00 | $578.00 | 83.27 | 83.99 | 100.00 | 100.00 | 100.00 |
Wright High School | Charter | 1800 | $1,049,400.00 | $583.00 | 83.68 | 83.95 | 100.00 | 100.00 | 100.00 |
# extract the top performing schools
top_five_schools = school_summary.sort_values(by=["% Overall Passing"], ascending=False)
top_five_schools.head(5)
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
School Type | Total Students | Total Budget | Per Student Budget | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing | |
---|---|---|---|---|---|---|---|---|---|
School | |||||||||
Cabrera High School | Charter | 1858 | $1,081,356.00 | $582.00 | 83.06 | 83.98 | 100.00 | 100.00 | 100.00 |
Griffin High School | Charter | 1468 | $917,500.00 | $625.00 | 83.35 | 83.82 | 100.00 | 100.00 | 100.00 |
Holden High School | Charter | 427 | $248,087.00 | $581.00 | 83.80 | 83.81 | 100.00 | 100.00 | 100.00 |
Pena High School | Charter | 962 | $585,858.00 | $609.00 | 83.84 | 84.04 | 100.00 | 100.00 | 100.00 |
Shelton High School | Charter | 1761 | $1,056,600.00 | $600.00 | 83.36 | 83.73 | 100.00 | 100.00 | 100.00 |
# lower five performing schools
lower_five_schools = top_five_schools.tail()
lower_five_schools = lower_five_schools.sort_values("% Overall Passing")
lower_five_schools
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
School Type | Total Students | Total Budget | Per Student Budget | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing | |
---|---|---|---|---|---|---|---|---|---|
School | |||||||||
Figueroa High School | District | 2949 | $1,884,411.00 | $639.00 | 76.71 | 81.16 | 77.18 | 94.54 | 85.86 |
Ford High School | District | 2739 | $1,763,916.00 | $644.00 | 77.10 | 80.75 | 78.20 | 93.87 | 86.04 |
Huang High School | District | 2917 | $1,910,635.00 | $655.00 | 76.63 | 81.18 | 77.72 | 94.48 | 86.10 |
Hernandez High School | District | 4635 | $3,022,020.00 | $652.00 | 77.29 | 80.93 | 77.73 | 94.61 | 86.17 |
Johnson High School | District | 4761 | $3,094,650.00 | $650.00 | 77.07 | 80.97 | 77.97 | 94.48 | 86.22 |
# math scores by grade
twelfthgrade = students_df.loc[students_df["grade"] == "12th"].groupby("school")["math_score"].mean()
eleventhgrade = students_df.loc[students_df["grade"] == "11th"].groupby("school")["math_score"].mean()
tenthgrade = students_df.loc[students_df["grade"] == "10th"].groupby("school")["math_score"].mean()
ninthgrade = students_df.loc[students_df["grade"] == "9th"].groupby("school")["math_score"].mean()
math_scores = pd.DataFrame({
"9th": ninthgrade,
"10th": tenthgrade,
"11th": eleventhgrade,
"12th": twelfthgrade
})
math_scores = math_scores[["9th", "10th", "11th", "12th"]]
math_scores.index.name = " "
math_scores
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
9th | 10th | 11th | 12th | |
---|---|---|---|---|
Bailey High School | 77.08 | 77.00 | 77.52 | 76.49 |
Cabrera High School | 83.09 | 83.15 | 82.77 | 83.28 |
Figueroa High School | 76.40 | 76.54 | 76.88 | 77.15 |
Ford High School | 77.36 | 77.67 | 76.92 | 76.18 |
Griffin High School | 82.04 | 84.23 | 83.84 | 83.36 |
Hernandez High School | 77.44 | 77.34 | 77.14 | 77.19 |
Holden High School | 83.79 | 83.43 | 85.00 | 82.86 |
Huang High School | 77.03 | 75.91 | 76.45 | 77.23 |
Johnson High School | 77.19 | 76.69 | 77.49 | 76.86 |
Pena High School | 83.63 | 83.37 | 84.33 | 84.12 |
Rodriguez High School | 76.86 | 76.61 | 76.40 | 77.69 |
Shelton High School | 83.42 | 82.92 | 83.38 | 83.78 |
Thomas High School | 83.59 | 83.09 | 83.50 | 83.50 |
Wilson High School | 83.09 | 83.72 | 83.20 | 83.04 |
Wright High School | 83.26 | 84.01 | 83.84 | 83.64 |
# reading scores by grade
twelfthgrade = students_df.loc[students_df["grade"] == "12th"].groupby("school")["reading_score"].mean()
eleventhgrade = students_df.loc[students_df["grade"] == "11th"].groupby("school")["reading_score"].mean()
tenthgrade = students_df.loc[students_df["grade"] == "10th"].groupby("school")["reading_score"].mean()
ninthgrade = students_df.loc[students_df["grade"] == "9th"].groupby("school")["reading_score"].mean()
reading_scores = pd.DataFrame({
"9th": ninthgrade,
"10th": tenthgrade,
"11th": eleventhgrade,
"12th": twelfthgrade
})
reading_scores = reading_scores[["9th", "10th", "11th", "12th"]]
reading_scores.index.name = " "
reading_scores
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
9th | 10th | 11th | 12th | |
---|---|---|---|---|
Bailey High School | 81.30 | 80.91 | 80.95 | 80.91 |
Cabrera High School | 83.68 | 84.25 | 83.79 | 84.29 |
Figueroa High School | 81.20 | 81.41 | 80.64 | 81.38 |
Ford High School | 80.63 | 81.26 | 80.40 | 80.66 |
Griffin High School | 83.37 | 83.71 | 84.29 | 84.01 |
Hernandez High School | 80.87 | 80.66 | 81.40 | 80.86 |
Holden High School | 83.68 | 83.32 | 83.82 | 84.70 |
Huang High School | 81.29 | 81.51 | 81.42 | 80.31 |
Johnson High School | 81.26 | 80.77 | 80.62 | 81.23 |
Pena High School | 83.81 | 83.61 | 84.34 | 84.59 |
Rodriguez High School | 80.99 | 80.63 | 80.86 | 80.38 |
Shelton High School | 84.12 | 83.44 | 84.37 | 82.78 |
Thomas High School | 83.73 | 84.25 | 83.59 | 83.83 |
Wilson High School | 83.94 | 84.02 | 83.76 | 84.32 |
Wright High School | 83.83 | 83.81 | 84.16 | 84.07 |
# compare results based on per student spending:
spend_bins = [0, 580, 605, 630, 655]
spend_labels = ["Less than $580", "Medium $580-605", "Average $605-630", "Above Average $630-655"]
spend_categories = pd.cut(pass_count_merge_df["Per Student Budget"], spend_bins, labels=spend_labels)
breakdown_by_spending_df = pd.DataFrame({'Spend Level':spend_categories,
'Average Math Score':aves_merge_df['math_score'],
'Average Reading Score':aves_merge_df['reading_score'],
'% Passing Math':pass_count_merge_df['% Passing Math'],
'% Passing Reading':pass_count_merge_df['% Passing Reading'],
'Overall Passing Rate':pass_count_merge_df['% Overall Passing']})
breakdown_by_spending_df.sort_values('Spend Level')
grouped_breakdown_by_spending_df = breakdown_by_spending_df.groupby(['Spend Level'])
spending_per_student = grouped_breakdown_by_spending_df.mean()
spending_per_student = spending_per_student[['Average Math Score', 'Average Reading Score', '% Passing Math',
'% Passing Reading', 'Overall Passing Rate']]
spending_per_student
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | Overall Passing Rate | |
---|---|---|---|---|---|
Spend Level | |||||
Less than $580 | 83.27 | 83.99 | 100.00 | 100.00 | 100.00 |
Medium $580-605 | 83.48 | 83.87 | 100.00 | 100.00 | 100.00 |
Average $605-630 | 81.41 | 82.97 | 92.64 | 98.18 | 95.41 |
Above Average $630-655 | 77.87 | 81.37 | 80.96 | 95.23 | 88.10 |
# scores by school size
school_size_bins = [0, 500, 2750, 5000]
school_size_labels = ["Small (<500)", "Medium (500-2750)", "Large (2750-5000)"]
school_size_categories = pd.cut(pass_count_merge_df["size"], school_size_bins, labels=school_size_labels)
breakdown_by_size_df = pd.DataFrame({'School Size':school_size_categories,
'Average Math Score':aves_merge_df['math_score'],
'Average Reading Score':aves_merge_df['reading_score'],
'% Passing Math':pass_count_merge_df['% Passing Math'],
'% Passing Reading':pass_count_merge_df['% Passing Reading'],
'Overall Passing Rate':pass_count_merge_df['% Overall Passing']})
grouped_breakdown_by_size_df = breakdown_by_size_df.groupby(['School Size'])
size_comparison = grouped_breakdown_by_size_df.mean()
size_comparison = size_comparison[['Average Math Score', 'Average Reading Score', '% Passing Math',
'% Passing Reading', 'Overall Passing Rate']]
size_comparison
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | Overall Passing Rate | |
---|---|---|---|---|---|
School Size | |||||
Small (<500) | 83.80 | 83.81 | 100.00 | 100.00 | 100.00 |
Medium (500-2750) | 82.64 | 83.51 | 97.28 | 99.23 | 98.25 |
Large(2750-5000) | 76.93 | 81.00 | 77.74 | 94.55 | 86.14 |
# scores by school type (District/Charter)
school_types = schools_df[['type','school']]
calcs_by_type_df = pd.DataFrame({'school':aves_merge_df['school'],
'Average Math Score':aves_merge_df['math_score'],
'Average Reading Score':aves_merge_df['reading_score'],
'% Passing Math':pass_count_merge_df['% Passing Math'],
'% Passing Reading':pass_count_merge_df['% Passing Reading'],
'Overall Passing Rate':pass_count_merge_df['% Overall Passing']})
breakdown_by_type_df = pd.merge(calcs_by_type_df, school_types,on='school')
breakdown_by_type_df = breakdown_by_type_df.rename(index=str, columns={"type":"Type"})
grouped_breakdown_by_type_df = breakdown_by_type_df.groupby(['Type'])
chart_vs_dist_df = grouped_breakdown_by_type_df.mean()
chart_vs_dist_df = chart_vs_dist_df[['Average Math Score', 'Average Reading Score','% Passing Math',
'% Passing Reading','Overall Passing Rate']]
chart_vs_dist_df
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | Overall Passing Rate | |
---|---|---|---|---|---|
Type | |||||
Charter | 83.47 | 83.90 | 100.00 | 100.00 | 100.00 |
District | 76.96 | 80.97 | 77.81 | 94.45 | 86.13 |
-
This is an overview analysis of large high schools using a Python Data Analysis Library called Pandas.
-
The Data Source is based on two CSV files provided in the raw_data folder and outputs a series of reports to analyze key factors in test score results across a district including charter schools.
-
Dataframes are used to parse the records. (i.e.
pd.DataFrame
)