Skip to content

risatino/academy-of-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Academy of Py

Three trends based on this dataset:

  1. School size and score results can have a negative effect on student performance. The larger the number of students, the lower success in grades.

  2. The budget per student and spending levels are negatively correlated with scores.

  3. Students from Charter schools perform better than students from District schools. The data shows a significant difference with the overall passing rate among students between Charter and District schools. The largest Charter school has much better scores than the smallest District school. This data only takes into consideration large District and small Charter school size so there is probably a discrepancy with the two types of schools.

Taking into consideration the following:

  • My passing score and definition of a passing rate is an assumption. I decided to set it at a score of 65 based on Google's top search results for American High Schools. I did not look up Charter or District school categories.
# dependencies
import pandas as pd
import numpy as np
# csv file location
schools_rdata = "raw_data/schools_complete.csv"
students_rdata = "raw_data/students_complete.csv"
# format decimals
pd.options.display.float_format = '{:,.2f}'.format

# read schools data
schools_df = pd.read_csv(schools_rdata)
schools_df.head()

# rename column 'name' to 'school'
school = schools_df.rename(columns={"name": "school"})
school.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
School ID school type size budget
0 0 Huang High School District 2917 1910635
1 1 Figueroa High School District 2949 1884411
2 2 Shelton High School Charter 1761 1056600
3 3 Hernandez High School District 4635 3022020
4 4 Griffin High School Charter 1468 917500
# read students data
students_df = pd.read_csv(students_rdata)
students_df.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Student ID name gender grade school reading_score math_score
0 0 Paul Bradley M 9th Huang High School 66 79
1 1 Victor Smith M 12th Huang High School 94 61
2 2 Kevin Rodriguez M 12th Huang High School 90 60
3 3 Dr. Richard Scott M 12th Huang High School 67 58
4 4 Bonnie Ray F 9th Huang High School 97 84
# find totals for district:
school_list = students_df['school'].unique()
total_schools = len(school_list)
total_students = students_df['name'].count()
total_budget = schools_df['budget'].sum()

# find averages for scores:
ave_math_score=students_df['math_score'].mean()
ave_reading_score=students_df['reading_score'].mean()

# locate and count passing (i.e. better than 65, based on US high school data) scores
math_pass_df = students_df.loc[students_df["math_score"] >= 65, :]
math_pass_count = math_pass_df['math_score'].count()
reading_pass_df = students_df.loc[students_df["reading_score"] >= 65, :]
reading_pass_count = reading_pass_df['reading_score'].count()

# calculate percentages:
percent_pass_math = math_pass_count/total_students*100
percent_pass_reading = reading_pass_count/total_students*100
percent_pass_overall = (math_pass_count + reading_pass_count)/total_students*50

# build district summary dataframe:
district_breakdown = pd.DataFrame({"Total Schools": [total_schools],
                                   "Total Students": [total_students],
                                   "Total Budget": [total_budget],
                                   "Average Math Score": [ave_math_score],
                                   "Average Reading Score": [ave_reading_score],
                                   "% Passing Math":[percent_pass_math],
                                   "% Passing Reading":[percent_pass_reading],
                                   "% Overall Passing Rate": [percent_pass_overall]})
district_breakdown['Total Budget'] = district_breakdown['Total Budget'].map('${:,.2f}'.format)
district_breakdown=district_breakdown[['Total Schools','Total Students','Total Budget','Average Math Score', 'Average Reading Score','% Passing Math','% Passing Reading','% Overall Passing Rate']]

district_breakdown
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Total Schools Total Students Total Budget Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing Rate
0 15 39170 $24,649,428.00 78.99 81.88 84.73 96.20 90.46
# prepare schools df for merge with average students scores:
schools_df = schools_df.rename(index=str, columns={"name":"school"})
school_groups_df =students_df.groupby(["school"])
school_score_groups_df = school_groups_df[['school','math_score','reading_score']].mean().reset_index()

# merge schools data with average scores data:
aves_merge_df = pd.merge(school_score_groups_df, schools_df, on="school")

# find pass counts by school and merge with school summary df:
school_math_pass_group  = math_pass_df.groupby(["school"])
school_math_pass_df = school_math_pass_group[['math_score']].count().reset_index()
school_reading_pass_group  = reading_pass_df.groupby(["school"])
school_reading_pass_df = school_reading_pass_group[['reading_score']].count().reset_index()
pass_count_merge_df = pd.merge(aves_merge_df, school_math_pass_df, on="school")
pass_count_merge_df = pd.merge(pass_count_merge_df, school_reading_pass_df, on="school")

# complete per school summary with percentages columns and per student budgets:
pass_count_merge_df['% Passing Math'] = pass_count_merge_df['math_score_y']/pass_count_merge_df['size']*100
pass_count_merge_df['% Passing Reading'] = pass_count_merge_df['reading_score_y']/pass_count_merge_df['size']*100
pass_count_merge_df['% Overall Passing'] = (pass_count_merge_df['% Passing Math'] + pass_count_merge_df['% Passing Reading'])/2
pass_count_merge_df['Per Student Budget'] = pass_count_merge_df['budget']/pass_count_merge_df['size']

school_summary = pass_count_merge_df.rename(index=str, columns={"school":"School",'math_score_x':'Average Math Score',
                                                                'reading_score_x':'Average Reading Score','type':'School Type',
                                                                'size':'Total Students','budget':'Total Budget'})

school_summary = school_summary.drop(['math_score_y','reading_score_y','School ID'], axis = 1)
school_summary = school_summary[["School",'School Type','Total Students','Total Budget','Per Student Budget',
                                 'Average Math Score', 'Average Reading Score','% Passing Math',
                                 '% Passing Reading','% Overall Passing']]

school_summary = school_summary.set_index('School')
school_summary['Total Budget'] = school_summary['Total Budget'].map('${:,.2f}'.format)
school_summary['Per Student Budget'] = school_summary['Per Student Budget'].map('${:,.2f}'.format)
school_summary
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
School Type Total Students Total Budget Per Student Budget Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing
School
Bailey High School District 4976 $3,124,928.00 $628.00 77.05 81.03 77.91 94.55 86.23
Cabrera High School Charter 1858 $1,081,356.00 $582.00 83.06 83.98 100.00 100.00 100.00
Figueroa High School District 2949 $1,884,411.00 $639.00 76.71 81.16 77.18 94.54 85.86
Ford High School District 2739 $1,763,916.00 $644.00 77.10 80.75 78.20 93.87 86.04
Griffin High School Charter 1468 $917,500.00 $625.00 83.35 83.82 100.00 100.00 100.00
Hernandez High School District 4635 $3,022,020.00 $652.00 77.29 80.93 77.73 94.61 86.17
Holden High School Charter 427 $248,087.00 $581.00 83.80 83.81 100.00 100.00 100.00
Huang High School District 2917 $1,910,635.00 $655.00 76.63 81.18 77.72 94.48 86.10
Johnson High School District 4761 $3,094,650.00 $650.00 77.07 80.97 77.97 94.48 86.22
Pena High School Charter 962 $585,858.00 $609.00 83.84 84.04 100.00 100.00 100.00
Rodriguez High School District 3999 $2,547,363.00 $637.00 76.84 80.74 77.94 94.62 86.28
Shelton High School Charter 1761 $1,056,600.00 $600.00 83.36 83.73 100.00 100.00 100.00
Thomas High School Charter 1635 $1,043,130.00 $638.00 83.42 83.85 100.00 100.00 100.00
Wilson High School Charter 2283 $1,319,574.00 $578.00 83.27 83.99 100.00 100.00 100.00
Wright High School Charter 1800 $1,049,400.00 $583.00 83.68 83.95 100.00 100.00 100.00
# extract the top performing schools
top_five_schools = school_summary.sort_values(by=["% Overall Passing"], ascending=False)
top_five_schools.head(5)
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
School Type Total Students Total Budget Per Student Budget Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing
School
Cabrera High School Charter 1858 $1,081,356.00 $582.00 83.06 83.98 100.00 100.00 100.00
Griffin High School Charter 1468 $917,500.00 $625.00 83.35 83.82 100.00 100.00 100.00
Holden High School Charter 427 $248,087.00 $581.00 83.80 83.81 100.00 100.00 100.00
Pena High School Charter 962 $585,858.00 $609.00 83.84 84.04 100.00 100.00 100.00
Shelton High School Charter 1761 $1,056,600.00 $600.00 83.36 83.73 100.00 100.00 100.00
# lower five performing schools
lower_five_schools = top_five_schools.tail()
lower_five_schools = lower_five_schools.sort_values("% Overall Passing")
lower_five_schools
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
School Type Total Students Total Budget Per Student Budget Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing
School
Figueroa High School District 2949 $1,884,411.00 $639.00 76.71 81.16 77.18 94.54 85.86
Ford High School District 2739 $1,763,916.00 $644.00 77.10 80.75 78.20 93.87 86.04
Huang High School District 2917 $1,910,635.00 $655.00 76.63 81.18 77.72 94.48 86.10
Hernandez High School District 4635 $3,022,020.00 $652.00 77.29 80.93 77.73 94.61 86.17
Johnson High School District 4761 $3,094,650.00 $650.00 77.07 80.97 77.97 94.48 86.22
# math scores by grade
twelfthgrade = students_df.loc[students_df["grade"] == "12th"].groupby("school")["math_score"].mean()
eleventhgrade = students_df.loc[students_df["grade"] == "11th"].groupby("school")["math_score"].mean()
tenthgrade = students_df.loc[students_df["grade"] == "10th"].groupby("school")["math_score"].mean()
ninthgrade = students_df.loc[students_df["grade"] == "9th"].groupby("school")["math_score"].mean()

math_scores = pd.DataFrame({
        "9th": ninthgrade,
        "10th": tenthgrade,
        "11th": eleventhgrade,
        "12th": twelfthgrade
})

math_scores = math_scores[["9th", "10th", "11th", "12th"]]
math_scores.index.name = " "
math_scores
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
9th 10th 11th 12th
Bailey High School 77.08 77.00 77.52 76.49
Cabrera High School 83.09 83.15 82.77 83.28
Figueroa High School 76.40 76.54 76.88 77.15
Ford High School 77.36 77.67 76.92 76.18
Griffin High School 82.04 84.23 83.84 83.36
Hernandez High School 77.44 77.34 77.14 77.19
Holden High School 83.79 83.43 85.00 82.86
Huang High School 77.03 75.91 76.45 77.23
Johnson High School 77.19 76.69 77.49 76.86
Pena High School 83.63 83.37 84.33 84.12
Rodriguez High School 76.86 76.61 76.40 77.69
Shelton High School 83.42 82.92 83.38 83.78
Thomas High School 83.59 83.09 83.50 83.50
Wilson High School 83.09 83.72 83.20 83.04
Wright High School 83.26 84.01 83.84 83.64
# reading scores by grade
twelfthgrade = students_df.loc[students_df["grade"] == "12th"].groupby("school")["reading_score"].mean()
eleventhgrade = students_df.loc[students_df["grade"] == "11th"].groupby("school")["reading_score"].mean()
tenthgrade = students_df.loc[students_df["grade"] == "10th"].groupby("school")["reading_score"].mean()
ninthgrade = students_df.loc[students_df["grade"] == "9th"].groupby("school")["reading_score"].mean()

reading_scores = pd.DataFrame({
        "9th": ninthgrade,
        "10th": tenthgrade,
        "11th": eleventhgrade,
        "12th": twelfthgrade
})

reading_scores = reading_scores[["9th", "10th", "11th", "12th"]]
reading_scores.index.name = " "
reading_scores
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
9th 10th 11th 12th
Bailey High School 81.30 80.91 80.95 80.91
Cabrera High School 83.68 84.25 83.79 84.29
Figueroa High School 81.20 81.41 80.64 81.38
Ford High School 80.63 81.26 80.40 80.66
Griffin High School 83.37 83.71 84.29 84.01
Hernandez High School 80.87 80.66 81.40 80.86
Holden High School 83.68 83.32 83.82 84.70
Huang High School 81.29 81.51 81.42 80.31
Johnson High School 81.26 80.77 80.62 81.23
Pena High School 83.81 83.61 84.34 84.59
Rodriguez High School 80.99 80.63 80.86 80.38
Shelton High School 84.12 83.44 84.37 82.78
Thomas High School 83.73 84.25 83.59 83.83
Wilson High School 83.94 84.02 83.76 84.32
Wright High School 83.83 83.81 84.16 84.07
# compare results based on per student spending:
spend_bins = [0, 580, 605, 630, 655]
spend_labels = ["Less than $580", "Medium $580-605", "Average $605-630", "Above Average $630-655"]
spend_categories = pd.cut(pass_count_merge_df["Per Student Budget"], spend_bins, labels=spend_labels)

breakdown_by_spending_df = pd.DataFrame({'Spend Level':spend_categories,
                                        'Average Math Score':aves_merge_df['math_score'],                                       
                                        'Average Reading Score':aves_merge_df['reading_score'],
                                        '% Passing Math':pass_count_merge_df['% Passing Math'], 
                                        '% Passing Reading':pass_count_merge_df['% Passing Reading'],
                                        'Overall Passing Rate':pass_count_merge_df['% Overall Passing']})

breakdown_by_spending_df.sort_values('Spend Level')
grouped_breakdown_by_spending_df = breakdown_by_spending_df.groupby(['Spend Level'])

spending_per_student = grouped_breakdown_by_spending_df.mean()
spending_per_student = spending_per_student[['Average Math Score', 'Average Reading Score', '% Passing Math',
                                             '% Passing Reading', 'Overall Passing Rate']]
spending_per_student
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
Spend Level
Less than $580 83.27 83.99 100.00 100.00 100.00
Medium $580-605 83.48 83.87 100.00 100.00 100.00
Average $605-630 81.41 82.97 92.64 98.18 95.41
Above Average $630-655 77.87 81.37 80.96 95.23 88.10
# scores by school size
school_size_bins = [0, 500, 2750, 5000]
school_size_labels = ["Small (<500)", "Medium (500-2750)", "Large (2750-5000)"]
school_size_categories = pd.cut(pass_count_merge_df["size"], school_size_bins, labels=school_size_labels)

breakdown_by_size_df = pd.DataFrame({'School Size':school_size_categories,
                                        'Average Math Score':aves_merge_df['math_score'],                                       
                                        'Average Reading Score':aves_merge_df['reading_score'],
                                        '% Passing Math':pass_count_merge_df['% Passing Math'], 
                                        '% Passing Reading':pass_count_merge_df['% Passing Reading'],
                                        'Overall Passing Rate':pass_count_merge_df['% Overall Passing']})

grouped_breakdown_by_size_df = breakdown_by_size_df.groupby(['School Size'])

size_comparison = grouped_breakdown_by_size_df.mean()
size_comparison = size_comparison[['Average Math Score', 'Average Reading Score', '% Passing Math',
                                   '% Passing Reading', 'Overall Passing Rate']]
size_comparison
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
School Size
Small (<500) 83.80 83.81 100.00 100.00 100.00
Medium (500-2750) 82.64 83.51 97.28 99.23 98.25
Large(2750-5000) 76.93 81.00 77.74 94.55 86.14
# scores by school type (District/Charter)
school_types = schools_df[['type','school']]
calcs_by_type_df = pd.DataFrame({'school':aves_merge_df['school'],
                                        'Average Math Score':aves_merge_df['math_score'],                                       
                                        'Average Reading Score':aves_merge_df['reading_score'],
                                        '% Passing Math':pass_count_merge_df['% Passing Math'], 
                                        '% Passing Reading':pass_count_merge_df['% Passing Reading'],
                                        'Overall Passing Rate':pass_count_merge_df['% Overall Passing']})
breakdown_by_type_df = pd.merge(calcs_by_type_df, school_types,on='school')
breakdown_by_type_df = breakdown_by_type_df.rename(index=str, columns={"type":"Type"})

grouped_breakdown_by_type_df = breakdown_by_type_df.groupby(['Type'])

chart_vs_dist_df = grouped_breakdown_by_type_df.mean()
chart_vs_dist_df = chart_vs_dist_df[['Average Math Score', 'Average Reading Score','% Passing Math',
                                     '% Passing Reading','Overall Passing Rate']]
chart_vs_dist_df
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
Type
Charter 83.47 83.90 100.00 100.00 100.00
District 76.96 80.97 77.81 94.45 86.13

Specs:

  • This is an overview analysis of large high schools using a Python Data Analysis Library called Pandas.

  • The Data Source is based on two CSV files provided in the raw_data folder and outputs a series of reports to analyze key factors in test score results across a district including charter schools.

  • Dataframes are used to parse the records. (i.e. pd.DataFrame)

About

data analysis using pandas and numpy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published