GitHub - risatino/academy-of-py: data analysis using pandas and numpy

Academy of Py

Three trends based on this dataset:

School size and score results can have a negative effect on student performance. The larger the number of students, the lower success in grades.
The budget per student and spending levels are negatively correlated with scores.
Students from Charter schools perform better than students from District schools. The data shows a significant difference with the overall passing rate among students between Charter and District schools. The largest Charter school has much better scores than the smallest District school. This data only takes into consideration large District and small Charter school size so there is probably a discrepancy with the two types of schools.

Taking into consideration the following:

My passing score and definition of a passing rate is an assumption. I decided to set it at a score of 65 based on Google's top search results for American High Schools. I did not look up Charter or District school categories.

# dependencies
import pandas as pd
import numpy as np

# csv file location
schools_rdata = "raw_data/schools_complete.csv"
students_rdata = "raw_data/students_complete.csv"

# format decimals
pd.options.display.float_format = '{:,.2f}'.format

# read schools data
schools_df = pd.read_csv(schools_rdata)
schools_df.head()

# rename column 'name' to 'school'
school = schools_df.rename(columns={"name": "school"})
school.head()

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	School ID	school	type	size	budget
0	0	Huang High School	District	2917	1910635
1	1	Figueroa High School	District	2949	1884411
2	2	Shelton High School	Charter	1761	1056600
3	3	Hernandez High School	District	4635	3022020
4	4	Griffin High School	Charter	1468	917500

# read students data
students_df = pd.read_csv(students_rdata)
students_df.head()

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Student ID	name	gender	grade	school	reading_score	math_score
0	0	Paul Bradley	M	9th	Huang High School	66	79
1	1	Victor Smith	M	12th	Huang High School	94	61
2	2	Kevin Rodriguez	M	12th	Huang High School	90	60
3	3	Dr. Richard Scott	M	12th	Huang High School	67	58
4	4	Bonnie Ray	F	9th	Huang High School	97	84

# find totals for district:
school_list = students_df['school'].unique()
total_schools = len(school_list)
total_students = students_df['name'].count()
total_budget = schools_df['budget'].sum()

# find averages for scores:
ave_math_score=students_df['math_score'].mean()
ave_reading_score=students_df['reading_score'].mean()

# locate and count passing (i.e. better than 65, based on US high school data) scores
math_pass_df = students_df.loc[students_df["math_score"] >= 65, :]
math_pass_count = math_pass_df['math_score'].count()
reading_pass_df = students_df.loc[students_df["reading_score"] >= 65, :]
reading_pass_count = reading_pass_df['reading_score'].count()

# calculate percentages:
percent_pass_math = math_pass_count/total_students*100
percent_pass_reading = reading_pass_count/total_students*100
percent_pass_overall = (math_pass_count + reading_pass_count)/total_students*50

# build district summary dataframe:
district_breakdown = pd.DataFrame({"Total Schools": [total_schools],
                                   "Total Students": [total_students],
                                   "Total Budget": [total_budget],
                                   "Average Math Score": [ave_math_score],
                                   "Average Reading Score": [ave_reading_score],
                                   "% Passing Math":[percent_pass_math],
                                   "% Passing Reading":[percent_pass_reading],
                                   "% Overall Passing Rate": [percent_pass_overall]})
district_breakdown['Total Budget'] = district_breakdown['Total Budget'].map('${:,.2f}'.format)
district_breakdown=district_breakdown[['Total Schools','Total Students','Total Budget','Average Math Score', 'Average Reading Score','% Passing Math','% Passing Reading','% Overall Passing Rate']]

district_breakdown

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Total Schools	Total Students	Total Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing Rate
0	15	39170	$24,649,428.00	78.99	81.88	84.73	96.20	90.46

# prepare schools df for merge with average students scores:
schools_df = schools_df.rename(index=str, columns={"name":"school"})
school_groups_df =students_df.groupby(["school"])
school_score_groups_df = school_groups_df[['school','math_score','reading_score']].mean().reset_index()

# merge schools data with average scores data:
aves_merge_df = pd.merge(school_score_groups_df, schools_df, on="school")

# find pass counts by school and merge with school summary df:
school_math_pass_group  = math_pass_df.groupby(["school"])
school_math_pass_df = school_math_pass_group[['math_score']].count().reset_index()
school_reading_pass_group  = reading_pass_df.groupby(["school"])
school_reading_pass_df = school_reading_pass_group[['reading_score']].count().reset_index()
pass_count_merge_df = pd.merge(aves_merge_df, school_math_pass_df, on="school")
pass_count_merge_df = pd.merge(pass_count_merge_df, school_reading_pass_df, on="school")

# complete per school summary with percentages columns and per student budgets:
pass_count_merge_df['% Passing Math'] = pass_count_merge_df['math_score_y']/pass_count_merge_df['size']*100
pass_count_merge_df['% Passing Reading'] = pass_count_merge_df['reading_score_y']/pass_count_merge_df['size']*100
pass_count_merge_df['% Overall Passing'] = (pass_count_merge_df['% Passing Math'] + pass_count_merge_df['% Passing Reading'])/2
pass_count_merge_df['Per Student Budget'] = pass_count_merge_df['budget']/pass_count_merge_df['size']

school_summary = pass_count_merge_df.rename(index=str, columns={"school":"School",'math_score_x':'Average Math Score',
                                                                'reading_score_x':'Average Reading Score','type':'School Type',
                                                                'size':'Total Students','budget':'Total Budget'})

school_summary = school_summary.drop(['math_score_y','reading_score_y','School ID'], axis = 1)
school_summary = school_summary[["School",'School Type','Total Students','Total Budget','Per Student Budget',
                                 'Average Math Score', 'Average Reading Score','% Passing Math',
                                 '% Passing Reading','% Overall Passing']]

school_summary = school_summary.set_index('School')
school_summary['Total Budget'] = school_summary['Total Budget'].map('${:,.2f}'.format)
school_summary['Per Student Budget'] = school_summary['Per Student Budget'].map('${:,.2f}'.format)
school_summary

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	School Type	Total Students	Total Budget	Per Student Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing
School
Bailey High School	District	4976	$3,124,928.00	$628.00	77.05	81.03	77.91	94.55	86.23
Cabrera High School	Charter	1858	$1,081,356.00	$582.00	83.06	83.98	100.00	100.00	100.00
Figueroa High School	District	2949	$1,884,411.00	$639.00	76.71	81.16	77.18	94.54	85.86
Ford High School	District	2739	$1,763,916.00	$644.00	77.10	80.75	78.20	93.87	86.04
Griffin High School	Charter	1468	$917,500.00	$625.00	83.35	83.82	100.00	100.00	100.00
Hernandez High School	District	4635	$3,022,020.00	$652.00	77.29	80.93	77.73	94.61	86.17
Holden High School	Charter	427	$248,087.00	$581.00	83.80	83.81	100.00	100.00	100.00
Huang High School	District	2917	$1,910,635.00	$655.00	76.63	81.18	77.72	94.48	86.10
Johnson High School	District	4761	$3,094,650.00	$650.00	77.07	80.97	77.97	94.48	86.22
Pena High School	Charter	962	$585,858.00	$609.00	83.84	84.04	100.00	100.00	100.00
Rodriguez High School	District	3999	$2,547,363.00	$637.00	76.84	80.74	77.94	94.62	86.28
Shelton High School	Charter	1761	$1,056,600.00	$600.00	83.36	83.73	100.00	100.00	100.00
Thomas High School	Charter	1635	$1,043,130.00	$638.00	83.42	83.85	100.00	100.00	100.00
Wilson High School	Charter	2283	$1,319,574.00	$578.00	83.27	83.99	100.00	100.00	100.00
Wright High School	Charter	1800	$1,049,400.00	$583.00	83.68	83.95	100.00	100.00	100.00

# extract the top performing schools
top_five_schools = school_summary.sort_values(by=["% Overall Passing"], ascending=False)
top_five_schools.head(5)

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	School Type	Total Students	Total Budget	Per Student Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing
School
Cabrera High School	Charter	1858	$1,081,356.00	$582.00	83.06	83.98	100.00	100.00	100.00
Griffin High School	Charter	1468	$917,500.00	$625.00	83.35	83.82	100.00	100.00	100.00
Holden High School	Charter	427	$248,087.00	$581.00	83.80	83.81	100.00	100.00	100.00
Pena High School	Charter	962	$585,858.00	$609.00	83.84	84.04	100.00	100.00	100.00
Shelton High School	Charter	1761	$1,056,600.00	$600.00	83.36	83.73	100.00	100.00	100.00

# lower five performing schools
lower_five_schools = top_five_schools.tail()
lower_five_schools = lower_five_schools.sort_values("% Overall Passing")
lower_five_schools

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	School Type	Total Students	Total Budget	Per Student Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing
School
Figueroa High School	District	2949	$1,884,411.00	$639.00	76.71	81.16	77.18	94.54	85.86
Ford High School	District	2739	$1,763,916.00	$644.00	77.10	80.75	78.20	93.87	86.04
Huang High School	District	2917	$1,910,635.00	$655.00	76.63	81.18	77.72	94.48	86.10
Hernandez High School	District	4635	$3,022,020.00	$652.00	77.29	80.93	77.73	94.61	86.17
Johnson High School	District	4761	$3,094,650.00	$650.00	77.07	80.97	77.97	94.48	86.22

# math scores by grade
twelfthgrade = students_df.loc[students_df["grade"] == "12th"].groupby("school")["math_score"].mean()
eleventhgrade = students_df.loc[students_df["grade"] == "11th"].groupby("school")["math_score"].mean()
tenthgrade = students_df.loc[students_df["grade"] == "10th"].groupby("school")["math_score"].mean()
ninthgrade = students_df.loc[students_df["grade"] == "9th"].groupby("school")["math_score"].mean()

math_scores = pd.DataFrame({
        "9th": ninthgrade,
        "10th": tenthgrade,
        "11th": eleventhgrade,
        "12th": twelfthgrade
})

math_scores = math_scores[["9th", "10th", "11th", "12th"]]
math_scores.index.name = " "
math_scores

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	9th	10th	11th	12th

Bailey High School	77.08	77.00	77.52	76.49
Cabrera High School	83.09	83.15	82.77	83.28
Figueroa High School	76.40	76.54	76.88	77.15
Ford High School	77.36	77.67	76.92	76.18
Griffin High School	82.04	84.23	83.84	83.36
Hernandez High School	77.44	77.34	77.14	77.19
Holden High School	83.79	83.43	85.00	82.86
Huang High School	77.03	75.91	76.45	77.23
Johnson High School	77.19	76.69	77.49	76.86
Pena High School	83.63	83.37	84.33	84.12
Rodriguez High School	76.86	76.61	76.40	77.69
Shelton High School	83.42	82.92	83.38	83.78
Thomas High School	83.59	83.09	83.50	83.50
Wilson High School	83.09	83.72	83.20	83.04
Wright High School	83.26	84.01	83.84	83.64

# reading scores by grade
twelfthgrade = students_df.loc[students_df["grade"] == "12th"].groupby("school")["reading_score"].mean()
eleventhgrade = students_df.loc[students_df["grade"] == "11th"].groupby("school")["reading_score"].mean()
tenthgrade = students_df.loc[students_df["grade"] == "10th"].groupby("school")["reading_score"].mean()
ninthgrade = students_df.loc[students_df["grade"] == "9th"].groupby("school")["reading_score"].mean()

reading_scores = pd.DataFrame({
        "9th": ninthgrade,
        "10th": tenthgrade,
        "11th": eleventhgrade,
        "12th": twelfthgrade
})

reading_scores = reading_scores[["9th", "10th", "11th", "12th"]]
reading_scores.index.name = " "
reading_scores

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	9th	10th	11th	12th

Bailey High School	81.30	80.91	80.95	80.91
Cabrera High School	83.68	84.25	83.79	84.29
Figueroa High School	81.20	81.41	80.64	81.38
Ford High School	80.63	81.26	80.40	80.66
Griffin High School	83.37	83.71	84.29	84.01
Hernandez High School	80.87	80.66	81.40	80.86
Holden High School	83.68	83.32	83.82	84.70
Huang High School	81.29	81.51	81.42	80.31
Johnson High School	81.26	80.77	80.62	81.23
Pena High School	83.81	83.61	84.34	84.59
Rodriguez High School	80.99	80.63	80.86	80.38
Shelton High School	84.12	83.44	84.37	82.78
Thomas High School	83.73	84.25	83.59	83.83
Wilson High School	83.94	84.02	83.76	84.32
Wright High School	83.83	83.81	84.16	84.07

# compare results based on per student spending:
spend_bins = [0, 580, 605, 630, 655]
spend_labels = ["Less than $580", "Medium $580-605", "Average $605-630", "Above Average $630-655"]
spend_categories = pd.cut(pass_count_merge_df["Per Student Budget"], spend_bins, labels=spend_labels)

breakdown_by_spending_df = pd.DataFrame({'Spend Level':spend_categories,
                                        'Average Math Score':aves_merge_df['math_score'],                                       
                                        'Average Reading Score':aves_merge_df['reading_score'],
                                        '% Passing Math':pass_count_merge_df['% Passing Math'], 
                                        '% Passing Reading':pass_count_merge_df['% Passing Reading'],
                                        'Overall Passing Rate':pass_count_merge_df['% Overall Passing']})

breakdown_by_spending_df.sort_values('Spend Level')
grouped_breakdown_by_spending_df = breakdown_by_spending_df.groupby(['Spend Level'])

spending_per_student = grouped_breakdown_by_spending_df.mean()
spending_per_student = spending_per_student[['Average Math Score', 'Average Reading Score', '% Passing Math',
                                             '% Passing Reading', 'Overall Passing Rate']]
spending_per_student

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
Spend Level
Less than $580	83.27	83.99	100.00	100.00	100.00
Medium $580-605	83.48	83.87	100.00	100.00	100.00
Average $605-630	81.41	82.97	92.64	98.18	95.41
Above Average $630-655	77.87	81.37	80.96	95.23	88.10

# scores by school size
school_size_bins = [0, 500, 2750, 5000]
school_size_labels = ["Small (<500)", "Medium (500-2750)", "Large (2750-5000)"]
school_size_categories = pd.cut(pass_count_merge_df["size"], school_size_bins, labels=school_size_labels)

breakdown_by_size_df = pd.DataFrame({'School Size':school_size_categories,
                                        'Average Math Score':aves_merge_df['math_score'],                                       
                                        'Average Reading Score':aves_merge_df['reading_score'],
                                        '% Passing Math':pass_count_merge_df['% Passing Math'], 
                                        '% Passing Reading':pass_count_merge_df['% Passing Reading'],
                                        'Overall Passing Rate':pass_count_merge_df['% Overall Passing']})

grouped_breakdown_by_size_df = breakdown_by_size_df.groupby(['School Size'])

size_comparison = grouped_breakdown_by_size_df.mean()
size_comparison = size_comparison[['Average Math Score', 'Average Reading Score', '% Passing Math',
                                   '% Passing Reading', 'Overall Passing Rate']]
size_comparison

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
School Size
Small (<500)	83.80	83.81	100.00	100.00	100.00
Medium (500-2750)	82.64	83.51	97.28	99.23	98.25
Large(2750-5000)	76.93	81.00	77.74	94.55	86.14

# scores by school type (District/Charter)
school_types = schools_df[['type','school']]
calcs_by_type_df = pd.DataFrame({'school':aves_merge_df['school'],
                                        'Average Math Score':aves_merge_df['math_score'],                                       
                                        'Average Reading Score':aves_merge_df['reading_score'],
                                        '% Passing Math':pass_count_merge_df['% Passing Math'], 
                                        '% Passing Reading':pass_count_merge_df['% Passing Reading'],
                                        'Overall Passing Rate':pass_count_merge_df['% Overall Passing']})
breakdown_by_type_df = pd.merge(calcs_by_type_df, school_types,on='school')
breakdown_by_type_df = breakdown_by_type_df.rename(index=str, columns={"type":"Type"})

grouped_breakdown_by_type_df = breakdown_by_type_df.groupby(['Type'])

chart_vs_dist_df = grouped_breakdown_by_type_df.mean()
chart_vs_dist_df = chart_vs_dist_df[['Average Math Score', 'Average Reading Score','% Passing Math',
                                     '% Passing Reading','Overall Passing Rate']]
chart_vs_dist_df

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
Type
Charter	83.47	83.90	100.00	100.00	100.00
District	76.96	80.97	77.81	94.45	86.13

Specs:

This is an overview analysis of large high schools using a Python Data Analysis Library called Pandas.
The Data Source is based on two CSV files provided in the raw_data folder and outputs a series of reports to analyze key factors in test score results across a district including charter schools.
Dataframes are used to parse the records. (i.e. pd.DataFrame)

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Data Analysis		Data Analysis
Images		Images
LICENSE		LICENSE
README.md		README.md
READMORE.md		READMORE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Academy of Py

Three trends based on this dataset:

Taking into consideration the following:

Specs:

About

Releases

Packages

Languages

License

risatino/academy-of-py

Folders and files

Latest commit

History

Repository files navigation

Academy of Py

Three trends based on this dataset:

Taking into consideration the following:

Specs:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages