You are the lead data analyst for a popular music store. Help them analyze their sales and service!
Link to full description of project: https://discuss.codecademy.com/t/data-science-independent-project-2-explore-a-sample-database/419945
Image Source: (https://stackoverflow.com/questions/60346886/sql-more-efficient-sql-query-against-chinook-database)
The database has 11 inter-connected tables that represents a digital media store. It has information about artists, their music tracks and sales preformance. It also contains employees and customers records.
The database for MySql can be acquired in this Github Repository (https://github.com/lerocha/chinook-database).
To load the database on MySql, make sure to put the Chinook_MySql.sql
file in the MySql database directory.
To run the script, type SOURCE C:/Chinook_MySql.sql.txt
in MySql command line client.
To find the most popular tracks among the playlists, we use COUNT
function together with GROUP BY
to find out the number of times each TrackId
appears in the PlaylistTrack
table. Then we join with the Track
tables to also return the names of the tracks. Ordering Count
by descending order makes it easy to find the top tracks.
41 tracks are equally popular among all playlists (10 shown here), all appearing 5 times in all available playlists.
To better understand the most lucrative tracks, we aggregate the sales of each tracks with SUM
functions. We also examine if tracks from a particular genre or album generated more sales.
It seems that popularity in playlist does not directly translate into sales. The most lucrative tracks mostly come from TV shows and dramas.
We want to look at the sales figure by country. Additionally, we also want to look at the percentage makeup of each country in terms of total sales.
The music store has the highest sales in USA (over one-fifth), followed by Canada and France.
Now we want to see look at some summary statistics, such as the number of customers served by each employee, and the amount they spend on average. This is a bit more complex. We first use WITH
to generate a temporary tables with the total amount each customers spend, then join that with the employee table to get the sales performance of each employee.
So there are three employees who directly interact with customers, each with around 20 customers. Their average sales per customer is similar.
We can use the WITH
function to create a temporary table to determine the length of each album, then aggregate the length in minutes of each album.
Album length in time seems to be related to sales performance, longer albums generated more revenue than shorter ones.
We again use WITH
function to create temporary table that reflects the total revenue per track, then aggregate tracks in a playlist.
As we can see, there does not seem to be a strong relationship between the two.
We first use WITH
function to create two temporary tables, one for current year and one for previous year. Then join the two and calculate the percentage change.