From 6abcf21b4eb69b43d542ce425430c31afbe37740 Mon Sep 17 00:00:00 2001 From: Matt Ahrens Date: Thu, 9 Nov 2023 20:50:43 -0600 Subject: [PATCH] Moving lesson answers to answer key --- docs/05-Querying-data-in-python.md | 26 ---- docs/06-Writing-sql-query.md | 27 ---- docs/07-Advanced-sql-queries.md | 42 ------ docs/09-Advanced-join-queries.md | 46 ------ docs/10-Data-visualization-in-python.md | 48 ------ docs/answer_key.md | 188 +++++++++++++++++++++++- 6 files changed, 187 insertions(+), 190 deletions(-) diff --git a/docs/05-Querying-data-in-python.md b/docs/05-Querying-data-in-python.md index 9219cb7..85fa436 100644 --- a/docs/05-Querying-data-in-python.md +++ b/docs/05-Querying-data-in-python.md @@ -88,29 +88,3 @@ If you have successfully built all of those queries to answer the questions, the ## Summary In this lesson, we learned how to write queries in Python using functions. We explored our book ratings datasets to ask questions of the data. We used different functions to help us get the answers we wanteds. Some of the functions included: `count()`, `groupby()`, `sort_values()`, and `head()`. - -## Answer key -1. What is the age of the users who did reviews grouped by each age? Hint: you will have to use the users dataset for this query. -``` -users_df.groupby('Age').count().sort_values(by=['Age']) -``` - -2. What is the overall average age of users? Hint: you will have to use the `mean()` function. -``` -users_df['Age'].mean() -``` - -3. What is the number of ratings at each ratings (0 - 10)? Hint: you will have to the use the ratings dataset. -``` -ratings_df.groupby('Book-Rating').count().sort_values(by=['Book-Rating']) -``` - -4. What is the overall average book rating from all ratings? Hint: you will have to use the `mean()` function. -``` -ratings_df['Book-Rating'].mean() -``` - -5. How many distinct authors are in the dataset? Hint: you will have to use the books dataset and the `nunique()` function. -``` -books_df['Book-Author'].nunique() -``` diff --git a/docs/06-Writing-sql-query.md b/docs/06-Writing-sql-query.md index 80ec30e..7ca6447 100644 --- a/docs/06-Writing-sql-query.md +++ b/docs/06-Writing-sql-query.md @@ -99,30 +99,3 @@ Now you can try to build your own SQL queries. Here are a few to start with: ## Summary In this lesson, we learned about SQL and what the main keywords in SQL mean. We then were able to write our own SQL queries in Python to ask questions of our data. We saw how writing a SQL query is similar to using functions. - -## Answer key -1. How many users are in the dataset? -``` -query = """ - SELECT count(*) - FROM users_df - """ -sqldf(query) -``` -2. How many books are in the dataset? -``` -query = """ - SELECT count(*) - FROM books_df - """ -sqldf(query) -``` -3. What are the minimum and maximum ratings that can be given for a book? (Hint: use `MIN()` and `MAX()` functions in the SELECT part of your query.) -``` -query = """ - SELECT MIN(`Book-Rating`), MAX(`Book-Rating`) - FROM ratings_df - """ -sqldf(query) -``` - diff --git a/docs/07-Advanced-sql-queries.md b/docs/07-Advanced-sql-queries.md index efa35fd..eb0b790 100644 --- a/docs/07-Advanced-sql-queries.md +++ b/docs/07-Advanced-sql-queries.md @@ -84,45 +84,3 @@ Now try to build your own advacned SQL queries. Here are a few to start with: ## Summary In this lesson, we learned about how to do more advanced queries, specifically in how to filter records for number and string fields. We also learned about how to count unique values in a dataset with the **DISTINCT** keyword. - -## Answer key -1. What book (ISBN) has the most ratings = 10 and which book (ISBN) has the most ratings = 0? -``` -query = """ - SELECT `ISBN`, count(*) as total - FROM ratings_df - WHERE `Book-Rating` = 10 - GROUP BY `ISBN` - ORDER BY total desc -""" -sqldf(query) - -query = """ - SELECT `ISBN`, count(*) as total - FROM ratings_df - WHERE `Book-Rating` = 0 - GROUP BY `ISBN` - ORDER BY total desc -""" -sqldf(query) -``` - -2. What is the average age for the top cities in the United States for users in the dataset? (Hint: use the **AVG** keyword in your SQL query.) -``` -query = """ - SELECT AVG(`Age`) - FROM users_df - WHERE `Location` LIKE "%usa%" -""" -sqldf(query) -``` - -3. How many unique publishers did J.K. Rowling use for her Harry Potter books? -``` -query = """ - SELECT count(distinct `Publisher`) - FROM books_df - WHERE `Book-Title` LIKE "%Harry Potter%" and `Book-Author` LIKE "%Rowling%" -""" -sqldf(query) -``` diff --git a/docs/09-Advanced-join-queries.md b/docs/09-Advanced-join-queries.md index 86bd187..80c44e0 100644 --- a/docs/09-Advanced-join-queries.md +++ b/docs/09-Advanced-join-queries.md @@ -45,49 +45,3 @@ Now you're ready to write your own advanced join queries in SQL with our books d ## Summary In this lesson, we wrote more advanced queries including a multi-join query to join three datasets together and also a left join. We also saw how join queries can include other functions and filters to get the answer desired from the query. - -## Answer key -1. What user location has the most number of book ratings? -``` -query = """ - SELECT `Location`, count(`Book-Rating`) as rating_cnt - FROM ratings_df - INNER JOIN users_df - ON ratings_df.`User-ID` = users_df.`User-ID` - GROUP BY users_df.`Location` - ORDER BY rating_cnt desc -""" -sqldf(query) -``` - -2. What publication year has the least popular books by average rating that has more than 10 ratings? -``` -query = """ - SELECT `Year-Of-Publication`, AVG(`Book-Rating`) as rating_avg - FROM books_df - INNER JOIN ratings_df - ON books_df.`ISBN` = ratings_df.`ISBN` - GROUP BY `Year-Of-Publication` - HAVING COUNT(`Book-Rating`) > 10 - ORDER BY rating_avg -""" -sqldf(query) -``` - -3. What age of users has the highest average rating for books that were published between 2000 and 2003? -``` -query = """ - SELECT `Age`, AVG(`Book-Rating`) as rating_avg - FROM ratings_df - INNER JOIN users_df - ON ratings_df.`User-ID` = users_df.`User-ID` - INNER JOIN books_df - ON ratings_df.`ISBN` = books_df.`ISBN` - WHERE `Year-Of-Publication` >= 2000 and `Year-Of-Publication` <= 2003 - GROUP BY users_df.`Age` - ORDER BY rating_avg desc -""" -sqldf(query) -``` - - diff --git a/docs/10-Data-visualization-in-python.md b/docs/10-Data-visualization-in-python.md index 0b5e971..b82acc9 100644 --- a/docs/10-Data-visualization-in-python.md +++ b/docs/10-Data-visualization-in-python.md @@ -85,51 +85,3 @@ Here are some visualization challenges for you to try out: ## Summary In this lesson, we explored 4 basic data visualizations and how they differ in displaying information about a dataset. We then used various plot functions in Python to display different types of data from the books datasets. - -## Answer key -1. Create a line chart to show the number of unique users who gave ratings per year of publication from 1992 to 2002. Hint: you will have to use the `DISTINCT` keyword. -``` -query = """ - SELECT `Year-Of-Publication` as year, count(distinct(users_df.`User-ID`)) as users - FROM ratings_df - INNER JOIN users_df - ON ratings_df.`User-ID` = users_df.`User-ID` - INNER JOIN books_df - ON ratings_df.`ISBN` = books_df.`ISBN` - WHERE year >= 1992 and year <= 2002 - GROUP BY year - ORDER BY year -""" -year_counts = sqldf(query) -year_counts.plot.line(x='year', y='users') -``` - -2. Create a pie chart for the number of books per year of publication from 1992 to 2002. -``` -query = """ - SELECT `Year-Of-Publication` as year, count(books_df.`ISBN`) as books - FROM ratings_df - INNER JOIN books_df - ON ratings_df.`ISBN` = books_df.`ISBN` - WHERE year >= 1992 and year <= 2002 - GROUP BY year - ORDER BY year -""" -year_counts = sqldf(query) -year_counts.plot.pie(x='year', y='books') -``` - -3. Create a scatter plot to show the relationship between year of publication and average book rating (for 1992 - 2002). Each book should be a single point in the plot. -``` -query = """ - SELECT `Year-Of-Publication` as year, avg(`Book-Rating`) as rating_avg - FROM ratings_df - INNER JOIN books_df - ON ratings_df.`ISBN` = books_df.`ISBN` - WHERE year >= 1992 and year <= 2002 - GROUP BY year - ORDER BY year -""" -year_counts = sqldf(query) -year_counts.plot.scatter(x='year', y='rating_avg') -``` diff --git a/docs/answer_key.md b/docs/answer_key.md index 5a55ca7..6959fca 100644 --- a/docs/answer_key.md +++ b/docs/answer_key.md @@ -7,4 +7,190 @@ description: Answer Key # Answer Key Below are answers to the practice problems from various lessons from the course. -## +## Lesson 5 +1. What is the age of the users who did reviews grouped by each age? Hint: you will have to use the users dataset for this query. +``` +users_df.groupby('Age').count().sort_values(by=['Age']) +``` + +2. What is the overall average age of users? Hint: you will have to use the `mean()` function. +``` +users_df['Age'].mean() +``` + +3. What is the number of ratings at each ratings (0 - 10)? Hint: you will have to the use the ratings dataset. +``` +ratings_df.groupby('Book-Rating').count().sort_values(by=['Book-Rating']) +``` + +4. What is the overall average book rating from all ratings? Hint: you will have to use the `mean()` function. +``` +ratings_df['Book-Rating'].mean() +``` + +5. How many distinct authors are in the dataset? Hint: you will have to use the books dataset and the `nunique()` function. +``` +books_df['Book-Author'].nunique() +``` + +## Lesson 6 +1. How many users are in the dataset? +``` +query = """ + SELECT count(*) + FROM users_df + """ +sqldf(query) +``` +2. How many books are in the dataset? +``` +query = """ + SELECT count(*) + FROM books_df + """ +sqldf(query) +``` +3. What are the minimum and maximum ratings that can be given for a book? (Hint: use `MIN()` and `MAX()` functions in the SELECT part of your query.) +``` +query = """ + SELECT MIN(`Book-Rating`), MAX(`Book-Rating`) + FROM ratings_df + """ +sqldf(query) +``` + +## Lesson 7 +1. What book (ISBN) has the most ratings = 10 and which book (ISBN) has the most ratings = 0? +``` +query = """ + SELECT `ISBN`, count(*) as total + FROM ratings_df + WHERE `Book-Rating` = 10 + GROUP BY `ISBN` + ORDER BY total desc +""" +sqldf(query) + +query = """ + SELECT `ISBN`, count(*) as total + FROM ratings_df + WHERE `Book-Rating` = 0 + GROUP BY `ISBN` + ORDER BY total desc +""" +sqldf(query) +``` + +2. What is the average age for the top cities in the United States for users in the dataset? (Hint: use the **AVG** keyword in your SQL query.) +``` +query = """ + SELECT AVG(`Age`) + FROM users_df + WHERE `Location` LIKE "%usa%" +""" +sqldf(query) +``` + +3. How many unique publishers did J.K. Rowling use for her Harry Potter books? +``` +query = """ + SELECT count(distinct `Publisher`) + FROM books_df + WHERE `Book-Title` LIKE "%Harry Potter%" and `Book-Author` LIKE "%Rowling%" +""" +sqldf(query) +``` + +## Lesson 9 +1. What user location has the most number of book ratings? +``` +query = """ + SELECT `Location`, count(`Book-Rating`) as rating_cnt + FROM ratings_df + INNER JOIN users_df + ON ratings_df.`User-ID` = users_df.`User-ID` + GROUP BY users_df.`Location` + ORDER BY rating_cnt desc +""" +sqldf(query) +``` + +2. What publication year has the least popular books by average rating that has more than 10 ratings? +``` +query = """ + SELECT `Year-Of-Publication`, AVG(`Book-Rating`) as rating_avg + FROM books_df + INNER JOIN ratings_df + ON books_df.`ISBN` = ratings_df.`ISBN` + GROUP BY `Year-Of-Publication` + HAVING COUNT(`Book-Rating`) > 10 + ORDER BY rating_avg +""" +sqldf(query) +``` + +3. What age of users has the highest average rating for books that were published between 2000 and 2003? +``` +query = """ + SELECT `Age`, AVG(`Book-Rating`) as rating_avg + FROM ratings_df + INNER JOIN users_df + ON ratings_df.`User-ID` = users_df.`User-ID` + INNER JOIN books_df + ON ratings_df.`ISBN` = books_df.`ISBN` + WHERE `Year-Of-Publication` >= 2000 and `Year-Of-Publication` <= 2003 + GROUP BY users_df.`Age` + ORDER BY rating_avg desc +""" +sqldf(query) +``` + +## Lesson 10 +1. Create a line chart to show the number of unique users who gave ratings per year of publication from 1992 to 2002. Hint: you will have to use the `DISTINCT` keyword. +``` +query = """ + SELECT `Year-Of-Publication` as year, count(distinct(users_df.`User-ID`)) as users + FROM ratings_df + INNER JOIN users_df + ON ratings_df.`User-ID` = users_df.`User-ID` + INNER JOIN books_df + ON ratings_df.`ISBN` = books_df.`ISBN` + WHERE year >= 1992 and year <= 2002 + GROUP BY year + ORDER BY year +""" +year_counts = sqldf(query) +year_counts.plot.line(x='year', y='users') +``` + +2. Create a pie chart for the number of books per year of publication from 1992 to 2002. +``` +query = """ + SELECT `Year-Of-Publication` as year, count(books_df.`ISBN`) as books + FROM ratings_df + INNER JOIN books_df + ON ratings_df.`ISBN` = books_df.`ISBN` + WHERE year >= 1992 and year <= 2002 + GROUP BY year + ORDER BY year +""" +year_counts = sqldf(query) +year_counts.plot.pie(x='year', y='books') +``` + +3. Create a scatter plot to show the relationship between year of publication and average book rating (for 1992 - 2002). Each book should be a single point in the plot. +``` +query = """ + SELECT `Year-Of-Publication` as year, avg(`Book-Rating`) as rating_avg + FROM ratings_df + INNER JOIN books_df + ON ratings_df.`ISBN` = books_df.`ISBN` + WHERE year >= 1992 and year <= 2002 + GROUP BY year + ORDER BY year +""" +year_counts = sqldf(query) +year_counts.plot.scatter(x='year', y='rating_avg') +``` + +