diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md index df3899f1..41eb13e9 100644 --- a/CONTRIBUTORS.md +++ b/CONTRIBUTORS.md @@ -12,7 +12,7 @@ https://doi.org/10.6084/m9.figshare.1314459 Which was adapted from the paper: S. K. Morgan Ernest, Thomas J. Valone, and James H. Brown. 2009. Long-term monitoring and experimental manipulation of a Chihuahuan Desert ecosystem near Portal, Arizona, USA. Ecology 90:1708. -http://esapubs.org/archive/ecol/E090/118/ +https://esapubs.org/archive/ecol/E090/118/ ## Contributors diff --git a/episodes/01-sql-basic-queries.md b/episodes/01-sql-basic-queries.md index 4cf6d57e..fa3f9c49 100644 --- a/episodes/01-sql-basic-queries.md +++ b/episodes/01-sql-basic-queries.md @@ -26,7 +26,7 @@ what plot they were captured on, their species ID, sex and weight in grams. Let's write an SQL query that selects all of the columns in the surveys table. SQL queries can be written in the box located under the "Execute SQL" tab. Click on the right arrow above the query box to execute the query. (You can also use the keyboard shortcut "Cmd-Enter" on a Mac or "Ctrl-Enter" on a Windows machine to execute a query.) The results are displayed in the box below your query. If you want to display all of the columns in a table, use the wildcard \*. -``` +```sql SELECT * FROM surveys; ``` @@ -36,7 +36,7 @@ SQL is case insensitive, but it helps for readability, and is good style. If we want to select a single column, we can type the column name instead of the wildcard \*. -``` +```sql SELECT year FROM surveys; ``` @@ -44,7 +44,7 @@ FROM surveys; If we want more information, we can add more columns to the list of fields, right after SELECT: -``` +```sql SELECT year, month, day FROM surveys; ``` @@ -53,7 +53,7 @@ FROM surveys; Sometimes you don't want to see all the results, you just want to get a sense of what's being returned. In that case, you can use a `LIMIT` clause. In particular, you would want to do this if you were working with large databases. -``` +```sql SELECT * FROM surveys LIMIT 10; @@ -64,7 +64,7 @@ LIMIT 10; If we want only the unique values so that we can quickly see what species have been sampled we use `DISTINCT` -``` +```sql SELECT DISTINCT species_id FROM surveys; ``` @@ -72,7 +72,7 @@ FROM surveys; If we select more than one column, then the distinct pairs of values are returned -``` +```sql SELECT DISTINCT year, species_id FROM surveys; ``` @@ -83,7 +83,7 @@ We can also do calculations with the values in a query. For example, if we wanted to look at the mass of each individual on different dates, but we needed it in kg instead of g we would use -``` +```sql SELECT year, month, day, weight/1000 FROM surveys; ``` @@ -95,7 +95,7 @@ correct results in that case divide by `1000.0`. Expressions can use any fields, any arithmetic operators (`+`, `-`, `*`, and `/`) and a variety of built-in functions. For example, we could round the values to make them easier to read. -``` +```sql SELECT plot_id, species_id, sex, weight, ROUND(weight / 1000, 2) FROM surveys; ``` @@ -126,7 +126,7 @@ criteria. For example, let's say we only want data for the species *Dipodomys merriami*, which has a species code of DM. We need to add a `WHERE` clause to our query: -``` +```sql SELECT * FROM surveys WHERE species_id='DM'; @@ -135,7 +135,7 @@ WHERE species_id='DM'; We can do the same thing with numbers. Here, we only want the data since 2000: -``` +```sql SELECT * FROM surveys WHERE year >= 2000; ``` @@ -147,7 +147,7 @@ We can use more sophisticated conditions by combining tests with `AND` and `OR`. For example, suppose we want the data on *Dipodomys merriami* starting in the year 2000: -``` +```sql SELECT * FROM surveys WHERE (year >= 2000) AND (species_id = 'DM'); @@ -160,7 +160,7 @@ in the way that we intend. If we wanted to get data for any of the *Dipodomys* species, which have species codes `DM`, `DO`, and `DS`, we could combine the tests using OR: -``` +```sql SELECT * FROM surveys WHERE (species_id = 'DM') OR (species_id = 'DO') OR (species_id = 'DS'); @@ -194,7 +194,7 @@ Now, let's combine the above queries to get data for the 3 *Dipodomys* species f the year 2000 on. This time, let's use IN as one way to make the query easier to understand. It is equivalent to saying `WHERE (species_id = 'DM') OR (species_id = 'DO') OR (species_id = 'DS')`, but reads more neatly: -``` +```sql SELECT * FROM surveys WHERE (year >= 2000) AND (species_id IN ('DM', 'DO', 'DS')); @@ -210,7 +210,7 @@ When the queries become more complex, it can be useful to add comments. In SQL, comments are started by `--`, and end at the end of the line. For example, a commented version of the above query can be written as: -``` +```sql -- Get post 2000 data on Dipodomys' species -- These are in the surveys table, and we are interested in all columns SELECT * FROM surveys @@ -231,14 +231,14 @@ For simplicity, let's go back to the **species** table and alphabetize it by tax First, let's look at what's in the **species** table. It's a table of the species\_id and the full genus, species and taxa information for each species\_id. Having this in a separate table is nice, because we didn't need to include all this information in our main **surveys** table. -``` +```sql SELECT * FROM species; ``` Now let's order it by taxa. -``` +```sql SELECT * FROM species ORDER BY taxa ASC; @@ -247,7 +247,7 @@ ORDER BY taxa ASC; The keyword `ASC` tells us to order it in ascending order. We could alternately use `DESC` to get descending order. -``` +```sql SELECT * FROM species ORDER BY taxa DESC; @@ -258,7 +258,7 @@ ORDER BY taxa DESC; We can also sort on several fields at once. To truly be alphabetical, we might want to order by genus then species. -``` +```sql SELECT * FROM species ORDER BY genus ASC, species ASC; @@ -291,7 +291,7 @@ Another note for ordering. We don't actually have to display a column to sort by it. For example, let's say we want to order the birds by their species ID, but we only want to see genus and species. -``` +```sql SELECT genus, species FROM species WHERE taxa = 'Bird' diff --git a/episodes/02-sql-aggregation.md b/episodes/02-sql-aggregation.md index c65df9e9..a8044d69 100644 --- a/episodes/02-sql-aggregation.md +++ b/episodes/02-sql-aggregation.md @@ -29,14 +29,14 @@ calculating combined values in groups. Let's go to the surveys table and find out how many individuals there are. Using the wildcard \* counts the number of records (rows): -``` +```sql SELECT COUNT(*) FROM surveys; ``` We can also find out how much all of those individuals weigh: -``` +```sql SELECT COUNT(*), SUM(weight) FROM surveys; ``` @@ -44,7 +44,7 @@ FROM surveys; We can output this value in kilograms (dividing the value by 1000.00), then rounding to 3 decimal places: (Notice the divisor has numbers after the decimal point, which forces the answer to have a decimal fraction) -``` +```sql SELECT ROUND(SUM(weight)/1000.00, 3) FROM surveys; ``` @@ -82,7 +82,7 @@ WHERE (weight > 5) AND (weight < 10); Now, let's see how many individuals were counted in each species. We do this using a `GROUP BY` clause -``` +```sql SELECT species_id, COUNT(*) FROM surveys GROUP BY species_id; @@ -135,7 +135,7 @@ We can order the results of our aggregation by a specific column, including the aggregated column. Let's count the number of individuals of each species captured, ordered by the count: -``` +```sql SELECT species_id, COUNT(*) FROM surveys GROUP BY species_id @@ -149,14 +149,14 @@ clearer in the query and in its output, we can use aliases to assign new names t We can use aliases in column names using `AS`: -``` +```sql SELECT MAX(year) AS last_surveyed_year FROM surveys; ``` The `AS` isn't technically required, so you could do -``` +```sql SELECT MAX(year) last_surveyed_year FROM surveys; ``` @@ -165,14 +165,14 @@ but using `AS` is much clearer so it is good style to include it. We can not only alias column names, but also table names in the same way: -``` +```sql SELECT * FROM surveys AS surv; ``` And again, the `AS` keyword is not required, so this works, too: -``` +```sql SELECT * FROM surveys surv; ``` @@ -188,7 +188,7 @@ filter the results based on **aggregate functions**, through the `HAVING` keywor For example, we can request to only return information about species with a count higher than 10: -``` +```sql SELECT species_id, COUNT(species_id) FROM surveys GROUP BY species_id @@ -203,7 +203,7 @@ to that alias in the `HAVING` clause. For example, in the above query, we can call the `COUNT(species_id)` by another name, like `occurrences`. This can be written this way: -``` +```sql SELECT species_id, COUNT(species_id) AS occurrences FROM surveys GROUP BY species_id @@ -251,7 +251,7 @@ before the query itself. For example, imagine that our project only covers the data gathered during the summer (May - September) of 2000. That query would look like: -``` +```sql SELECT * FROM surveys WHERE year = 2000 AND (month > 4 AND month < 10); @@ -260,7 +260,7 @@ WHERE year = 2000 AND (month > 4 AND month < 10); But we don't want to have to type that every time we want to ask a question about that particular subset of data. Hence, we can benefit from a view: -``` +```sql CREATE VIEW summer_2000 AS SELECT * FROM surveys @@ -269,7 +269,7 @@ WHERE year = 2000 AND (month > 4 AND month < 10); Using a view we will be able to access these results with a much shorter notation: -``` +```sql SELECT * FROM summer_2000 WHERE species_id = 'PE'; @@ -281,7 +281,7 @@ From the last example, there should only be five records. If you look at the `w easy to see what the average weight would be. If we use SQL to find the average weight, SQL behaves like we would hope, ignoring the NULL values: -``` +```sql SELECT AVG(weight) FROM summer_2000 WHERE species_id = 'PE'; @@ -290,7 +290,7 @@ WHERE species_id = 'PE'; But if we try to be extra clever, and find the average ourselves, we might get tripped up: -``` +```sql SELECT SUM(weight), COUNT(*), SUM(weight)/COUNT(*) FROM summer_2000 WHERE species_id = 'PE'; @@ -301,7 +301,7 @@ values), but the `SUM` only includes the three records with data in the `weight` field, giving us an incorrect average. However, our strategy *will* work if we modify the `COUNT` function slightly: -``` +```sql SELECT SUM(weight), COUNT(weight), SUM(weight)/COUNT(weight) FROM summer_2000 WHERE species_id = 'PE'; @@ -314,7 +314,7 @@ missing in that field. So here is one example where NULLs can be tricky: Another case is when we use a "negative" query. Let's count all the non-female animals: -``` +```sql SELECT COUNT(*) FROM summer_2000 WHERE sex != 'F'; @@ -322,7 +322,7 @@ WHERE sex != 'F'; Now let's count all the non-male animals: -``` +```sql SELECT COUNT(*) FROM summer_2000 WHERE sex != 'M'; @@ -330,7 +330,7 @@ WHERE sex != 'M'; But if we compare those two numbers with the total: -``` +```sql SELECT COUNT(*) FROM summer_2000; ``` @@ -343,7 +343,7 @@ returns the 'not NULL, not x' group. Sometimes this may be what we want - but sometimes we may want the missing values included as well! In that case, we'd need to change our query to: -``` +```sql SELECT COUNT(*) FROM summer_2000 WHERE sex != 'M' OR sex IS NULL; diff --git a/episodes/03-sql-joins.md b/episodes/03-sql-joins.md index 4f4a7abe..3a4f5c3d 100644 --- a/episodes/03-sql-joins.md +++ b/episodes/03-sql-joins.md @@ -37,7 +37,7 @@ For that, we need to tell the computer which columns provide the link between th tables using the word `ON`. What we want is to join the data with the same species id. -``` +```sql SELECT * FROM surveys JOIN species @@ -63,7 +63,7 @@ works on columns which share the same name. In this case we are telling the manager that we want to combine `surveys` with `species` and that the common column is `species_id`. -``` +```sql SELECT * FROM surveys JOIN species @@ -85,7 +85,7 @@ For example, what if we wanted information on when individuals of each species were captured, but instead of their species ID we wanted their actual species names. -``` +```sql SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species FROM surveys JOIN species @@ -102,7 +102,7 @@ ON surveys.species_id = species.species_id; Many databases, including SQLite, also support a join through the `WHERE` clause of a query. For example, you may see the query above written without an explicit JOIN. -``` +```sql SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species FROM surveys, species WHERE surveys.species_id = species.species_id; @@ -137,7 +137,7 @@ ON surveys.species_id = species.species_id; We can count the number of records returned by our original join query. -``` +```sql SELECT COUNT(*) FROM surveys JOIN species @@ -147,7 +147,7 @@ USING (species_id); Notice that this number is smaller than the number of records present in the survey data. -``` +```sql SELECT COUNT(*) FROM surveys; ``` @@ -211,7 +211,7 @@ Joins can be combined with sorting, filtering, and aggregation. So, if we wanted average mass of the individuals on each different type of treatment, we could do something like -``` +```sql SELECT plots.plot_type, AVG(surveys.weight) FROM surveys JOIN plots @@ -276,7 +276,7 @@ place of `NULL`. We can represent unknown sexes with `'U'` instead of `NULL`: -``` +```sql SELECT species_id, sex, COALESCE(sex, 'U') FROM surveys; ``` @@ -330,7 +330,7 @@ GROUP BY species_id; was `NULL` in the surveys table. We can use `COALESCE` to include them again, re-writing the `NULL` to a valid joining value: -``` +```sql SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species FROM surveys JOIN species @@ -366,7 +366,7 @@ is returned. This is useful for "nulling out" specific values. We can "null out" plot 7: -``` +```sql SELECT species_id, plot_id, NULLIF(plot_id, 7) FROM surveys; ``` @@ -417,7 +417,7 @@ ORDER BY LENGTH(genus) DESC; As we saw before, aliases make things clearer, and are especially useful when joining tables. -``` +```sql SELECT surv.year AS yr, surv.month AS mo, surv.day AS day, sp.genus AS gen, sp.species AS sp FROM surveys AS surv JOIN species AS sp diff --git a/learners/setup.md b/learners/setup.md index 65f1f0d5..9b344eb2 100644 --- a/learners/setup.md +++ b/learners/setup.md @@ -34,7 +34,6 @@ for SQLite**, so it does not have to be installed separately. Launch **DB Browser for SQLite** to confirm that the installation was successful. - :::::::::::::::::::::::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::::::: prereq