Skip to content

Commit

Permalink
[custom] fix lesson contents
Browse files Browse the repository at this point in the history
  • Loading branch information
zkamvar authored and Carpentries Apprentice committed Apr 21, 2023
1 parent 5867e8b commit 5ffc221
Show file tree
Hide file tree
Showing 5 changed files with 52 additions and 53 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ https://doi.org/10.6084/m9.figshare.1314459

Which was adapted from the paper: S. K. Morgan Ernest, Thomas J. Valone, and James H. Brown. 2009. Long-term monitoring and experimental manipulation of a Chihuahuan Desert ecosystem near Portal, Arizona, USA. Ecology 90:1708.

http://esapubs.org/archive/ecol/E090/118/
https://esapubs.org/archive/ecol/E090/118/

## Contributors

Expand Down
38 changes: 19 additions & 19 deletions episodes/01-sql-basic-queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ what plot they were captured on, their species ID, sex and weight in grams.

Let's write an SQL query that selects all of the columns in the surveys table. SQL queries can be written in the box located under the "Execute SQL" tab. Click on the right arrow above the query box to execute the query. (You can also use the keyboard shortcut "Cmd-Enter" on a Mac or "Ctrl-Enter" on a Windows machine to execute a query.) The results are displayed in the box below your query. If you want to display all of the columns in a table, use the wildcard \*.

```
```sql
SELECT *
FROM surveys;
```
Expand All @@ -36,15 +36,15 @@ SQL is case insensitive, but it helps for readability, and is good style.

If we want to select a single column, we can type the column name instead of the wildcard \*.

```
```sql
SELECT year
FROM surveys;
```

If we want more information, we can add more columns to the list of fields,
right after SELECT:

```
```sql
SELECT year, month, day
FROM surveys;
```
Expand All @@ -53,7 +53,7 @@ FROM surveys;

Sometimes you don't want to see all the results, you just want to get a sense of what's being returned. In that case, you can use a `LIMIT` clause. In particular, you would want to do this if you were working with large databases.

```
```sql
SELECT *
FROM surveys
LIMIT 10;
Expand All @@ -64,15 +64,15 @@ LIMIT 10;
If we want only the unique values so that we can quickly see what species have
been sampled we use `DISTINCT`

```
```sql
SELECT DISTINCT species_id
FROM surveys;
```

If we select more than one column, then the distinct pairs of values are
returned

```
```sql
SELECT DISTINCT year, species_id
FROM surveys;
```
Expand All @@ -83,7 +83,7 @@ We can also do calculations with the values in a query.
For example, if we wanted to look at the mass of each individual
on different dates, but we needed it in kg instead of g we would use

```
```sql
SELECT year, month, day, weight/1000
FROM surveys;
```
Expand All @@ -95,7 +95,7 @@ correct results in that case divide by `1000.0`. Expressions can use any fields,
any arithmetic operators (`+`, `-`, `*`, and `/`) and a variety of built-in
functions. For example, we could round the values to make them easier to read.

```
```sql
SELECT plot_id, species_id, sex, weight, ROUND(weight / 1000, 2)
FROM surveys;
```
Expand Down Expand Up @@ -126,7 +126,7 @@ criteria. For example, let's say we only want data for the species
*Dipodomys merriami*, which has a species code of DM. We need to add a
`WHERE` clause to our query:

```
```sql
SELECT *
FROM surveys
WHERE species_id='DM';
Expand All @@ -135,7 +135,7 @@ WHERE species_id='DM';
We can do the same thing with numbers.
Here, we only want the data since 2000:

```
```sql
SELECT * FROM surveys
WHERE year >= 2000;
```
Expand All @@ -147,7 +147,7 @@ We can use more sophisticated conditions by combining tests
with `AND` and `OR`. For example, suppose we want the data on *Dipodomys merriami*
starting in the year 2000:

```
```sql
SELECT *
FROM surveys
WHERE (year >= 2000) AND (species_id = 'DM');
Expand All @@ -160,7 +160,7 @@ in the way that we intend.
If we wanted to get data for any of the *Dipodomys* species, which have
species codes `DM`, `DO`, and `DS`, we could combine the tests using OR:

```
```sql
SELECT *
FROM surveys
WHERE (species_id = 'DM') OR (species_id = 'DO') OR (species_id = 'DS');
Expand Down Expand Up @@ -194,7 +194,7 @@ Now, let's combine the above queries to get data for the 3 *Dipodomys* species f
the year 2000 on. This time, let's use IN as one way to make the query easier
to understand. It is equivalent to saying `WHERE (species_id = 'DM') OR (species_id = 'DO') OR (species_id = 'DS')`, but reads more neatly:

```
```sql
SELECT *
FROM surveys
WHERE (year >= 2000) AND (species_id IN ('DM', 'DO', 'DS'));
Expand All @@ -210,7 +210,7 @@ When the queries become more complex, it can be useful to add comments. In SQL,
comments are started by `--`, and end at the end of the line. For example, a
commented version of the above query can be written as:

```
```sql
-- Get post 2000 data on Dipodomys' species
-- These are in the surveys table, and we are interested in all columns
SELECT * FROM surveys
Expand All @@ -231,14 +231,14 @@ For simplicity, let's go back to the **species** table and alphabetize it by tax
First, let's look at what's in the **species** table. It's a table of the species\_id and the full genus, species and taxa information for each species\_id. Having this in a separate table is nice, because we didn't need to include all
this information in our main **surveys** table.

```
```sql
SELECT *
FROM species;
```

Now let's order it by taxa.

```
```sql
SELECT *
FROM species
ORDER BY taxa ASC;
Expand All @@ -247,7 +247,7 @@ ORDER BY taxa ASC;
The keyword `ASC` tells us to order it in ascending order.
We could alternately use `DESC` to get descending order.

```
```sql
SELECT *
FROM species
ORDER BY taxa DESC;
Expand All @@ -258,7 +258,7 @@ ORDER BY taxa DESC;
We can also sort on several fields at once.
To truly be alphabetical, we might want to order by genus then species.

```
```sql
SELECT *
FROM species
ORDER BY genus ASC, species ASC;
Expand Down Expand Up @@ -291,7 +291,7 @@ Another note for ordering. We don't actually have to display a column to sort by
it. For example, let's say we want to order the birds by their species ID, but
we only want to see genus and species.

```
```sql
SELECT genus, species
FROM species
WHERE taxa = 'Bird'
Expand Down
42 changes: 21 additions & 21 deletions episodes/02-sql-aggregation.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,22 +29,22 @@ calculating combined values in groups.
Let's go to the surveys table and find out how many individuals there are.
Using the wildcard \* counts the number of records (rows):

```
```sql
SELECT COUNT(*)
FROM surveys;
```

We can also find out how much all of those individuals weigh:

```
```sql
SELECT COUNT(*), SUM(weight)
FROM surveys;
```

We can output this value in kilograms (dividing the value by 1000.00), then rounding to 3 decimal places:
(Notice the divisor has numbers after the decimal point, which forces the answer to have a decimal fraction)

```
```sql
SELECT ROUND(SUM(weight)/1000.00, 3)
FROM surveys;
```
Expand Down Expand Up @@ -82,7 +82,7 @@ WHERE (weight > 5) AND (weight < 10);
Now, let's see how many individuals were counted in each species. We do this
using a `GROUP BY` clause

```
```sql
SELECT species_id, COUNT(*)
FROM surveys
GROUP BY species_id;
Expand Down Expand Up @@ -135,7 +135,7 @@ We can order the results of our aggregation by a specific column, including
the aggregated column. Let's count the number of individuals of each
species captured, ordered by the count:

```
```sql
SELECT species_id, COUNT(*)
FROM surveys
GROUP BY species_id
Expand All @@ -149,14 +149,14 @@ clearer in the query and in its output, we can use aliases to assign new names t

We can use aliases in column names using `AS`:

```
```sql
SELECT MAX(year) AS last_surveyed_year
FROM surveys;
```

The `AS` isn't technically required, so you could do

```
```sql
SELECT MAX(year) last_surveyed_year
FROM surveys;
```
Expand All @@ -165,14 +165,14 @@ but using `AS` is much clearer so it is good style to include it.

We can not only alias column names, but also table names in the same way:

```
```sql
SELECT *
FROM surveys AS surv;
```

And again, the `AS` keyword is not required, so this works, too:

```
```sql
SELECT *
FROM surveys surv;
```
Expand All @@ -188,7 +188,7 @@ filter the results based on **aggregate functions**, through the `HAVING` keywor
For example, we can request to only return information
about species with a count higher than 10:

```
```sql
SELECT species_id, COUNT(species_id)
FROM surveys
GROUP BY species_id
Expand All @@ -203,7 +203,7 @@ to that alias in the `HAVING` clause.
For example, in the above query, we can call the `COUNT(species_id)` by
another name, like `occurrences`. This can be written this way:

```
```sql
SELECT species_id, COUNT(species_id) AS occurrences
FROM surveys
GROUP BY species_id
Expand Down Expand Up @@ -251,7 +251,7 @@ before the query itself. For example, imagine that our project only covers
the data gathered during the summer (May - September) of 2000. That
query would look like:

```
```sql
SELECT *
FROM surveys
WHERE year = 2000 AND (month > 4 AND month < 10);
Expand All @@ -260,7 +260,7 @@ WHERE year = 2000 AND (month > 4 AND month < 10);
But we don't want to have to type that every time we want to ask a
question about that particular subset of data. Hence, we can benefit from a view:

```
```sql
CREATE VIEW summer_2000 AS
SELECT *
FROM surveys
Expand All @@ -269,7 +269,7 @@ WHERE year = 2000 AND (month > 4 AND month < 10);

Using a view we will be able to access these results with a much shorter notation:

```
```sql
SELECT *
FROM summer_2000
WHERE species_id = 'PE';
Expand All @@ -281,7 +281,7 @@ From the last example, there should only be five records. If you look at the `w
easy to see what the average weight would be. If we use SQL to find the
average weight, SQL behaves like we would hope, ignoring the NULL values:

```
```sql
SELECT AVG(weight)
FROM summer_2000
WHERE species_id = 'PE';
Expand All @@ -290,7 +290,7 @@ WHERE species_id = 'PE';
But if we try to be extra clever, and find the average ourselves,
we might get tripped up:

```
```sql
SELECT SUM(weight), COUNT(*), SUM(weight)/COUNT(*)
FROM summer_2000
WHERE species_id = 'PE';
Expand All @@ -301,7 +301,7 @@ values), but the `SUM` only includes the three records with data in the
`weight` field, giving us an incorrect average. However,
our strategy *will* work if we modify the `COUNT` function slightly:

```
```sql
SELECT SUM(weight), COUNT(weight), SUM(weight)/COUNT(weight)
FROM summer_2000
WHERE species_id = 'PE';
Expand All @@ -314,23 +314,23 @@ missing in that field. So here is one example where NULLs can be tricky:
Another case is when we use a "negative" query. Let's count all the
non-female animals:

```
```sql
SELECT COUNT(*)
FROM summer_2000
WHERE sex != 'F';
```

Now let's count all the non-male animals:

```
```sql
SELECT COUNT(*)
FROM summer_2000
WHERE sex != 'M';
```

But if we compare those two numbers with the total:

```
```sql
SELECT COUNT(*)
FROM summer_2000;
```
Expand All @@ -343,7 +343,7 @@ returns the 'not NULL, not x' group. Sometimes this may be what we want -
but sometimes we may want the missing values included as well! In that
case, we'd need to change our query to:

```
```sql
SELECT COUNT(*)
FROM summer_2000
WHERE sex != 'M' OR sex IS NULL;
Expand Down
Loading

0 comments on commit 5ffc221

Please sign in to comment.