Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to workflows and dependency versions #85

Merged
merged 9 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
strategy:
matrix:
version:
- '1.5'
- '1'
os:
- ubuntu-latest
- macOS-latest
Expand All @@ -25,15 +25,15 @@ jobs:
- x64
experimental: [false]
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- uses: julia-actions/setup-julia@v1
with:
version: ${{ matrix.version }}
arch: ${{ matrix.arch }}
- uses: julia-actions/julia-buildpkg@latest
- uses: julia-actions/julia-runtest@latest
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-runtest@v1
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v1
- uses: codecov/codecov-action@v3
with:
file: ./lcov.info
flags: unittests
Expand Down
37 changes: 29 additions & 8 deletions .github/workflows/CompatHelper.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,40 @@
name: CompatHelper

on:
push:
branches:
- master
schedule:
- cron: '00 00 * * *'

jobs:
CompatHelper:
runs-on: ubuntu-latest
runs-on: ${{ matrix.os }}
strategy:
matrix:
julia-version:
- '1'
arch:
- x86
os:
- ubuntu-latest
steps:
- uses: julia-actions/setup-julia@latest
- uses: julia-actions/setup-julia@v1
with:
version: 1.3
- name: Pkg.add("CompatHelper")
run: julia -e 'using Pkg; Pkg.add("CompatHelper")'
- name: CompatHelper.main()
version: ${{ matrix.julia-version }}
arch: ${{ matrix.arch }}
- name: "Install CompatHelper"
run: |
import Pkg
name = "CompatHelper"
uuid = "aa819f21-2bde-4658-8897-bab36330d9b7"
version = "3"
Pkg.add(; name, uuid, version)
shell: julia --color=yes {0}
- name: "Run CompatHelper"
run: |
import CompatHelper
CompatHelper.main()
shell: julia --color=yes {0}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: julia -e 'using CompatHelper; CompatHelper.main()'
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
8 changes: 4 additions & 4 deletions .github/workflows/Documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ jobs:
name: "Build the documentation"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: julia-actions/setup-julia@latest
- uses: actions/checkout@v4
- uses: julia-actions/setup-julia@v1
with:
version: '1.5'
- uses: julia-actions/julia-buildpkg@latest
version: '1'
- uses: julia-actions/julia-buildpkg@v1
- name: Install dependencies
run: julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
- name: Build and deploy
Expand Down
6 changes: 3 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ keywords = ["macroecology", "ecology", "biology", "geography"]
license = "MIT"
desc = "Julia framework for spatial ecology"
authors = ["mkborregaard <[email protected]"]
version = "0.9.15"
version = "0.9.16"

[deps]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Expand All @@ -21,14 +21,14 @@ StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

[compat]
DataFrames = "1.0, 1.1, 1.2"
DataFramesMeta = "0.5, 0.6, 0.7, 0.8, 0.9"
DataFramesMeta = "0.5, 0.6, 0.7, 0.8, 0.9, 0.10, 0.11, 0.12, 0.13, 0.14"
Distances = "0.8, 0.9, 0.10"
EcoBase = "0.1"
RandomBooleanMatrices = "0.1"
RandomNumbers = "1.4"
RecipesBase = "0.7, 0.8, 1"
StableRNGs = "1"
StatsBase = "0.32, 0.33"
StatsBase = "0.32, 0.33, 0.34"
julia = "1.2"

[extras]
Expand Down
16 changes: 7 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,12 @@
[![d_stable](https://img.shields.io/badge/Doc-stable-green?style=flat-square)](https://ecojulia.github.io/SpatialEcology.jl/stable/)
[![d_latest](https://img.shields.io/badge/Doc-latest-blue?style=flat-square)](https://ecojulia.github.io/SpatialEcology.jl/dev/)


![version](https://img.shields.io/github/v/tag/EcoJulia/SpatialEcology.jl?sort=semver&style=flat-square)
![CI](https://img.shields.io/github/workflow/status/EcoJulia/SpatialEcology.jl/CI?label=CI&style=flat-square)
[![CI](https://github.com/EcoJulia/SpatialEcology.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/EcoJulia/SpatialEcology.jl/actions/workflows/CI.yml)
![Doc](https://img.shields.io/github/workflow/status/EcoJulia/SpatialEcology.jl/Documentation?label=Doc&style=flat-square)
[![Coverage](https://img.shields.io/codecov/c/github/EcoJulia/SpatialEcology.jl?style=flat-square)](https://codecov.io/gh/EcoJulia/SpatialEcology.jl)

### Primary author: Michael Krabbe Borregaard (@mkborregaard)
[![codecov](https://codecov.io/gh/EcoJulia/SpatialEcology.jl/graph/badge.svg?token=DeSFZuHa99)](https://codecov.io/gh/EcoJulia/SpatialEcology.jl)

## Primary author: Michael Krabbe Borregaard (@mkborregaard)

A package for community- and macro-ecological analysis in julia.
This package offers a set of base types for interoperability in spatial ecology. The idea is to provide a powerful framework for expressing a great variety of analyses in a flexible manner. It presently holds types for presence-absence matrices, site data and species traits, and will be included with phylogenies and ecological interaction networks. SpatialEcology takes care of aligning all data for analysis.
Expand All @@ -20,10 +18,10 @@ The emphasis is on fast, flexible code operating mainly with views into the larg
The package originated as a port of the R package `nodiv`, available from CRAN.

- Types:
- Assemblage (holds presence-absence information along with information on traits and sites)
- ComMatrix (presence-absence matrix)
- SpatialData (Grid or Point data with site information)
- Assemblage (holds presence-absence information along with information on traits and sites)
- ComMatrix (presence-absence matrix)
- SpatialData (Grid or Point data with site information)

## Relevant other packages
This package is part of the [EcoJulia](https://ecojulia.org) organisation, which aims to bring together a coherent set of packages for ecological data analysis.For other relevant packages check the [BioJulia](https://biojulia.net) organisation focusing on molecular biology, and the [JuliaGeo](https://juliageo.org/) organisation focusing on geographical data analysis. A long-term goal of the EcoJulia organisation is to interface as seamlessly as possible with these projects to create an integrated data analysis framework for julia.

This package is part of the [EcoJulia](https://ecojulia.org) organisation, which aims to bring together a coherent set of packages for ecological data analysis.For other relevant packages check the [BioJulia](https://biojulia.net) organisation focusing on molecular biology, and the [JuliaGeo](https://juliageo.org/) organisation focusing on geographical data analysis. A long-term goal of the EcoJulia organisation is to interface as seamlessly as possible with these projects to create an integrated data analysis framework for julia.
78 changes: 48 additions & 30 deletions docs/src/examples/nodebased.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
## Node-based analysis of species distributions

This example demonstrates how to do a node-based comparison of species
distributions, as described in [Borregaard, M.K., Rahbek, C., Fjeldså, J., Parra, J.L., Whittaker, R.J. and Graham, C.H. (2014). Node-based analysis of species distributions. _Methods in Ecology and Evolution_ **5**: 1225-1235](http://macroecointern.dk/pdf-reprints/Borregaard_MEE_2014.pdf).
distributions, as described in [Borregaard, M.K., Rahbek, C., Fjeldså, J., Parra, J.L., Whittaker, R.J. and Graham, C.H. (2014). Node-based analysis of species distributions. _Methods in Ecology and Evolution_ **5**: 1225-1235](http://macroecointern.dk/pdf-reprints/Borregaard_MEE_2014.pdf).

We will reimplement the method from the paper from first principles, using
SpatialEcology functionality and the ecojulia phylogenetics package Phylo.
Expand All @@ -14,35 +14,38 @@ are defined on a regular grid with a cellsize of 1 lat/long degree. This is
one of the datasets used in the Borregaard _et al._ (2014) paper.

### Load data and create objects
First, let's load the data.

First, let's load the data.

Species occurrence data for spatial ecological analysis exists in a variety of
different formats. A common format is to have the data in one or several CSV files.

In this case, we have the data in two CSV tables, one of species
occurrences in each grid cell, and one with the lat-long coordinates of each
grid cell.

The CSV table of occurrences is in the widely used phylocom format,
which is a long-form format for associating the occurrence of species in sites.
It consists of three columns, a column of species names, one of abundances
(here all have the value 1, as it's a presence-absence data set) and a column
of sites.
occurrences in each grid cell, and one with the lat-long coordinates of each
grid cell.

The CSV table of occurrences is in the widely used phylocom format,
which is a long-form format for associating the occurrence of species in sites.
It consists of three columns, a column of species names, one of abundances
(here all have the value 1, as it's a presence-absence data set) and a column
of sites:

```@example nodebased
using CSV, DataFrames, SpatialEcology
phylocom = CSV.read("../../data/tyrann_phylocom.tsv", DataFrame)
first(phylocom, 4) # hide
```

The coordinates is a simple DataFrame with a column of sites, one of latitude
and one of longitude
The coordinates is a simple DataFrame with a column of sites, one of latitude
and one of longitude:

```@example nodebased
coord = CSV.read("../../data/tyrann_coords.tsv", DataFrame)
first(coord, 4) # hide
```

We ensure that the column of sites are represented as `string`s in both data
sets. We then construct the Assemblage object. The site columns are used to
We ensure that the column of sites are represented as `string`s in both data
sets. We then construct the Assemblage object. The site columns are used to
match the two DataFrames together.

```@example nodebased
Expand All @@ -51,15 +54,17 @@ coord.cell = string.(coord.cell)
tyrants = Assemblage(phylocom, coord)
```

Let's have a look at the data
Let's have a look at the data:

```@example nodebased
using Plots
ENV["GKSwstype"]="nul" # hide
default(color = cgrad(:Spectral, rev = true))
plot(tyrants)
```

Next, we'll read in the phylogenetic tree
Next, we'll read in the phylogenetic tree:

```@example nodebased
using Phylo
tree = open(parsenewick, "../../data/tyrannid_tree.tre")
Expand All @@ -68,7 +73,8 @@ plot(tree, treetype = :fan, tipfont = (5,))
```

### Extract information from a single clade
The [Phylo](http://docs.ecojulia.org/Phylo/stable) package uses iterators over

The [Phylo](http://docs.ecojulia.org/Phylo/stable) package uses iterators over
vertices in the phylogeny for almost everything. For example, to get a vector
of all internal (non-tip) nodes in the phylogeny, we would create an iterator
over the names of all nodes in the tree, filtered by the function `isleaf`, which
Expand All @@ -89,7 +95,7 @@ randnode = nodevec[131]
```

We can get a list of the names of all tips/species descending from the node by
getting all descendant nodes with `getdescendants` and filtering with `isleaf`.
getting all descendant nodes with `getdescendants` and filtering with `isleaf`.
We need to pass an anonymous function to `filter` here, because the `isleaf`
function takes two arguments.

Expand All @@ -101,7 +107,7 @@ first(nodespecies(tree, randnode), 4) # hide

We can use that species list to subset an `Assemblage` object. For instance, we
can make a function to create a smaller `Assemblage` of all species descending
from our selected node.
from our selected node.

```@example nodebased
get_clade(assemblage, tree, node) = view(assemblage, species = nodespecies(tree, node))
Expand All @@ -111,11 +117,12 @@ plot(rand_clade, title = randnode)
```

### Comparing the richness of sister clades

The question we are interested in addressing here is: At a given node where the
lineage splits into two sister clades - are the two descendant clades distributed
differently? This could be an indication that an evolutionary or biogeographic
event happened at that time, of consequence for the current distribution of
the species.
event happened at that time, of consequence for the current distribution of
the species.

So let's get the two descendant clades, and plot their distribution
in comparison to the parent clade
Expand All @@ -135,16 +142,18 @@ end

plot_node(tyrants, tree, randnode)
```

It is clear that the two clades have distinct distributions, with the first
descendant appearing to be overrepresented in the tropical rainforest biome,
mainly in the Amazon.
mainly in the Amazon.

But is the difference great enough that we can say that
But is the difference great enough that we can say that
this is not just a random pattern? We can use randomization to find out.

### Using randomization to assess significance of distribution differences

SpatialEcology `Assemblage`s can be randomized using the `curveball` matrix
randomization algorithm defined in [RandomBooleanMatrices.jl](http://docs.ecojulia.org/RandomBooleanMatrices/stable).
randomization algorithm defined in [RandomBooleanMatrices.jl](http://docs.ecojulia.org/RandomBooleanMatrices/stable).

This algorithm randomizes a species-by-site
matrix while keeping row and column sums constant, and is very fast. We can
Expand All @@ -156,6 +165,7 @@ rmg = matrixrandomizer(rand_clade)
newcomm = rand(rmg)
plot(newcomm, title = "randomized version of $randnode")
```

Because row and column sums are kept constant, the richness of the randomized
community is the same. But the richness of the two descendant clades will be
different - let's look at our focal node:
Expand All @@ -169,8 +179,9 @@ plot(
layout = (1,3), size = (1000, 350), title = ["parent" "child 1" "child 2"]
)
```

This represents a random expectation for the species richness of the two
descendant clades should be.
descendant clades should be.

We can repeat this process 100 times and store
the species richness of one of the clades in order to get a sampling
Expand Down Expand Up @@ -200,9 +211,10 @@ sims = simulate_descendants(rand_clade, tree, ch2)
Then we calculate the mean and standard deviation across simulations and use this
to express the empirical richness values as standardized effect sizes.
The resulting standardized effect size for each grid cell constitutes the `SOS`
metric of Borregaard et al. (2014).
metric of Borregaard et al. (2014).

To calculate this for our focal cell and plot it we can do:

```@example nodebased
using Statistics
function calculate_SOS(sims)
Expand All @@ -213,14 +225,16 @@ end

sims = simulate_descendants(rand_clade, tree, ch1)
SOS = calculate_SOS(sims)
plot(SOS, rand_clade, clim = (-8,8), fillcolor = :RdYlBu, title = "SOS for clade $randnode)
plot(SOS, rand_clade, clim = (-8,8), fillcolor = :RdYlBu, title = "SOS for clade $randnode")
```

We see a clear distinction between the two clades descending from our focal node,
where one descendant is overrepresented in tropical moist forest and the other in
colder regions.

The strength of divergence among the two clades is summarized by the GND value
(Borregaard et al. 2014)

```@example nodebased
using StatsBase: tiedrank

Expand All @@ -241,8 +255,9 @@ first(GND, 4) # hide
```

### Putting it all together

We can use all of the above to go through the entire phylogeny and generate SOS
and GND values.
and GND values.

First, let us create a function that calculates both metrics

Expand All @@ -264,10 +279,12 @@ SOS, GND = process_node(tyrants, tree, randnode)
```

### Final step: Applying the method to the entire tree

Finally, we can now go through every node on the tree and calculate the metrics.

This function recreates the functionality of the main `Node_analysis` function of
the [nodiv](https://github.com/mkborregaard/nodiv) R package
the [nodiv](https://github.com/mkborregaard/nodiv) R package:

```@example nodebased
using ProgressLogging
function node_based_analysis(assemblage::Assemblage, tree::AbstractTree)
Expand All @@ -283,7 +300,8 @@ end
SOSs, GNDs = node_based_analysis(tyrants, tree);
```

Let's visualize the GND values on the tree
Let's visualize the GND values on the tree:

```@example nodebased
plot(tree,
showtips = false, marker_z = GNDs,
Expand Down
5 changes: 1 addition & 4 deletions docs/src/man/getters.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,6 @@ sitenames
speciesnames
coordinates
occurring
noccurring
occupied
noccupied
occurrences
cooccurring
```
```
Loading
Loading