Code for the automated detection of artisanal gold mines in Sentinel-2 satellite imagery, a web map of gold mines in the Amazon rainforest, and links to related journalism. The repo underpins Amazon Mining Watch.
- LAUNCH WEB MAP
- INTERPRETING THE MAP
- JOURNALISM
- METHODOLOGY
- MINING AND AIRSTRIPS DATASETS
The mining of concern here touches every country in the Amazon basin. In the typical process, miners slash the rainforest to bare earth and then pump water through underlying sediments to liberate the minerals. They introduce mercury to form an amalgam with the gold, to separte it from other particles, and later they burn off the mercury to arrive at a fairly pure gold metal. This type of mining is called artisanal because it is practiced by small groups of individuals with some machinery, such as pumps, dredges, and excavators. The mining proceeds along streams and rivers, which provide water and access into the rainforest.
The environmental and human costs are high. Mining transforms healthy rainforest into a wasteland of bare earth and toxic sediment pools. Mercury enters adjacent streams and rivers. In the Amazon basin, miners frequently operate within indigenous lands, bringing with them unfamiliar diseases and the potential for violent conflict.
Scars from the mining can be seen from satellite. On the banks of a river, you will observe jumbled, multi-colored wastewater pools. They can be brown, tan, yellow, different shades of green, even turquoise. For the most part they are irregular in size, shape, and orientation. Often nearby you can observe miners' encampments, often some blue-tarped tents, and in well-developed mines, a dirt airstrip cut to fly in miners and to fly out the gold.
In the Amazon mine map, detected mines are delineated by the yellow stroke. Here are some characteristic examples of mines from the map:
The automated detector is a work in progress. With limited bootstrap sampling, we extrapolated signficantly to run over the whole of the Amazon basin. There are some false detections on the map, and we encourage users to apply discretion in interpreting the findings. Terrain features that can masquerade as mines include sandbars in rivers, braided rivers, farm ponds, and aquaculture ponds (two examples below), like so:
You can recognize aquaculture ponds by their geometric shape, efficient use of space, and presence in obvious agricultural zones.
A more common model error is the false negative, where the model fails to detect a mine or the full extent of a mine. Older mine sites that have fallen into disuse and the edges of active mining regions often fall into this category.
On the whole, false detections are relatively few given how widespread the mining is, and we hope this will be a useful resource to those interested in tracking mining activity in the region.
Mining in the Amazon is expanding rapidly, and frequent cloud cover makes it challenging to stitch together comprehensive satellite basemaps. In the Amazon mine map, you will sometimes see healthy rainforest in areas where mining activity is indicated. In that case, the displayed imagery is out of date. (To make for a better user experience, the imagery displayed is different from the imagery used for detection.)
We provide two display options for the web map. The Mapbox satellite basemap is the default. It provides detailed, sub-meter resolution views of many of the mines. The second option is the newly published Sentinel-2 basemap from MapTiler, which uses imagery from 2020 and 2021 exclusively, but at 10-meter resolution. In the example below, mine detections are displayed over the Mapbox basemap at left and over the MapTiler Sentinel-2 basemap at right.
For up-to-date views, we recommend searching the full Sentinel-2 catalog on SentinelHub EO Browser or the Planetscope data made available through the Planet Labs NICFI program.
Creating quantitative accuracy metrics for a system like this is not always easy or constructive. For example, if the system asserted that there are no mines at all in the Amazon basin, it would be better than 99% accurate, because such a large proportion of the landscape remains unmined.
To provide a more constructive measure, we validated a random subsample of the system's detections. This allows us to estimate what is known as the precision or positive predictive value for the classifier. In essence, it tells you the likelihood that box marked as a mine is actually a mine. On our latest run, we see a precision of 98.2%. For a sample of 500 mining detections, you can expect to see about 9 misclassifications. In our sample, a third of the false detections still identified mining activity, but mining for materials such as bauxite rather than gold.
The goal of this work is mine detection rather than area estimation, and our classification operates on 440 m x 440 m patches. If the network determines that mining exists within the patch, then the full patch is declared a mine. This leads to a systematic overestimation of mined area if it is naively computed from the polygon boundaries. Building a segmentation model to delineate mine boundaries would be a viable extension of this work.
This work grew out of a series of collaborations with journalists and with advocates at Survival International seeking to expose illegal gold mining activity and document its impacts on the environment and on local indigenous communities. We began identifying mines by sight in satellite imagery. Later, some high school classes helped sift through images. Finally it made sense to try to automate the identification of mine sites. The training datasets for the machine-learned models followed from those initial human surveys.
- Las pistas illegales que bullen en la selva Venezolana, from El País and ArmandoInfo, 2022. First in the series Corredor Furtivo. Produced in conjunction with the Pulitzer Center's Rainforest Investigation Network (in English, translated).
- The pollution of illegal gold mining in the Tapajós River, InfoAmazonia, 2021. The story is part of a series, Murky Waters, on pollution in the Amazon River system.
Rough dirt airstrips, often cut illegally from the forest and unregistered with authorities, allow miners to access the mines and to fly out the gold. The Intercept Brasil and The New York Times surveyed over a thousand clandestine airstrips in Brazil's Legal Amazon, identifying 362 landing strips within 20 kilometers of mining activity. The inquiry into the airstrips' role in the expansion of mining led to a pair of stories and a short documentary film:
- The illegal airstrips bringing toxic mining to Brazil’s indigenous land, The New York Times, 2022.
- As pistas da destruição, The Intercept, 2022.
- Os pilotos da Amazônia, The Intercept, short film, 2022.
The airstrip location data are available for download. The clandestine airstrips dataset is the result of a collaborative reporting effort by The Intercept Brasil, The New York Times, and the Rainforest Investigations Network, an initiative of The Pulitzer Center. The Intercept Brasil created the project within the network, which was later joined by The New York Times. The data were gathered by Earthrise Media from OpenStreetMap and from satellite images of Amazônia Legal in 2021, augmented with input from the Socio-Environmental Institute of Brazil, the Yanomami Hutukara Association, and government reports, and verified by the newsrooms.
- Garimpo destruidor, The Intercept, 2021. Video of a helicopter flyover of mine devastation.
- Gana por ouro, The Intercept, 2021. Report on an industrial gold mine operating without proper environmental permits. Two weeks after the story appeared the mine was shut down and fined.
- Serious risk of attack by miners on uncontacted Yanomami in Brazil, Survival International, 2021.
- Illegal mining sparks malaria outbreak in indigenous territories in Brazil, InfoAmazonia and Mongabay, 2020.
- Amazon gold rush: The threatened tribe, Reuters, 2019, on illegal mining in protected Yanomami Indigenous Territory.
Many thanks to the journalists whose skill and resourceful reporting brought these important stories to light.
The mine detector is a lightweight convolutional neural network, which we train to discriminate mines from other terrain by feeding it hand-labeled examples of mines and other key features as they appear in Sentinel-2 satellite imagery. The network operates on 44 x 44 pixel (440 m x 440 m) patches of data extracted from the Sentinel 2 L1C data product. Each pixel in the patch captures the light reflected from Earth's surface in twelve bands of visible and infrared light. We average (median composite) the Sentinel data across a four-month period to reduce the presence of clouds, cloud shadow, and other transitory effects.
During run time, the network assesses each patch for signs of recent mining activity, and then the region of interest is shifted by 140 m for the network to make a subsequent assessment. This process proceeds across the entire region of interest. The network makes 326 million individual assessments in covering the 6.7 million square kilometers of the Amazon basin.
The system was developed for use in the Amazon, but it has also been seen to work in other tropical biomes.
Amazon mine map and the output dataset. This data was largely generated with the 44px v2.6 model. A small portion in the Brazillian state of Pará was analyzed using the 44px v2.9 model to improve accuracy.
Tapajós mine map and output dataset. In this case, we analyzed the region yearly from 2016-2020 to monitor the growth of mining in the area, using the earlier 28px v9 model.
Venezuela mine map, Bolívar dataset and Amazonas dataset. Analysis via the 28px v9 model.
Ghana mine map and output dataset. This was a test of the model's ability to generalize to tropical geographies outside of the Amazon basin, using the 44px v2.8 model.
This repo contains all code needed to generate data, train models, and deploy a model to predict presence of mining in a region of interest. While we welcome external development and use of the code, subject to terms of an open MIT license, creating datasets and deploying the model currently requires access to the Descartes Labs platform.
Download and install Miniforge Conda env if not already installed:
OS | Architecture | Download |
---|---|---|
Linux | x86_64 (amd64) | Miniforge3-Linux-x86_64 |
Linux | aarch64 (arm64) | Miniforge3-Linux-aarch64 |
Linux | ppc64le (POWER8/9) | Miniforge3-Linux-ppc64le |
OS X | x86_64 | Miniforge3-MacOSX-x86_64 |
OS X | arm64 (Apple Silicon) | Miniforge3-MacOSX-arm64 |
Then run
chmod +x ~/Downloads/Miniforge3-{platform}-{architecture}.sh
sh ~/Downloads/Miniforge3-{platform}-{architecture}.sh
source ~/miniforge3/bin/activate
Next, create a conda environment named mining-detector
by running conda env create -f environment.yml
from the repo root directory. Activate the environment by running conda activate mining-detector
. Code has been developed and tested on a Mac with python version 3.9.7. Other platforms and python releases may work, but have not yet been tested.
The data used for model training may be accessed and downloaded from s3://mining-data.earthrise.media
.
The system runs from three core notebooks.
Given a GeoJSON file of sampling locations, generate a dataset of Sentinel 2 images. Dataset is stored as a pickled list of numpy arrays.
Train a neural network based on the images stored in the data/training_data/
directory. Data used to train this model is stored at s3://mining-data.earthrise.media
.
Given a model file and a GeoJSON describing a region of interest, run the model and download the results. Options exist to deploy the model on a directory of ROI files.
data/boundaries
contains GeoJSON polygon boundaries for regions of interest where the model has been deployed.data/sampling_locations
contains GeoJSON datasets that are used as sampling locations to generate training datasets. Datasets in this directory should be considered "confirmed," and positive/negative class should be indicated in the file's title.
The models directory contains keras neural network models saved as .h5
files. The model names indicate the patch size evaluated by the model, followed by the model's version number and date of creation. Each model file is paired with a corresponding config .txt
file that logs the datasets used to train the model, some hyperparameters, and the model's performance on the test dataset.
The model 44px_v2.8_2021-11-11.h5
is currently the top performer overall, though some specificity has been sacrificed for generalization. Different models have different strengths/weaknesses. There are also versions of model v2.6 that operate on RGB and RGB+IR data. These may be of interest when evaluating whether multispectral data from Sentinel is required.
The code in this repository are available for reuse under an open MIT License. The data is available under CC BY 4.0. In publication, please cite Earth Genome, with reference to this repository.