Skip to content

Latest commit

 

History

History
168 lines (102 loc) · 18.6 KB

README.md

File metadata and controls

168 lines (102 loc) · 18.6 KB

Gold Mine Detector and Map

Code for the automated detection of artisanal gold mines in Sentinel-2 satellite imagery, a web map of gold mines in the Amazon rainforest, and links to related journalism. The repo underpins Amazon Mining Watch.

mining-header-planet


Interpreting the map

The mining of concern here touches every country in the Amazon basin. In the typical process, miners slash the rainforest to bare earth and then pump water through underlying sediments to liberate the minerals. They introduce mercury to form an amalgam with the gold, to separte it from other particles, and later they burn off the mercury to arrive at a fairly pure gold metal. This type of mining is called artisanal because it is practiced by small groups of individuals with some machinery, such as pumps, dredges, and excavators. The mining proceeds along streams and rivers, which provide water and access into the rainforest.

The environmental and human costs are high. Mining transforms healthy rainforest into a wasteland of bare earth and toxic sediment pools. Mercury enters adjacent streams and rivers. In the Amazon basin, miners frequently operate within indigenous lands, bringing with them unfamiliar diseases and the potential for violent conflict.

Scars from the mining can be seen from satellite. On the banks of a river, you will observe jumbled, multi-colored wastewater pools. They can be brown, tan, yellow, different shades of green, even turquoise. For the most part they are irregular in size, shape, and orientation. Often nearby you can observe miners' encampments, often some blue-tarped tents, and in well-developed mines, a dirt airstrip cut to fly in miners and to fly out the gold.

In the Amazon mine map, detected mines are delineated by the yellow stroke. Here are some characteristic examples of mines from the map:

MinesEx (These are mines.)

The automated detector is a work in progress. With limited bootstrap sampling, we extrapolated signficantly to run over the whole of the Amazon basin. There are some false detections on the map, and we encourage users to apply discretion in interpreting the findings. Terrain features that can masquerade as mines include sandbars in rivers, braided rivers, farm ponds, and aquaculture ponds (two examples below), like so:

NotMinesEx (These are not mines.)

You can recognize aquaculture ponds by their geometric shape, efficient use of space, and presence in obvious agricultural zones.

A more common model error is the false negative, where the model fails to detect a mine or the full extent of a mine. Older mine sites that have fallen into disuse and the edges of active mining regions often fall into this category.

On the whole, false detections are relatively few given how widespread the mining is, and we hope this will be a useful resource to those interested in tracking mining activity in the region.

Basemap Imagery

Mining in the Amazon is expanding rapidly, and frequent cloud cover makes it challenging to stitch together comprehensive satellite basemaps. In the Amazon mine map, you will sometimes see healthy rainforest in areas where mining activity is indicated. In that case, the displayed imagery is out of date. (To make for a better user experience, the imagery displayed is different from the imagery used for detection.)

We provide two display options for the web map. The Mapbox satellite basemap is the default. It provides detailed, sub-meter resolution views of many of the mines. The second option is the newly published Sentinel-2 basemap from MapTiler, which uses imagery from 2020 and 2021 exclusively, but at 10-meter resolution. In the example below, mine detections are displayed over the Mapbox basemap at left and over the MapTiler Sentinel-2 basemap at right.

MapboxvsSentinel2basemaps

For up-to-date views, we recommend searching the full Sentinel-2 catalog on SentinelHub EO Browser or the Planetscope data made available through the Planet Labs NICFI program.

Detection Accuracy

Creating quantitative accuracy metrics for a system like this is not always easy or constructive. For example, if the system asserted that there are no mines at all in the Amazon basin, it would be better than 99% accurate, because such a large proportion of the landscape remains unmined.

To provide a more constructive measure, we validated a random subsample of the system's detections. This allows us to estimate what is known as the precision or positive predictive value for the classifier. In essence, it tells you the likelihood that box marked as a mine is actually a mine. On our latest run, we see a precision of 98.2%. For a sample of 500 mining detections, you can expect to see about 9 misclassifications. In our sample, a third of the false detections still identified mining activity, but mining for materials such as bauxite rather than gold.

Area estimation

The goal of this work is mine detection rather than area estimation, and our classification operates on 440 m x 440 m patches. If the network determines that mining exists within the patch, then the full patch is declared a mine. This leads to a systematic overestimation of mined area if it is naively computed from the polygon boundaries. Building a segmentation model to delineate mine boundaries would be a viable extension of this work.

Journalism

MiningTitlesCollage

This work grew out of a series of collaborations with journalists and with advocates at Survival International seeking to expose illegal gold mining activity and document its impacts on the environment and on local indigenous communities. We began identifying mines by sight in satellite imagery. Later, some high school classes helped sift through images. Finally it made sense to try to automate the identification of mine sites. The training datasets for the machine-learned models followed from those initial human surveys.

Reports using the automated detections

Clandestine airstrips and airstrips dataset

Rough dirt airstrips, often cut illegally from the forest and unregistered with authorities, allow miners to access the mines and to fly out the gold. The Intercept Brasil and The New York Times surveyed over a thousand clandestine airstrips in Brazil's Legal Amazon, identifying 362 landing strips within 20 kilometers of mining activity. The inquiry into the airstrips' role in the expansion of mining led to a pair of stories and a short documentary film:

The airstrip location data are available for download. The clandestine airstrips dataset is the result of a collaborative reporting effort by The Intercept Brasil, The New York Times, and the Rainforest Investigations Network, an initiative of The Pulitzer Center. The Intercept Brasil created the project within the network, which was later joined by The New York Times. The data were gathered by Earthrise Media from OpenStreetMap and from satellite images of Amazônia Legal in 2021, augmented with input from the Socio-Environmental Institute of Brazil, the Yanomami Hutukara Association, and government reports, and verified by the newsrooms.

Related reporting on open-pit mining

Many thanks to the journalists whose skill and resourceful reporting brought these important stories to light.

Methodology

Overview

The mine detector is a lightweight convolutional neural network, which we train to discriminate mines from other terrain by feeding it hand-labeled examples of mines and other key features as they appear in Sentinel-2 satellite imagery. The network operates on 44 x 44 pixel (440 m x 440 m) patches of data extracted from the Sentinel 2 L1C data product. Each pixel in the patch captures the light reflected from Earth's surface in twelve bands of visible and infrared light. We average (median composite) the Sentinel data across a four-month period to reduce the presence of clouds, cloud shadow, and other transitory effects.

During run time, the network assesses each patch for signs of recent mining activity, and then the region of interest is shifted by 140 m for the network to make a subsequent assessment. This process proceeds across the entire region of interest. The network makes 326 million individual assessments in covering the 6.7 million square kilometers of the Amazon basin.

The system was developed for use in the Amazon, but it has also been seen to work in other tropical biomes.

Results

Assessement of mining in the Amazon basin in 2020

Amazon mine map and the output dataset. This data was largely generated with the 44px v2.6 model. A small portion in the Brazillian state of Pará was analyzed using the 44px v2.9 model to improve accuracy.

Tapajós basin mining progression, 2016-2020

Tapajós mine map and output dataset. In this case, we analyzed the region yearly from 2016-2020 to monitor the growth of mining in the area, using the earlier 28px v9 model.

Hand-validated dectections of mines in Venezuela's Bolívar and Amazonas states in 2020

Venezuela mine map, Bolívar dataset and Amazonas dataset. Analysis via the 28px v9 model.

Generalization Test in Ghana's Ashanti Region, 2018 and 2020

Ghana mine map and output dataset. This was a test of the model's ability to generalize to tropical geographies outside of the Amazon basin, using the 44px v2.8 model.

Running the Code

This repo contains all code needed to generate data, train models, and deploy a model to predict presence of mining in a region of interest. While we welcome external development and use of the code, subject to terms of an open MIT license, creating datasets and deploying the model currently requires access to the Descartes Labs platform.

Setup

Download and install Miniforge Conda env if not already installed:

OS Architecture Download
Linux x86_64 (amd64) Miniforge3-Linux-x86_64
Linux aarch64 (arm64) Miniforge3-Linux-aarch64
Linux ppc64le (POWER8/9) Miniforge3-Linux-ppc64le
OS X x86_64 Miniforge3-MacOSX-x86_64
OS X arm64 (Apple Silicon) Miniforge3-MacOSX-arm64

Then run

chmod +x ~/Downloads/Miniforge3-{platform}-{architecture}.sh
sh ~/Downloads/Miniforge3-{platform}-{architecture}.sh
source ~/miniforge3/bin/activate

Next, create a conda environment named mining-detector by running conda env create -f environment.yml from the repo root directory. Activate the environment by running conda activate mining-detector. Code has been developed and tested on a Mac with python version 3.9.7. Other platforms and python releases may work, but have not yet been tested.

The data used for model training may be accessed and downloaded from s3://mining-data.earthrise.media.

Notebooks

The system runs from three core notebooks.

create_dataset.ipynb (requires Descartes Labs access)

Given a GeoJSON file of sampling locations, generate a dataset of Sentinel 2 images. Dataset is stored as a pickled list of numpy arrays.

train_model.ipynb

Train a neural network based on the images stored in the data/training_data/ directory. Data used to train this model is stored at s3://mining-data.earthrise.media.

deploy_model.ipynb (requires Descartes Labs access)

Given a model file and a GeoJSON describing a region of interest, run the model and download the results. Options exist to deploy the model on a directory of ROI files.

Data

  • data/boundaries contains GeoJSON polygon boundaries for regions of interest where the model has been deployed.
  • data/sampling_locations contains GeoJSON datasets that are used as sampling locations to generate training datasets. Datasets in this directory should be considered "confirmed," and positive/negative class should be indicated in the file's title.

Models

The models directory contains keras neural network models saved as .h5 files. The model names indicate the patch size evaluated by the model, followed by the model's version number and date of creation. Each model file is paired with a corresponding config .txt file that logs the datasets used to train the model, some hyperparameters, and the model's performance on the test dataset.

The model 44px_v2.8_2021-11-11.h5 is currently the top performer overall, though some specificity has been sacrificed for generalization. Different models have different strengths/weaknesses. There are also versions of model v2.6 that operate on RGB and RGB+IR data. These may be of interest when evaluating whether multispectral data from Sentinel is required.

License

The code in this repository are available for reuse under an open MIT License. The data is available under CC BY 4.0. In publication, please cite Earth Genome, with reference to this repository.