Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Area discrepancies between our model and Campagna data #102

Open
apericak opened this issue Sep 27, 2016 · 1 comment
Open

Area discrepancies between our model and Campagna data #102

apericak opened this issue Sep 27, 2016 · 1 comment
Assignees

Comments

@apericak
Copy link
Collaborator

After running a bunch of model exports over the last few days, we have realized that our model is currently identifying about 2000 km^2 less mining area over the period 1976 - 2005 than does Campagna. Of note, this figure takes into account about 1000 km^2 of mining that Campagna found occurring between 1976 and 1985 that our model did not find. Campagna found roughly 4600 km^2 by 2005, whereas we are only finding about 2700 km^2 by that date.

What we can say is that we are very confident about the mining we have currently identified, and our accuracy assessment supports that confidence, we still have a big issue if we're missing so much area as compared to a cited dataset. Additionally, an EPA study from a few years ago (that Matt Ross is aware of) found a similar number to the Campagna result; and Matt thinks an annual rate of new mining should be close to 100 km^2 (we are currently getting about 60). We will still likely want to report our current data (bad science to ignore it otherwise), but we will want to explain why we are running the analysis to better fit the older data. We can make a good argument for this since the Campagna methodology was different than we're doing, so we are using prior research to inform our current results and make sure our model is accurately and comprehensively finding mines.

Just by visually comparing the Campagna results to basemap and other imagery, it does appear that in many cases that data are correct in pointing out mines. There are errors (e.g., identifying urban areas near Wise, VA, as mines), but our dataset will have errors too. The Campagna data, as well, in most cases are limited by the mine permit boundaries, which means extra error in that dataset is not coming from area outside of permits. My current best guess is that Campagna probably overestimated (but not by much), and we assuredly underestimated. Ideally, we would hope to find the sweet spot in between those two extremes.

So, we have a few courses of action:

  1. Relax our thresholds so as to increase the amount of area identified as mine. This will also likely pick up non-mine areas, however.
  2. Explore using other spectral indices (SAVI, EVI, etc.) either in combination with or instead of NDVI. Using an index will keep the model automated, but we will need to create some algorithm for using the results of multiple indices. E.g., a mine is an area with an NDVI < [threshold A] AND SAVI < [threshold B].
  3. See if we can use the Campagna results as a way to guide our own model. For instance, we might use areas identified as mines in that study to arrive at better thresholds. This might be challenging, however, since we don't know exactly when mining occurred with that dataset.
  4. As suggested by @cjthomas730, create a final product that gives levels of confidence about mining. Since we have high confidence about our current data, call those areas "high confidence" or something like that; and then use some of the methods above to identify additional mine land, calling that "medium confidence".
@WassonMF
Copy link

I'm guessing at this point that it's not wise to go down the route of #2,
given how exploratory it would be at this late stage. And I definitely
don't like #3, as my experience with the Campagna dataset was a lot of
false positives. I just realized you all may never have seen Ross
Geredien's analysis of that and other datasets. Not sure Ross's techniques
were the best out there, but this definitely seems worth sharing:

http://ilovemountains.org/reclamation-fail/mining-extent-2009/Assessing_the_Extent_of_Mountaintop_Removal_in_Appalachia.pdf

But I really like Christian's suggestion (#4) because it covers both bases:
providing a fairly conservative dataset for use by researchers while also
addressing and acknowledging the underestimation of mined areas. Happy to
talk more about this this afternoon.

On Tue, Sep 27, 2016 at 11:12 AM, apericak [email protected] wrote:

After running a bunch of model exports over the last few days, we have
realized that our model is currently identifying about 2000 km^2 less
mining area over the period 1976 - 2005 than does Campagna. Of note, this
figure takes into account about 1000 km^2 of mining that Campagna found
occurring between 1976 and 1985 that our model did not find. Campagna found
roughly 4600 km^2 by 2005, whereas we are only finding about 2700 km^2 by
that date.

What we can say is that we are very confident about the mining we have
currently identified, and our accuracy assessment supports that confidence,
we still have a big issue if we're missing so much area as compared to a
cited dataset. Additionally, an EPA study from a few years ago (that Matt
Ross is aware of) found a similar number to the Campagna result; and Matt
thinks an annual rate of new mining should be close to 100 km^2 (we are
currently getting about 60). We will still likely want to report our
current data (bad science to ignore it otherwise), but we will want to
explain why we are running the analysis to better fit the older data. We
can make a good argument for this since the Campagna methodology was
different than we're doing, so we are using prior research to inform our
current results and make sure our model is accurately and comprehensively
finding mines.

Just by visually comparing the Campagna results to basemap and other
imagery, it does appear that in many cases that data are correct in
pointing out mines. There are errors (e.g., identifying urban areas near
Wise, VA, as mines), but our dataset will have errors too. The Campagna
data, as well, in most cases are limited by the mine permit boundaries,
which means extra error in that dataset is not coming from area outside of
permits. My current best guess is that Campagna probably overestimated (but
not by much), and we assuredly underestimated. Ideally, we would hope to
find the sweet spot in between those two extremes.

So, we have a few courses of action:

  1. Relax our thresholds so as to increase the amount of area
    identified as mine. This will also likely pick up non-mine areas, however.
  2. Explore using other spectral indices (SAVI, EVI, etc.) either in
    combination with or instead of NDVI. Using an index will keep the model
    automated, but we will need to create some algorithm for using the results
    of multiple indices. E.g., a mine is an area with an NDVI < [threshold A]
    AND SAVI < [threshold B].
  3. See if we can use the Campagna results as a way to guide our own
    model. For instance, we might use areas identified as mines in that study
    to arrive at better thresholds. This might be challenging, however, since
    we don't know exactly when mining occurred with that dataset.
  4. As suggested by @cjthomas730 https://github.com/cjthomas730,
    create a final product that gives levels of confidence about mining. Since
    we have high confidence about our current data, call those areas "high
    confidence" or something like that; and then use some of the methods above
    to identify additional mine land, calling that "medium confidence".


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#102, or mute the thread
https://github.com/notifications/unsubscribe-auth/ALCbI2YyAl6_NrfCFWg2-pQkTjMluBIyks5quTJtgaJpZM4KHya0
.

Matthew F. Wasson, Ph.D., Director of Programs
Appalachian Voices

589 West King St.
Boone, NC 28607
Phone: 828-262-1500
Website: www.appalachianvoices.org

"Nonviolent action, born of the awareness of suffering and nurtured by
love, is the most effective way to confront adversity."

  • Thich Nhat Hanh, Love In Action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants