appendix01_catchment_delineation.Rmd

---
title: "Little Shasta Watershed Delineation"
author:
  - name: Ryan Peek 
    affiliation: Center for Watershed Sciences, UCD
    affiliation_url: https://watershed.ucdavis.edu
date: "`r Sys.Date()`"
output: 
  distill::distill_article:
    toc: true
    code_folding: false
editor_options: 
  chunk_output_type: console
---

```{r setup1, include=FALSE}

knitr::opts_chunk$set(echo = FALSE, tidy = FALSE,
                      message = FALSE, warning = FALSE)

library(knitr)
library(here)
library(glue)
library(sf)
suppressPackageStartupMessages(library(tidyverse))
library(fs)
library(units)
library(sf)
library(nhdplusTools)
# Locations of interest
loi_comids <- c(3917946, 3917950, 3917198)

library(mapview)
mapviewOptions(fgb = FALSE)

# load("data_output/nhdPlot_comid_outlet_raw.rda")

# get starting comid
comidstart <- list(featureSource = "comid",
                   featureID = "3917946") # -122.4768 41.71785

```

# Identifying Watershed Boundaries

Before functional flow metrics can be generated, a clean delineation of the watershed boundaries is required. In some cases this is very straightforward, as the underlying catchments (sub units of a watershed) cleanly nest within the watershed boundary and the streamlines are well described and correct.

However, in some cases there are discrepancies in the underlying watershed boundaries and streamlines. For the Little Shasta watershed, there are problems with flowline components that are incorrectly delineated or routed (i.e., canals are connected to the river that do not actually connect), as well as the catchments associated with the watershed boundary. These issues can cause problems with the functional flow modeling and subsequent metric predictions. 

This document reviews the process to delineate streamlines from the original raw available data and the options used to revise and re-generate watershed scale data used in functional flow models.


## Watershed Units

The Hydrologic Unit Code or HUC^[see [here for data](https://databasin.org/maps/8df84d8ed5f34cac8d1f54d5c97fdd77/) ] are hierarchically nested watershed units that can be used for a variety of analyses. The lower or smaller the HUC number (i.e., HUC4), the larger the watershed unit. The finest scale HUC (smallest watershed unit) is HUC12, and the largest is HUC2.

```{r, echo=FALSE, fig.cap='HUC Hydrologic Boundary Dataset, figure from USGS'}

include_graphics("https://d9-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/thumbnails/image/WBD_Base_HUStructure_small.png")

# from https://www.usgs.gov/media/images/watershed-boundary-dataset-structure-visualization

```

Watersheds can be delineated using the HUC codes. These HUCs are nested, and should ostensibly match with the flowline and basin data that is pulled via the NHD dataset. 

The current HUC8 and HUC12s associated with the Little Shasta River, (`1801020703`), show these data don't overlap very cleanly, and some revision may be required to delineate a more discrete watershed.

```{r huc12, layout="l-page", fig.cap="HUC8 in black, HUC12 in blue, and NHD catchment in orange"}

# get starting comid
comidstart <- list(featureSource = "comid",
                   featureID = "3917946")

nhdPlot <- plot_nhdplus(comidstart,
                        actually_plot = FALSE,
                        streamorder = 1)

# get huc8 based on outlet
h8 <- get_huc8(nhdPlot$outlets)
h12_all <- get_huc12(h8)
h12_lsh <- h12_all %>% 
  filter(huc12 %in% c("180102070301", "180102070302", "180102070303", "180102070304"))

# plot these
plot(h8$geometry, lwd=4)
plot(h12_all$geometry, border="gray50", lwd=0.5, add=TRUE)
plot(h12_lsh$geometry, border="blue", col=alpha("steelblue", 0.7), lwd=2, add=TRUE)
plot(nhdPlot$basin$geometry, border="orange", col=alpha("orange", 0.3), lwd=0.8, add=TRUE)

```

Notably, the northern side of the Little Shasta is missing from the NHD basin (in orange), but is part of the HUC12 in blue. The Little Shasta HUC10 is comprised of all the blue HUC12's in the figure above.

## NHD Catchments

NHD catchments are smaller subwatershed units associated with the topographic drainage area associated with each distinct `NHD flowline COMID`. These are smaller units than the HUC12's described previously. When assessing the NHD flowlines and associated catchments (one for each NHD flowline `COMID`), there are a number of discrepancies in both the catchments (and thus the underlying catchment data the models require) as well as the flow network which the original flow predictions used. We used a set of flowlines and catchments that were cropped to the HUC12 watersheds that exist within the Little Shasta. This still includes many canals and associated drainages from the southern side of the Little Shasta, but which do not have direct surface water connections with the Little Shasta River.

```{r nhdFlowlines, layout="l-page", fig.height=6, fig.cap="NHD Flowlines and Canals for Little Shasta"}

# pull flowlines
flowlines <- navigate_nldi(comidstart,
                           mode = "UT",
                           distance = 100)

# crop to just the h12s (st_intersection will split across across H12)
flowlines_trim <- flowlines$UT_flowlines[h12_lsh,] %>% 
  # drop comid that extends outside
  filter(!nhdplus_comid %in% c(3917382))

# pull for HUC above just to pull catchments...
huc_feat <- list(featureSource = "huc12pp", 
                         featureID = "180102070501")
flowline_huc <- navigate_nldi(huc_feat,
                               mode = "UT",
                               distance = 50)

# pull extra comids
flowlines_huc <- flowline_huc$UT_flowlines[h12_lsh[h12_lsh$huc12 %in% c(180102070303, 180102070304),],] %>% 
  # filter canal comids that are completely outside of HUC12s
  filter(!nhdplus_comid %in% c(3917928, 3917930, 3917242, 3917382, 3917954))
# st_intersection splits comid across h12 boundaries (gives duplicates)

#mapview(flowlines_huc) + mapview(h12_lsh) + mapview(flowlines_trim, color="blue")

# note: COMIDs 3917382 and 3917928 extend inside/outside of h12
# dropping both here

# Make a distinct final comid list
final_comids <- c(flowlines_huc$nhdplus_comid, flowlines_trim$nhdplus_comid) %>% unique(.)

# view
flowlines_final <- bind_rows(flowlines_huc, flowlines_trim) %>%
  distinct(nhdplus_comid, .keep_all = TRUE)

# mapview(flowlines_final, color="steelblue", layer.name="Flowlines for L. Shasta", show.legend=FALSE) +
#   mapview(h12_lsh, col.regions="gray", alpha.regions=0.2, lwd=2, layer.name="H12") +
#   mapview(flowlines_huc, color="green2", layer.name="H12 180102070501 Canals")


```

```{r downloadNHDplus, eval=FALSE, echo=FALSE}

# download full dataset based on these comids
nhdplus <- subset_nhdplus(comids = as.integer(final_comids),
                          output_file = glue(here("data_output/nhdplus_full_vaa.gpkg")),
                          nhdplus_data = "download",
                          flowline_only = FALSE,
                          overwrite = TRUE, return_data = FALSE)


```

In fact, the catchments highlighted in **orange** below are not part of the original Little Shasta NHD watershed delineation. Connections and routing between streamlines and canals further complicates this delineation, as many canals may connect streamlines in the opposite direction to the drainage or topographic orientation.

```{r getCatchments, eval=TRUE, echo=FALSE}
# load data
db <- here("data_output/nhdplus_full_vaa.gpkg")
nhd_flowlines <- st_read(db, "NHDFlowline_Network", quiet = TRUE)
nhd_catchments <- st_read(db, "CatchmentSP", quiet = TRUE)

# missing comids/catchments: 
#catch_comids[!catch_comids$FEATUREID %in% nhd_catchments$featureid,]$FEATUREID
coms_missing <- c(3917928, 948010095, 3917216, 3917940, 948010047,
                  948010096, 3917924, 3917926, 948010092, 948010091,
                  948010048, 3917100, 948010046)
miss_catch <- nhd_catchments %>% filter(featureid %in% coms_missing)
```


```{r plotCatchmentsMapview, eval=FALSE, layout="l-page", fig.height=6, fig.cap="Catchments for Little Shasta within HUC12"}
# plot catchments
mapview(nhd_catchments, alpha.regions=0.3, col.regions="gray", 
        layer.name="NHD Catchments") + 
  mapview(h12_lsh, color="gray20", alpha.color=0.5, col.regions="gray20",
          alpha.regions=0, lwd=4, layer.name="HUC12") +
  mapview(miss_catch, col.regions="orange", alpha.regions=0.4,
          layer.name="Missing/MisID Catchments")+
  mapview(nhd_flowlines %>% filter(ftype!="CanalDitch"), 
          layer.name="Streams only") + 
  mapview(nhd_flowlines %>% filter(ftype=="CanalDitch"), color="forestgreen", lwd=2, layer.name="Canals")
  
```

```{r plotCatchmentstmap, layout="l-page", fig.height=6, fig.cap="Catchments for Little Shasta within HUC12"}

# plot catchments
library(tmap)
library(tmaptools)

# set modes
tmap::tmap_mode("plot")
m1 <- tm_shape(nhd_catchments) + tm_polygons(alpha = 0.3,col ="gray", 
        title="NHD Catchments") + 
  tm_shape(h12_lsh) + tm_polygons(border.col = "gray20", alpha = 0,  col = "gray20", border.alpha = 0.5,  lwd=4, title="HUC12") +
  tm_shape(miss_catch) + tm_polygons(col="orange", alpha=0.4,
                                     title="Missing/MisID Catchments") +
  tm_shape(nhd_flowlines %>% filter(ftype!="CanalDitch")) + tm_lines(col = "blue2", title.col ="Streams only") +
  tm_shape(nhd_flowlines %>% filter(ftype=="CanalDitch")) +
  tm_lines(col="forestgreen", lwd=2, title.col = "Canals") + 
  tm_compass() + tm_scale_bar() + 
  tm_layout(frame = FALSE)

m1  
```


## NHD Flow Network

The stream network, or flow network is a crucial component that needs to be correctly delineated. This provides a detailed account of where surface water flows and connects across the landscape. These "*flowline networks*" are generated across the US using the [National Hydrography Dataset (NHD)](https://www.usgs.gov/national-hydrography/national-hydrography-dataset). 

The current NHD **Little Shasta** streamline network includes several canals that should not be included in a surfacewater account of natural flow accumulation across the landscape. A first step is to identify these segments and revise the watershed stream network.

The downstream most stream segment for the Little Shasta River is `COMID: 3917946`. Here we can use the `{nhdplusTools}` package to pull and plot the upstream basin associated with the starting `COMID`. This delineation clearly shows a non-topographically derived basin, which is in large part to the inclusion of many different canals and artificial ditches as part of the streamline network.

```{r nhdflowlines, eval=TRUE, echo=FALSE, layout="l-page", layout="l-page", fig.cap="The raw NHD data for upstream catchment and flowlines associated with the Little Shasta River outlet (maroon dot, COMID=3917946)"}

# get starting comid
comidstart <- list(featureSource = "comid",
                   featureID = "3917946") # -122.4768 41.71785

# get map from starting comid
#png("figures/raw_nhd_watershed_map.png", width = 8, height = 10, units="in", res = 300)
nhdPlot <- plot_nhdplus(comidstart,
                    streamorder = 1,
                    #basemap = "cartolight",
                    plot_config = list(basin = list(lwd = 2),
                                       outlets = list(
                                         comid = list(col = "maroon"))))
#dev.off()
#save(nhdPlot, file = "data_output/nhdPlot_comid_outlet_raw.rda")
```

```{r print-nhdPlot, echo=F,eval=F, layout="l-page", fig.cap="The raw NHD data for upstream catchment and flowlines associated with the Little Shasta River outlet (maroon dot, COMID=3917946)"}

knitr::include_graphics(here("figures/raw_nhd_watershed_map.png"))

```


# Dealing with Discrepancies

In particular, the canal which delivers water from Lake Shastina appears to be routed as if it drains into the Little Shasta River, rather than bypassing the mainstem river completely. To address inaccuracies in the NHD flowlines and associated catchments, the following steps were taken:

 1. We identified natural stream channel from artificial (canal) channels, and filtered all artificial segments out, and fixed mis-classified segments.
 2. We identified catchments based on the new clean streamline dataset, and create a revised catchment layer, which included all catchments that occurred within (or nearly completely within) the NHD HUC10/HUC12 boundary.^[Both `COMID 3917928` and `COMID 3917382` overlapped outside of the HUC10 boundary, but each was included in the analysis]. 
 3. We attributed each catchment within the HUC10 to a `COMID`, and included created a clean stream network which routed each `COMID` and catchment to a downstream `COMID`, culminating at the Little Shasta confluence with the Shasta River (`COMID=3917946`), according to topography and ground-truthed knowledge of the watershed. Any `COMID` that did not have a clear surface water or topographic drainage to a mainstem or downstream catchment within the basin was dropped.
 4. We then re-calculated the catchment area, accumulated drainage area, and total drainage area for each catchment using the `{nhdtoolsPlus}` package.

```{r cleanEnviro, echo=FALSE, eval=TRUE}

rm(flowlines, flowline_huc, flowlines_huc, flowlines_trim,
   flowlines_final,
   h12_all, h8, nhdPlot, huc_feat)

```


## Selecting Flowlines & Catchments

The first step is to identify the streamlines that flow to the mainstem Little Shasta, remove any all canals, and select and re-route the network to match this cleaned network. While not correct from a streamflow standpoint, this allows us to use the catchments we need to create a stream network. We also have removed the springs and flowlines from the southern portion of the Little Shasta watershed that don't have surface flow connections to the mainstem Little Shasta River.

```{r trimcanals}

# get catch/flow VAA
# db <- here(glue("{outdir}/nhdplus_full_vaa.gpkg"))
# nhd_flowlines <- st_read(db, "NHDFlowline_Network", quiet = TRUE)
# nhd_catchments <- st_read(db, "CatchmentSP", quiet = TRUE)

# Filter CANALS and Segments that we need
comid_trim <- nhd_flowlines %>%
  filter(fcode %in% c(46006, 46003)) %>%
  # add these connectors back in
  bind_rows(., nhd_flowlines %>%
              filter(comid %in%
                       # canals
                       c(948010091, 948010089, 948010092,3917222,
                         3917226, 3917270, 3917952, 3917266,
                         948010095, 948010096)))

# get Catchments that match
catch_trim <- nhd_catchments %>%
  filter(featureid %in% comid_trim$comid)

# TRIM SOUTHERN LOBE OUT -------------

h12_lsh <- st_transform(h12_lsh, 4269)
# drop H12: 180102070302
comid_to_trim <- comid_trim[h12_lsh[h12_lsh$huc12=="180102070302",],]$comid

# drop these because we want to keep
comid_to_trim <- comid_to_trim[!grepl("3917266|3917222", x = comid_to_trim)]
# add these: 3917218, 3917212, 3917214, 3917228 b/c we want to drop
comid_to_trim <- c(comid_to_trim, 3917218, 3917212, 3917214, 3917228)

# check
# mapview(comid_trim %>% filter(!comid %in% comid_to_trim), color="red") +
#   mapview(catch_trim %>% filter(!featureid %in% comid_to_trim), alpha.regions=0, color="gray", lwd=3)

# trim (n=45 after)
catch_trim <- catch_trim %>% filter(!featureid %in% comid_to_trim)
comid_trim <- comid_trim %>% filter(!comid %in% comid_to_trim)


# get nodes (or use nhdtoolsPlus::get_nodes)
comid_nodes <- 
  comid_trim %>% 
  select(comid, tonode, fromnode, geom) %>%
  mutate(node_start = lwgeom::st_startpoint(geom),
         node_end = lwgeom::st_endpoint(geom))

# assign obj for comid start and end
comid_nodes_start <- comid_nodes
st_geometry(comid_nodes_start) <- "node_start"
#head(comid_nodes_start)
comid_nodes_end <- comid_nodes
st_geometry(comid_nodes_end) <- "node_end"

# map
# mapview(comid_trim, color="steelblue", lwd=2, layer.name="Trimmed NHD flowlines") + 
#   mapview(catch_trim, layer.name="Catchments", alpha.regions=0.3) + 
#   mapview(h12_lsh, lwd=4, alpha.regions=0, color="gray30", col.regions="gray30", layer.name="HUC12") +
#   mapview(comid_trim %>% filter(!fcode %in% c(46006, 46003)), color="red", 
#           layer.name="Canals") +
#     mapview(comid_nodes_start, col.regions="orange", layer.name="Node Start", cex=3) +
#   mapview(comid_nodes_end, col.regions="green", layer.name="Node End", cex=3)
# 
```

We can now use this set of `COMIDs` and flowlines to create a cleaned flow network, which is used for the accumulation of catchment variables. Here we add nodes which represent the start and end and are used to calculate the flow network. Notably there are a few where canals have a flow direction that is opposite of the natural drainage (see red segments), and the end nodes should be switched (orange node should be in the mainstem Little Shasta instead of in a headwater or upper catchment).

```{r makeFlownet, layout="l-page", fig.height=6, fig.cap="Visualize start (green) and end (orange) nodes of NHD flowlines. Red segments need revision."}

# set modes
tmap::tmap_mode("plot")

m2 <- tm_shape(catch_trim) +
  tm_polygons(alpha = 0,border.alpha = 0.5,border.col = "gray", col="gray", title="Catchments", legend.show=TRUE)+
  tm_shape(comid_trim %>% 
             select(comid, fromnode, tonode, hydroseq, dnhydroseq)) +
  tm_lines(col="fromnode", legend.col.show = FALSE, palette = "viridis",
           title.col = "Streamlines")+
    tm_add_legend('line', lwd=2, col=c("#440154FF", "#FDE725FF", "red"),
                  labels=c('Mainstem', 'Headwaters', 'Revised Canal'), title = "Legend") +
  tm_shape(comid_nodes_start) + 
  tm_dots(size = 0.4, border.col="forestgreen", alpha=0.7, border.alpha=1, shape = 21, col = "forestgreen", legend.show = TRUE, title="Start") + 
  tm_add_legend("symbol", shape=21, col="forestgreen", size=0.5, border.col="forestgreen", labels=c('Start')) +
tm_shape(comid_nodes_end) + 
  tm_dots(size = 0.9, alpha=0.6, border.alpha=1, border.col="orange", shape = 21, col = "orange", title="End") + 
  tm_add_legend("symbol", shape=21, col="orange", size=1, border.col="orange", labels=c('End')) +
tm_shape(comid_trim %>% 
           select(comid, fromnode, tonode, hydroseq, dnhydroseq) %>% 
           filter(comid %in% c("948010095", "948010096", "948010092", "3917222"))) +
  tm_lines(col="red", legend.col.show = TRUE,
           title.col ="Revised Directionality")+
tm_compass() + tm_scale_bar() + 
  tm_layout(frame = FALSE,
            legend.show = TRUE,
            legend.position = c("left", "top"))

m2
  
# # visualize nodes
# mapview(comid_nodes_start, 
#         col.regions="forestgreen", layer.name="Start", cex=3) +
#   mapview(comid_nodes_end, col.regions="orange", layer.name="End", cex=5) +
#   mapview(comid_trim %>% 
#             select(comid, fromnode, tonode, hydroseq, dnhydroseq), 
#           zcol="fromnode", legend=FALSE) +
#   mapview(comid_trim %>% 
#             select(comid, fromnode, tonode, hydroseq, dnhydroseq) %>% 
#             filter(comid %in% c("948010095", "948010096", "948010092", "3917222")), 
#           color="red", layer.name="Need to Fix") +
#   mapview(catch_trim, alpha.regions=0, color="gray")

```

## Revising Nodes and `COMIDs`

We next need to clean and revise the nodes identified above and comids associated with stream segments (and connecting canals) to create a complete flow network that routes flow to the outlet (Little Shasta confluence with main Shasta River). 

```{r edit-comids}

# edit comids and nodes:
comid_trim_rev <- comid_trim %>%
  select(-c(id, fdate:gnis_name, reachcode:wbareacomi, shape_length, frommeas:areasqkm, divdasqkm:enabled)) %>%
  mutate(
    # the start node
    fromnode = case_when(
      comid == 948010095 ~ 10013981,
      comid == 948010096 ~ 10086523,
      comid == 948010092 ~ 10086520,
      TRUE ~ fromnode),
    # end node
    tonode = case_when(
      comid == 948010095 ~ 10086523,
      comid == 948010096 ~ 10014067,
      comid == 948010092 ~ 10014066,
      comid == 3917222 ~ 10014066,
      TRUE ~ tonode),
    # dnhydroseq
    dnhydroseq = case_when(
      #is.na(dnhydroseq) ~ NA_real_,
      comid == 948010095 ~ 10020959,
      comid == 948010096 ~ 10021351,
      comid == 948010046 ~ 10020959,
      comid == 948010092 ~ 10032346,
      comid == 3917222 ~ 10032346,
      TRUE ~ dnhydroseq),
    # uphydroseq
    uphydroseq = case_when(
      comid == 948010095 ~ 0,
      comid == 948010096 ~ 10026762,
      comid == 948010046 ~ 10020959,
      comid == 948010091 ~ 0,
      comid == 948010092 ~ 0,
      TRUE ~ uphydroseq))

# now make cleaned flownet
flownet <- get_tocomid(comid_trim_rev, return_dendritic = TRUE, missing = 0, add = TRUE) %>%
  select(comid, tocomid, fromnode, tonode, hydroseq, dnhydroseq, ftype, fcode, lengthkm, levelpathi, totdasqkm, terminalfl, startflag, terminalpa, divergence) %>%
  mutate(
    tocomid = case_when(
      comid == 948010095L ~ 948010096L,
      comid == 948010092L ~ 3917948L,
      comid == 948010046L ~ 948010096L,
      TRUE ~ tocomid),
    fromnode = case_when(
      comid == 948010092L ~ 10086520,
      comid == 948010095L ~ 10013981, # or NA_real_
      comid == 948010096 ~ 10086523,
      TRUE ~ fromnode),
    tonode = case_when(
      comid == 948010092L ~ 10086523,
      TRUE ~ tonode))


```

If we generate an interactive flow network, we can double check the connections. **Note, the downstream outlet is "`0`".**

```{r simplenetwork, layout="l-page", fig.cap="Little Shasta flow network representation, by COMID."}

library(igraph)
library(networkD3)

(p <- simpleNetwork(flownet, Source = "comid", Target = "tocomid",
                   height = "400px", 
                   width = "400px", 
                   fontSize = 16, 
                   fontFamily = "serif",
                   nodeColour = "darkblue", 
                   linkColour = "steelblue",
                   opacity = 0.9, zoom = TRUE, charge = -40))
```

The network diagram looks correct. However, what about the flow lines? Here we plot by the upstream to downstream sort order. While this flow network is certainly artificial, it will permit more effective surface water routing through the catchments, and allows generation of the catchment accumulation data.

```{r plotFlowNetwork, layout="l-page", fig.height=6, fig.cap="Flownet visual inspection for missing nodes."}

# sort and split
flownet_sort <- get_sorted(flownet, split = TRUE)
# if dendritic and different Terminal groups exist, use split = TRUE:
# flownet_sort_main <- filter(flownet_sort, terminalID == 3917946)

# add sort_order
flownet_sort['sort_order'] <- 1:nrow(flownet_sort)


# preview
mapview(flownet_sort, zcol="sort_order", layer.name="Revised Sorted", legend=F) + 
  mapview(flownet_sort, zcol="dnhydroseq", layer.name="D/S Hydroseq", legend=F) + 
  mapview(catch_trim, alpha.regions=0, color="gray", legend =F)

```

Finally, as a final check, we can calculate the `arbolate sum` which is the cumulative total length of all upstream flowlines, for each segment or `COMID`. The `nhdplusTools::calculate_arbolate_sum()` function will calculate this information for us based on the corrected flowlines. We would expect to see all the stream segments "flowing" (or getting wider in the plot below) downstream towards the outlet.

```{r arbolatesum, layout="l-page", fig.height=6, fig.cap="Total Length of all upstream segments (cumulative flowlength)."}

# get path arbolate 
flownet_sort[["arbolatesum"]] <- calculate_arbolate_sum(
  dplyr::select(flownet_sort,
                ID = comid, toID = tocomid, length = lengthkm))

# plot based on upstream flowpath
plot(sf::st_geometry(flownet_sort), lwd = flownet_sort$arbolatesum / 10)
```

Each of the steps and pieces above can be regenerated with any given watershed, assuming there is a flowline network with associated catchments that can be routed or revised as necessary. This framework and code should provide a foundation which makes these cleaning and revision steps more efficient.


## Generate Accumulation Network

Using the cleaned surface water flow network, we calculated the length and drainage areas for each subcatchment and `COMID` segment. These data are required in the functional flow models. The funtions used included several functions to regenerate the `COMID` flow lengths and accumulate the total drainage area per each `catchment`.

We uses a modified `{nhdplusTools}` function to get generate the clean network and calculate distance/drainage area. This required adding and checking the lengths with the `nhdplusTools::get_pathlength` function, then identify all upstream `COMIDs` for any given comid. These `COMID` lists can be used to calculate each variable of interest.

```{r getLengths, eval=T, echo=FALSE}

# get pathlengths: need ID, toID, length
flowlines_net <- flownet_sort %>%
  rename(ID=comid, toID=tocomid) %>%
  # this updates the pathlength total (main path to basin terminal)
  left_join(., nhdplusTools::get_pathlength(.)) %>%
  relocate(pathlength, .after=lengthkm)

```

```{r getPaths, eval=T, echo=FALSE}
get_us_comids <- function (network, ID, distance = NULL)
{
    # drop geometry:
    if ("sf" %in% class(network))
      network <- sf::st_set_geometry(network, NULL)
    # get vector of cols to use
    netcols_needed <- c("ID", "toID", "lengthkm", "levelpathi","pathlength", "hydroseq","dnhydroseq")
    # filter to cols
    if (!sum(names(network) %in% netcols_needed)==length(netcols_needed)){
      print("Columns missing from network...double check inputs!")
    }
    network <- dplyr::select(network, all_of(netcols_needed))

    # fix names
    names(network) <- tolower(names(network))

    # set the starting ID to search from and search distance in network
    start_comid <- dplyr::filter(network, id == ID)
    if (!is.null(distance)) {
      if (distance < start_comid$lengthkm)
        return(ID)
    }

    # function to get the comids
    private_get_UT <- function(network, ID){
      # first get the mainstem comids and the trib comids
      main <- filter(network, id %in% ID)
      if (length(main$hydroseq) == 1) {
        full_main <- filter(network, levelpathi %in% main$levelpathi &
                              hydroseq >= main$hydroseq)
        trib_lpid <- filter(network, dnhydroseq %in% full_main$hydroseq &
                              !levelpathi %in% main$levelpathi &
                              hydroseq >= main$hydroseq)$levelpathi
      }
      else {
        full_main <- filter(network, levelpathi %in% main$levelpathi)
        trib_lpid <- filter(network, dnhydroseq %in% full_main$hydroseq &
                              !levelpathi %in% main$levelpathi)$levelpathi
      }
      # if trib comid to start, re-run and grab comids associated u/s
      trib_comid <- filter(network, levelpathi %in% trib_lpid)$id
      if (length(trib_comid) > 0) {
        return(c(full_main$id, private_get_UT(network, trib_comid)))
      }
      # or just give comids for mainstem
      else {
        return(full_main$id)
      }
    }
    # return dat
    all <- private_get_UT(network, ID)

    # check and sort if distance is set
    if (!is.null(distance)) {
      stop_pathlength <- start_comid$pathlength - start_comid$lengthkm +
        distance
      network <- filter(network, ID %in% all)
      return(filter(network, pathlength <= stop_pathlength)$id)
    }
    else {
      return(all)
    }
  }

```

```{r applyPaths, eval=F, echo=F}

# make list of us comids for each comid:
final_comlist <- map(flowlines_net$ID, ~get_us_comids(flowlines_net, .x)) %>% 
  set_names(flowlines_net$ID)

```

```{r cleanup2, echo=FALSE}
# cleanup files and save out
rm(coms_missing, db, final_comids, loi_comids, comid_trim, flownet_sort,
   comid_nodes, comid_nodes_end, comid_nodes_start, comid_trim_rev, 
   comidstart, flownet, h12_lsh, miss_catch, nhd_catchments, nhd_flowlines, p)

```


## Calculate Weighted Drainage Areas

Finally, we need to calculate the weighted area in relation to the total drainage area, for use in the final accumulation and flow predictions. The table below shows the **`COMID`** and **`toCOMID`** attributes, as well as the length and total length (**`pathlength`**) of a given stream segment. The **`area_weight`** is the scaled area to drainage area weight assigned to each catchment **`COMID`**. The **`totdasqkm`** is the revised cumulative drainage area for each catchment, and the **`nhdptotda`** is the previous total drainage area before the watershed was revised and cleaned.

```{r addArea, eval=TRUE, echo=FALSE}
# calc total area:
lsh_area <- sum(st_area(catch_trim) %>% set_units("km^2"))

# calculate total drainage area and weight
catchment_area <- flowlines_net %>%
  # add back in the correct areas to use in the calculation
  left_join(st_drop_geometry(catch_trim) %>%
              select(comid=featureid, areasqkm), by=c("ID"="comid")) %>%
  dplyr::select(ID, toID, area=areasqkm) %>%
  mutate(totda = calculate_total_drainage_area(.),
         # add previous calc for total drainage area
         nhdptotda = flowlines_net$totdasqkm,
         # add an area weight for accumulation
         area_weight = area / totda) 
# to calc area from catchments can use this if needed
#mutate(area = units::set_units(st_area(geom),"m^2") %>%
# make km2
#    set_units("km^2") %>% drop_units(),)


# join back to catch_main
final_catch <- left_join(catch_trim %>% select(-c(shape_length, shape_area)),
                         st_drop_geometry(catchment_area) %>%
                           select(ID, area, area_weight, totda, nhdptotda),
                         by=c("featureid"="ID")) %>%
  rename(comid=featureid)

# now add back to lsh_flownet
final_flownet <- flowlines_net %>%
  select(-c( totdasqkm)) %>%
  left_join(., st_drop_geometry(final_catch),
            by=c("ID"="comid")) %>%
  rename(totdasqkm = totda, comid=ID, tocomid=toID) %>%
  select(-c(id, sourcefc))

final_flownet %>% st_drop_geometry() %>% 
  select(comid, tocomid, ftype, lengthkm, pathlength, 
         sort_order, area_weight:nhdptotda ) %>% 
  DT::datatable()
```

The total watershed area for the Little Shasta using this iteration is `r lsh_area`. The total drainage area and cumulative drainage area is calculated as well. We can validate with a final map to triple check things. Area weights for headwaters should be equal to `1`, since cumulative drainage area is equivalent to the catchment area.

```{r finalmap, layout="l-page", fig.height=6, fig.cap="Final cleaned watershed."}


fm1 <- tm_shape(final_catch) +
  tm_polygons(col="totda", title="Total Drainage\nArea (sqkm)") +
  tm_shape(final_flownet) +
  tm_lines(lwd=1, col="cyan4")+
  #tm_compass() + tm_scale_bar() + 
  tm_layout(frame = FALSE,
            legend.show = TRUE,
            legend.position = c("left", "top"))
fm2 <- tm_shape(final_catch) +
  tm_polygons(col="area_weight", title="Area Weight", palette="viridis") +
  tm_shape(final_flownet) +
  tm_lines(lwd=2,alpha = 0.8, col="cyan4")+
  tm_compass() + tm_scale_bar() + 
  tm_layout(frame = FALSE,
            legend.show = TRUE,
            legend.position = c("left", "top"))


tmap::tmap_arrange(fm1, fm2, ncol=1)
  
# mapview(final_flownet, lwd=4, zcol="pathlength", 
#         layer.name="Flowline Path Length") +
#   mapview(final_catch, zcol="totda", layer.name="Tot. Drainage Area") +
#   mapview(final_catch, zcol="area_weight", layer.name="Area Weight")


```

```{r saveit, eval=F, echo=F}

write_rds(final_comlist, file="data_output/cleaned_full_lshasta_comid_accum_list.rds")
write_rds(final_flownet, file = "data_output/cleaned_full_lshasta_nhd_flownet.rds")
write_rds(final_catch, file="data_output/cleaned_full_lshasta_nhd_catchments.rds")

# list(comlist_accum = final_comlist , catchments = final_catch, flowlines = final_flownet))
```