Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NA cluster ID values #22

Open
russHyde opened this issue Jan 18, 2023 · 6 comments
Open

Fix NA cluster ID values #22

russHyde opened this issue Jan 18, 2023 · 6 comments

Comments

@russHyde
Copy link
Collaborator

Using "scanner-env-2022-08-01.rds" from https://github.com/oliviabboyd/tfpscan_sample .

tree <- treeview(
  "tfpscan-2022-08-01/scanner-env-2022-08-01.rds",
  output_dir = "tfpscan-2022-08-01/treeview"
)

When running the above in the trim-tooltip branch, most of the tooltip text are of the form "Cluster ID: 12345", but some are of the form "Cluster ID: NA". This corresponds to tree$data$mouseover. But the relevant rows have tree$data$cluster_id matching NA_character_.

@russHyde
Copy link
Collaborator Author

In the following code from treeview()

  sc2 <- sc0[!is.na(sc0$tr2mrca), ]
  sc2$date_range <- sapply(
    seq_len(nrow(sc2)),
    function(i) glue::glue("{sc2$least_recent_tip[i]} -> {sc2$most_recent_tip[i]}")
  )

  ## tips
  td0 <- sc2[sc2$tr2mrca <= ape::Ntip(tr2), tdvars]
  td0$lineages <- td0$lineage
  td0$cocirc_summary <- td0$cocirc_lineage_summary
  td0$node <- td0$tr2mrca
  td0$internal <- "N"

  ## internal
  td1 <- sc2[sc2$tr2mrca > ape::Ntip(tr2), tdvars]
  if (nrow(td1) > 0) {
    td1$lineages <- td1$lineage
    td1$cocirc_summary <- td1$cocirc_lineage_summary
    td1$node <- td1$tr2mrca
    td1$internal <- "Y"
    td1$cluster_size <- 0
    x <- setdiff(
      (ape::Ntip(tr2) + 1):(ape::Ntip(tr2) + ape::Nnode(tr2)),
      td1$node
    ) # make sure every node represented
    td1 <- merge(td1,
      data.frame(node = x),
      all = TRUE
    )
    td <- rbind(td0, td1)
  } else {
    td <- td0
  }
  td <- td[order(td$node), ] # important

The sc0 data.frame from which tree$data is ultimately formed, has no NA cluster_id entries.
Nor are there NA cluster_ids in sc2, td0 or td1 (prior to definition of x).

@russHyde
Copy link
Collaborator Author

But, we have

x == c(175, 181, 188, 189, 190, 199)

And after the merge(td1, data.frame(node = x), all = TRUE) we do have NA values in cluster_id

td1$cluster_id
 [1] NA       "94121"  "94137"  "94217"  "96096"  "96097"  NA       "97961"  "97988"  "98462"  "98463"  "98464" 
[13] "99556"  NA       NA       NA       "100556" "100557" "100558" "100564" "101069" "102175" "102216" "102424"
[25] NA       "102637" "103258" "103913" "104820" "105030" "106973" "106999" "107001" "108938" "108939" "108940"
[37] "108947" "109088" "109306" "109307" "109308" "110040" "110519" "111323" "111355" "113060" "113508" "114073"

@russHyde
Copy link
Collaborator Author

In the above,

  • td0$node contains the sequence 1 .. 174.
  • prior to merging with x, td1$node contains
sort(td1$node)
 [1] 176 177 178 179 180 182 183 184 185 186 187 191 192 193 194 195 196 197 198 200 201 202 203 204 205 206 207 208
[29] 209 210 211 212 213 214 215 216 217 218 219 220 221 222

@russHyde
Copy link
Collaborator Author

The tips/leaves of the tree are the nodes with ID 1 .. ape::Ntip(tr2), where here ape::Ntip(tr2) == 174

The internal nodes appear to be the nodes with ID (ntip + 1) : (ntip + nnode), where ntip = ape::Ntip(tr2) and nnode = ape::Nnode(tr2).

@russHyde
Copy link
Collaborator Author

node values derive from sc0$tr2mrca.
The code to add tr2mrca to sc0 for internal nodes looks like this:

  for (i in seq_along(stres2)) {
    inode <- i + ape::Ntip(tr2)
    uv <- stats::na.omit(sc0$node_number[match(
      stres2[[i]]$tip.label,
      sc0$representative
    )])
    shared_anc <- Reduce(
      intersect,
      e0$ancestors[uv]
    )
    shared_anc2 <- setdiff(
      intersect(
        shared_anc,
        sc0$node_number
      ),
      uv
    )
    if (length(shared_anc2) > 0) {
      a <- shared_anc2[which.min(e0$ndesc[shared_anc2])]
      sc0$tr2mrca[sc0$node_number == a] <- inode
    }
  }

@russHyde
Copy link
Collaborator Author

for inode = 175, the shared_anc2 vector is empty. OMG there must be a simple way to set up a mapping between tree node and cluster ID.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant