Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: What makes a ribbon to cross over other ribbons in sankey plot ? #37

Open
Valentin-Bio opened this issue Mar 15, 2023 · 4 comments

Comments

@Valentin-Bio
Copy link

Hello developer! ,

I'm using geom_sankey() to plot microbial taxonomies by given taxonomic ranks. This is what I did:

colnames(taxonomy_table)

"Domain" "Phylum" "Class" "Order" "Family" "Genus"

tableforsankey <- taxonomy_table %>%
make_long(Domain, Phylum, Class, Order, Family, Genus)


phylum_colors <- c(
  "Bacteria" = "cadetblue3",
  "Proteobacteria" = "antiquewhite2",
  "Cyanobacteria" = "chocolate1",
  "Bacteroidota" = "aquamarine3",
  "Actinobacteriota" = "bisque4",
  "Gammaproteobacteria" = "antiquewhite2",
  "Burkholderiales" = "antiquewhite2",
  "BACL14" = "antiquewhite2",
  "Amylibacter" = "antiquewhite2",
  "Alphaproteobacteria" = "antiquewhite2",
  "Thioglobus" = "antiquewhite2",
  "Rhodobacterales" = "antiquewhite2",
  "Rhizobiales_B" = "antiquewhite2",
  "Pseudomonadales" = "antiquewhite2", 
  "PS1" = "antiquewhite2",
  "Thioglobaceae" = "antiquewhite2",
  "TMED25" = "antiquewhite2",
  "Rhodobacteraceae" = "antiquewhite2",
  "Pseudohongiellaceae" = "antiquewhite2",
  "Methylophilaceae" = "antiquewhite2",
  "Bacteroidia" = "aquamarine3",
  "Flavobacteriales" = "aquamarine3",
  "Flavobacteriaceae" = "aquamarine3",
  "MED-G11" = "aquamarine3",
  "Algibacter_B" = "aquamarine3",
  "Cyanobacteriia" = "chocolate1",
  "PCC-6307" = "chocolate1",
  "Cyanobiaceae" = "chocolate1",
  "Synechococcus_E" = "chocolate1",
  "Synechococcus_C" = "chocolate1", 
  "Acidimicrobiia" = "bisque4",
  "Actinomarinales" = "bisque4",
  "Actinomarinaceae" = "bisque4",
  "Actinomarina" = "bisque4"
)



ggplot(tableforsankey, 
       aes(x = x,
           next_x = next_x,
           node = node,
           next_node = next_node,
           fill = factor(node),
           label = node)) + 
  geom_sankey(flow.alpha = 0.75,node.color = 1, type = "sankey") +
  geom_sankey_label(size = 2.5, color = 1, fill = "aliceblue") + 
  scale_fill_manual(values = phylum_colors) + 
  theme_sankey(base_size = 16) +
  theme(legend.position = "none", axis.text = element_text(size = 9)) +
  xlab("") + ggtitle("Bacteria")

and this is the sankey that I get:

image

ribbons from phylum starts to intercross, is there a way in which I can display the sankey plot but specifying the ribbons to not cross over other ribbons ?

best regards,

Valentín.

@giacomomutti
Copy link

Hey Valentin, I am in a very similar situation, did you find a solution for this?

@Valentin-Bio
Copy link
Author

Hello @giacomomutti , I could not figure out how to make it.

bests.

@keithnewman
Copy link

keithnewman commented Nov 7, 2023

Your nodes have a character names, so standard ggplot behaviour is to display these as categories in alphabetical order. If you notice at each x coordinate (or column if you prefer to think about it that way), the nodes are in alphabetical order (with A at the bottom to Z at the top, but with capital letters coming before lower case equivalents if we look at the order of TMED25 before Thioglobaceae). This ordering determines the node locations, which causes the overlaps to happen.

To control the order of character labels, you can convert them your node and next_node data columns to factor objects and specify the ordering you want as the factor levels. They'll order themselves using this level-ordering rather than alphabetical ordering. Forcats may assist with handling factors.

However, I'm finding factors can mess up the sankey label positioning, which is why I'm browsing the issue board in the first place.

@giacomomutti
Copy link

I solved this issue by converting the node and next_node column to factor but the levels are all the names in your dataset.

First you need to arrange your dataset for all the columns you are interested in. Then you get the levels of all the columns and apply the same ordering to all the columns and the node and next_node variable and it should work. Then both the labels and the sankey will be correctly positioned.

This may not work if you have the same label for different taxonomic levels, in this case you can add a prefix to each clade like "c__Haptophyta" and "f__Haptophyta" so that they are unique and then remove the prefix, in this case that's the label column.

df <- df %>% 
  arrange(phylum, class, order, family, genus, species, count)

lvls_tax <- c("Eukaryota",unique(c(unique(df$phylum), unique(df$class), unique(df$class), 
                                   unique(df$order), unique(df$family),unique(df$genus))))

df <- df %>% 
  mutate(phylum=factor(phylum, ordered = T, lvls_tax),
         class=factor(class, ordered = T, lvls_tax),
         order=factor(order, ordered = T, lvls_tax),
         family=factor(family, ordered = T, lvls_tax),
         genus=factor(genus, ordered = T, lvls_tax),
         species=factor(species, ordered = T, lvls_tax))

df_long <- df %>% 
  make_long(colnames(df)[1:6], value = count) %>% 
  mutate(node=factor(node, lvls_tax), next_node=factor(next_node, lvls_tax),
         label=gsub(".*_", "", node)) %>% 
  filter(!is.na(node))

ggplot(df_long, aes(x = x, next_x = next_x, node = node, next_node = next_node, fill = node, label=label)) +
  geom_alluvial(space = 2, width = .3, flow.alpha = .6) +
  geom_alluvial_label(size = 2.5, space = 2, color = 1, fill = "aliceblue") +
  theme(legend.position = "none", axis.text.y = element_blank(),
        axis.ticks.y = element_blank(), axis.title.x = element_blank(),
        axis.text.x = element_text(angle=0, family = "Helvetica", colour = "black"))

This is the resulting plot:

image

Hope it helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants