Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scaling of factors in plot_type = "scatter" #131

Open
stephens999 opened this issue Jul 24, 2024 · 9 comments
Open

scaling of factors in plot_type = "scatter" #131

stephens999 opened this issue Jul 24, 2024 · 9 comments

Comments

@stephens999
Copy link

it seems in the vignette (flashier_single_cell.html) scatter plots the factors are normalized
to have max value 1, but I don't think we necessarily want or need that scaling?

(The heatmap is also scaled that way, but arguably this could be helpful in the heatmap because we are showing all the factors at once.)

@pcarbo
Copy link
Collaborator

pcarbo commented Jul 25, 2024

@stephens999 Are you suggesting to plot L_pm or F_pm? Currently I think @willwerscheid uses ldf() to normalize the factors.

@stephens999
Copy link
Author

I am suggesting to plot DF' instead of F' (with L normalized with infinity norm).
The vignette does print out the values of DF' for the top genes, but it doesn't include D in the plot.

@pcarbo
Copy link
Collaborator

pcarbo commented Jul 26, 2024

Yes, that makes sense.

This reminds me that I wanted to make a few related improvements to the plotting functions in flashier, mainly to clarify what is being plotted.

@pcarbo
Copy link
Collaborator

pcarbo commented Jul 30, 2024

@stephens999 @willwerscheid I've been thinking about how to make the plots more consistent and uniform in interpretation:

plot_type = c("scree", "bar", "heatmap", "histogram", "scatter", "structure")

(Aside from the screen plot which is different.)

What do you think about the following conventions for the plots:

  • If a single k is being plotted, show with(ldf(fl,type = "i"),F %*% diag(D)) or `with(ldf(fl,type = "i"),L %*% diag(D)).

  • If more than one k is being plotted, show ldf(fl,type = "i")$F or ldf(fl, type = "i")$L so that the values are more comparable across k.

  • Potentially we could have a TRUE/FALSE option (e.g., "normalize") to control this behaviour.

@stephens999
Copy link
Author

stephens999 commented Jul 30, 2024 via email

@pcarbo
Copy link
Collaborator

pcarbo commented Jul 31, 2024

@stephens999 Okay, but it sounds like this would only be the default when plotting a single k? As you said previously,

The heatmap is also scaled that way [that is, normalized so that the maximum value is 1], but arguably this could be helpful in the heatmap because we are showing all the factors at once.

In other words, the default will be with(ldf(fl,type = "i"),F %*% diag(D)) only when plotting a single column of F.

@stephens999
Copy link
Author

stephens999 commented Aug 1, 2024 via email

@pcarbo
Copy link
Collaborator

pcarbo commented Aug 1, 2024

@willwerscheid Matthew is suggesting the following default for all plots:

  • When plotting L, ldf(fl, type = "i")$L is shown.

  • When plotting F, with(ldf(fl,type = "i"),F %*% diag(D)) is shown.

Are you okay with this as the default setting for all plots? Are there particular plots where you think this default might be problematic?

If you are okay with this, I will update the plotting interface and propose the changes in a pull request.

@pcarbo
Copy link
Collaborator

pcarbo commented Aug 12, 2024

I created a branch "improvements_to_plots" to address this issue and tackle other related improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants