Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have some questions. Can GFlowNet be used for topic modeling? What are the challenges involved? Discussion is welcome. #1

Open
sample-guo opened this issue Jul 7, 2023 · 5 comments

Comments

@sample-guo
Copy link

No description provided.

@malkin1729
Copy link
Collaborator

Good question.

Yes, a GFlowNet -- and GFlowNet-EM in particular -- could be used for topic modeling, since a topic model is a particular kind of latent variable model, albeit one with a continuous latent variable. Consider the example of LDA, using the notations $\theta$ for a topic vector and $x$ for a document.

  • The generative model ($p$) is specified by $p(\theta)$, which is fixed to $\text{Dirichlet}(\alpha)$, and by $p(x\mid\theta)$.
    • In LDA, the latter is fully specified by the topic-word matrix $A$, and $\log p(x_i\mid\theta_i)$ can be computed using matrix products and logsoftmax operations (if $x_i$ is represented as a vector of word counts).
    • This could take some other parametric form in topic models that do not assume exchangeability (i.e., words in $x_i$ not conditionally independent given $\theta_i$).
  • The posterior model $q(\theta_i\mid x_i)$ is typically estimated with a Dirichlet in LDA algorithms. However, we can instead train a continuous GFlowNet of some form to sample $\theta_i$, a point in the probability simplex, conditioned on $x_i$.
    • This GFlowNet would be trained in the E-step, and the reward for sampling $\theta_i$ given document $x_i$ is $p(\theta_i)p(x_i\mid\theta_i)$.
    • How should a GFlowNet generate a vector in the probability simplex over topics? One way is using a stick-breaking process with a mixture of Betas policy (like in the code below), but there are probably other ways.
    • In the M-step, one trains the model $p(x_i\mid\theta_i)$ by sampling from the posterior model, $\theta_i\sim q(\theta_i\mid x_i)$, and taking gradient steps on $\log p(x_i\mid\theta_i) + \log p(A)$, where $p(A)$ is the $\text{Dirichlet}(\beta)$ prior on the topic-word matrix.

I have tried this for very small topic models on synthetic data -- you can find the code for a proof of concept at https://gist.github.com/malkin1729/88227a1e451596e1ea1fc7d4e0a7ae09 -- but never pursued it further. Curious about what you can do with it, and particularly about whether topic models with more interesting structure in the latent can benefit from the GFlowNet approach.

@sample-guo
Copy link
Author

As far as I know, some tree-structured neural topic models and nonparametric forest-structured topic models do utilize the stick-breaking process with a mixture of Betas. I am considering whether GFlowNet can be used as an alternative for modeling in these cases.

Additionally, for certain popular neural topic models based on VAEs, the assumption is often made that the latent variables follow a Gaussian distribution or a logit Gaussian distribution. I am pondering whether continuous-GFlownet theory can be employed as a replacement in these cases.

@malkin1729
Copy link
Collaborator

malkin1729 commented Jul 13, 2023

A GFlowNet could be used to sample the posterior over latent topic vectors in nonparametric topic models, indeed. However, I have not seen stick-breaking with mixture of Betas in that literature. Do you have a reference?

In my code, I simply used mixture of Betas in parametrizing the sampling of a point in the probability simplex by sequentially “breaking off” probability mass to assign to a chosen topic.

@sample-guo
Copy link
Author

  • Masaru Isonuma, Junichiro Mori, Danushka Bollegala, and Ichiro Sakata. 2020. Tree-structured neural topic model. In ACL, pages 800–806.

  • Ziye Chen, Cheng Ding, Zusheng Zhang, Yanghui Rao, and Haoran Xie. 2021b. Tree-structured topic modeling with nonparametric neural variational inference. In ACL/IJCNLP, pages 2343–2353.

  • Zhang Z, Zhang X, Rao Y. Nonparametric Forest-Structured Neural Topic Modeling[C]//Proceedings of the 29th International Conference on Computational Linguistics. 2022: 2585-2597.

I apologize, but I believe these papers are related. If they are combined with Gflownet, do you have any thoughts or suggestions?

@malkin1729
Copy link
Collaborator

There is nothing to apologize for. Thank you for the references. I have worked a little on graph-structured topic models and had seen the first paper before, and I quickly looked at the other two now.

They are relevant to structured topic models, of course, but I do not see there the use of learned Beta mixtures as posterior estimators, which is what I had asked about in the comment above.

A GFlowNet could be used as an amortized variational posterior in any of these models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants