Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't reproduce the preprocessed data #10

Open
quynhneo opened this issue Nov 19, 2020 · 10 comments
Open

can't reproduce the preprocessed data #10

quynhneo opened this issue Nov 19, 2020 · 10 comments

Comments

@quynhneo
Copy link

quynhneo commented Nov 19, 2020

Hi there,
I ran https://github.com/adjidieng/DETM/blob/master/scripts/data_undebates.py on the kaggle data for un debates (as link in your paper: https://www.kaggle.com/unitednations/un-general-debates) but I am unable to reproduce the preprocessed data you linked here https://bitbucket.org/franrruiz/data_undebates_largev/src/master/ (variables in .mat files are different from yours) .
Any idea? There is not much setting beside min_df and max_df. I used the default, perhaps you used something else?

@mona-timmermann
Copy link

Might be too obvious, but could it just be because of the random permutation with no seed? Apart from that, I've observed a lot of things I had to change in the code to get it to run and to implement the model as described in the paper. I was never able to reproduce the results using the original code.

@quynhneo
Copy link
Author

hm...possibly. Same here on having to change a lot. Perhaps we should submit some PRs.

@Emekaborisama
Copy link

Let's work on converting it to a python library @quynhneo @mona-timmermann

What do you think?

Although I notice a new error that occurs on a large dataset

@quynhneo
Copy link
Author

quynhneo commented Jan 5, 2021

Not a bad idea ... Ideally we have @adjidieng supports the idea .

@Emekaborisama
Copy link

I can talk to @adjidieng tomorrow and i will keep you in touch with her response

wyt? @mona-timmermann

@Emekaborisama
Copy link

Adji said we can proceed but we will upload the package as a branch on this repo.
@quynhneo @mona-timmermann lets get this done

@yangyijane
Copy link

@Emekaborisama Hi any updates on the python script to reproduce this study? thank you very much.

@yangyijane
Copy link

yangyijane commented Feb 4, 2021 via email

@yangyijane
Copy link

yangyijane commented Feb 4, 2021 via email

@quynhneo
Copy link
Author

according to the paper, they calculate perplexity using test documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants