Skip to content
This repository has been archived by the owner on Jan 2, 2023. It is now read-only.

How do you perform figures 2a) and 2d) from the paper ? #4

Open
eddgeag opened this issue Oct 10, 2018 · 5 comments
Open

How do you perform figures 2a) and 2d) from the paper ? #4

eddgeag opened this issue Oct 10, 2018 · 5 comments

Comments

@eddgeag
Copy link

eddgeag commented Oct 10, 2018

For the 2a) figure, what I understood, is to run the complete data set ( oterwise if I choose a subsample of 10000 samples, I get 2 cluster of minimum in BIC) and to do one iteration with the whole dataset. then the error bars sub samples but with the same order. 100 fits, With 100 different iterations. Or what do you do ?

For the 2d) figure
What I try to do is the next:
I look for the 4 highest enrichment ( the index), which correspond with the lowest p-values, hence the 4 personalities right ? With those index, I select the mean clusters. And the standard deviation are the diagonal of the covariance of the selected cluster, But I don't get the same results

** Because there is no code for this figures**

@eddgeag
Copy link
Author

eddgeag commented Oct 10, 2018

Bassicaly I don't get the same results from the table based on the 300 items test

Sorry for the inconvenient.
Best, Edmond

@martingerlach
Copy link
Collaborator

Regarding Fig. 2a:
You have to run the clustering on the full dataset. In the notebook I believe I selected a smaller subset so that the calculation will finish within a few minutes and not take hours. It is correct that with fewer data you will also detect fewer clusters according to the BIC. However, if you analyze the full dataset, you should see a minimum around ~13.
I also included the explicit code used in the calculation of the errorbars. This is done by generating several (100 in the paper; here I did only 10 again for sake of time) bootstrapped datasets, that is we draw datasets of the same size with replacement. We then perform the same procedure. This is a common way to estimate the errorbars based on the fact that we only have a finite sample.
I updated the corresponding notebook (analysis_clustering-01...)

Regarding Fig. 2d:
You have to also fit the whole dataset with 13 or so clusters (that is what the BIC says is the optimum). You will then see that only 4 clusters have a small p-value and a large enrichment factor.
You can then select those 4 clusters and look at their position in the 5D-traitspace.
I updated the corresponding notebook (analysis_clustering-02...) and get the same results.

Best.
Martin

@eddgeag
Copy link
Author

eddgeag commented Oct 10, 2018

Thank you so much
Edmond,
Best

@eddgeag
Copy link
Author

eddgeag commented Oct 15, 2018

Just one more question:
Let's say you don't know the order of the clusters regarding the personality types, you only know the index, in other words, you don't have the literature data. How do you know which cluster belong to each type of personality ?

@martingerlach
Copy link
Collaborator

The label merely represents an interpretation of the inferred personality types based on the location in trait space.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants