-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Affiliation info: cites, pubs and fields #27
Comments
Add author count, simply count distinct authors with that affiliation-field |
I suppose you want the size of "departments"? Since we have discarded the author info in |
or, we summarise from AuthorAffiliation, where we have already assigned the main institution for each author-year. if we do this, then the field information will also be from the author-level, which differs from the table on publication outcomes by university. another issue is that authors may not publish every year. so to a get a good proxy of the "department size" in a given year may be best an average with a rolling window? |
I would say we start with what you call: "number of potential collaborators". For this it is not necessarily an issue if they do not publish every year, we can take the average over the last 5 years ex-post, like we do for citations in the analysis. This can be done directly here without any big issue, right? |
The other measure is good too, and we will almost surely want that, in particular in combination with citation measures... So if you can do it "quickly", please add both |
Should be ready tomorrow. I want to add that an author can be counted in multiple cells because they publish in multiple fields at level 0 (more likely) and/or because they publish at multiple affiliations. The latter is less likely, but
Since we already use |
I think we can make the tables for paper/citation counts at affiliation-year-field0 as well as the keyword list at the same level in one go:
PaperId
,Year
,AffiliationId
AffiliationId
-Field0
-Year
levelAffiliationId
-Field0
-Year
levelTo consider/note
affiliation_outcomes
, which is only at theAffiliationId
level, with the output from step 2 above. I don't know right now where we use this old table as an input, and we should checkFieldOfStudyId
and aScore
. We can use the next lower integer of the score as a frequency weight to calculate tf-idf (but not sure how to exactly implement frequency weights)Here are the queries, which we can use to replace the query in
affiliation_outcomes.py
.The text was updated successfully, but these errors were encountered: