compute topic similarity more efficiently? #38

f-hafner · 2023-03-08T19:08:21Z

Currently, we iterate over each graduation year, but for each iteration, we load a window of data +/- 5 years into memory. If we compute the similarity for a 2 or more neighboring graduation years, we only have to add data for two additional years. This could speed up the calculations. The trade-off is that this needs more memory.

chrished · 2023-03-09T16:23:35Z

what about instead iteratively querying the data: when done with first window -> drop the first year + only load the additional year needed -> ...

f-hafner · 2023-03-09T19:07:37Z

that is an option, although it would require some rewriting. and the parallelization would "only" be over fields of study. an alternative is DuckDB . #39

chrished · 2023-03-09T20:03:50Z

Ah yes, didn’t have the parallelization in mind. But as the results stored, is there a big benefit to do it more efficiently? Am 2023-03-09 um 20:07 schrieb f-hafner ***@***.***>: that is an option, although it would require some rewriting. and the parallelization would "only" be over fields of study. an alternative is DuckDB . #39 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

f-hafner · 2023-03-10T16:31:43Z

because we calculate the similarities for all graduates-potential employers, this step only comes after linking. so when we update the linking, we need to rerun the calculations for similarity as well.

but it's not urgent and not very important, but it crossed my mind and I wanted to keep it as an issue for the moment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compute topic similarity more efficiently? #38

compute topic similarity more efficiently? #38

f-hafner commented Mar 8, 2023

chrished commented Mar 9, 2023

f-hafner commented Mar 9, 2023

chrished commented Mar 9, 2023 via email

f-hafner commented Mar 10, 2023

compute topic similarity more efficiently? #38

compute topic similarity more efficiently? #38

Comments

f-hafner commented Mar 8, 2023

chrished commented Mar 9, 2023

f-hafner commented Mar 9, 2023

chrished commented Mar 9, 2023 via email

f-hafner commented Mar 10, 2023