Skip to content

Commit

Permalink
User-level Embeddings vs Status-level Embeddings Averages (#30)
Browse files Browse the repository at this point in the history
* Calculate embeddings averages per user

* Save deduped and averaged files to drive

* Actually save as parquet file to reduce file size in half

* Upload results obtained from running this notebook
  • Loading branch information
s2t2 authored Feb 19, 2024
1 parent fe9e414 commit 6b63665
Show file tree
Hide file tree
Showing 25 changed files with 9,092 additions and 2 deletions.

Large diffs are not rendered by default.

10 changes: 8 additions & 2 deletions notebooks/openai_embeddings_v2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@

This supercedes earlier approach to fetch embeddings. In this second attempt we are grabbing user-level as well as tweet-level embeddings, to compare the difference in these approaches.

The "Exporting Embeddings" notebook takes embeddings stored in BigQuery (see app/openai_embeddings_v2/README.md), and exports them to CSV / parquet files on Google Drive for easier and cheaper access
1. The "Exporting Embeddings" notebook takes embeddings stored in BigQuery (see app/openai_embeddings_v2/README.md), and exports them to CSV / parquet files on Google Drive for easier and cheaper access

The "Analysis Template" notebook provides an example of how to load the files from drive for further analysis.

2. The "De duping and Averaging" notebook de-duplicates status embeddings, and also calculates average tweet-level embeddings per user, and saves these CSV files to drive.


3. The "Analysis Template" notebook provides an example of how to load the files from drive for further analysis.

4. The "User vs Tweet Level Embeddings" notebook performs dimensionality reduction on user embeddings vs tweet embeddings averaged for each user. The results are saved to drive, and then copied to the "results/openai_embeddings_v2" folder in this repo.
Loading

0 comments on commit 6b63665

Please sign in to comment.