MovieLens 25M contains 25M ratings and 1M tag applications applied to 62k+ movies by 162k users.
PosterLens 25M collects 62061 posters (~330 movies from the dataset are missing a cover)for movies from MovieLens 25M together with their ResNet-34 embeddings.
MovieLens 20M contains 20M ratings and 0.5M tag applications applied to 27k+ movies by 138k users.
PosterLens 20M collects 27163 posters (115 movies from the dataset are missing a cover) for movies from MovieLens 20M together with their ResNet-34 embeddings.
This repo contains the reproducible pipeline generating the datasetj.
Download a copy from Kaggle:
kaggle datasets download -d aptlin/posterlens-25m
kaggle datasets download -d aptlin/posterlens-20m
-
Pick the size of a dataset from the official page with MovieLens datasets (at the moment only 25m and 20m are supported)
export MOVIELENS_SIZE= <Your string>
-
Clone the repo:
git clone [email protected]:aptlin/posterlens.git
-
Install dependencies using poetry:
cd posterlens poetry install
-
Run the pipeline:
./run.sh $MOVIELENS_SIZE
Please cite the dataset in case you find it helpful for your research:
Sasha Aptlin, “PosterLens 25M.” Kaggle, 2021, doi: 10.34740/KAGGLE/DS/1321802.