You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been working extensively with Bayesian Personalized Ranking (BPR) for quite a sometime and I found something very strange with it. Suppose I have a data containing userId, itemId, rating and train the bpr on it and save the factors. Next, suppose I retrain another bpr on the same data but initialized with the previously trained factors, and then when I compare the user_factors or item factors, the newly created factors change only in very few dimensions and everything else remains the same. Also when I try to log the two runs of BPR I find it very strange that the logs in first iteration are seen as expected but logs in second run are un-explainable.
In my use case,
I have users, items and timeframe(suppose one month). As the timeframe slides, interaction between users and items changes, for example, a user may buy multiple or zero items spanning the timeframe and an item may or may not be available in a timeframe. Also new users/items may get added in subsequent timeframes.
So I believed that let's train bpr on initial timeframe, let's call it T1, get the factors(embeddings) for user and item, on second timeframe T2, let's train the bpr2 but this time let's initialize the user and item factors as follows:
For those users/items in T2 for which we have factors available from T1, initialize with them.
For those users/items in T2 for which no factors are available from T1, initialize them with random vectors.
Later I found out that even if I run the bpr on same dataset, inconsistent behaviour is shown.
I modified the /cornac/cornac/models/bpr/recom_bpr.pyx for logging, and have attached the same.
Also, I have attached a sample dataset, and a notebook to run on.
Different variables like log_initiallog_final etc. are defined in notebook.
Now, I tried comparing the learned factors from these two runs and found these observations-
As we read the log_initial for any item at index let's say 0:
The last entry i.e. item_factor whether i or j, in the log appears as an embedding in the df_item exactly as expected).
If we take a look at the logs, every after item_{i/j} factor and following previous item_{i/j} factor are exactly the same as expected.
Now when we read log_final for the same index as read in log_initial:
The last entry i.e. item_factor{i/j} in the log is not same as what appears in the df_item_final.
if we take a look at the logs, every after item_{i/j} factor and following previous item_{i/j} are not same. I don't know why this happens.
So I have following questions:
Why the logs are consistent in log_initial and inconsistent in log_final??
Why the final entry in log_final does not show up as item_factor in df_item_final??
When I compare both the embeddings, except item_bias and values in first few dimensions change, everything else remains the same in df_item and df_item_final.
In which platform does it happen?
Cornac 1.14.2
Python 3.8.16
Debian 11 Bullseye
How do we replicate the issue?
First download the repo cornac and extract it and then replace the file cornac/cornac/models/bpr/recom_bpr.pyx with the one available here
Next install it by running the command python3 setup.py install
Description
I have been working extensively with Bayesian Personalized Ranking (BPR) for quite a sometime and I found something very strange with it. Suppose I have a data containing userId, itemId, rating and train the bpr on it and save the factors. Next, suppose I retrain another bpr on the same data but initialized with the previously trained factors, and then when I compare the user_factors or item factors, the newly created factors change only in very few dimensions and everything else remains the same. Also when I try to log the two runs of BPR I find it very strange that the logs in first iteration are seen as expected but logs in second run are un-explainable.
In my use case,
I have users, items and timeframe(suppose one month). As the timeframe slides, interaction between users and items changes, for example, a user may buy multiple or zero items spanning the timeframe and an item may or may not be available in a timeframe. Also new users/items may get added in subsequent timeframes.
So I believed that let's train bpr on initial timeframe, let's call it T1, get the factors(embeddings) for user and item, on second timeframe T2, let's train the bpr2 but this time let's initialize the user and item factors as follows:
Later I found out that even if I run the bpr on same dataset, inconsistent behaviour is shown.
I modified the
/cornac/cornac/models/bpr/recom_bpr.pyx
for logging, and have attached the same.Also, I have attached a sample dataset, and a notebook to run on.
Different variables like log_initial log_final etc. are defined in notebook.
Now, I tried comparing the learned factors from these two runs and found these observations-
As we read the log_initial for any item at index let's say 0:
Now when we read log_final for the same index as read in log_initial:
So I have following questions:
In which platform does it happen?
Cornac 1.14.2
Python 3.8.16
Debian 11 Bullseye
How do we replicate the issue?
cornac/cornac/models/bpr/recom_bpr.pyx
with the one available herepython3 setup.py install
Expected behavior (i.e. solution)
The text was updated successfully, but these errors were encountered: