Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deep recommender example. #23

Merged
merged 1 commit into from
Feb 27, 2025

Conversation

hertschuh
Copy link
Collaborator

In this example, we add features for both movies and user, and compare networks of different depths for both towers.

@hertschuh hertschuh requested a review from abheesht17 February 12, 2025 01:06
Copy link
Collaborator

@abheesht17 abheesht17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some NITs. Will have another pass soon!

usually not be immediately usable in a model.

For example:
- User and item ids may be strings (titles, usernames) or large, noncontiguous
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NITs:
ids --> IDs (in other places as well)
noncontiguous --> non-contiguous

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also did ids -> IDs in other examples.


Of course, complex models also have their disadvantages. The first is
computational cost, as larger models require both more memory and more
computation to fit and serve. The second is the requirement for more data. In
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does fit here mean .fit()? Maybe, we should use train?


Nevertheless, effort put into building and fine-tuning larger models often pays
off. In this tutorial, we will illustrate how to build a deep retrieval model
using Keras Recommenders. We'll do this by building progressively more complex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, we need to decide the name of the library and uniformly change it everywhere :P

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Well I removed it here for now, it doesn't really add anything.


### Normalizing continuous features

Continuous features may need normlization so that they fall within an acceptable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normlization -> normalization

### Normalizing continuous features

Continuous features may need normlization so that they fall within an acceptable
range for the model. We will give two examples of such normalization
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full stop


"""
This looks correct, the layer is tokenizing titles into individual words.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need newline here and below?

"""

AGE_BINS_COUNT = 10
use_age_feature = keras.utils.FeatureSpace.float_discretized(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_age_feature --> user_age_feature?

### Extract raw candidate features

First, we gather all the raw features from the dataset in lists. That is the
titles of the movies and the genres. Note that one ore more genres are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ore --> or


"""
Now we need to pad genres with an Out Of Vocabulary value to be able to
represent genres with as a fixed size vector. We'll pad with zeros for
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove with before as?

keras.layers.GlobalAveragePooling1D(),
]
)
self.movie_genres_embedding = keras.Sequential(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a comment here, saying that for every movie, we embed every token, and then take the mean of all token embeddings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment for the title token embeddings and the genres embeddings.

In this example, we add features for both movies and user, and compare networks of different depths for both towers.
Copy link
Collaborator

@abheesht17 abheesht17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@hertschuh hertschuh merged commit 0210406 into keras-team:main Feb 27, 2025
5 checks passed
@hertschuh hertschuh deleted the deep_rec_example branch February 27, 2025 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants