From 30f0c20ccdbda929ff084ab2a6e614848c176a22 Mon Sep 17 00:00:00 2001
From: darrylong <darrylong@users.noreply.github.com>
Date: Wed, 18 Dec 2024 22:19:18 +0800
Subject: [PATCH] Add Model Ensembling Tutorial (#640)

* add model ensembling tutorial

* update ensembling notebook

* refractor codes

* Revamped tutorial to include simple borda count

* Restructured model ensembling tutorial

* Update tutorial result representation

* Update tutorial

* Update tutorial

* Update model ensembling tutorial based on feedback

* Update linear regression/random forest inference data set

* WIP Initial experimental calculation

* preliminary calculation of experimental comparison

* Update recall@K and precision@K evaluation

* Add borda count, enhanced wmf to comparison

* Revised model ensembling tutorial

* Fix bug

* Revised markdown and description of tutorial

* Enhance tutorial content

* Simplify introduction

* Enhance tutorial

* Update tutorial

* Updated model ensembling tutorial

* Optimize inference code

* Optimize inference code

---------

Co-authored-by: tqtg <tuantq.vnu@gmail.com>
---
 tutorials/model_ensembling.ipynb | 1125 ++++++++++++++++++++++++++++++
 1 file changed, 1125 insertions(+)
 create mode 100644 tutorials/model_ensembling.ipynb
diff --git a/tutorials/model_ensembling.ipynb b/tutorials/model_ensembling.ipynb
new file mode 100644
index 00000000..0e93babc
--- /dev/null
+++ b/tutorials/model_ensembling.ipynb
@@ -0,0 +1,1125 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "c9004b98-e34b-4ce3-bfb3-bb2ddba0f125",
+   "metadata": {},
+   "source": [
+    "*Copyright (c) Cornac Authors. All rights reserved.*\n",
+    "\n",
+    "*Licensed under the Apache 2.0 License.*\n",
+    "\n",
+    "# Model Ensembling"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "faed055c",
+   "metadata": {},
+   "source": [
+    "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
+    "  <td>\n",
+    "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/PreferredAI/cornac/blob/master/tutorials/model_ensembling.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
+    "  </td>\n",
+    "  <td>\n",
+    "    <a target=\"_blank\" href=\"https://github.com/PreferredAI/cornac/blob/master/tutorials/model_ensembling.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
+    "  </td>\n",
+    "</table>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c98761e-37fc-4407-b451-d59a68aecd83",
+   "metadata": {},
+   "source": [
+    "This Jupyter Notebook shows how to combine multiple recommendation models using the Cornac library. Ensembling is a technique where we combine predictions from different models to get more accurate results. By using this method, we can improve the performance of a recommendation system.\n",
+    "\n",
+    "### What You'll Learn\n",
+    "This tutorial is divided into five parts:\n",
+    "\n",
+    "1. **Introduction**.\n",
+    "We’ll start with a simple experiment using the **BPR** and **WMF** models and explore the dataset.\n",
+    "2. **Simple Model Ensembling**.\n",
+    "Learn how to combine BPR and WMF predictions using a method called **Borda Count**.\n",
+    "3. **Further Ensembling**.\n",
+    "Create variations of the WMF model and ensemble their predictions.\n",
+    "4. **Ensembling with Regression Models**.\n",
+    "Use **linear regression** and **random forest regression** from `scikit-learn` to combine WMF models.\n",
+    "5. **Further Evaluation**.\n",
+    "Evaluate the ensemble models to see how they perform compared to individual models."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8bd6262f",
+   "metadata": {},
+   "source": [
+    "**Note:** Part of this notebook (in Section 4) uses the `scikit-learn` package. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82295d13-46da-4e8e-a052-b420beb969e8",
+   "metadata": {},
+   "source": [
+    "## 1. Introduction\n",
+    "<a id='introduction'></a>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50e0c21e",
+   "metadata": {},
+   "source": [
+    "In this section, we’ll run a basic experiment with the **BPR** (Bayesian Personalized Ranking) and **WMF** (Weighted Matrix Factorization) models to see how they work. We’ll also look at the dataset to understand its structure and distribution."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db09b809",
+   "metadata": {},
+   "source": [
+    "### 1.1 Install required dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89ef2804",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install cornac, tensorflow (required for WMF) and scikit-learn\n",
+    "! pip install cornac==2.2.2 tensorflow==2.18 scikit-learn"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bd266ee7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import logging\n",
+    "\n",
+    "# Disable all CUDA logging\n",
+    "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'\n",
+    "logging.getLogger('tensorflow').setLevel(logging.ERROR)\n",
+    "\n",
+    "# Import necessary libraries and functions\n",
+    "from IPython.display import display\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "from tqdm import tqdm\n",
+    "\n",
+    "from cornac.datasets import movielens\n",
+    "from cornac.models import BPR, WMF\n",
+    "from cornac.eval_methods import RatioSplit\n",
+    "from cornac.metrics import Precision, Recall\n",
+    "from cornac.utils import cache\n",
+    "from cornac import Experiment\n",
+    "\n",
+    "from sklearn import linear_model\n",
+    "from sklearn.ensemble import RandomForestRegressor"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5636df0-91c3-4b73-8c60-26d75b9bd6f6",
+   "metadata": {},
+   "source": [
+    "### 1.2 Loading Dataset\n",
+    "\n",
+    "First, we load the **MovieLens 100K** dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "92a57076",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data = movielens.load_feedback(variant=\"100K\") # Load MovieLens Dataset\n",
+    "\n",
+    "rs = RatioSplit(data, test_size=0.2, rating_threshold=4.0, seed=42, verbose=True) # Split to train-test set to 80-20\n",
+    "train_set, test_set = rs.train_set, rs.test_set"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d37011e-1384-42cb-8ea0-4667784f952a",
+   "metadata": {},
+   "source": [
+    "### 1.3 Training BPR and WMF models\n",
+    "\n",
+    "We will train two models: \n",
+    "\n",
+    "1. **BPR (Bayesian Personalized Ranking)**\n",
+    "2. **WMF (Weighted Matrix Factorization)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ea466b90",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "bpr_model = BPR(k=10, max_iter=100, learning_rate=0.01, lambda_reg=0.001, seed=123) # Initialize BPR model\n",
+    "wmf_model = WMF(k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123) # Initialize WMF model\n",
+    "\n",
+    "models = [bpr_model, wmf_model]\n",
+    "metrics = [Precision(k=100), Recall(k=100)] # Set metrics for experiment\n",
+    "\n",
+    "experiment = Experiment(rs, models, metrics, user_based=True).run() # Run Experiment to compare BPR model to WMF model individually"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6d915d7",
+   "metadata": {},
+   "source": [
+    "Comparing **Precision** and **Recall**, both **BPR** and **WMF** are providing comparable results.\n",
+    "\n",
+    "Let's move on to try to interpret these results by using the genres of movies that were recommended to us."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0fcc4831",
+   "metadata": {},
+   "source": [
+    "### 1.4 Interpreting Results"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0be074f7",
+   "metadata": {},
+   "source": [
+    "##### 1.4.1 Creating a Movie Genre Dataframe"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6687eea0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Creating a dataframe of movies with its corresponding genres\n",
+    "\n",
+    "# Download some information of MovieLens 100K dataset\n",
+    "item_df = pd.read_csv(\n",
+    "  cache(\"http://files.grouplens.org/datasets/movielens/ml-100k/u.item\"), \n",
+    "  sep=\"|\", encoding=\"ISO-8859-1\",\n",
+    "  names=[\"ItemID\", \"Title\", \"Release Date\", \"Video Release Date\", \"IMDb URL\", \n",
+    "         \"unknown\", \"Action\", \"Adventure\", \"Animation\", \"Children's\", \"Comedy\", \n",
+    "         \"Crime\", \"Documentary\", \"Drama\", \"Fantasy\", \"Film-Noir\", \"Horror\", \n",
+    "         \"Musical\", \"Mystery\", \"Romance\", \"Sci-Fi\", \"Thriller\", \"War\", \"Western\"]\n",
+    ").set_index(\"ItemID\").drop(columns=[\"Video Release Date\", \"IMDb URL\", \"unknown\"])\n",
+    "\n",
+    "item_idx2id = train_set.item_ids # mapping between item index and origial film ID \n",
+    "user_idx2id = train_set.user_ids # mapping between user index and origial user ID\n",
+    "\n",
+    "# Let's take a look at an example of this dataframe\n",
+    "display(item_df.head(3))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a26d684",
+   "metadata": {},
+   "source": [
+    "The `item_df` dataframe consists of all movie items with its corresponding genre attributes.\n",
+    "\n",
+    "Further down below, we are going to filter this table with the recommendations that we get from the recommender system models we created to get a better sense."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b476a6f8",
+   "metadata": {},
+   "source": [
+    "##### 1.4.2 Creating Training Data Dataframe\n",
+    "\n",
+    "To get a sense of what data has been inserted into our model for training, let's count the genres of the training data used to train the model.\n",
+    "\n",
+    "But first, let's create a `training_data_df` dataframe with all training data.\n",
+    "\n",
+    "The training data consists of 80000 triplets of **User Index**, **Item Index** and **Rating** rows as seen in the dataset summary in Section 1.2."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "139fc938",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's view a sample of the training data dataframe\n",
+    "print(\"Sample row of record:\")\n",
+    "print(\"(user_index, item_index, rating):\", list(zip(*train_set.uir_tuple))[0])\n",
+    "\n",
+    "# Create a training data dataframe\n",
+    "training_data_df = pd.DataFrame(zip(*train_set.uir_tuple)) # adding all training data into dataframe\n",
+    "training_data_df.columns = ['user_idx', 'item_idx', 'rating'] # adding column names to the data\n",
+    "\n",
+    "# Add new column, 'item_id', for further filtering in later sections\n",
+    "training_data_df['item_id'] = training_data_df.apply(lambda row: item_idx2id[int(row['item_idx'])], axis=1) # converted from the item index field"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2a2f5733",
+   "metadata": {},
+   "source": [
+    "##### 1.4.3 Filtering Training Data\n",
+    "\n",
+    "Let's filter based on a particular user to learn more about the user.\n",
+    "\n",
+    "We set ``UIDX`` to user index **3**, and ``TOPK`` to **100**, to get the top 100 recommendations in each model for comparison."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3c91c6ae",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's define the user index and top-k movies to be recommended\n",
+    "UIDX = 3\n",
+    "TOPK = 100\n",
+    "\n",
+    "# Positively rated items by a user (rating >= 4.0 as rating_threshold used earlier, and user index = UIDX)\n",
+    "positively_rated_items = training_data_df[\n",
+    "    (training_data_df['rating'] >= 4.0) & (training_data_df['user_idx'] == UIDX)\n",
+    "]['item_id'].unique()\n",
+    "filter_df = item_df.loc[[int(item_id) for item_id in positively_rated_items]] # get genres of movie items\n",
+    "\n",
+    "print(\"Number of movies:\", len(filter_df)) # Number of movies positvely rated by user index 3 in training data\n",
+    "\n",
+    "# Group by Movie Genre and Sum by genres\n",
+    "filter_df = filter_df.select_dtypes(np.number).sum() \n",
+    "filter_df = filter_df.to_frame(\"Sum\") # Let's call that column 'Sum'\n",
+    "\n",
+    "# Add a new column '%' for the percentage of individual genre sum compared to total sum\n",
+    "filter_df[\"%\"] = filter_df[\"Sum\"] / filter_df[\"Sum\"].sum() * 100\n",
+    "filter_df[\"%\"] = filter_df[\"%\"].round(1)\n",
+    "\n",
+    "# Let's see the training data genres, sums and percentages\n",
+    "print(\"Positively rated movies by user index 3 in training data\")\n",
+    "display(filter_df.sort_values(\"Sum\", ascending=False)[:10])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d700c1c1",
+   "metadata": {},
+   "source": [
+    "As shown above in the training data, the top genres for user index 3 with positively rated movies include 'Drama', 'Comedy', 'Romance', 'Action' and 'Thriller'.\n",
+    "\n",
+    "Let's now compare them to the recommendations of the BPR and WMF models respectively."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d242944c",
+   "metadata": {},
+   "source": [
+    "##### 1.4.4 Interpreting Recommendations of BPR, WMF Models"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "72759171",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get the Top 5 Genres in filtered training data for user index 3\n",
+    "top_genres = filter_df.sort_values(\"Sum\", ascending=False).head(5).index.tolist()\n",
+    "print(\"\\nTop 5 Genres in training data:\", top_genres)\n",
+    "\n",
+    "# Get top K recommendations for BPR and put them into the genre dataframe\n",
+    "bpr_recommendations, bpr_scores = bpr_model.rank(UIDX) # rank recommendations by score, limit to top K\n",
+    "bpr_recommendations = bpr_recommendations[:TOPK] # limit to top K\n",
+    "bpr_topk = [item_idx2id[iidx] for iidx in bpr_recommendations] # convert item indexes into item ids\n",
+    "bpr_df = item_df.loc[[int(iid) for iid in bpr_topk]] # filter the movie genre dataframe by item ids\n",
+    "\n",
+    "# Let's view the top recommendations for BPR by top genres\n",
+    "display(\"BPR: Top recommendations\", bpr_df[[\"Title\"] + top_genres].head(10))\n",
+    "\n",
+    "# Now, let's do likewise for WMF - get top K recommendations and put them into the genre dataframe\n",
+    "wmf_recommendations, wmf_scores = wmf_model.rank(UIDX) # rank recommendations by score\n",
+    "wmf_recommendations = wmf_recommendations[:TOPK] # limit to top K\n",
+    "wmf_topk = [item_idx2id[iidx] for iidx in wmf_recommendations] # convert item indexes into item ids\n",
+    "wmf_df = item_df.loc[[int(iid) for iid in wmf_topk]] # filter the movie genre dataframe by item ids\n",
+    "\n",
+    "# View the top recommendations for WMF\n",
+    "display(\"WMF: Top recommendations\", wmf_df[[\"Title\"] + top_genres].head(10))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed2f0abc",
+   "metadata": {},
+   "source": [
+    "Now that we have seen the top recommendations of the BPR and WMF models, let's do a comparison by taking a look at the genre distribution."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f02bf37",
+   "metadata": {},
+   "source": [
+    "##### 1.4.5 Comparing Models by Genre Distribution"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "283ca840",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's introduce `combined_df` for comparison.\n",
+    "# This dataframe will be used to compare models by summing up genres from recommendations of different models\n",
+    "combined_df = pd.DataFrame({\n",
+    "    \"Train Data %\": filter_df[\"%\"],\n",
+    "    \"BPR Sum\": bpr_df.select_dtypes(np.number).sum(), # group by genres, then get sum of each genre\n",
+    "    \"WMF Sum\": wmf_df.select_dtypes(np.number).sum() # likewise for WMF\n",
+    "})\n",
+    "\n",
+    "# Get percentages of movie genre sums\n",
+    "combined_df['BPR %'] = combined_df['BPR Sum'] / TOPK * 100 \n",
+    "combined_df[\"WMF %\"] = combined_df[\"WMF Sum\"] / TOPK * 100\n",
+    "\n",
+    "combined_df = combined_df.round(1) # round all \n",
+    "combined_df = combined_df.sort_values(\"Train Data %\", ascending=False)\n",
+    "\n",
+    "# Let's take a look at the genre distribution by percentages\n",
+    "display(\"Train Data to Recommended % Distribution\", combined_df[['BPR %', 'WMF %']][:10])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c30fe92b",
+   "metadata": {},
+   "source": [
+    "Now that we have seen the distribution of individual models, we are curious about what kind of distribution we will get from ensembling these models.\n",
+    "\n",
+    "Let's see what happens when we ensemble these two models. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7cce45f7",
+   "metadata": {},
+   "source": [
+    "## 2. Simple Model Ensembling\n",
+    "\n",
+    "In this section, we’ll combine the predictions from the **BPR** and **WMF** models using a method called **Borda Count**.\n",
+    "\n",
+    "### What is Borda Count?\n",
+    "\n",
+    "Borda Count is a simple ranking method that assigns points based on an item’s rank in each model. Higher-ranked items get more points. We then add up the points from all models to get a combined ranking.\n",
+    "\n",
+    "**Example**:\n",
+    "1. Each model ranks items from 1 to 5.\n",
+    "2. Items earn points based on their rank (e.g., 1st place gets 4 points, 2nd gets 3 points, etc.).\n",
+    "3. We sum the points for each item across all models.\n",
+    "4. The item with the highest total points becomes the top recommendation.\n",
+    "\n",
+    "Here’s a sample ranking for a user:\n",
+    "\n",
+    "| Rank | Model 1 | Model 2 | Model 3 | Points (5 - rank) |\n",
+    "|------|---------|---------|---------|-------------------|\n",
+    "| 1    | A       | D       | E       | 4                 |\n",
+    "| 2    | B       | C       | A       | 3                 |\n",
+    "| 3    | C       | A       | B       | 2                 |\n",
+    "| 4    | D       | B       | D       | 1                 |\n",
+    "| 5    | E       | E       | C       | 0                 |\n",
+    "\n",
+    "**Borda Count Result**:\n",
+    "\n",
+    "| Item | Total Points |\n",
+    "|------|--------------|\n",
+    "| A    | 9            |\n",
+    "| B    | 6            |\n",
+    "| C    | 5            |\n",
+    "| D    | 6            |\n",
+    "| E    | 4            |\n",
+    "\n",
+    "**Final Ranking: A > B, D > C > E**\n",
+    "\n",
+    "Now, let’s implement this method!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b349407b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's create a new dataframe to calculate ranking and borda count\n",
+    "rank_df = pd.DataFrame({\n",
+    "    \"ItemID\": item_idx2id,\n",
+    "})\n",
+    "\n",
+    "total_items = len(rank_df) # 1651 items\n",
+    "\n",
+    "# Obtain points (inverse of rank) of the items based on the BPR score\n",
+    "rank_df[\"BPR Score\"] = bpr_scores\n",
+    "rank_df[\"BPR Rank\"] = rank_df[\"BPR Score\"].rank(ascending=False).astype(int) # Get Rank where 1 = Top recommendation\n",
+    "rank_df[\"BPR Points\"] = total_items - rank_df[\"BPR Rank\"] # Get points by calculating ('Total Item count' - 'Rank')\n",
+    "\n",
+    "# Do likewise for WMF\n",
+    "rank_df[\"WMF Score\"] = wmf_scores\n",
+    "rank_df[\"WMF Rank\"] = rank_df[\"WMF Score\"].rank(ascending=False).astype(int) # Get Rank where 1 = Top recommendation\n",
+    "rank_df[\"WMF Points\"] = total_items - rank_df[\"WMF Rank\"] # Get points by calculating ('Total Item count' - 'Rank')\n",
+    "\n",
+    "# Get Borda Count by summing up points of BPR and WMF\n",
+    "rank_df[\"Borda Count\"] = rank_df[\"BPR Points\"] + rank_df[\"WMF Points\"]\n",
+    "rank_df[\"Borda Rank\"] = rank_df[\"Borda Count\"].rank(ascending=False).astype(int) # Get Rank where 1 = Top recommendation\n",
+    "\n",
+    "# Round decimal places for readability purposes\n",
+    "rank_df = rank_df.round(3)\n",
+    "rank_df.sort_values(\"Borda Rank\", inplace=True)\n",
+    "\n",
+    "# Now let's take a look at the table with Borda Count \n",
+    "display(rank_df[[\"ItemID\", \"BPR Rank\", \"WMF Rank\", \"Borda Rank\"]].head(5))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57994d68",
+   "metadata": {},
+   "source": [
+    "The top recommendation, **ItemID 313**, was ranked **8th** by BPR and **1st** by WMF. Similarly, the second recommendation, **ItemID 739**, was ranked **7th** by BPR and **11th** by WMF.\n",
+    "\n",
+    "This demonstrates how ensembling allows us to leverage the strengths of multiple models to produce a more balanced recommendation.\n",
+    "\n",
+    "---\n",
+    "\n",
+    "Next, we’ll incorporate the recommendations into the genre distribution dataframe to compare their performance against the individual base models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ac86f568",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "UIDX = 3\n",
+    "TOPK = 100\n",
+    "\n",
+    "borda_count_topk = rank_df[\"ItemID\"].values[:TOPK] # Get top K (100) Item IDs\n",
+    "\n",
+    "borda_df = item_df.loc[[int(i) for i in borda_count_topk]] # Filter genre data frame by the top item IDs\n",
+    "\n",
+    "# Add Borda Count results into 'combined_df' dataframe for comparison\n",
+    "combined_df[\"Borda Count Sum\"] = borda_df.select_dtypes(np.number).sum() # group by genre, and calculate sum of each genre\n",
+    "combined_df[\"BPR + WMF Borda Count %\"] = combined_df[\"Borda Count Sum\"] / TOPK * 100 # Calculate percentage of sum to total\n",
+    "combined_df[\"BPR + WMF Borda Count %\"] = combined_df[\"BPR + WMF Borda Count %\"].round(1) # rounding for readability purposes\n",
+    "\n",
+    "# Let's take a look at the genre distribution of train data, BPR, WMF and the newly added Borda Count\n",
+    "display(\"BPR + WMF Borda Count Recommendations Distribution\", combined_df[[\"BPR %\", \"WMF %\", \"BPR + WMF Borda Count %\"]][:10])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a55bb6e",
+   "metadata": {},
+   "source": [
+    "As Borda Count is a combination of both BPR and WMF models, the distributions are expected to be influenced by both models.\n",
+    "\n",
+    "In the next section, we will further add more models to the ensemble."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "142db187",
+   "metadata": {},
+   "source": [
+    "## 3. Further Ensembling\n",
+    "\n",
+    "In this step, we enhance our ensemble by creating variations of the **WMF** model and combining their predictions using Borda Count. Each variation introduces slight adjustments, such as changes in parameters, to capture different perspectives of the dataset.\n",
+    "\n",
+    "Think of choosing a movie with friends, where each person has a slightly different taste. By considering everyone’s preferences, you make a decision that satisfies the group. As with any statistical learning models, there could be some variance in the model trained with different seeds or hyperparameters. By ensembling these models, we can reduce the variance and improve the overall performance. By introducing variations of the WMF model, we obtain a more balanced and robust recommendation.\n",
+    "\n",
+    "### Approach:\n",
+    "1. **Different Random Seeds**:  \n",
+    "   Train multiple models with different random seeds (e.g., `seed=123`). This variation captures different nuances, as some models may perform better for certain users than others.\n",
+    "   \n",
+    "2. **Varying Number of Latent Factors**:  \n",
+    "   Adjust the number of latent factors (`k`). By changing `k`, the models can capture diverse aspects of the data, providing a broader view of the underlying patterns.\n",
+    "\n",
+    "Let’s implement this by training several WMF models with different seeds and latent factor values, then ensemble them using Borda Count to improve the overall recommendation performance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5ce879a6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# WMF models with different seeds\n",
+    "wmf_model_123 = WMF(name=\"WMF_123\", k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123)\n",
+    "wmf_model_456 = WMF(name=\"WMF_456\", k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=456)\n",
+    "wmf_model_789 = WMF(name=\"WMF_789\", k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=789)\n",
+    "wmf_model_888 = WMF(name=\"WMF_888\", k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=888)\n",
+    "wmf_model_999 = WMF(name=\"WMF_999\", k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=999)\n",
+    "# WMF models with different number of latent factors\n",
+    "wmf_model_k20 = WMF(name=\"WMF_k20\", k=20, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123)\n",
+    "wmf_model_k30 = WMF(name=\"WMF_k30\", k=30, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123)\n",
+    "wmf_model_k40 = WMF(name=\"WMF_k40\", k=40, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123)\n",
+    "wmf_model_k50 = WMF(name=\"WMF_k50\", k=50, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123)\n",
+    "\n",
+    "models = [wmf_model_123, wmf_model_456, wmf_model_789, wmf_model_888, wmf_model_999, wmf_model_k20, wmf_model_k30, wmf_model_k40, wmf_model_k50]\n",
+    "\n",
+    "metrics = [Precision(k=100), Recall(k=100)] # The same metrics as before\n",
+    "\n",
+    "# Let's run an experiment to take a look at how different these models are, with just different random seeds!\n",
+    "experiment = Experiment(rs, models, metrics, user_based=True).run()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce487e7d",
+   "metadata": {},
+   "source": [
+    "Based on the results, we can see that even within the same model, the results can vary. \n",
+    "\n",
+    "Let's try ensembling all these models together into 1 single model by Borda Count, and look at its recommendations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8e745118",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's create a different dataframe to calculate ranking and borda count\n",
+    "rank_2_df = pd.DataFrame({\n",
+    "    \"ItemID\": item_idx2id,\n",
+    "})\n",
+    "\n",
+    "# Add a column named 'Ensembled WMF Model'\n",
+    "rank_2_df[\"WMF Family Borda Count\"] = 0\n",
+    "\n",
+    "# Calculate the points (inverse of rank) for each of the models and accumulate them into the 'WMF Borda Count' column\n",
+    "# We use the same formula as the 'Borda Count' calculation\n",
+    "for model in models:\n",
+    "    name = model.name\n",
+    "    recommendations, scores = model.rank(UIDX)\n",
+    "    rank_2_df[name + \"_score\"] = scores\n",
+    "    rank_2_df[name + \"_rank\"] = rank_2_df[name + \"_score\"].rank(ascending=False).astype(int)\n",
+    "    rank_2_df[name + \"_points\"] = total_items - rank_2_df[name + \"_rank\"]\n",
+    "    rank_2_df[\"WMF Family Borda Count\"] = rank_2_df[\"WMF Family Borda Count\"] + rank_2_df[name + \"_points\"]\n",
+    "\n",
+    "# Let's sort and view the top recommendations!\n",
+    "display(\"Top 10 Recommendations for WMF Borda Count\", rank_2_df[[\"ItemID\", \"WMF Family Borda Count\"]].sort_values(\"WMF Family Borda Count\", ascending=False).head(10))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8224e10e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Now, let's add them to the combined dataframe for comparison with earlier models\n",
+    "wmf_borda_count_topk = rank_2_df.sort_values(\"WMF Family Borda Count\", ascending=False)[\"ItemID\"].values[:TOPK]\n",
+    "wmf_borda_df = item_df.loc[[int(i) for i in wmf_borda_count_topk]]\n",
+    "\n",
+    "combined_df[\"WMF Family Borda Count Sum\"] = wmf_borda_df.select_dtypes(np.number).sum()\n",
+    "combined_df[\"WMF Family Borda Count %\"] = combined_df[\"WMF Family Borda Count Sum\"] / TOPK * 100\n",
+    "combined_df[\"WMF Family Borda Count %\"] = combined_df[\"WMF Family Borda Count %\"].round(1)\n",
+    "\n",
+    "# Let's compare the recommendation distribution\n",
+    "display(\"Combined Recommendations Distribution\", combined_df[[\"WMF %\", \"BPR + WMF Borda Count %\", \"WMF Family Borda Count %\"]][:10])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e13bdce6",
+   "metadata": {},
+   "source": [
+    "Comparing the results of the WMF Borda Count model, we can see that the different random seed initializations, along with the different number of latent factors, have influenced the recommendations.\n",
+    "\n",
+    "-------\n",
+    "\n",
+    "Now that we have touched on borda count methods, let's see how we could use other methods and popular packages such as **scikit-learn** to do advanced model ensembling."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c286769",
+   "metadata": {},
+   "source": [
+    "## 4. Ensembling with Regression Models\n",
+    "\n",
+    "In this step, we’ll explore ensembling using **linear regression** and **random forest regression**. These methods allow us to model the relationship between the predictions of multiple models and the actual outcomes, resulting in a more adaptive and accurate ensemble.\n",
+    "\n",
+    "### Why Use Regression Models?\n",
+    "\n",
+    "- **Linear Regression**:  \n",
+    "  A simple and interpretable approach, best suited when the relationship between model predictions and true values is linear.\n",
+    "  \n",
+    "- **Random Forest Regression**:  \n",
+    "  A more flexible method that captures non-linear relationships and complex interactions, making it well-suited for diverse and intricate datasets.\n",
+    "\n",
+    "These regression-based methods go beyond basic averaging by adapting to patterns in the data, potentially improving prediction accuracy.\n",
+    "\n",
+    "### Approach\n",
+    "\n",
+    "This process can be seen as a meta-learning problem. Here’s how it works:\n",
+    "\n",
+    "We use the predictions of the base models (WMF Variations) as features, and a meta-learner (such as Linear Regression or Random Forest) is trained to make the final prediction. This framework allows flexibility to experiment with various machine learning models, including Linear Regression, Random Forest, Gradient Boosting, or even Neural Networks.\n",
+    "\n",
+    "Let’s begin by training a **Linear Regression** model to combine the predictions from the WMF variations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f96a39db",
+   "metadata": {},
+   "source": [
+    "##### 4.1 Prepare Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "352624c0-059d-4909-8d54-3d718bfc375c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# First, lets create training and test data dataframes\n",
+    "training_df = pd.DataFrame(zip(*train_set.uir_tuple)) # Add 'User Index', 'Item Index', 'Rating' triples as records in dataframe\n",
+    "training_df.columns = ['user_idx', 'item_idx', 'rating'] # Set column names\n",
+    "\n",
+    "# Get all possible user_index, item_index combinations, add them into dataframe for inference\n",
+    "all_df = pd.DataFrame({\n",
+    "    \"user_idx\": [user_idx for user_idx in range(train_set.num_users) for _ in range(train_set.num_items)],\n",
+    "    \"item_idx\": [item_idx for _ in range(train_set.num_users) for item_idx in range(train_set.num_items)],\n",
+    "})\n",
+    "all_df['item_id'] = all_df.apply(lambda row: item_idx2id[int(row['item_idx'])], axis=1) # Add 'Item ID' column into dataframe by converting 'Item Index' to 'Item ID'\n",
+    "\n",
+    "# Lets get all the scores for the models trained in Part 3.\n",
+    "models = [wmf_model_123, wmf_model_456, wmf_model_789, wmf_model_888, wmf_model_999, wmf_model_k20, wmf_model_k30, wmf_model_k40, wmf_model_k50]\n",
+    "\n",
+    "# For each model, we add individual predicted ratings by individual models to training and test dataframes\n",
+    "for model in tqdm(models):\n",
+    "    name = model.name\n",
+    "\n",
+    "    # Group by user_idx and apply score function to each group\n",
+    "    def score_items(group):\n",
+    "        return pd.Series(model.score(int(group.name))[group['item_idx'].values], index=group.index)\n",
+    "    \n",
+    "    training_df[name + \"_score\"] = training_df.groupby(\"user_idx\").apply(score_items, include_groups=False).reset_index(level=0, drop=True) # for training\n",
+    "    all_df[name + \"_score\"] = all_df.groupby(\"user_idx\").apply(score_items, include_groups=False).reset_index(level=0, drop=True) # for inference\n",
+    "\n",
+    "# Let's pick out the 5 features - predicted ratings from the 5 models trained\n",
+    "X_train = training_df[['WMF_123_score', 'WMF_456_score', 'WMF_789_score', 'WMF_888_score', 'WMF_999_score', 'WMF_k20_score', 'WMF_k30_score', 'WMF_k40_score', 'WMF_k50_score']] # use these predicted ratings as features\n",
+    "y_train = training_df['rating'] # use ground truth to train this linear regression model\n",
+    "X_inference = all_df[['WMF_123_score', 'WMF_456_score', 'WMF_789_score', 'WMF_888_score', 'WMF_999_score', 'WMF_k20_score', 'WMF_k30_score', 'WMF_k40_score', 'WMF_k50_score']] # all data, used to predict values for ranking\n",
+    "\n",
+    "display(\"Training features\", X_train.head(3)) # predicting ratings as features\n",
+    "display(\"Target values\", y_train.head(3)) # ground truth ratings\n",
+    "display(\"Inference Data\", X_inference.head(3)) # all inference data "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3fc8a785",
+   "metadata": {},
+   "source": [
+    "Now that we have already prepared the data for fitting into a **scikit-learn** model, let's first try to train a Linear Regression model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a75e842",
+   "metadata": {},
+   "source": [
+    "##### 4.2 Fitting Linear Regression Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "16a564bd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "UIDX = 3\n",
+    "TOPK = 100\n",
+    "\n",
+    "# Let's now fit into a Linear Regression model\n",
+    "regr = linear_model.LinearRegression(fit_intercept=False) # force model to only use predictions from WMF models\n",
+    "regr.fit(X_train, y_train) # train the model\n",
+    "\n",
+    "# Input: 9 base model predicted ratings. Output: final predicted rating based on linear regression\n",
+    "y_pred = regr.predict(X_inference) # Get predictions based on trained model\n",
+    "\n",
+    "all_df[\"WMF Linear Regression\"] = y_pred # create a column in `test_df` for the predictions\n",
+    "\n",
+    "# Get Top K ratings from predictions\n",
+    "sorted_df = all_df.sort_values(\"WMF Linear Regression\", ascending=False) # sort by predicted ratings\n",
+    "top_item_ids = sorted_df[sorted_df['user_idx'] == UIDX]['item_id'].values[:TOPK] # filter top K (50 as set in Section 2.3)\n",
+    "\n",
+    "# Place them into the comparison distribution dataframe\n",
+    "linear_regression_df = item_df.loc[[int(i) for i in top_item_ids]] # Get genres of ratings\n",
+    "combined_df[\"WMF Linear Regression Sum\"] = linear_regression_df.select_dtypes(np.number).sum() # group by genre and sum them up\n",
+    "combined_df[\"WMF Linear Regression %\"] = combined_df[\"WMF Linear Regression Sum\"] / TOPK * 100 # get percentages of (genre sum / whole sum)\n",
+    "\n",
+    "combined_df[\"WMF Linear Regression %\"] = combined_df[\"WMF Linear Regression %\"].round(1) # round values for readability\n",
+    "\n",
+    "print(\"Coefficients of the linear regression model\")\n",
+    "print(regr.coef_) # coefficients of the linear regression model\n",
+    "print(regr.intercept_) # intercept of the linear regression model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "048c684f",
+   "metadata": {},
+   "source": [
+    "Coefficients of the Linear Regression model indicate the contributions of each base model in the ensemble.\n",
+    "\n",
+    "We have successfully trained a **Linear Regression** model using the predictions from the 9 WMF base models, which included variations with different seeds and latent factors.\n",
+    "\n",
+    "Next, let's proceed to train a **Random Forest Regressor** model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6bed0bc-f6c1-43fc-9c4b-acb56bcae554",
+   "metadata": {},
+   "source": [
+    "##### 4.3 Fitting the Random Forest Model\n",
+    "\n",
+    "We will use the same training data to fit a **Random Forest Regressor** model.\n",
+    "\n",
+    "While we are using a Random Forest in this example, we also have the option to experiment with other models, such as Gradient Boosting and others, to see how they perform."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0fe095ce",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "UIDX = 3\n",
+    "TOPK = 100\n",
+    "\n",
+    "# Let's now train a Random Forest model\n",
+    "randomforest_model = RandomForestRegressor(n_estimators=50, max_depth=2, random_state=42) \n",
+    "randomforest_model.fit(X_train, y_train) # Train the model\n",
+    "\n",
+    "# Input: 5 base model predicted ratings. Output: final predicted rating based on random forest\n",
+    "y_pred = randomforest_model.predict(X_inference)\n",
+    "\n",
+    "all_df[\"WMF Random Forest\"] = y_pred # create a column in `all_df` for the predictions\n",
+    "\n",
+    "# Get Top K ratings from predictions\n",
+    "sorted_df = all_df.sort_values(\"WMF Random Forest\", ascending=False) # sort by predicted ratings\n",
+    "top_item_ids = sorted_df[sorted_df['user_idx'] == UIDX]['item_id'].values[:TOPK] # filter top K (50 as set in Section 2.3)\n",
+    "\n",
+    "# Place them into the comparison distribution dataframe\n",
+    "random_forest_df = item_df.loc[[int(i) for i in top_item_ids]] # Get genres of ratings\n",
+    "combined_df[\"WMF Random Forest Sum\"] = random_forest_df.select_dtypes(np.number).sum() # group by genre and sum them up\n",
+    "combined_df[\"WMF Random Forest %\"] = combined_df[\"WMF Random Forest Sum\"] / TOPK * 100 # get percentages of (genre sum / whole sum)\n",
+    "\n",
+    "combined_df[\"WMF Random Forest %\"] = combined_df[\"WMF Random Forest %\"].round(1) # round values for readability\n",
+    "\n",
+    "# Now let's take a look at how the genre distribution is\n",
+    "display(\"Combined Recommendations Distribution\", combined_df[[\"WMF %\", \"WMF Family Borda Count %\", \"WMF Linear Regression %\", \"WMF Random Forest %\"]][:10])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fa25cc1",
+   "metadata": {},
+   "source": [
+    "\n",
+    "\n",
+    "We have also successfully trained a **Random Forest Regressor** model using the predictions from the 9 WMF base models, which included variations with different seeds and latent factors.\n",
+    "\n",
+    "The distribution of the results indicates that these ensemble models leveraged the base model predictions in different ways to generate the final predictions.\n",
+    "\n",
+    "---\n",
+    "\n",
+    "In the next section, we will compare the results of the various models to evaluate their performance."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6ac3d4e",
+   "metadata": {},
+   "source": [
+    "## 5. Further Evaluation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d02a252",
+   "metadata": {},
+   "source": [
+    "In the beginning, we have split the dataset into training and testing sets. Now, we will evaluate the performance of the ensemble models using **Precision@100** and **Recall@100** metrics.\n",
+    "\n",
+    "We will use the test set to evaluate the models. \n",
+    "\n",
+    "### 5.1 Preparing the Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "65b85d45-55a5-4805-9471-4e5ba97264ed",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rank_df = pd.DataFrame({\n",
+    "    \"user_idx\": all_df[\"user_idx\"],\n",
+    "    \"item_idx\": all_df[\"item_idx\"],\n",
+    "})\n",
+    "\n",
+    "total_items = train_set.num_items # 1651 items\n",
+    "\n",
+    "models_to_calculate = [bpr_model, wmf_model, wmf_model_123, wmf_model_456, wmf_model_789, wmf_model_888, wmf_model_999, wmf_model_k20, wmf_model_k30, wmf_model_k40, wmf_model_k50]\n",
+    "\n",
+    "# Calculate points for each model using the Borda count process.\n",
+    "# Take note that points should be calculated on a per user basis.\n",
+    "for model in tqdm(models_to_calculate):\n",
+    "    name = model.name\n",
+    "    \n",
+    "    # Group by user_idx and apply score function to each group\n",
+    "    def score_items(group):\n",
+    "        return pd.Series(model.score(int(group.name))[group['item_idx'].values], index=group.index)\n",
+    "    \n",
+    "    rank_df[name + \"_score\"] = rank_df.groupby(\"user_idx\").apply(score_items, include_groups=False).reset_index(level=0, drop=True)\n",
+    "\n",
+    "    # Calculate ranks and points for all users at once\n",
+    "    rank_df[name + \"_rank\"] = rank_df.groupby(\"user_idx\")[name + \"_score\"].rank(ascending=False, method='min').astype(int)\n",
+    "    rank_df[name + \"_points\"] = total_items - rank_df[name + \"_rank\"] + 1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "04f63b15",
+   "metadata": {},
+   "source": [
+    "This is how you calculate Borda Count scores for all users.\n",
+    "\n",
+    "Once we have the scores calculated, we will sum them up according to the Borda Count formula outlined in Sections 2 and 3.\n",
+    "\n",
+    "**BPR + WMF Borda Count**:  \n",
+    "To clarify, our basic Borda Count model includes the **BPR Model** and the **WMF Model**.\n",
+    "\n",
+    "**WMF Family Borda Count**:  \n",
+    "The `WMF Family Borda Count` model, on the other hand, consists of multiple variations:\n",
+    "- Models initialized with different random seeds: **wmf_model_123**, **wmf_model_456**, **wmf_model_789**, **wmf_model_888**, and **wmf_model_999**.\n",
+    "- Models with different latent factors: **wmf_model_k20**, **wmf_model_k30**, **wmf_model_k40**, and **wmf_model_k50**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "240fec5f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "borda_count_models = [bpr_model, wmf_model]\n",
+    "rank_df[\"BPR + WMF Borda Count\"] = rank_df[[model.name + \"_points\" for model in borda_count_models]].sum(axis=1) # Sum up points of BPR and WMF\n",
+    "\n",
+    "wmf_borda_count_models = [wmf_model_123, wmf_model_456, wmf_model_789, wmf_model_888, wmf_model_999, wmf_model_k20, wmf_model_k30, wmf_model_k40, wmf_model_k50]\n",
+    "rank_df[\"WMF Family Borda Count\"] = rank_df[[model.name + \"_points\" for model in wmf_borda_count_models]].sum(axis=1) # Sum up points of all WMF models\n",
+    "\n",
+    "# Now, lets add them into the `all_df` dataframe for comparison\n",
+    "all_df.sort_values(by=[\"user_idx\", \"item_idx\"], inplace=True) # ensure that the dataframe is sorted by user index and item index\n",
+    "\n",
+    "all_df[\"BPR_score\"] = rank_df[\"BPR_score\"].values\n",
+    "all_df[\"WMF_score\"] = rank_df[\"WMF_score\"].values\n",
+    "\n",
+    "all_df[\"BPR + WMF Borda Count\"] = rank_df[\"BPR + WMF Borda Count\"].values\n",
+    "all_df[\"WMF Family Borda Count\"] = rank_df[\"WMF Family Borda Count\"].values"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b728ef78",
+   "metadata": {},
+   "source": [
+    "Now that we have all model scores in the same table. Let's calculate the same **Precision@K** and **Recall@K** values as run in the experiments.\n",
+    "\n",
+    "We do this by manually calculating recall values with the respective formulas.\n",
+    "\n",
+    "### 5.2 Results for Borda Count of BPR and WMF\n",
+    "\n",
+    "We calculate the **Precision@100** and **Recall@100** values for the BPR + WMF Borda Count model, which combines the BPR and WMF models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "916b390b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "models = [\"BPR_score\", \"WMF_score\", \"BPR + WMF Borda Count\"]\n",
+    "\n",
+    "result_data = {\n",
+    "    \"Metrics\": [\"Precision@100\", \"Recall@100\"],\n",
+    "}\n",
+    "\n",
+    "test_users = set(test_set.uir_tuple[0])\n",
+    "for model in tqdm(models):\n",
+    "    sorted_df = all_df.sort_values(model, ascending=False) # sort by predicted ratings\n",
+    "    precisions, recalls = [], []\n",
+    "    \n",
+    "    for uidx in test_users:\n",
+    "        true_top_k = test_set.user_data[uidx][0] # ground truth data\n",
+    "        predicted_top_k = sorted_df[sorted_df['user_idx'] == uidx]['item_idx'].values[:TOPK].astype(int)\n",
+    "        # Precision@K\n",
+    "        precision = len(set(true_top_k) & set(predicted_top_k)) / len(predicted_top_k)\n",
+    "        precisions.append(precision)\n",
+    "        # Recall@K\n",
+    "        recall = len(set(true_top_k) & set(predicted_top_k)) / len(true_top_k)\n",
+    "        recalls.append(recall)\n",
+    "        \n",
+    "    result_data[model] = [np.mean(precisions), np.mean(recalls)]\n",
+    "    # result_df[f\"Recall@{TOPK}\"].append(np.mean(recalls))\n",
+    "\n",
+    "# Now let's take a look at the results\n",
+    "result_df = pd.DataFrame(result_data)\n",
+    "\n",
+    "display(\"Base BPR and Base WMF in comparison with BPR + WMF Borda Count\", result_df[[\"Metrics\", \"BPR_score\", \"WMF_score\", \"BPR + WMF Borda Count\"]])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "581d7584",
+   "metadata": {},
+   "source": [
+    "We observe better recall performance in Borda Count compared to the individual models."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0fb0e717",
+   "metadata": {},
+   "source": [
+    "### 5.3 Results for WMF Related Models\n",
+    "\n",
+    "We calculate the **Precision@100** and **Recall@100** values for the WMF related models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7188ca7d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "models = [\"WMF Family Borda Count\", \"WMF Linear Regression\", \"WMF Random Forest\"]\n",
+    "\n",
+    "result_data = {\n",
+    "    \"Metrics\": [\"Precision@100\", \"Recall@100\"],\n",
+    "}\n",
+    "\n",
+    "test_users = set(test_set.uir_tuple[0])\n",
+    "for model in tqdm(models):\n",
+    "    sorted_df = all_df.sort_values(model, ascending=False) # sort by predicted ratings\n",
+    "    precisions, recalls = [], []\n",
+    "    \n",
+    "    for uidx in test_users:\n",
+    "        true_top_k = test_set.user_data[uidx][0] # ground truth data\n",
+    "        predicted_top_k = sorted_df[sorted_df['user_idx'] == uidx]['item_idx'].values[:TOPK].astype(int)\n",
+    "        # Precision@K\n",
+    "        precision = len(set(true_top_k) & set(predicted_top_k)) / len(predicted_top_k)\n",
+    "        precisions.append(precision)\n",
+    "        # Recall@K\n",
+    "        recall = len(set(true_top_k) & set(predicted_top_k)) / len(true_top_k)\n",
+    "        recalls.append(recall)\n",
+    "        \n",
+    "    result_data[model] = [np.mean(precisions), np.mean(recalls)]\n",
+    "    # result_df[f\"Recall@{TOPK}\"].append(np.mean(recalls))\n",
+    "\n",
+    "# Now let's take a look at the results\n",
+    "result_df = pd.DataFrame(result_data)\n",
+    "\n",
+    "display(\"WMF Models Comparison\", result_df[[\"Metrics\", \"WMF Family Borda Count\", \"WMF Linear Regression\", \"WMF Random Forest\"]])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42885a8b",
+   "metadata": {},
+   "source": [
+    "However, we also observe that performance varies, and may not always provide an improvement over the individual models.\n",
+    "\n",
+    "One of the other ways that could be explored will be to create an new ensemble, utilizing the many different base models that Cornac supports.\n",
+    "\n",
+    "During the development of these models, we find that there are many ways to experiment about to improve the models. However, there is also a risk of overfitting the model to the training data.\n",
+    "\n",
+    "It is important to evaluate the models on the test set to ensure that they generalize well to unseen data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da02e512-2e13-43f5-8bfa-c0f14bb44fbe",
+   "metadata": {},
+   "source": [
+    "## 6. Conclusion\n",
+    "\n",
+    "Our results show that there’s no one-size-fits-all solution.\n",
+    "\n",
+    "### Which models and configurations perform best?\n",
+    "\n",
+    "Testing multiple models and ensemble techniques helps find the best approach for each dataset. While ensembling can improve accuracy, results will depend on how well models complement each other.\n",
+    "\n",
+    "- **Try Different Base Models**: Cornac offers a variety of models; experimenting with each helps reveal what works best.\n",
+    "- **Adjust Model Parameters**: Tuning settings can optimize individual models and enhance ensemble performance.\n",
+    "\n",
+    "### Is Ensembling Always Better?\n",
+    "\n",
+    "- **Performance vs. Resources**: Ensembles often require more computation, so it’s important to balance resource use with performance gains.\n",
+    "- **Know When Not to Ensemble**: In some cases, a single well-tuned model may work as well as, or even better than, an ensemble.\n",
+    "\n",
+    "These questions guide future experiments as we continue experimenting towards better recommender systems."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "base",
+   "language": "python",
+   "name": "base"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}