Skip to content

ML Models

Robert McMahan edited this page Dec 5, 2023 · 4 revisions

Introduction

A new feature has been added to the UI under ML Models. This feature is a comprehensive, user-friendly solution that enables the seamless construction and implementation of manual machine learning (ML) model solutions for lead generation. It allows for easy customization and integration with GA4, Google Ads, and 1PD. By accurately predicting lead scores right at the moment when a customer signs up or fills out a form, this feature effectively tackles the significant challenge of prolonged conversion lags for LG. Furthermore, the latest version of this feature goes beyond lead score predictions, offering the added capabilities of profit and LTV prediction. This expanded functionality enhances the versatility and potential applications of this solution.

Prerequisites / Required Setup

Google Cloud Project

  • See the Cost Calculator (pre-filled with CRMint cloud infrastructure) for estimates on how much this could cost.
  • Must create or have access to an existing Google Cloud project.
  • Must create or have an existing organization setup within Google Cloud.
    • To check if organization is set up and associated with the current project:

      • Open the resource manager: https://console.cloud.google.com/cloud-resource-manager
      • Your project should be listed under an organization here
        (“Organization” in the example below will be replaced with your organization’s name and “Project” will be replaced with your project’s name).
    • If you have an existing project that needs to be added to an existing or newly setup organization click here for more info on how to do this.

First Party Data (If You Plan to Use A First Party Data Source)

  • A first party table must exist (can be in any dataset and may be named anything, but must exist in the same project this solution is deployed).
  • This first party table must have either a user id or user pseudo id column (the column can be named anything, but one of these must exist).
    • This is how the data will be joined to GA4 event data.
  • This first party table must have a trigger event date (date of the action i.e. form submission date) column. This field can be named anything, but must exist and be a datetime, date, or timestamp.
  • If planning to build a pLTV model then the first party table must have a first value column (e.g. the customer’s first purchase). This field can be named anything, but must exist and be a float, decimal, or integer.

Google Analytics (Web Data Streams Only)

  • BigQuery export must already be set up within Google Analytics UI and must have at least 2 months of data (ideally 3+ months). The BigQuery dataset where this is exported must be in the same Google Cloud Project where CRMint is installed.
    • Admin (Bottom Left Gear Icon) ▶️ Product Links ▶️ BigQuery Links
  • Must have or be able to create a Measurement Protocol API secret for the associated data stream.
    • Admin (Bottom Left Gear Icon) ▶️ Data Streams ▶️ Select your Stream ▶️ Measurement Protocol API Secrets ▶️ Create
  • Must have or be able to acquire the measurement ID for the associated data stream.
    • Admin (Bottom Left Gear Icon) ▶️ Data Streams ▶️ Select your Stream ▶️ Measurement ID
  • If planning to join first party data on user_id (selecting user_id as the unique identifier in the UI) then this same user_id needs to be present and associated with each event in your GA4 data.
  • If planning to join first party data on user_pseudo_id (selecting client_id as the unique identifier in the UI) then this same user_pseudo_id must be present in the first party table.

Google Ads

Install CRMint

Follow the installation guide for CRMint. It will step you through the entire setup process and offer a link at the end which will allow you to access the application UI.

Things to note:

  • You need to be Google Cloud Project Owner to successfully install it.
  • If you stumble upon any issues with this process, please check the issues page first.

Post-Installation (Configuration & Cleanup)

  1. If you plan to output the results of the Model to Google Analytics 4 Measurement Protocol as an Event:
    • Access the Settings tab within the UI and add the necessary Google Analytics 4 API secret and Google Analytics Measurement ID (see above for how to acquire these) then save.
  2. If you plan to use Google Analytics 4 event data as a data source for the model (as model variables - features/label):
    • Access the Settings tab within the UI and add the necessary Google Analytics 4 BigQuery Dataset then save. This BigQuery dataset must be in the same Google Cloud Project where this solution is installed (see above for how to set up the export).
  3. If you plan to output the results of the Model to Google Ads as Offline Click Conversions:
    • Access the Settings tab within the UI and add the necessary Client ID, Client Secret, Google Ads Developer Token, and Google Ads Refresh Token (see above for how to acquire these) then save.
  4. Must create a dataset (in the same project CRMint was installed) where model, training dataset, output, and (if using it) the first party table will/should live.
  5. Make sure the CRMint Controller service account has access to view (BigQuery Data Viewer) both the GA4 events dataset and the dataset created in step 4. Also make sure the CRMint Jobs service account has access to edit (BigQuery Data Editor) the dataset created in step 4.
  6. Run the command crmint cloud url from the Cloud Shell to get the URL to access the UI (should you forget it at any point).

UI Walkthrough

Create a New ML Model

  1. Select ML Models tab.

  2. Select New Model.

  3. Enter the model name (strictly used to give meaning to this configuration).

  4. Enter the dataset name (where the first_party table, mentioned above in the prerequisites, is created and where the BQML model and associated predictions and output will live).

  5. Specify your data source (GA4 data only, First Party Data only, or a combination of the two).

  6. Specify your training and predictive timespans (how much data measured in days to use for each step).

    • For example the training timespan could by 30 and predictive timespan 1 day. In this case the generated model will consider a timespan that starts 33 days ago and ends 3 days ago for the training and when predicting it will use data from 2 days ago to yesterday. This ensures that the correct amount of data is used (per your configuration) and, crucially, that the two datasets don’t overlap.
  7. Select the User Identifier (what the model should use to join the two data sources and/or what it should use to group the data).

    • If using first_party as a data source:
      • If selecting User ID both the first_party table and the GA4 events need to have user_id populated.
      • If selecting Client ID the first_party table needs to contain a user_pseudo_id.
  8. Select a Type (this is the type of model and at first only a couple of options are available).

  9. A list of available hyper-parameters will be presented. These can be removed (by unchecking them) or altered by selecting a new option in the dropdowns or grabbing the sliders and adjusting.

  10. Select Fetch within the green box next to Variables.

    • This will use the BigQuery dataset and table provided under Input Configuration to locate the first party table and use the GA4 configuration under Settings to locate the GA4 event data in BigQuery. It will show a list of variables for selection (feature/label/etc) based on GA4 events existing in BigQuery within the timespan you selected for training and will list the columns within the first party table as well.
  11. Select the role for each variable that matters for the model.

    • Variable Descriptions:
      • Client ID / User ID - The user identifier to use to join the GA4 and 1P data together.
      • Feature - What to provide to the model that might be a determining factor for the outcome you're trying to predict. This is the input feature that the model will use.
      • Label - The outcome you’re interested in predicting.
      • First Value (Regression Models Only) - The first purchase/subscription/etc that has a monetary value you want to exclude from the label.
      • Trigger Date (First Party Variables Only) - Only consider GA4 events that took place up to or prior to this date. (Example: inquiry date)
      • Trigger Event (Classification Models Only) - Only consider GA4 events that took place up to or prior to the date associated with the first of this event for a given user identifier. (Example: website form submission date)
  12. Specify the number of segments to use when building out predicted conversion rate ranges. This approach is best practice in assigning a conversion rate based on where the prediction falls in a range.

  13. Adjust the class imbalance ratio (how much the data is expected to lean towards no purchase, no enroll, no subscribe, etc).

  14. Enter the fields necessary for the output configuration (where the data should be sent and how to send it to that destination).

    • In the case of a classification type model: specify the average conversion value. This average conversion value will be used by the model as a way to generate a conversion value from a predicted conversion rate.

View/Edit an Existing Model

  1. Select the model you wish to view from the list by clicking the name.
  2. Let’s explore this page a little.
    • The name of the model is in the top left corner.
    • Notice the Edit option in the top right corner.
    • This can be used to modify the configuration of this model. Modifying any model will delete the existing pipelines and re-create them with the updated settings (including any altered SQL).
    • The tabs under this (Training and Predictive) denote the different pipelines that were generated.
    • Under the Training tab there’s a training pipeline listed which can be selected to view, edit, run, or schedule. More on that later.
    • Under the Predictive tab there’s a predictive pipeline listed, but this tab will have more components to it as the predictive pipeline has more steps. At the bottom are additional SQL which represent different steps in the pipeline.
  3. Once a pipeline is selected by clicking on the name the following page will be presented. Let’s explore this.
    • In the top left is the generated name of the pipeline.
    • In the top right there are options to Run the pipeline, Edit it, or Add a Job to it.
      • Run manually triggers the pipeline to run once.
      • Edit allows for updating the schedule, setting it to run on a schedule, etc.
      • Add Job allows for adding additional steps to the pipeline if necessary.
    • The tabs allow for viewing different aspects of the pipeline.
      • Pipeline tab gives a view of the jobs/steps and the order they execute.
      • Jobs tab gives a more detailed view of the jobs and includes things like status, its dependencies, and when the last activity was for it.
      • Logs tab gives a view of any log messages generated when running the jobs in the pipeline. This is where errors and more detailed status messages will be displayed.
    • Selecting a step under the Pipeline tab will bring you to an edit page allowing you to modify the step/job and associated SQL script.