Skip to content

Slony/rfmizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RFMizer

Russian version: README_RU.md (might be stale).

Table of Contents (created by gh-md-toc):

About RFMizer

RFMizer is a Python script that takes a complete log of users' orders exported from CRM system and outputs user ID to RFMxyz segments mapping and RFMxyz segments to bid multipliers mapping.

Orders log

Users' orders log is a plain text CSV file exported from advertiser's CRM system on daily basis. Each line of this file consists of three mandatory fields and arbitrary number of optional fields. Mandatory fields are:

  • order_date — date of an order made by the user
  • user_id — an internal ID of a user in advertiser's CRM system
  • order_value — monetary value of the order

Optional fields can be used to expose additional dimensions' segments corresponding to the user. That where "xyz" part of "RFMxyz" comes from.

Example:

order_date,user_id,order_value,geo
2016-01-27,274223,389.34,1
2016-01-27,826746,1743.00,5
2016-01-29,734242,87.05,7
2016-02-05,274223,52.83,1
...

Orders log file format

  • The file should be encoded in UTF-8.
  • There should be NO byte order marker (BOM) at the beginning of file.
  • Field values should be separated with commas ,.
  • Dot . or comma , should be used as a decimal point in numbers.
  • Double quotes " in field values should be doubled, i.e. " replaced with "".
  • Field values with commas ,, double quotes " or newlines should be enclosed in double quotes ".

Orders log data fields

Required data fields:

  1. order_date — order date;
  2. user_id — impersonal unique customer ID;
  3. order_value — order price.

Optional data fields can be used to specify segments of additional dimensions, e.g. customer's geo zone, discount program, and so on. To be considered as additional dimensions, these optional data fields should be listed in configuration file under input_columns section.

RFMizer output files

User ID to RFMxyz mapping

User ID to RFMxyz segments mapping generated by RFMizer script is a plain text CSV file. Each line of this file consists of four mandatory fields and arbitrary number of optional fields.

Mandatory fields are:

  • user_id — an internal ID of a user in advertiser's CRM system
  • recency — segment of the last purchase recency dimension corresponding to the user
  • frequency — segment of the orders frequency dimension corresponding to the user
  • monetary — segment of the monetary lifetime value dimension corresponding to the user

Optional fields are used for additional dimensions' segments corresponding to the user.

Example:

user_id,recency,frequency,monetary,geo
274223,1,3,4,1
826746,2,2,1,5
734242,4,1,2,7
...

RFMxyz to bid multipliers mapping

RFMxyz segments to bid multipliers mapping generated by RFMizer script is a plain text CSV file. Each line of this file consists of four mandatory fields and arbitrary number of optional fields.

Mandatory fields are:

  • recency — segment of the last purchase recency dimension
  • frequency — segment of the orders frequency dimension
  • monetary — segment of the monetary lifetime value dimension
  • bid_ratio — bid multiplier corresponding to the users with recency, frequency and monetary segments specified

Optional fields are used for additional dimensions' segments.

Example:

recency,frequency,monetary,geo,bid_ratio
1,1,1,1,0.89
2,3,4,5,1.15
5,2,5,3,5.76
...

RFM segments' borders

It's a plain text CSV file with metrics' values of borders between segments of frequency, monetary, and recency dimensions.

Example:

dimension,segment,border
frequency,1,2
frequency,2,3
frequency,3,4
frequency,4,6
monetary,1,23.7
monetary,2,35.95
monetary,3,51.0
monetary,4,82.0
recency,1,-252
recency,2,-192
recency,3,-137
recency,4,-80

System requirements

  • Python 2.7 or Python 3.5
  • pyyaml installed
  • future installed

You can use pip to install pyyaml and future:

sudo pip install --user pyyaml future

or

pip install --user pyyaml future

Using RFMizer

python rfmizer.py [--log-level LOG_LEVEL] config-file input-file

Required arguments:

  • config-file — configuration file in Yaml format
  • input-file — orders log file in CSV format

Optional arguments:

  • --log-level LOG_LEVEL — RFMizer log verbosity level; possible values:
    • CRITICAL;
    • ERROR;
    • WARNING (default verbosity level);
    • INFO;
    • DEBUG.

python rfmizer.py -h shows help message:

usage: rfmizer.py [-h] [--log-level LOG_LEVEL] config-file input-file

positional arguments:
  config-file           configuration file
  input-file            input data file

optional arguments:
  -h, --help            show this help message and exit
  --log-level LOG_LEVEL
                        logging level, defaults to WARNING

Examples:

python rfmizer.py config.yaml orders.csv
python rfmizer.py --log=INFO config.yaml orders.csv
python rfmizer.py -h

Configuration file

RFMizer configuration file example:

input_columns:
  - order_date
  - user_id
  - order_value
  - geo_code
segments_count:
  recency: 5
  frequency: 5
  monetary: 5
rfmizer:
  look_back_period: 365
  output_columns:
    user_id: user_id
    recency: recency
    frequency: frequency
    monetary: monetary
    geo_code: geo_code
predictor:
  prediction_period: 182
output_path: .
output_file_prefix: some_unique_name

RFMizer configuration options description:

Section Option Value
input_columns List of order log columns that has to be taken into account. Columns order_date, user_id and order_value are mandatory and they must be listed here. All other columns specified are being considered as additional dimensions.
segments_count recency Number of segments to use for recency dimension (recency of last purchase).
segments_count frequency Number of segments to use for frequency dimension (frequency of purchases).
segments_count monetary Number of segments to use for monetary dimension (monetary life-time value).
rfmizer look_back_period Duration of time span used to segment the users. Specified as number of days.
rfmizer output_columns Dictionary that maps dimension names to column names in output file with user ID to segments mapping. Mapping for user_id, recency, frequency, monetary, and all additional dimensions must be specified.
predictor prediction_period Duration of time span used to retrospectively predict expected value of each segment of users. Specified as number of days.
output_path Directory path to save output files to.
output_file_prefix Unique string for output filenames to start with. Can be used to distinguish output files generated with different configuration files.

About

RFM variables calculation tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages