Russian version: README_RU.md (might be stale).
Table of Contents (created by gh-md-toc):
RFMizer is a Python script that takes a complete log of users' orders exported from CRM system and outputs user ID to RFMxyz segments mapping and RFMxyz segments to bid multipliers mapping.
Users' orders log is a plain text CSV file exported from advertiser's CRM system on daily basis. Each line of this file consists of three mandatory fields and arbitrary number of optional fields. Mandatory fields are:
- order_date — date of an order made by the user
- user_id — an internal ID of a user in advertiser's CRM system
- order_value — monetary value of the order
Optional fields can be used to expose additional dimensions' segments corresponding to the user. That where "xyz" part of "RFMxyz" comes from.
Example:
order_date,user_id,order_value,geo
2016-01-27,274223,389.34,1
2016-01-27,826746,1743.00,5
2016-01-29,734242,87.05,7
2016-02-05,274223,52.83,1
...
- The file should be encoded in UTF-8.
- There should be NO byte order marker (BOM) at the beginning of file.
- Field values should be separated with commas
,
. - Dot
.
or comma,
should be used as a decimal point in numbers. - Double quotes
"
in field values should be doubled, i.e."
replaced with""
. - Field values with commas
,
, double quotes"
or newlines should be enclosed in double quotes"
.
Required data fields:
order_date
— order date;user_id
— impersonal unique customer ID;order_value
— order price.
Optional data fields can be used to specify segments of additional dimensions, e.g. customer's geo zone, discount program, and so on. To be considered as additional dimensions, these optional data fields should be listed in configuration file under input_columns
section.
User ID to RFMxyz segments mapping generated by RFMizer script is a plain text CSV file. Each line of this file consists of four mandatory fields and arbitrary number of optional fields.
Mandatory fields are:
user_id
— an internal ID of a user in advertiser's CRM systemrecency
— segment of the last purchase recency dimension corresponding to the userfrequency
— segment of the orders frequency dimension corresponding to the usermonetary
— segment of the monetary lifetime value dimension corresponding to the user
Optional fields are used for additional dimensions' segments corresponding to the user.
Example:
user_id,recency,frequency,monetary,geo
274223,1,3,4,1
826746,2,2,1,5
734242,4,1,2,7
...
RFMxyz segments to bid multipliers mapping generated by RFMizer script is a plain text CSV file. Each line of this file consists of four mandatory fields and arbitrary number of optional fields.
Mandatory fields are:
recency
— segment of the last purchase recency dimensionfrequency
— segment of the orders frequency dimensionmonetary
— segment of the monetary lifetime value dimensionbid_ratio
— bid multiplier corresponding to the users with recency, frequency and monetary segments specified
Optional fields are used for additional dimensions' segments.
Example:
recency,frequency,monetary,geo,bid_ratio
1,1,1,1,0.89
2,3,4,5,1.15
5,2,5,3,5.76
...
It's a plain text CSV file with metrics' values of borders between segments of frequency, monetary, and recency dimensions.
Example:
dimension,segment,border
frequency,1,2
frequency,2,3
frequency,3,4
frequency,4,6
monetary,1,23.7
monetary,2,35.95
monetary,3,51.0
monetary,4,82.0
recency,1,-252
recency,2,-192
recency,3,-137
recency,4,-80
- Python 2.7 or Python 3.5
pyyaml
installedfuture
installed
You can use pip to install pyyaml
and
future
:
sudo pip install --user pyyaml future
or
pip install --user pyyaml future
python rfmizer.py [--log-level LOG_LEVEL] config-file input-file
Required arguments:
Optional arguments:
--log-level LOG_LEVEL
— RFMizer log verbosity level; possible values:CRITICAL
;ERROR
;WARNING
(default verbosity level);INFO
;DEBUG
.
python rfmizer.py -h
shows help message:
usage: rfmizer.py [-h] [--log-level LOG_LEVEL] config-file input-file
positional arguments:
config-file configuration file
input-file input data file
optional arguments:
-h, --help show this help message and exit
--log-level LOG_LEVEL
logging level, defaults to WARNING
Examples:
python rfmizer.py config.yaml orders.csv
python rfmizer.py --log=INFO config.yaml orders.csv
python rfmizer.py -h
RFMizer configuration file example:
input_columns:
- order_date
- user_id
- order_value
- geo_code
segments_count:
recency: 5
frequency: 5
monetary: 5
rfmizer:
look_back_period: 365
output_columns:
user_id: user_id
recency: recency
frequency: frequency
monetary: monetary
geo_code: geo_code
predictor:
prediction_period: 182
output_path: .
output_file_prefix: some_unique_name
RFMizer configuration options description:
Section | Option | Value |
---|---|---|
input_columns |
List of order log columns that has to be taken into account. Columns order_date , user_id and order_value are mandatory and they must be listed here. All other columns specified are being considered as additional dimensions. |
|
segments_count |
recency |
Number of segments to use for recency dimension (recency of last purchase). |
segments_count |
frequency |
Number of segments to use for frequency dimension (frequency of purchases). |
segments_count |
monetary |
Number of segments to use for monetary dimension (monetary life-time value). |
rfmizer |
look_back_period |
Duration of time span used to segment the users. Specified as number of days. |
rfmizer |
output_columns |
Dictionary that maps dimension names to column names in output file with user ID to segments mapping. Mapping for user_id , recency , frequency , monetary , and all additional dimensions must be specified. |
predictor |
prediction_period |
Duration of time span used to retrospectively predict expected value of each segment of users. Specified as number of days. |
output_path |
Directory path to save output files to. | |
output_file_prefix |
Unique string for output filenames to start with. Can be used to distinguish output files generated with different configuration files. |