Skip to content

Segment Expansion for New Feature Generation

Zhang Pengshan (David) edited this page Aug 9, 2019 · 4 revisions

What is Segment Expansion?

This feature is from our internal users. Segment is some kind of segmentation in total population of training data, take a city column as example, different city can be set as different segment.

Segment expansion is that some features may only have effect in some segment. For example, a numerical column may not be workable for total population, but for one segment it can define a good new feature.

How to Use It?

By define such configuration in ModelConfig.json.

 "dataSet" : {
   "dataPath" : "...",
   ...
   "segExpressionFile" : "columns/segments.file",
   ...
  }

'segExpressionFile' defines a file name under local 'columns' folder. In this file each segment is defined as a line:

population=='1' or population=='2'
num_txns>5

Each line will be treated as one segment, detailed filter definition can be found in Filter Logic. With raw variable which is in total population, after stats run, you will found 3 times features in ColumnConfig.json. For example, if you have 1000 raw features, you will have final 1000+2000 features.

Such new features in ColumnConfig.json will be applied in norm, train, varselect, eval steps like a normal feature for further usage.

Do Stats in Different Tag Column

column_a=='1'|||new_tag_column||1||0

Such configuration is used to support segment expansion on different tag column, here 'new_tag_column' is new tag instead of default tag column in ModelConfig.json. '1' and '0' are postive tag and negative tag. How to support multiple values in positive tag? Check below 1|2 means 1 and 2 are bothe positive tags.

column_a=='1'|||new_tag_column||1|2||0

PMML Support

Such segment model in Neural Network of Shifu is also supported to be exported as PMML format. In PMML, transform logic will reference feature as input even in segment expansion features. Just use 'shifu export' without change segment.expansion.file definition.

Clone this wiki locally