Skip to content

Latest commit

 

History

History
20 lines (14 loc) · 1.95 KB

README.md

File metadata and controls

20 lines (14 loc) · 1.95 KB

insuranceSalesPredictor

The feature of interest is whether or not a customer buys insurance, based on socio-demographic factors and ownership of other insurance policies; and to build profile of a typical customer.

As a strategic consultant to a well-known company with diversified business and a presence in different industries and geographical markets. You are asked to analyse the customer dataset of an insurance company. The Dataset was supplied by a European data mining company. It is a real world business problem. The feature of interest is whether or not a customer buys a caravan insurance. Information about customers consists of 86 variables: • 43 socio-demographic variables derived via the customer's ZIP area code, • 43 variables about ownership of other insurance policies.

Size: o 9822 records: 5822 training records and 4000 test records o 86 attributes

Task: The data analytical task was to predict whether a customer is interested in the insurance policy from the data. First some exploratory data analysis was done and visualising the data gave much insight into certain particularities of this dataset. Then the data was prepared for data mining. It was important to select the right features, and to construct new features from the existing ones. Different data mining algorithms in SPSS Modeller and R were used. Second task was to derive information about the profile of a typical insurance buyer to give a clear insight to why customers have the company's policy and how these customers are different from other customers.

Challenges: Feature selection and extraction were very important. The data was noisy and supremely unbalanced.

The analysis, the critical arguments and conclusions contained in the deliverables were supported by evidence. The descriptions and interpretation were comprehensible, useful and actionable for a marketing professional with no prior knowledge of technical jargon.

++ (add CLV about the discussion and calculation)