10 hrs live hackathon held at ZS Office amongst the Top 40 Emerging Data Scientists of the country.
A missing value imputation problem where each row has some missing value and each column has plenty of missing values.
Problem Description : (Customer targeting Problem) An pharma company XYZ, based in the United States, has several products in the various therapy areas. It markets these products to physicians all over the country through various channels -
Sales representative (Medical rep - Rep_Live) Remote representative (Medical rep via video conference – Rep_Remote) Peer to Peer marketing programs (such as conferences, etc. – P2P) Digital message (such as mobile messages – Digital_Push) Online video (such as online video, etc – Digital_Pull) Direct mail (Snail mail – Direct_Mail) Online advertisement (Such as website banner – Digital_Pull) Digital Email (such as Email – Digital_Push) However, it’s current marketing efforts are not proving to be very successful – Physicians are not engaging with the content delivered to them through these various channels. XYZ now wants to improve its targeting strategy by analyzing the “affinity” of each physician to the above channels. “Affinity” data has been collected for various physicians through historical channel interactions with doctors and every doctor has been given an affinity rating for each of the above 8 channels on a scale of 0 – 1 however lot of HCPs do not have all channel information.Develop an approach to assign affinity for every tactic of HCP in the “SUBMISSION” file. Data : all data(id,different channels affinity, region(urban or rural)) , demographic data(physicians info like age, gender)
Evaluation Metric : RMSE (Root Mean Squared Error)