You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The project has a lot of potential, as it seems to one of the more rich datasets and a very interesting topic. The preliminary analysis of the dataset also shows that the dataset is relatively easy to work with, sample points are treated as all or nothing with complete or incomplete features and no corruption of the data. It also looks like decent progress has been made on the project with the forest classifier having a 12% error.
Where I think the report is lacking is in the explanation of decisions made for the project. For example it's unclear why the two models of One-vs-Rest and Forest Classifier were used. I don't believe they were covered in the course and even so, there should be some explanation as to why the model would work for this problem. Another example is the following: "we used a total of 158 features after preprocessing, which required dropping feature columns which represented one level of a particular categorical variable". Why did you decide to drop some columns of a specific category? And where do the 158 features come from? In each survey there are 35 questions (with multiple parts) so do they amount to 158? If not, mention which features are dropped and why.
The text was updated successfully, but these errors were encountered:
The project has a lot of potential, as it seems to one of the more rich datasets and a very interesting topic. The preliminary analysis of the dataset also shows that the dataset is relatively easy to work with, sample points are treated as all or nothing with complete or incomplete features and no corruption of the data. It also looks like decent progress has been made on the project with the forest classifier having a 12% error.
Where I think the report is lacking is in the explanation of decisions made for the project. For example it's unclear why the two models of One-vs-Rest and Forest Classifier were used. I don't believe they were covered in the course and even so, there should be some explanation as to why the model would work for this problem. Another example is the following: "we used a total of 158 features after preprocessing, which required dropping feature columns which represented one level of a particular categorical variable". Why did you decide to drop some columns of a specific category? And where do the 158 features come from? In each survey there are 35 questions (with multiple parts) so do they amount to 158? If not, mention which features are dropped and why.
The text was updated successfully, but these errors were encountered: