Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ziyi Chen's review on final report. #79

Open
changy12 opened this issue Dec 7, 2016 · 0 comments
Open

Ziyi Chen's review on final report. #79

changy12 opened this issue Dec 7, 2016 · 0 comments

Comments

@changy12
Copy link

changy12 commented Dec 7, 2016

About introduction:
It is an interesting and meaningful research question. I appreciate that you identified the relevant areas like voice recognition, which exceeds the range this course. I appreciate your courage to take the challenge.

About Dataset Description:
I appreciate your effort in merging different datasets, your insight in and explanation of many 0’s.

About Problem Description:
The problem is clear and to the point.
You could notice that confidence of an algorithm measures the variation rather than accuracy of prediction. Accuracy is more important and with high accuracy, we can compare these algorithms based on confidence.

About Exploratory Data Analysis:
The procedure is very clear, especially when you gave an example.
The classifiers with only one feature is interesting and useful for preliminary exploration.
You said “This average is considered as the segregation point between each of the genders in the test set.” The segregation point should be computed on training dataset and then its performance is tested on test dataset.
You may also consider other segregation methods like the mean of the means.
You’d better give reference for pocket learning.
Logistic regression is in fact for classification problem, i. e., the outcome variable is categorical. In binary classification, translate the outcome variable into 1 or -1 rather than log-transform.

About Model Analysis:
The success rates are really high. However, if there is significant imbalance between the 2 classes, success rate can be misleading. For example, if there are 98% males (females) in the test dataset, then the success rate can be 98% even if the classifier predicts all voices to be male (female). In this case, you could try some other measures like F1 score.
You said PCA improved the success rate of SVM, did the improved success rate exceed 0.981? You could list these success rates with PCA as well. From the information you provided, I cannot infer PCA is not helpful.

In general:
This paper proposed an interesting and meaningful research question, conducted abundant preliminary analysis and tried a large number of classifiers. The writing is clear and straightforward.
You could have better understanding of some topics in classification, such as logistic regression and unbalanced classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant