-
Notifications
You must be signed in to change notification settings - Fork 1
Methodology : Naive Bayes Classifier
AutoGrade aims to assign a grade to each essay by evaluating certain attributes of that essay, i.e., if we treat each grade as a class, then the entire problem of assigning a grade to an essay boils down to classifying the essay to an appropriate class. To handle this classification problem, we make use of the Naive Bayes Classifier.
The Naive Bayes Classifier (NBC) strives to classify each new record (in our case, an essay) to a particular class (with a certain confidence level) based on the data that it has been trained with. Mathematically, it works as following:
{A1,...,An} is the attribute set; C is a class from the set of all classes;
Now, the probability of an essay being classified to a class C given the values of the attributes of that essay is:
P(Class = C|A1=a1, A2=a2,...,An=an) = {P(A1=a1,...,An=an|C) x P(C)}/P(A1=a1,...,An=an) ... (Using Bayes Theorem)
We need to find a class C for which maximises the above equation. Thus, we will have to compute the above equation for all possible classes. As P(A1=a1,...,An=an) will be constant for all such computations, we can ignore it. Thus, we need to find the C that maximises:
P(A1=a1,...,An=an|C) x P(C)
Assuming that the attributes are independent of each other, we can rewrite this equation as
P(A1=a1|C) x P(A2=a2|C) x ... x P(An=an|C) x P(C)
Here, P(C) = Number of times a record is classified as C in training data/ Total number of records in training data. We can calculate P(Ai=ai|C) from the training data as well.