-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
28 lines (19 loc) · 1.45 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
NOTE: See Project Report for theory and implementation details.
This repository contains scripts and data for machine learning project by Sindhuula Selvaraju and Zachary Silver.
This repository contains 2 directories:
1. Data : Raw and parsed data files
2. Code : Scripts to parse and classify data and compute prediction accuracy
Usage:
parse_data.py : python parse_data.py <input_file> <output_file> <stopword_file>
assign_feature_weights.py : python assign_feature_weights.py <input_file> <output_file> <words_file>
Points to note:
1. The preprocessing/parsing of our raw data involves removing frequently occuring words and phrases and making sure keywords necessary for the classification get a higher weight
2. The feature vector is formd similar to the way it waas formed in our homeworks but instead of integer our features are strings
3. We're still using the model and predict files to keep track of possible points of failure.
5. Our neural network uses some number of stacked "perceptrons." Each
perceptron represents its input as a vector and its output as a number.
Depending on the number of nodes in our hidden layers, we will create
multiple perceptrons. For example, if we have 10 nodes in a hidden layer,
each of those nodes will be the output of 10 perceptrons that represent
the previous layers. These outputs will then be the inputs for a final
perceptron that gives the final output value.