-
Notifications
You must be signed in to change notification settings - Fork 1
6. Test Data
There are a couple of simple networks with which you can experiment. Try setting up your own networks using different numbers of hidden nodes, different training sets, different starting points, learning rate parameters etc. I have provided two MS Excel (gasp!) spreadsheets with which you can generate as much test data as you want. Run the Macro “Export Training Set” to generate 1000 training pairs in your current directory. Append them together (removing all but the first “networkTopology” clause) to make a bigger set.
To get a better idea of the training times involved, save the training results and load them into a spreadsheet. From there you can plot the network's performance over the training set. === The Binary Relationship Exclusive OR (XOR) === The binary relationship Exclusive OR (XOR) is a classic neural network problem since it is a very simple demonstration of the power of three layer networks over the original two layer networks first proposed in the 60's. See any of the referenced books for exhaustive explanations of every aspect of this problem.
XOR is the following relationship:
X | Y | XOR(X,Y) |
---|---|---|
0 | 0 | 0 |
1 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 0 |
Generate a number of sets and one test set and see how more training produces a better (or worse) result. If you are using the eNNpi application, use the “-test” option to save the test set error and use a spreadsheet to plot how well the network behaves.
The spreadsheet xor.xls is a Microsoft Excel spreadsheet that uses the RANDBEWTEEN function to generate test data. Run the Macro “exportTrainingSet” to generate a new set of training data called “xor.txt” in the current directory.=== Binary converter === The second example converts a “binary” input to a “decimal” output. The relationship is a follows:
Input 1 | Input 2 | Output 1 | Output 2 | Output 3 |
---|---|---|---|---|
0 | 0 | 0 | 0 | 0 |
0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 1 | 0 |
1 | 1 | 0 | 0 | 1 |
Training this network should be significantly more work than XOR. Try it and see.
Similar to the xor example the spreadsheet binary.xls generates a file called “binary.txt” in the current directory when the “exportTrainingSet” macro is run.
Shuttle is a public data set that is available from the UCLA Machine Learning Repository (full citation details here) that was originally provided by NASA. You can see the commentary and download the data set here.
I've included this to test out a large(-ish) data set. The raw content is not great for a neural network that works best when most input is between -3 and 3. To improve on this slightly I've loaded the data into an open document spreadsheet (shuttle7_5_Sets1-4.ods) and used the STANDARDIZE function to shift the mean of the input to be close to zero and the standard deviation to be 1. I've also translated the output from a single digit (1 – 5) to five outputs with 1, -1, -1, -1, -1 representing 1, -1, 1, -1, -1, -1 representing 2 etc. I've also broken it up into four sets to make file handling a little easier.
I'll keep on working with this data set and see if I can get the 99% accuracy rate that Dr. Catlett mentions in the commentary.