Skip to content

6. Test Data

Nick Oppen edited this page Dec 1, 2013 · 1 revision

Example Networks with Training Data

There are a couple of simple networks with which you can experiment. Try setting up your own networks using different numbers of hidden nodes, different training sets, different starting points, learning rate parameters etc. I have provided two MS Excel (gasp!) spreadsheets with which you can generate as much test data as you want. Run the Macro “Export Training Set” to generate 1000 training pairs in your current directory. Append them together (removing all but the first “networkTopology” clause) to make a bigger set.

To get a better idea of the training times involved, save the training results and load them into a spreadsheet. From there you can plot the network's performance over the training set. === The Binary Relationship Exclusive OR (XOR) === The binary relationship Exclusive OR (XOR) is a classic neural network problem since it is a very simple demonstration of the power of three layer networks over the original two layer networks first proposed in the 60's. See any of the referenced books for exhaustive explanations of every aspect of this problem.

XOR is the following relationship:

X Y XOR(X,Y)
0 0 0
1 0 1
0 1 1
1 1 0
Any input value in the training set between -0.095 and 0.05 is assumed to 0 and any value between 0.995 and 1.05 is assumed to be 1. This trains the network to be tolerant with the input values but still get the correct result. The desired values are 0 and 1 exactly and you as the network trainer must make up your mind how close to the desired value you want your output to be.

Generate a number of sets and one test set and see how more training produces a better (or worse) result. If you are using the eNNpi application, use the “-test” option to save the test set error and use a spreadsheet to plot how well the network behaves.

The spreadsheet xor.xls is a Microsoft Excel spreadsheet that uses the RANDBEWTEEN function to generate test data. Run the Macro “exportTrainingSet” to generate a new set of training data called “xor.txt” in the current directory.=== Binary converter === The second example converts a “binary” input to a “decimal” output. The relationship is a follows:

Input 1 Input 2 Output 1 Output 2 Output 3
0 0 0 0 0
0 1 1 0 0
1 0 0 1 0
1 1 0 0 1
The inputs represent the zero to three in binary and the outputs a “decimalised” form of the input. Again any input value in the training set between -0.095 and 0.05 is assumed to 0 and any value between 0.995 and 1.05 is assumed to be 1.

Training this network should be significantly more work than XOR. Try it and see.

Similar to the xor example the spreadsheet binary.xls generates a file called “binary.txt” in the current directory when the “exportTrainingSet” macro is run.

Shuttle (Statlog)

Shuttle is a public data set that is available from the UCLA Machine Learning Repository (full citation details here) that was originally provided by NASA. You can see the commentary and download the data set here.

I've included this to test out a large(-ish) data set. The raw content is not great for a neural network that works best when most input is between -3 and 3. To improve on this slightly I've loaded the data into an open document spreadsheet (shuttle7_5_Sets1-4.ods) and used the STANDARDIZE function to shift the mean of the input to be close to zero and the standard deviation to be 1. I've also translated the output from a single digit (1 – 5) to five outputs with 1, -1, -1, -1, -1 representing 1, -1, 1, -1, -1, -1 representing 2 etc. I've also broken it up into four sets to make file handling a little easier.

I'll keep on working with this data set and see if I can get the 99% accuracy rate that Dr. Catlett mentions in the commentary.