This is sample code for training the siamese neural network and comparing it to edit distance based techniques.
This was tested with 1.2.2 version of keras and python 2.7. We recommend having a GPU to speed up processing.
You will need a copy of Arial.ttf to put in the code directory before running. This file should be in the same directory of the python code. If not, you will see an error similar to this:
Traceback (most recent call last):
File "run_siamese.py", line 144, in <module>
X1_train = generate_imgs([x[0] for x in data['train']], font_location, font_size, image_size, text_location)
File "run_siamese.py", line 47, in generate_imgs
font = ImageFont.truetype(font_location, font_size)
File "/home/<user>/anaconda2/lib/python2.7/site-packages/PIL/ImageFont.py", line 239, in truetype
return FreeTypeFont(font, size, index, encoding)
File "/home/<user>/anaconda2/lib/python2.7/site-packages/PIL/ImageFont.py", line 128, in __init__
self.font = core.getfont(font, size, index, encoding)
IOError: cannot open resource
You can view this for getting Arial.ttf: https://support.apple.com/guide/font-book/install-and-validate-fonts-fntbk1000/mac
You can install ttf-mscorefonts-installer
Look in c:\Windows\Fonts
The code is very simple and has a couple parameters that can be changed inline. The first is data type. For running the code on process data, set the dataset_type
to "process"
:
dataset_type = 'process'
For domains:
dataset_type = 'domain'
The second parameter allows the code to run on a subset of the data to get results faster. Note that a smaller dataset will get worse results. To set this parameter to a subset set isFast
to True
:
isFast = True
Too set to the full dataset do:
isFast = False
Simply run the python code like:
python run_siamese.py