ASR Main Scripts #7

raymondxyy · 2019-01-09T21:18:38Z

This issue focuses on debugging airbus_attention_vtlp_CTC.py that first appears in commit 1857440 @ShangwuYao. Leave the major changes you did in the next section, and anything else to be done in the to-do section.

Major Changes in Commit 05eb1ec

By @raymondxyy.

Moved airbus_attention_vtlp_CTC.py from audlib.nn to egs/asr.
Moved common DNN functions to audlib.nn.nn. Check "Refactoring nn Module" issue for more info.
Refactored data-related functions to egs/asr/dataset.py. The script only needs successful installation of audlib and a downloaded WSJ dataset (You can download from LDC's catalog using @raymondxyy's account) to work properly. Make sure this runs fine before you proceed.
Refactored transforms and collate functions to egs/asr/transforms.py.
Removed WSJ class that loads all training/validation/test data in one shot. Replaced it with WSJ0 and ASRWSJ0 (WSJ1 is also available) in audlib.data.wsj. See egs/asr/dataset.py.

Removed this code block for vocabulary handling in main:

STRINGS = [
    x for x in "&*ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz #'-/@_"]
INPUT_DIM = 40
vocab_size = len(STRINGS)
STR_DICT = {}
for i, x in enumerate(STRINGS):
    STR_DICT[x] = i

The string-to-vocab-to-label-to-vocab-to-string pipeline can be done using CharacterMap or PhonemeMap in audlib.asr.util. See dataset.py for an example of how this is done.

Removed myDataset because it only prepends or appends special characters to training transcripts. This step is now done as part of transform in transforms.FinalTransform.
CUDA availability is checked in the beginning of main functions and tensors get transferred to device as part of the new collate function:
```
collate_fn = Compose([my_collate_fn, ToDevice(device)])
```
All code that transfers tensors to CUDA that are not in the main functions are therefore commented out. It doesn't make much sense for data to keep switching device after they enter NNs, unless there are packages that require processing on a particular device. This will be part of the TODOs.
Vocabulary histogram is now available as part of the dataset, so no need to do it manually in main. See VOCAB_HIST in dataset.py.
Disabled all to_variable code as it's deprecated according to PyTorch.
Removed dependency to python-levenshtein. The Levenshtein (edit) distance function is implemented in audlib.asr.utils.levenshtein.

TODOs

BUGs

Update grid_search to work with new interface.
Update beamsearch to work with new interface.
Debug network using PyTorch 1.0.
- On CPU, error is thrown in one of built-in RNN functions:
```
File "/home/xyy/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 182, in forward
self.num_layers, self.dropout, self.training, self.bidirectional)
RuntimeError: got an incorrect number of RNN parameters
```
Should be an easy fix if it's just different interface.
- On GPU, error for incompatible types (one torch.cuda.FloatTensor, the other torch.FloatTensor). Should also be an easy fix with nnModule.to(device).

REVIEWs

Look for REVIEW tags in transforms.my_collate_fn. Make sure the new interface does the same thing as the old one.
In getInputChar in airbus_attention_vtlp_CTC.py, make sure the new call to torch.bernoulli is okay.
Document everything @ShangwuYao. Add docstrings (e.g. Numpy standard); give more descriptive function names (e.g. not my_collate_fn).
Move generic functions to audlib.nn. ASR-specific NNs can be put into audlib.nn.asr.
Resolve dependency to sklearn.ParameterSampler. We can implement this if it's not complicated.
Consider refactoring model_data_optim.
my_collate_fn probably needs a rewrite for clarity. The pattern
```
batch[0][0].new(batch_size, max_len, 40).zero_().float()
```
looks to do the same thing as
```
torch.zeros((batch_size, max_len, 40), dtype=batch[0][0].dtype)
```
This pattern also appears often in the main script. What does it do?

The text was updated successfully, but these errors were encountered:

ShangwuYao · 2019-01-18T06:41:41Z

batch[0][0].new(batch_size, max_len, 40).zero_().float()

This line of code initialize the new data on the same device as batch[0][0] to avoid the overhead of moving data between devices.

ShangwuYao mentioned this issue Jan 18, 2019

Neural Network Module for Issue 7 #9

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASR Main Scripts #7

ASR Main Scripts #7

raymondxyy commented Jan 9, 2019 •

edited

Loading

ShangwuYao commented Jan 18, 2019

ASR Main Scripts #7

ASR Main Scripts #7

Comments

raymondxyy commented Jan 9, 2019 • edited Loading

Major Changes in Commit 05eb1ec

TODOs

BUGs

REVIEWs

ShangwuYao commented Jan 18, 2019

raymondxyy commented Jan 9, 2019 •

edited

Loading