Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASR Main Scripts #7

Open
12 tasks
raymondxyy opened this issue Jan 9, 2019 · 1 comment
Open
12 tasks

ASR Main Scripts #7

raymondxyy opened this issue Jan 9, 2019 · 1 comment

Comments

@raymondxyy
Copy link
Owner

raymondxyy commented Jan 9, 2019

This issue focuses on debugging airbus_attention_vtlp_CTC.py that first appears in commit 1857440 @ShangwuYao. Leave the major changes you did in the next section, and anything else to be done in the to-do section.

Major Changes in Commit 05eb1ec

By @raymondxyy.

  • Moved airbus_attention_vtlp_CTC.py from audlib.nn to egs/asr.

  • Moved common DNN functions to audlib.nn.nn. Check "Refactoring nn Module" issue for more info.

  • Refactored data-related functions to egs/asr/dataset.py. The script only needs successful installation of audlib and a downloaded WSJ dataset (You can download from LDC's catalog using @raymondxyy's account) to work properly. Make sure this runs fine before you proceed.

  • Refactored transforms and collate functions to egs/asr/transforms.py.

  • Removed WSJ class that loads all training/validation/test data in one shot. Replaced it with WSJ0 and ASRWSJ0 (WSJ1 is also available) in audlib.data.wsj. See egs/asr/dataset.py.

  • Removed this code block for vocabulary handling in main:

    STRINGS = [
        x for x in "&*ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz #'-/@_"]
    INPUT_DIM = 40
    vocab_size = len(STRINGS)
    STR_DICT = {}
    for i, x in enumerate(STRINGS):
        STR_DICT[x] = i

    The string-to-vocab-to-label-to-vocab-to-string pipeline can be done using CharacterMap or PhonemeMap in audlib.asr.util. See dataset.py for an example of how this is done.

  • Removed myDataset because it only prepends or appends special characters to training transcripts. This step is now done as part of transform in transforms.FinalTransform.

  • CUDA availability is checked in the beginning of main functions and tensors get transferred to device as part of the new collate function:

    collate_fn = Compose([my_collate_fn, ToDevice(device)])

    All code that transfers tensors to CUDA that are not in the main functions are therefore commented out. It doesn't make much sense for data to keep switching device after they enter NNs, unless there are packages that require processing on a particular device. This will be part of the TODOs.

  • Vocabulary histogram is now available as part of the dataset, so no need to do it manually in main. See VOCAB_HIST in dataset.py.

  • Disabled all to_variable code as it's deprecated according to PyTorch.

  • Removed dependency to python-levenshtein. The Levenshtein (edit) distance function is implemented in audlib.asr.utils.levenshtein.

TODOs


BUGs

  • Update grid_search to work with new interface.
  • Update beamsearch to work with new interface.
  • Debug network using PyTorch 1.0.
    • On CPU, error is thrown in one of built-in RNN functions:
    File "/home/xyy/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 182, in forward
    self.num_layers, self.dropout, self.training, self.bidirectional)
    RuntimeError: got an incorrect number of RNN parameters
    
    Should be an easy fix if it's just different interface.
    • On GPU, error for incompatible types (one torch.cuda.FloatTensor, the other torch.FloatTensor). Should also be an easy fix with nnModule.to(device).

REVIEWs

  • Look for REVIEW tags in transforms.my_collate_fn. Make sure the new interface does the same thing as the old one.
  • In getInputChar in airbus_attention_vtlp_CTC.py, make sure the new call to torch.bernoulli is okay.
  • Document everything @ShangwuYao. Add docstrings (e.g. Numpy standard); give more descriptive function names (e.g. not my_collate_fn).
  • Move generic functions to audlib.nn. ASR-specific NNs can be put into audlib.nn.asr.
  • Resolve dependency to sklearn.ParameterSampler. We can implement this if it's not complicated.
  • Consider refactoring model_data_optim.
  • my_collate_fn probably needs a rewrite for clarity. The pattern
    batch[0][0].new(batch_size, max_len, 40).zero_().float()
    looks to do the same thing as
    torch.zeros((batch_size, max_len, 40), dtype=batch[0][0].dtype)
    This pattern also appears often in the main script. What does it do?
@ShangwuYao
Copy link
Collaborator

batch[0][0].new(batch_size, max_len, 40).zero_().float()

This line of code initialize the new data on the same device as batch[0][0] to avoid the overhead of moving data between devices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants