Question about the training process #17

ironsuperdev · 2021-06-10T09:34:54Z

def train(
        train_loader: Any,
        epoch: int,
        criterion: Any,
        logger: Logger,
        encoder: Any,
        decoder: Any,
        encoder_optimizer: Any,
        decoder_optimizer: Any,
        model_utils: ModelUtils,
        rollout_len: int = 30,
) -> None:

 for i, (_input, target, helpers) in enumerate(train_loader):
       _input = _input.to(device)   # !!! here one whole batch of data are loaded !!!
       target = target.to(device)

       # Set to train mode
       encoder.train()
       decoder.train()

       # Zero the gradients
       encoder_optimizer.zero_grad()
       decoder_optimizer.zero_grad()

       # Encoder
       batch_size = _input.shape[0]
       input_length = _input.shape[1]
       # output_length = target.shape[1]
       # input_shape = _input.shape[2]

       # Initialize encoder hidden state
       encoder_hidden = model_utils.init_hidden(
           batch_size,
           encoder.module.hidden_size if use_cuda else encoder.hidden_size)

       # Initialize losses
       loss = 0

       # Encode observed trajectory
       for ei in range(input_length):       # !!! in this for loop, complete batch data of 2 sec. are fed through encoder !!!
           encoder_input = _input[:, ei, :]    # !!! each time the different data of certain time stamp ei * 0.1 sec are choosed !!!
           encoder_hidden = encoder(encoder_input, encoder_hidden)   

       # Initialize decoder input with last coordinate in encoder
       decoder_input = encoder_input[:, :2]    # !!! which data in the batch is used?? I don't clearly understand this !!!

       # Initialize decoder hidden state as encoder hidden state
       decoder_hidden = encoder_hidden

       decoder_outputs = torch.zeros(target.shape).to(device)

       # Decode hidden state in future trajectory
       for di in range(rollout_len):
           decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
           decoder_outputs[:, di, :] = decoder_output

           # Update loss
           loss += criterion(decoder_output[:, :2], target[:, di, :2])

           # Use own predictions as inputs at next step
           decoder_input = decoder_output

I don't clearly understand the codes above, especially the positions where I wrote some comments.

during training with batch of data, the encoder is fed with all data of the same recording time. Is this a correct procedure and does it have influence on LSTM's internal states?
after that decoder begins to take the encoder_input[:, :2] as its initial input. But what exactly is this data? Is this the last recorded trajectory in batch? Or all the data of same time points within all trajectories of the whole batch?

Thanks for more explanation on this

BR, Song

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the training process #17

Question about the training process #17

ironsuperdev commented Jun 10, 2021

Question about the training process #17

Question about the training process #17

Comments

ironsuperdev commented Jun 10, 2021