Major Updates

Updated my README emoji game to be more ambiguous while maintaining fun and heartwarming vibe. 🐕
Support for Python 3.5
Extensive rewrite of README to focus on new users and building an NLP pipeline.
Support for Pytorch 1.2
Added torchnlp.random for finer grain control of random state building on PyTorch's fork_rng. This module controls the random state of torch, numpy and random.

import random
import numpy
import torch

from torchnlp.random import fork_rng

with fork_rng(seed=123):  # Ensure determinism
    print('Random:', random.randint(1, 2**31))
    print('Numpy:', numpy.random.randint(1, 2**31))
    print('Torch:', int(torch.randint(1, 2**31, (1,))))

Refactored torchnlp.samplers enabling pipelining. For example:

from torchnlp.samplers import DeterministicSampler
from torchnlp.samplers import BalancedSampler

data = ['a', 'b', 'c'] + ['c'] * 100
sampler = BalancedSampler(data, num_samples=3)
sampler = DeterministicSampler(sampler, random_seed=12)
print([data[i] for i in sampler])  # ['c', 'b', 'a']

Added torchnlp.samplers.balanced_sampler for balanced sampling extending Pytorch's WeightedRandomSampler.
Added torchnlp.samplers.deterministic_sampler for deterministic sampling based on torchnlp.random.
Added torchnlp.samplers.distributed_batch_sampler for distributed batch sampling.
Added torchnlp.samplers.oom_batch_sampler to sample large batches first in order to force an out-of-memory error.
Added torchnlp.utils.lengths_to_mask to help create masks from a batch of sequences.
Added torchnlp.utils.get_total_parameters to measure the number of parameters in a model.
Added torchnlp.utils.get_tensors to measure the size of an object in number of tensor elements. This is useful for dynamic batch sizing and for torchnlp.samplers.oom_batch_sampler.

from torchnlp.utils import get_tensors

random_object_ = tuple([{'t': torch.tensor([1, 2])}, torch.tensor([2, 3])])
tensors = get_tensors(random_object_)
assert len(tensors) == 2

Added a corporate sponsor to the library: https://wellsaidlabs.com/

Minor Updates

Fixed snli example (#84)
Updated .gitignore to support Python's virtual environments (#84)
Removed requests and pandas dependency. There are only two dependencies remaining. This is useful for production environments. (#84)
Added LazyLoader to reduce dependency requirements. (4e84780)
Removed unused torchnlp.datasets.Dataset class in favor of basic Python dictionary lists and pandas. (#84)
Support for downloading tar.gz files and unpacking them faster. (eb61fee)
Rename itos and stoi to index_to_token and token_to_index respectively. (#84)
Fixed batch_encode, batch_decode, and enforce_reversible for torchnlp.encoders.text (#69)
Fix FastText vector downloads (#72)
Fixed documentation for LockedDropout (#73)
Fixed bug in weight_drop (#76)
stack_and_pad_tensors now returns a named tuple for readability (#84)
Added torchnlp.utils.split_list in favor of torchnlp.utils.resplit_datasets. This is enabled by the modularity of torchnlp.random. (#84)
Deprecated torchnlp.utils.datasets_iterator in favor of Pythons itertools.chain. (#84)
Deprecated torchnlp.utils.shuffle in favor of torchnlp.random. (#84)
Support encoding larger datasets following fixing this issue (#85).
Added torchnlp.samplers.repeat_sampler following up on this issue: pytorch/pytorch#15849

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python 3.5 Support, Sampler Pipelining, Finer Control of Random State, New Corporate Sponsor

Major Updates

Minor Updates