Setup Guide

This document describes how to setup all the dependencies to run the notebooks in this repository.

The recommended environment to run these notebooks is the Azure Data Science Virtual Machine (DSVM). Since a considerable number of the algorithms rely on deep learning, it is recommended to use a GPU DSVM.

For training at scale, operationalization or hyperparameter tuning, it is recommended to use Azure ML.

Compute Environments

Depending on the type of NLP system and the notebook that needs to be run, there are different computational requirements. Currently, this repository supports Python CPU and Python GPU. A conda environment YAML file can be generated for either CPU or GPU environments as shown below in the Dependencies Setup section.

Setup Guide for Local or DSVM Machines

Requirements

A machine running Linux, MacOS or Windows.
On Windows, Microsoft Visual C++ 14.0 is required for building certain packages. Download Microsoft Visual C++ Build Tools here.
Miniconda or Anaconda with Python version >= 3.6.
- This is pre-installed on Azure DSVM such that one can run the following steps directly. To setup on your local machine, Miniconda is a quick way to get started.
- It is recommended to update conda to the latest version: conda update -n base -c defaults conda

NOTE: Windows machines are not FULLY SUPPORTED. Please use at your own risk.

Dependencies Setup

We provide a script, generate_conda_file.py, to generate a conda-environment yaml file which you can use to create the target environment using the Python version 3.6 with all the correct dependencies.

Assuming the repo is cloned as nlp in the system, to install a default (Python CPU) environment:

cd nlp
python tools/generate_conda_file.py
conda env create -f nlp_cpu.yaml

You can specify the environment name as well with the flag -n.

Click on the following menus to see how to install the Python GPU environment:

Python GPU environment on Linux, MacOS

Assuming that you have a GPU machine, to install the Python GPU environment, which by default installs the CPU environment:

cd nlp
python tools/generate_conda_file.py --gpu
conda env create -n nlp_gpu -f nlp_gpu.yaml

Python GPU environment on Windows

Assuming that you have an Azure GPU DSVM machine, here are the steps to setup the Python GPU environment:

Make sure you have CUDA Toolkit version 9.0 above installed on your Windows machine. You can run the command below in your terminal to check.
```
  nvcc --version
```
If you don't have CUDA Toolkit or don't have the right version, please download it from here: CUDA Toolkit

Install the GPU environment.

 cd nlp
 python tools/generate_conda_file.py --gpu
 conda env create -n nlp_gpu -f nlp_gpu.yaml

Register Conda Environment in DSVM JupyterHub

We can register our created conda environment to appear as a kernel in the Jupyter notebooks.

conda activate my_env_name
python -m ipykernel install --user --name my_env_name --display-name "Python (my_env_name)"

If you are using the DSVM, you can connect to JupyterHub by browsing to https://your-vm-ip:8000. If you are prompted to enter user name and password, enter the user name and password that you use to log in to your virtual machine.

Installing the Repo's Utils via PIP

The utils_nlp module of this repository needs to be installed as a python package in order to be used by the examples. Click to expand and see the details

A setup.py file is provided in order to simplify the installation of this utilities in this repo from the main directory.

To install, please run the command below

python setup.py install

It is also possible to install directly from Github, which is the best way to utilize the utils_nlp package in external projects.

pip install -e  git+git@github.com:microsoft/nlp.git@master#egg=utils_nlp

Either command, from above, makes utils_nlp available in your conda virtual environment. You can verify it was properly installed by running:

pip list

NOTE - The pip installation does not install any of the necessary package dependencies, it is expected that conda will be used as shown above to setup the environment for the utilities being used.

The details of the versioning info can be found at VERSIONING.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SETUP.md

SETUP.md

Setup Guide

Table of Contents

Compute Environments

Setup Guide for Local or DSVM Machines

Requirements

Dependencies Setup

Register Conda Environment in DSVM JupyterHub

Installing the Repo's Utils via PIP

Files

SETUP.md

Latest commit

History

SETUP.md

File metadata and controls

Setup Guide

Table of Contents

Compute Environments

Setup Guide for Local or DSVM Machines

Requirements

Dependencies Setup

Register Conda Environment in DSVM JupyterHub

Installing the Repo's Utils via PIP