Skip to content

Latest commit

 

History

History
511 lines (297 loc) · 28.6 KB

README.md

File metadata and controls

511 lines (297 loc) · 28.6 KB

Demo setup: Realize Integrated Analytical Solutions with Azure Synapse Analytics

Contents

Requirements

  1. An Azure Account with the ability to create an Azure Synapse Workspace

  2. Make sure the following resource providers are registered for your Azure Subscription.

    • Microsoft.Sql
    • Microsoft.Synapse
    • Microsoft.StreamAnalytics
    • Microsoft.EventHub

    See further documentation for more information on registering resource providers on the Azure Portal.

  3. A Power BI Pro or Premium account to host Power BI reports used for the lab in Module 16.

Lab VM

Power BI Desktop can be installed on the lab VM for this course.

Install Power BI Desktop on your lab computer or VM for Module 16.

Please note, this is not the same VM as the one used to execute the environment setup scripts below.

Environment setup instructions

Note:

The entire setup process will take from 1.5 to 2 hours to complete.

Azure Setup

Task 1: Create a resource group in Azure

  1. Log into the Azure Portal using your Azure credentials.

  2. On the Azure Portal home screen, select the Menu button on the top-left corner (1). Hover over Resource groups (2), then select + Create (3).

    The Create button is highlighted.

  3. On the Create a resource group screen, select your desired Subscription and Region. For Resource group, enter data-engineering-synapse (make sure the name is unique), then select the Review + Create button. Copy the resource group name and save it in Notepad or similar for later reference.

    The Create a resource group form is displayed populated with Synapse-MCW as the resource group name.

  4. Select the Create button once validation has passed.

Important: Take note of the exact resource group name you provided for the steps that follow.

Task 2: Create an Azure VM for the deployment scripts

We highly recommend executing the PowerShell scripts on an Azure Virtual Machine instead of from your local machine. Doing so eliminates issues due to pre-existing dependencies and more importantly, network/bandwidth-related issues while executing the scripts.

  1. In the Azure portal, type in "virtual machines" in the top search menu and then select Virtual machines from the results.

    In the Services search result list, Virtual machines is selected.

  2. Select + Add on the Virtual machines page and then select the Virtual machine option.

  3. In the Basics tab, complete the following:

    Field Value
    Subscription select the appropriate subscription
    Resource group select data-engineering-synapse (the name of the resource group you created in the previous task)
    Virtual machine name synapse-lab-setup-vm (or unique name if not available)
    Region select the resource group's location
    Availability options select No infrastructure redundancy required
    Image select Windows 10 Pro, Version 1809 - Gen1
    Azure Spot instance set to Unchecked
    Size select Standard_D8s_v3
    Username select labuser
    Password enter a password you will remember
    Public inbound ports select Allow selected ports
    Select inbound ports select RDP (3389)
    Licensing select the option to confirm that you have an eligible Windows 10 license with multi-tenant hosting rights.

    The form fields are completed with the previously described settings.

  4. Select Review + create. On the review screen, select Create. After the deployment completes, select Go to resource to go to the virtual machine.

    The Go to resource option is selected.

  5. Select Connect from the actions menu and choose RDP.

    The option to connect to the virtual machine via RDP is selected.

  6. On the Connect tab, select Download RDP File.

    Download the RDP file to connect to the Power BI virtual machine.

  7. Open the RDP file and select Connect to access the virtual machine. When prompted for credentials, enter labuser for the username and the password you chose.

    Connect to a remote host.

    Click Yes to connect despite security certificate errors when prompted.

    The Yes button is highlighted.

Task 4: Create Azure Synapse Analytics workspace

  1. Deploy the workspace through the following Azure ARM template (press the button below):

  2. On the Custom deployment form fill in the fields described below.

    • Subscription: Select your desired subscription for the deployment.

    • Resource group: Select the resource group you previously created.

    • Region: The datacenter where your Azure Synapse environment will be created.

      Important: The Region field under 'Parameters' will list the Azure regions where Azure Synapse Analytics is available as of November 2020. This will help you find a region where the service is available without being limited to where the resource group is defined.

    • Unique Suffix: This unique suffix will be used naming resources that will created as part of your deployment. Make sure you follow correct Azure Resource naming conventions.

    • SQL Administrator Login Password: Provide a strong password for the SQLPool that will be created as part of your deployment. Visit here to read about password rules in place. Your password will be needed during the next steps. Make sure you have your password noted and secured.

  3. Select the Review + create button, then Create. The provisioning of your deployment resources will take approximately 13 minutes. Wait until provisioning successfully completes before continuing. You will need the resources in place before running the scripts below.

    Note: You may experience a deployment step failing in regards to Role Assignment. This error may safely be ignored.

Before starting

Steps & Timing

The entire script will take between 1.5 and 2 hours to complete. Major steps include:

  • Configure Synapse resources
  • Download all data sets and files into the data lake (~15 mins)
  • Execute the setup and execute the SQL pipeline (~30 mins)
  • Execute the Cosmos DB pipeline (~25 mins)

Task 1: Pre-requisites

Install these pre-requisites on your deployment VM before continuing.

Task 2: Download artifacts and install PowerShell modules

Perform all of the steps below from your deployment VM:

  1. Open a PowerShell Window as an administrator, run the following command to download the artifacts

    mkdir c:\labfiles
    
    cd c:\labfiles
    
    git clone https://github.com/solliancenet/microsoft-data-engineering-ilt-deploy.git data-engineering-ilt-deployment
  • Install Azure PowerShell module

    Open Windows PowerShell as an Administrator on your desktop and execute the following:

    if (Get-Module -Name AzureRM -ListAvailable) {
        Write-Warning -Message 'Az module not installed. Having both the AzureRM and Az modules installed at the same time is not supported.'
        Uninstall-AzureRm -ea SilentlyContinue
        Install-Module -Name Az -AllowClobber -Scope CurrentUser
    } else {
        Install-Module -Name Az -AllowClobber -Scope CurrentUser
    }

    [!Note]: You may be prompted to install NuGet providers, and receive a prompt that you are installing the module from an untrusted repository. Select Yes in both instances to proceed with the setup

  • Install Az.CosmosDB module

    Install-Module -Name Az.CosmosDB -AllowClobber

    [!Note]: If you receive a prompt that you are installing the module from an untrusted repository, select Yes to All to proceed with the setup.

  • Install sqlserver module

    Install-Module -Name SqlServer -AllowClobber
  • Install Azure CLI

    Invoke-WebRequest -Uri https://aka.ms/installazurecliwindows -OutFile .\AzureCLI.msi; Start-Process msiexec.exe -Wait -ArgumentList '/I AzureCLI.msi /quiet'; rm .\AzureCLI.msi

IMPORTANT

  • Once the last command has completed, close the Windows PowerShell window so you can import the newly installed Az.CosmosDB cmdlet.

Task 3: Execute setup scripts

Perform all of the steps below from your deployment VM:

  • Open Windows PowerShell as an Administrator and execute the following:

    Set-ExecutionPolicy Unrestricted

    [!Note]: If you receive a prompt that you are installing the module from an untrusted repository, select Yes to All to proceed with the setup.

  • Execute the following to import the Az.CosmosDB module:

    Import-Module Az.CosmosDB
  • Change directories to the root of this repo within your local file system.

    cd c:\labfiles\data-engineering-ilt-deployment\setup\04\artifacts\environment-setup\automation\
  • Execute Connect-AzAccount and sign in to your Microsoft user account when prompted.

    [!WARNING]: You may receive the message "TenantId 'xxxxxx-xxxx-xxxx-xxxx' contains more than one active subscription. The first one will be selected for further use. You can ignore this at this point. When you execute the environment setup, you will choose the subscription in which you deployed the environment resources.

  • Execute az login and sign in to your Microsoft user account when prompted.

    If you receive the following error, and have already closed and re-opened the PowerShell window, you need to restart your computer and restart the steps in this task: The term 'az' is not recognized as the name of a cmdlet, function, script file, or operable program.

  • Execute .\01-environment-setup.ps1

  1. You will be prompted to setup your Azure PowerShell and Azure CLI context.

  2. If you have more than one Azure Subscription, you will be prompted to enter the name of your desired Azure Subscription. You can copy and paste the value from the list to select one. For example:

    A subscription is copied and pasted into the text entry.

  3. Enter the name of the resource group you created at the beginning of the environment setup (such as data-engineering-synapse). This will make sure automation runs against the correct environment you provisioned in Azure.

    During the execution of the automation script you may be prompted to approve installations from PS-Gallery. Please approve to proceed with the automation.

    The Azure Cloud Shell window is displayed with a sample of the output from the preceding command.

    NOTE This script will take between 90 and 150 minutes to complete.

Potential errors that you can ignore

You may encounter a few errors and warnings during the script execution. The errors below can safely be ignored:

  1. The following error may occur when creating SQL users and adding role assignments in the dedicated SQL pool, and can safely be ignored: Principal '[email protected]' could not be created. Only connections established with Active Directory accounts can create other Active Directory users.

    Error is displayed.

  2. Toward the end of the script, you may see the following error. If you do, it can be safely ignored:

    Starting PowerBI Artifact Provisioning
    Invoke-WebRequest : The response content cannot be parsed because the Internet Explorer engine is not available, or Internet Explorer's first-launch configuration is not complete. Specify the UseBasicParsing parameter and try again.
    At C:\labfiles\data-engineering-ilt-deployment\setup\04\artifacts\environment-setup\solliance-synapse-automation\solliance-synapse-automation. char:15
    + ...   $result = Invoke-WebRequest -Uri $url -Method GET -ContentType "app ...
    +                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        + CategoryInfo          : NotImplemented: (:) [Invoke-WebRequest], NotSupportedException
        + FullyQualifiedErrorId : WebCmdletIEDomNotSupportedException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand
    
    Cannot index into a null array.
    At C:\labfiles\data-engineering-ilt-deployment\setup\04\artifacts\environment-setup\solliance-synapse-automation\solliance-synapse-automation. char:5
    +     $homeCluster = $result.Headers["home-cluster-uri"]
    +     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
        + FullyQualifiedErrorId : NullArray

Task 4: Pause SQL pool

Note:

If you are not planning on using the Synapse workspace environment right away, follow the steps in this task to pause the SQL pool. Otherwise, you will incur potentially significant cost.

  1. Navigate to the resource group into which you deployed this environment.

  2. Select the Dedicated SQL pool (SQLPool01).

    The SQL pool is highlighted.

  3. Select || Pause to pause the pool.

    The pause button is highlighted.

Task 6: Delete lab setup VM

You no longer need the virtual machine if you created one for this lab setup.

  1. Open the VM in your Azure resource group, select Delete, then select Yes when prompted.

    The delete button is highlighted.