Contents
- Demo setup: Realize Integrated Analytical Solutions with Azure Synapse Analytics
-
An Azure Account with the ability to create an Azure Synapse Workspace
-
Make sure the following resource providers are registered for your Azure Subscription.
- Microsoft.Sql
- Microsoft.Synapse
- Microsoft.StreamAnalytics
- Microsoft.EventHub
See further documentation for more information on registering resource providers on the Azure Portal.
-
A Power BI Pro or Premium account to host Power BI reports used for the lab in Module 16.
Power BI Desktop can be installed on the lab VM for this course.
Install Power BI Desktop on your lab computer or VM for Module 16.
Please note, this is not the same VM as the one used to execute the environment setup scripts below.
Note:
The entire setup process will take from 1.5 to 2 hours to complete.
-
Log into the Azure Portal using your Azure credentials.
-
On the Azure Portal home screen, select the Menu button on the top-left corner (1). Hover over Resource groups (2), then select + Create (3).
-
On the Create a resource group screen, select your desired Subscription and Region. For Resource group, enter
data-engineering-synapse
(make sure the name is unique), then select the Review + Create button. Copy the resource group name and save it in Notepad or similar for later reference. -
Select the Create button once validation has passed.
Important: Take note of the exact resource group name you provided for the steps that follow.
We highly recommend executing the PowerShell scripts on an Azure Virtual Machine instead of from your local machine. Doing so eliminates issues due to pre-existing dependencies and more importantly, network/bandwidth-related issues while executing the scripts.
-
In the Azure portal, type in "virtual machines" in the top search menu and then select Virtual machines from the results.
-
Select + Add on the Virtual machines page and then select the Virtual machine option.
-
In the Basics tab, complete the following:
Field Value Subscription select the appropriate subscription Resource group select data-engineering-synapse
(the name of the resource group you created in the previous task)Virtual machine name synapse-lab-setup-vm
(or unique name if not available)Region select the resource group's location Availability options select No infrastructure redundancy required
Image select Windows 10 Pro, Version 1809 - Gen1
Azure Spot instance set to Unchecked
Size select Standard_D8s_v3
Username select labuser
Password enter a password you will remember Public inbound ports select Allow selected ports
Select inbound ports select RDP (3389)
Licensing select the option to confirm that you have an eligible Windows 10 license with multi-tenant hosting rights. -
Select Review + create. On the review screen, select Create. After the deployment completes, select Go to resource to go to the virtual machine.
-
Select Connect from the actions menu and choose RDP.
-
On the Connect tab, select Download RDP File.
-
Open the RDP file and select Connect to access the virtual machine. When prompted for credentials, enter
labuser
for the username and the password you chose.Click Yes to connect despite security certificate errors when prompted.
-
Deploy the workspace through the following Azure ARM template (press the button below):
-
On the Custom deployment form fill in the fields described below.
-
Subscription: Select your desired subscription for the deployment.
-
Resource group: Select the resource group you previously created.
-
Region: The datacenter where your Azure Synapse environment will be created.
Important: The
Region
field under 'Parameters' will list the Azure regions where Azure Synapse Analytics is available as of November 2020. This will help you find a region where the service is available without being limited to where the resource group is defined. -
Unique Suffix: This unique suffix will be used naming resources that will created as part of your deployment. Make sure you follow correct Azure Resource naming conventions.
-
SQL Administrator Login Password: Provide a strong password for the SQLPool that will be created as part of your deployment. Visit here to read about password rules in place. Your password will be needed during the next steps. Make sure you have your password noted and secured.
-
-
Select the Review + create button, then Create. The provisioning of your deployment resources will take approximately 13 minutes. Wait until provisioning successfully completes before continuing. You will need the resources in place before running the scripts below.
Note: You may experience a deployment step failing in regards to Role Assignment. This error may safely be ignored.
The entire script will take between 1.5 and 2 hours to complete. Major steps include:
- Configure Synapse resources
- Download all data sets and files into the data lake (~15 mins)
- Execute the setup and execute the SQL pipeline (~30 mins)
- Execute the Cosmos DB pipeline (~25 mins)
Install these pre-requisites on your deployment VM before continuing.
- Install VC Redist: https://aka.ms/vs/15/release/vc_redist.x64.exe
- Install MS ODBC Driver 17 for SQL Server: https://www.microsoft.com/download/confirmation.aspx?id=56567
- Install SQL CMD x64: https://go.microsoft.com/fwlink/?linkid=2082790
- Install Microsoft Online Services Sign-In Assistant for IT Professionals RTW: https://www.microsoft.com/download/details.aspx?id=28177
- Install Git client accepting all the default options in the setup.
- Windows PowerShell
Perform all of the steps below from your deployment VM:
-
Open a PowerShell Window as an administrator, run the following command to download the artifacts
mkdir c:\labfiles cd c:\labfiles git clone https://github.com/solliancenet/microsoft-data-engineering-ilt-deploy.git data-engineering-ilt-deployment
-
Install Azure PowerShell module
Open Windows PowerShell as an Administrator on your desktop and execute the following:
if (Get-Module -Name AzureRM -ListAvailable) { Write-Warning -Message 'Az module not installed. Having both the AzureRM and Az modules installed at the same time is not supported.' Uninstall-AzureRm -ea SilentlyContinue Install-Module -Name Az -AllowClobber -Scope CurrentUser } else { Install-Module -Name Az -AllowClobber -Scope CurrentUser }
[!Note]: You may be prompted to install NuGet providers, and receive a prompt that you are installing the module from an untrusted repository. Select Yes in both instances to proceed with the setup
-
Install
Az.CosmosDB
moduleInstall-Module -Name Az.CosmosDB -AllowClobber
[!Note]: If you receive a prompt that you are installing the module from an untrusted repository, select Yes to All to proceed with the setup.
-
Install
sqlserver
moduleInstall-Module -Name SqlServer -AllowClobber
-
Install Azure CLI
Invoke-WebRequest -Uri https://aka.ms/installazurecliwindows -OutFile .\AzureCLI.msi; Start-Process msiexec.exe -Wait -ArgumentList '/I AzureCLI.msi /quiet'; rm .\AzureCLI.msi
IMPORTANT
- Once the last command has completed, close the Windows PowerShell window so you can import the newly installed Az.CosmosDB cmdlet.
Perform all of the steps below from your deployment VM:
-
Open Windows PowerShell as an Administrator and execute the following:
Set-ExecutionPolicy Unrestricted
[!Note]: If you receive a prompt that you are installing the module from an untrusted repository, select Yes to All to proceed with the setup.
-
Execute the following to import the
Az.CosmosDB
module:Import-Module Az.CosmosDB
-
Change directories to the root of this repo within your local file system.
cd c:\labfiles\data-engineering-ilt-deployment\setup\04\artifacts\environment-setup\automation\
-
Execute
Connect-AzAccount
and sign in to your Microsoft user account when prompted.[!WARNING]: You may receive the message "TenantId 'xxxxxx-xxxx-xxxx-xxxx' contains more than one active subscription. The first one will be selected for further use. You can ignore this at this point. When you execute the environment setup, you will choose the subscription in which you deployed the environment resources.
-
Execute
az login
and sign in to your Microsoft user account when prompted.If you receive the following error, and have already closed and re-opened the PowerShell window, you need to restart your computer and restart the steps in this task:
The term 'az' is not recognized as the name of a cmdlet, function, script file, or operable program
. -
Execute
.\01-environment-setup.ps1
-
You will be prompted to setup your Azure PowerShell and Azure CLI context.
-
If you have more than one Azure Subscription, you will be prompted to enter the name of your desired Azure Subscription. You can copy and paste the value from the list to select one. For example:
-
Enter the name of the resource group you created at the beginning of the environment setup (such as
data-engineering-synapse
). This will make sure automation runs against the correct environment you provisioned in Azure.During the execution of the automation script you may be prompted to approve installations from PS-Gallery. Please approve to proceed with the automation.
NOTE This script will take between 90 and 150 minutes to complete.
You may encounter a few errors and warnings during the script execution. The errors below can safely be ignored:
-
The following error may occur when creating SQL users and adding role assignments in the dedicated SQL pool, and can safely be ignored:
Principal '[email protected]' could not be created. Only connections established with Active Directory accounts can create other Active Directory users.
-
Toward the end of the script, you may see the following error. If you do, it can be safely ignored:
Starting PowerBI Artifact Provisioning Invoke-WebRequest : The response content cannot be parsed because the Internet Explorer engine is not available, or Internet Explorer's first-launch configuration is not complete. Specify the UseBasicParsing parameter and try again. At C:\labfiles\data-engineering-ilt-deployment\setup\04\artifacts\environment-setup\solliance-synapse-automation\solliance-synapse-automation. char:15 + ... $result = Invoke-WebRequest -Uri $url -Method GET -ContentType "app ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : NotImplemented: (:) [Invoke-WebRequest], NotSupportedException + FullyQualifiedErrorId : WebCmdletIEDomNotSupportedException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand Cannot index into a null array. At C:\labfiles\data-engineering-ilt-deployment\setup\04\artifacts\environment-setup\solliance-synapse-automation\solliance-synapse-automation. char:5 + $homeCluster = $result.Headers["home-cluster-uri"] + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidOperation: (:) [], RuntimeException + FullyQualifiedErrorId : NullArray
Note:
If you are not planning on using the Synapse workspace environment right away, follow the steps in this task to pause the SQL pool. Otherwise, you will incur potentially significant cost.
-
Navigate to the resource group into which you deployed this environment.
-
Select the Dedicated SQL pool (
SQLPool01
). -
Select || Pause to pause the pool.
You no longer need the virtual machine if you created one for this lab setup.