Skip to content

Commit

Permalink
Merge pull request #305 from Azure-Player/ver.1.4
Browse files Browse the repository at this point in the history
Ver.1.4
  • Loading branch information
NowinskiK authored Apr 26, 2023
2 parents ae8d1b7 + 988cdd7 commit 467cd6e
Show file tree
Hide file tree
Showing 39 changed files with 800 additions and 61 deletions.
105 changes: 100 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ The main advantage of the module is the ability to publish all the Azure Data Fa
- Integration Runtimes
- Managed Virtual Network
- Managed Private Endpoint
- Credential
* Finding the **right order** for deploying objects (no more worrying about object names)
* Built-in mechanism to replace, remove or add the properties with the indicated values (CSV and JSON file formats supported)
* Stopping/starting triggers
Expand All @@ -33,13 +34,14 @@ The main advantage of the module is the ability to publish all the Azure Data Fa
* Allows to define multiple file (objects) by wildcarding
* Global Parameters
* Support for Managed VNET and Managed Private Endpoint
* ⭐️ Incremental deployment (**NEW!**)
* Build function to support validation of files, dependencies and config
* Test connections (Linked Services)
* Generates mermaid dependencies diagram to be used in MarkDown type of documents

# Known issues

The module accepts **Credentials** type of object (when loading from files), but the deployment is skipped and not supported yet. [Read more here](https://github.com/SQLPlayer/azure.datafactory.tools/issues/156).
- **[Native CDC](https://learn.microsoft.com/en-us/azure/data-factory/concepts-change-data-capture)** objects are not yet supported.

# Overview

Expand Down Expand Up @@ -127,8 +129,6 @@ Publish-AdfV2FromJson -RootFolder "$RootFolder" -ResourceGroupName "$ResourceGro
Use optional ```[-Stage]``` parameter to prepare json files of ADF with appropriate values for properties and deploy to another environment correctly. See section: **How it works / Step: Replacing all properties environment-related** for more details.


> Detailed *Wiki* documentation - coming soon.
## Publish Options

The options allows you control which objects should be deployed by including or excluding them from the list. First of all you need to create the object:
Expand All @@ -148,6 +148,8 @@ $opt = New-AdfPublishOption
* [Boolean] **FailsWhenPathNotFound** - indicates whether missing paths fails the script. (default: *true*)
* [Boolean] **DoNotStopStartExcludedTriggers** - specifies whether excluded triggers will be stopped before deployment (default: *false*)
* [Boolean] **DoNotDeleteExcludedObjects** - specifies whether excluded objects can be removed. Applies when `DeleteNotInSource` is set to *True* only. (default: *true*)
* [Boolean] **IncrementalDeployment** - specifies whether Incremental Deployment mode is enabled (default: *false*)



Subsequently, you can define the needed options:
Expand Down Expand Up @@ -249,6 +251,7 @@ pipeline.ScdType[123]
trigger.*@testFolder
managedVirtualNetwork*.*
*managedPrivateEndpoint.*
factory.*
```
Full name of objects supported by the module is built of: `{Type}.{Name}@{Folder}`
All potential combinations can be found in code repository of ADF:
Expand Down Expand Up @@ -305,13 +308,47 @@ Currently ```Publish-AdfV2FromJson``` cmdlet contains two methods of publishing:

This section describes what the function ```Publish-AdfV2FromJson``` does step by step.

``` mermaid
graph LR;
S10[Create ADF] --> S15[Load files];
S15 --> S20[Update properties];
S20 --> S25[Deployment Plan]
S25 --> S30[Stop triggers];
S30 --> S40[Deployment];
S40 --> S45[Save Deployment State]
S45 --> S50[Delete objects];
S50 --> S60[Restart triggers];
```

## Step: Create ADF (if not exist)

💬 In log you'll see line: `STEP: Verifying whether ADF exists...`

You must have appropriate permission to create new instance.
*Location* parameter is required for this action.

If ADF does exist and `IncrementalDeployment` is ON, the process gets Global Parameters to load latest **Deployment State** from ADF.

## Step: Load files

💬 In log you'll see line: `STEP: Reading Azure Data Factory from JSON files...`

This step reads all local (json) files from a given directory (`rootfolder`).


## Step: Pre-deployment

💬 In log you'll see line: `STEP: Pre-deployment`

It prepares new (empty) file in `factory` folder if such file doesn't exist.
The file is needed for further steps to keep Deployment State in Global Parameter.

> This step is enable only when `IncrementalDeployment` is ON and `DeployGlobalParams` is ON.
## Step: Replacing all properties environment-related

💬 In log you'll see line: `STEP: Replacing all properties environment-related...`

This step will be executed only when `[Stage]` parameter has been provided.
The whole concept of CI & CD (Continuous Integration and Continuous Delivery) process is to deploy automatically and without risk onto target infrastructure, supporting multi-environments. Each environment (or stage) has to be exactly the same code except for selected properties. Very often these properties are:
- Data Factory name
Expand Down Expand Up @@ -353,6 +390,7 @@ Column `type` accepts one of the following values only:
- managedVirtualNetwork
- managedPrivateEndpoint
- factory *(for Global Parameters)*
- credential

### Column NAME

Expand Down Expand Up @@ -421,7 +459,7 @@ Having that in mind, you can leverage variables defined in Azure DevOps pipeline

This parameter is optional. When defined, the process will replace all properties defined in (csv) configuration file.
The parameter can be either full path to csv file (must ends with .csv) or just stage name.
When you provide parameter value 'UAT' the process will try open config file located .\deployment\config-UAT.csv
When you provide parameter value 'UAT' the process will try open config file located `.\deployment\config-UAT.csv`

> Use the optional [-Stage] parameter when executing ```Publish-AdfV2FromJson``` module to replace values for/with properties specified in config file(s).
Expand Down Expand Up @@ -489,28 +527,85 @@ If you prefer using JSON rather than CSV for setting up configuration - JSON fil
```


## Step: Deployment Plan

💬 In log you'll see line: `STEP: Determining the objects to be deployed...`

This step identifies objects to be deployed using `Includes` and `Excludes` list provided in *Publish Options*.
Afterwards, if `IncrementalDeployment = true`, it excludes objects by comparing hashes from **Deployment State** to hashed of awaiting objects.


## Step: Stoping triggers

💬 In log you'll see line: `STEP: Stopping triggers...`

This block stops all triggers which must be stopped due to deployment.
Since version 0.30 you can better control which triggers you want to omit from stopping. Only need to add such triggers to `Excludes` list and set flag `DoNotStopStartExcludedTriggers` to *true*.

> The step might be skipped when `StopStartTriggers = false` in *Publish Options*
## Step: Deployment of ADF objects

💬 In log you'll see line: `STEP: Deployment of all ADF objects...`

This step is actually responsible for doing all the stuff.
The mechanism is smart enough to publish all objects in the right order, thence a developer doesn't need to care of object names due to deployment failure any longer.
> Find out *Publish Option* capabilities in terms of filtering objects intended to be deployed.

## Step: Save deployment state (new in ver.1.4)

💬 In log you'll see line: `STEP: Updating (incremental) deployment state...`

After the deployment, in this step the tool prepares the list of deployed objects and their hashes (MD5 algorithm). The array is wrap up in json format and stored as new global parameter `adftools_deployment_state` in factory file.
**Deployment State** speeds up future deployments by identifying objects have been changed since last time.

> The step might be skipped when `IncrementalDeployment = false` OR `DeployGlobalParams = false` in *Publish Options*.
> You'll see warning in the console (log) when only `IncrementalDeployment = true`.

## Step: Deleting objects not in source

💬 In log you'll see line: `STEP: Deleting objects not in source ...`

This process removes all objects from ADF service whom couldn't be found in the source (ADF code).
The mechanism is smart enough to dropping the objects in right order.
Since version 0.30 you can better control which objects you want to omit from removing. Only need to add such objects to `Excludes` list and set flag `DoNotDeleteExcludedObjects` to *true*.

> The step might be skipped when `DeleteNotInSource = false` in *Publish Options*
## Step: Restarting all triggers

💬 In log you'll see line: `STEP: Starting all triggers...`

Restarting all triggers that should be enabled.

> The step might be skipped when `StopStartTriggers = false` in *Publish Options*
## Incremental Deployment

> This is new feature (ver.1.4) in public preview.
Usually the deployment process takes some time as it must go through all object (files) and send them via REST API to be deployed. The more objects in ADF the longer process takes.
In order to speed up the deployment process, you may want to use new switch `IncrementalDeployment` (new in *Publish Options*) to enable smart process of identify and deploy only objects that have been changed since last deployment.

### How it works?
It uses **Deployment State** kept in one of Global Parameters and is save/read to/from ADF service.
When the mode is ON, the process does a few additional steps across entire deployment process:
1. Reads Global Parameters from ADF (when not newly created) to get previous **Deployment State**
2. Identifies which objects are unchanged and excludes them from deployment
3. Calculates MD5 hashes of deployed objects and merges them to previous **Deployment State**
4. Saves **Deployment State** as `adftools_deployment_state` global parameter

### Remember
* Incremental Deployment assumes that no one changes ADF objects manually in the cloud
* You must deploy Global Parameters in order to save Deployment State
* Objects' hashes are calculate after update of properties. If you change config for an object - it will be deploy
* If you want to redeploy all objects again, you've got two options:
* Set `IncrementalDeployment = false` OR
* Delete manually `adftools_deployment_state` global parameter in target ADF service


# Selective deployment, triggers and logic

Publishing only selected objects of ADF is not an easy thing. If you add dependencies between objects and a need of stopping triggers before deploying on top of that - the situation becomes even more difficult. Therefore, not always it might be obvious what would happen during the deployment while you have flags set up, an object exist (or not) in the source and/or a trigger is Enabled (or Disabled) on target ADF service where you deploy to.
Expand Down Expand Up @@ -597,7 +692,7 @@ $params = @{
SubscriptionID = "{Your-subscriptionId-here}"
TenantID = "{Your-tenantId-here}"
ClientID = "SPN-ApplicationId"
ClientSecret = "SPN-Pas$word"
ClientSecret = "SPN-Password"
}
# Example 1
Expand Down
40 changes: 40 additions & 0 deletions adhoc/issue-156/Get-CredentialsViaRestAPI.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
Select-AzSubscription -SubscriptionName 'Microsoft Azure Sponsorship'

$testAdf = 'BigFactorySample2'
$DataFactoryName = "$testAdf-17274af2"
$ResourceGroupName = 'rg-devops-factory'
$adf = Get-AzDataFactoryV2 -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName
$adf

# Retrieve all credentials via API without parsing
$token = Get-AzAccessToken -ResourceUrl 'https://management.azure.com'
$authHeader = @{
'Content-Type' = 'application/json'
'Authorization' = 'Bearer ' + $token.Token
}
$url = "https://management.azure.com$($adf.DataFactoryId)/credentials?api-version=2018-06-01"
$url

# Retrieve credentials one by one via Az.DataFactory module
$ErrorActionPreference = 'Stop'
$r = Invoke-RestMethod -Method Get -Uri $url -Headers $authHeader -ContentType "application/json"
$items = $r.Value
foreach ($i in $items) {
Write-Host "--- Credential: $($i.name) ..."
ConvertTo-Json $i -Depth 50
}

# ------------------
. .\adhoc\~~Load-all-cmdlets-locally.ps1 # Load to this session

$adfi = Get-AzDataFactoryV2 -ResourceGroupName "$ResourceGroupName" -Name "$DataFactoryName"
Write-Host "Azure Data Factory (instance) loaded."
$adfi.DataFactoryId
$adfi.Location

$cr = Get-AzDFV2Credential -adfi $adfi | ToArray
Write-Host ("Credentials: {0} object(s) loaded." -f $cr.Count)
$cr.GetType()
$cr[0].GetType()


43 changes: 43 additions & 0 deletions adhoc/issue-156/publish-credentials.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Select-AzSubscription -SubscriptionName 'Microsoft Azure Sponsorship'
Get-AzContext

. .\adhoc\~~Load-all-cmdlets-locally.ps1 # Load to this session

$currentPath = (.\adhoc\Get-RootPath.ps1)
$testAdf = 'BigFactorySample2'
$testPath = Split-Path $currentPath -Parent | Split-Path -Parent | Join-Path -ChildPath 'test' | Join-Path -ChildPath $testAdf
$testPath

$FileName = "$testPath\credential\credential1.json"
$body = (Get-Content -Path $FileName -Encoding "UTF8" | Out-String)
$json = $body | ConvertFrom-Json


#$resType = Get-AzureResourceType $obj.Type
$DataFactoryName = "$testAdf-17274af2"
$ResourceGroupName = 'rg-devops-factory'
$resType = 'Microsoft.DataFactory/factories/credentials'
$resName = "$DataFactoryName/credential1"

New-AzResource `
-ResourceType $resType `
-ResourceGroupName $ResourceGroupName `
-Name "$resName" `
-ApiVersion "2018-06-01" `
-Properties $json `
-IsFullObject -Force

# ------------------------------------------------------------
Select-AzSubscription -SubscriptionName 'MVP'

# Delete credential
$adfi = Get-AzDataFactoryV2 -ResourceGroupName "$ResourceGroupName" -Name "$DataFactoryName"
Remove-AdfObjectRestAPI -type_plural 'credentials' -name 'credential1' -adfInstance $adfi


# Test: Remove-AdfObjectIfNotInSource
$adfIns = Get-AdfFromService -FactoryName "$DataFactoryName" -ResourceGroupName "$ResourceGroupName"
$adf = Import-AdfFromFolder -FactoryName "$DataFactoryName" -RootFolder "$testPath"
Remove-AdfObjectIfNotInSource -adfSource $adf -adfTargetObj $adfIns.Credentials[0] -adfInstance $adfIns

$adfIns.Credentials[0].Name
69 changes: 69 additions & 0 deletions adhoc/issue-195/deployment-state.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
{
"Algorithm": "MD5",
"LastUpdate": "2023-04-07T20:44:55.2412164Z",
"Deployed": {
"dataset.src_sql_users_df": "004F6DB9C91349820C9AAABA18D6DD03",
"trigger.TR_TumblingWindowType_Demo": "5642E22C5075BDF738D86784B33ABBEF",
"dataset.src_badges": "D2C9A716B28310EB785B8031056AA766",
"linkedService.LS_DataLakeGen1": "825E070ED690D2EA767339EBC6D425A0",
"dataset.src_Users_BlobCsv": "48A22C2BFDD0C03FD189E1D1DBCBE40B",
"linkedService.LS_SQL_Stackoverflow": "B2104892A0CB2473E7A2D4095B43D09F",
"linkedService.LS_AzureKeyVault": "4092683FDAE8590574A91B818D93AA94",
"linkedService.LS_BlobSqlPlayer": "7EBCE98689E65EB927518A18F48CE2CF",
"dataset.DS_Dst_MovieCsvZip": "3EC168293499D28BA0D7303B934A942F",
"dataset.SQLProdsWithHash": "2F078EB5EE690ECA1A957F37C44DA8BC",
"linkedService.AzureTableStorage": "2CB4D93B3D1FADD989DBBA3D390F90D0",
"pipeline.PL_users": "F6F77B172A9DEC5B5994EDA4513FB8D5",
"pipeline.PL_badges": "A62B981973DD4237F8CDE09EF3DB6EAE",
"dataset.output_movies": "1F7FF8AE552D176BD78349F46E32E6BD",
"dataset.blobDimCustomerAccount": "71626CB17AD1005082E7A1CEBA11F4EE",
"dataset.SQLProduct": "A2BB90FBA84BDCA20DE90D5C6FD62D74",
"dataflow.UsersAndBadges": "C1285BC4C046B6D515A4274EE0144F53",
"linkedService.LS_AzureDatabricks": "3128B8955372B818E74783946DD2F385",
"pipeline.SCD-Type1": "20A1D75AA80E72E7C37B524E57099ED6",
"dataset.BadgesStatsByNameBlob_output": "51FB7258CD9F3DF149695039DAFDAA1D",
"dataset.DS_MovieCsvDup": "86E017B69C6AE80B0F6329D77D09D135",
"dataflow.DF_DistinctRows": "59D2C4EB560A9484E80B8F915E9A592D",
"dataset.DS_Src_MovieCsv": "2136BE33588E062807FCF5F3875E8217",
"dataset.blobDimCustomerAttributes": "0FDCB4314FCC7BA42C949F6E5E43F41D",
"IntegrationRuntime.IR-DEV2019-Link": "50A3ECC23F8B5F5357A84C293240BC79",
"pipeline.PL_UsersAndBadges": "55DA9EABFFFF87144D72BA1F4F4958E0",
"pipeline.CopyTableStorage": "2B99CB3E40A67BD89DEF9A6B14709043",
"pipeline.PL_Wait": "1AEACDAE7B24FEED870F8A0696911AA2",
"dataset.AW2016_Product_blob": "E8044670E0DAD41F8D6157D7C7AA792C",
"dataset.DS_Badges_100_mb": "EF02900046DA979A9F2EDD76AAF90086",
"trigger.TR_OnFileCreation": "1619D0D06B88307F8F88607CC2C0DCCA",
"dataflow.ScdType1-hash": "5E192EBB240A09E14A61E4387073C289",
"trigger.TR_for_WaitPipeline": "B621B526050ACB6B2A713844584108D1",
"linkedService.LS_SQLDev19_WWI": "4C7DC85CA4565BB9F35576D92682B3A7",
"linkedService.AzureSqlDatabaseAW2014": "93C3C201A2EEDA800B8FBD4BB93DFF7C",
"dataset.UsersBlob_output2": "3BC69DC2FDB933B9049A5F77F78EC1BF",
"IntegrationRuntime.MyAzureIntegrationRuntime": "E0610FAC63358EB36F4FED278A2B54C1",
"dataset.AW2016_Product_csv": "3F4F2F668D0F2C93779D77ACAF18F3B1",
"pipeline.PL_CopyMovies": "FE944B1E2CB6771AC85D8AC1BD3DFBBF",
"dataset.Sales_Orders": "8155CBD620E996F0FCF3FFBA54B05196",
"factory.SQLPlayerDemo": "EF65E01450D2E2583E1FB485634858BC",
"dataset.AzureTable1": "78BD3B0F8E5AFA5F01A402A78B82D276",
"dataset.blob_TripData_csv_gz": "DB02050E730BCE1F5F64FAF181781E75",
"dataset.DS_YellowV4byDate": "23A5CCDCCC4E27CD7C1F215F802984ED",
"dataflow.users": "6B7118DBDF8B90D3FC463063ED08F33A",
"dataset.AzureTable2": "5ADD72E928CA105FE3CA551B86EB11EB",
"dataset.csv_output3": "402F7D65DFB08C845824546444ABC282",
"pipeline.Copy-json-dynamic-path": "0967E822DFD462098C4885D290ADB458",
"pipeline.PL_PowerBITurkey": "BAFC468EAED13D0089C99243D4334F1B",
"dataset.pqt_output3": "A6B76C98DA8CA6E5948A859B47C15F2B",
"dataflow.DF_UGTurkey": "3A030D67347E840E8C1CC756F14C54FC",
"pipeline.PL_CopyMovies_with_param": "C2AD739CDF97DD1850F945A36476F436",
"dataset.DS_Blob_json": "97944AE0D4811CC7209A5A4FDEC9A742",
"pipeline.PL_DynamicFileNameWithDate": "C1C638EF02A048C64CA7F566ECF0C1C0",
"dataset.dsblobtwitters": "F2491A87DC31C1CDC3D911874944DD8E",
"pipeline.PL_SimpleCopy": "217E9D7E37A5BD9D351F7013107FDF5B",
"linkedService.AzureSqlDatabase1": "E5F86FC51B029B639C42297AEF98ED17",
"dataflow.badgesGroupByName2": "534B837954E1F5325064E2DBF4926BE2",
"IntegrationRuntime.AzureIR": "37C066F01768D4B0779AB1BDFA902A15",
"dataset.src_BadgesBlobWithHeader": "3B191824DD75E3A68C71FB7B7364BB92",
"dataset.src_sql_Users": "44561BC28DAB848468F5477B6A0FC9F8",
"dataflow.CopyFlow": "7415C94D19790FBAD544135FA931EDB2",
"linkedService.LS_SQLPlayer_ADLS2": "7CA282D8C6E1B41D5E1C17807283620B"
}
}
26 changes: 26 additions & 0 deletions adhoc/issue-195/test-195.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Import-Module ".\azure.datafactory.tools.psd1" -Force
#Get-Module

$ErrorActionPreference = 'Stop'
$VerbosePreference = 'Continue'
#$DebugPreference = 'Continue'

#. .\adhoc\~~Load-all-cmdlets-locally.ps1 # Load to this session

$opt = New-AdfPublishOption
#$opt.Excludes.Add('*.*', '')
$opt.Includes.Add('link*.*', '')
$opt.Includes.Add('fac*.*', '')
$opt.IncrementalDeployment = $true
$ResourceGroupName = 'rg-devops-factory'
$DataFactoryName = "adf2-99443"
$RootFolder = "D:\GitHub\SQLPlayer\azure.datafactory.tools\test\adf2"
$Location = "UK South"

Import-Module ".\azure.datafactory.tools.psd1" -Force
$adf = Publish-AdfV2FromJson -RootFolder "$RootFolder" -ResourceGroupName "$ResourceGroupName" -DataFactoryName "$DataFactoryName" -Location "$Location" -Option $opt
$adf


$res = Get-GlobalParam -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName

Loading

0 comments on commit 467cd6e

Please sign in to comment.