[Question]: Inconsistent duration swap slots using AzureAppServiceManage@0 for Web Apps on dedicated Linux App Service plan #19273

Ruud2000 · 2023-11-14T09:52:08Z

Task name

AzureAppServiceManage@0

Task version

0.228.1

Environment type (Please select at least one enviroment where you face this issue)

Self-Hosted
Microsoft Hosted
VMSS Pool
Container

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

Windows 2019 datacenter-core-g2

Question

Recently we introduced a staging deployment slot for our Web Apps. So each Web App now has a staging and production slot. All Web Apps run on a dedicated Linux App Service plan (P1v3). Average CPU percentage is between 10 and 20, and average memory percentage around 80.

We now deploy our software to the staging slot and use Azure DevOps task AzureAppServiceManage@0 to swap the staging and production slots. The duration of a swap is not consistent between multiple deployments. Most of the times the duration is between 1m 30s and 2m 30s, but we also have occurrences where a swap takes more than 12 minutes. Especially when multiple Web Apps have a slow swap the pipeline takes a very long time, risking hitting the 60 minutes timeout.

The diagnostics show the swap starts by invoking:

[POST]https://management.azure.com/subscriptions/[redacted]/resourceGroups/Workload.WestEurope/providers/Microsoft.Web/sites/[redacted]/slots/staging/slotsswap?api-version=2016-08-01

Then we see the following call being invoked every 15 seconds, returning a HTTP response 202

[GET]https://management.azure.com/subscriptions/[redacted]/resourceGroups/Workload.WestEurope/providers/Microsoft.Web/sites/[redacted]/slots/staging/operationresults/08021b2d-33b1-4e10-bddb-8ac2b7ebd2cd?api-version=2016-08-01

And eventually after slightly more than 12 minutes this same call returns a HTTP response 200 and the swap is complete.

When we execute a swap in the Azure Portal we never seem to hit a slow swap. Looking at the developer tools in the browser while executing a swap in the portal shows a more current version for the swap API is used. The portal uses slotsswap?api-version=2018-11-01 while AzureAppServiceManage@0 uses api-version=2016-08-01. Could this perhaps explain the inconsistent durations?

I found this question from November 2021 which is almost identical to our situation: https://learn.microsoft.com/en-us/answers/questions/612601/optimizing-cd-pipeline-when-swapping-multiple-web
Unfortunately I have not yet been able to verify if swap times are more consistent when using the AzureCLI@2 task as suggested in the answer to this question, because our self hosted build agent currently has no Azure CLI installed.

The text was updated successfully, but these errors were encountered:

ivanBereznev · 2024-02-02T10:51:20Z

Experiencing the same issue. It takes 4-5 minutes to swap slots for a single web app. Tried both AzureAppServiceManage@0 and AzureCLI@2 and although the latter seems to be slightly faster all the results are still in the same ball park.

211211 · 2024-03-27T15:54:53Z

still facing the same issue on March 2024 with AzureAppServiceManage@0.
Switched to AzureCLI@2 and it works fine.

My command:
az webapp deployment slot swap -g {{your_rs_group}} -n {{app_name}} --slot {{source_slot}} --target-slot {{target_slot}}

devdeer-alex · 2024-04-19T14:58:54Z

I think this is related to Azure slots itself. The task just waits until the slot is swapped and this is sometimes taking a ridicilously long time. Its all over the usual discussions like on SO.

I currently randomly get the following output after 20+ minutes:

Starting: Swap Slot api-dd-alerting
==============================================================================
Task         : Azure App Service manage
Description  : Start, stop, restart, slot swap, slot delete, install site extensions or enable continuous monitoring for an Azure App Service
Version      : 0.238.1
Author       : Microsoft Corporation
Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/deploy/azure-app-service-manage
==============================================================================
Warming-up slots
Swapping App Service '***' slots - 'deploy' and 'production'
Successfully updated deployment History at https://***-deploy.scm.azurewebsites.net/api/deployments/35831713538731739
Successfully updated deployment History at https://***.scm.azurewebsites.net/api/deployments/35831713538731739
##[error]Error: Failed to swap App Service '***' slots - 'deploy' and 'production'. Error: ExpectationFailed - Cannot swap site slots for site '***' because the 'deploy' slot did not respond to http ping. (CODE: 417)
Finishing: Swap Slot api-dd-alerting

P-DHrestak · 2024-04-24T13:59:38Z

still facing the same issue on March 2024 with AzureAppServiceManage@0. Switched to AzureCLI@2 and it works fine.

My command: az webapp deployment slot swap -g {{your_rs_group}} -n {{app_name}} --slot {{source_slot}} --target-slot {{target_slot}}

Tried this solution but using AzureCLI@2 task takes just as long (20+ minutes) as does the AzureAppServiceManage one. The activity log has no useful data in it:

tobias-johansson-nltg · 2024-04-30T08:25:35Z

still facing the same issue on March 2024 with AzureAppServiceManage@0. Switched to AzureCLI@2 and it works fine.
My command: az webapp deployment slot swap -g {{your_rs_group}} -n {{app_name}} --slot {{source_slot}} --target-slot {{target_slot}}

Tried this solution but using AzureCLI@2 task takes just as long (20+ minutes) as does the AzureAppServiceManage one. The activity log has no useful data in it:

We also tried this but with the same result as using AzureAppServiceManage@0. Is there no way of getting more information of what it is actually doing? Our deploy pipeline contains quite a few steps, including creating and deleting a database copy, and the two swaps are the steps that take by far the most time :)

omer-glazer · 2024-04-30T08:47:58Z

Same here.
Our deployment swap takes up to 11 minutes, with no visible reason (activity logs/output logs).

chrisflem · 2024-04-30T10:17:11Z

Same here.
I have noticed that if I access the slot in a browser, the swap completes shortly after. Is there a bug in the code calling the slot, since it works when I do it manually ?

goodmanmd · 2024-05-13T14:02:06Z

We ran into this during a deploy last night. Both slots were accessible via browser and yet the task timed out after 23 minutes (!) with this error:

Error: Failed to swap App Service 'xxx' slots - 'staging' and 'production'. Error: ExpectationFailed - Cannot swap site slots for site 'xxx' because the 'staging' slot did not respond to http ping. (CODE: 417)

Is the task actually looking at HTTP rather than HTTPS? If so, that could explain what's going on. Our site redirects HTTP => HTTPS and therefore would not be returning a 2xx response code for any HTTP request if that's what the script is looking for to determine success. Even if it's not using HTTP, in our app, all requests to / would redirect to an auth screen so the same issue could still apply.

FWIW we fell back to manually swapping slots for our apps via the portal and those processed successfully within 30-60 seconds.

DennisJensen95 · 2024-05-14T05:39:36Z

We ran into this during a deploy last night. Both slots were accessible via browser and yet the task timed out after 23 minutes (!) with this error:

Error: Failed to swap App Service 'xxx' slots - 'staging' and 'production'. Error: ExpectationFailed - Cannot swap site slots for site 'xxx' because the 'staging' slot did not respond to http ping. (CODE: 417)

Is the task actually looking at HTTP rather than HTTPS? If so, that could explain what's going on. Our site redirects HTTP => HTTPS and therefore would not be returning a 2xx response code for any HTTP request if that's what the script is looking for to determine success. Even if it's not using HTTP, in our app, all requests to / would redirect to an auth screen so the same issue could still apply.

FWIW we fell back to manually swapping slots for our apps via the portal and those processed successfully within 30-60 seconds.

Besides the varying deployment times, which we also experience. We are also experiencing like you @goodmanmd the same stochastic timeout, if you then rerun it it succeeds. There are no indiciations of why this happens. We are using AzureCLI@2 for the swap operation. How are you doing to swap @goodmanmd?

goodmanmd · 2024-05-16T19:07:33Z

@DennisJensen95 for this particular app our deploys are infrequent - perhaps once or twice a year. In this case we fell back to swapping the slots via the Azure Portal as it was only 3 applications with 2 swaps each (staging, production, last-known-good).

Edit: Re-reading the question and I think you may be asking what method we're using for the automated swap in our pipeline -- we are currently using AzureAppServiceManage@0

ash-skelton · 2024-05-17T07:46:28Z

Would be great if Microsoft acknowledged this. We are seeing the exact same thing (using AzureAppServiceManager@0). It's happening sporadically across a few of our apps but it has definitely been getting worse.

pumacln · 2024-06-03T16:38:43Z

@DennisJensen95
@goodmanmd

I am having the same issue.
We use Azure PowerShell via Octopus Deploy to Start / Stop / Swap slots.

#Start the Staging Slot Start-AzWebAppSlot
-ResourceGroupName "#{ResourceGroup}" -Name "#{Website}"
-Slot "Staging"
#Swap the staging slot into production
Switch-AzWebAppSlot -ResourceGroupName "#{ResourceGroup}"
-Name "#{Website}" -SourceSlotName "Staging"
-DestinationSlotName "Production"
#Stop the Staging Slot
Stop-AzWebAppSlot -ResourceGroupName "#{ResourceGroup}"
-Name "#{Website}" -Slot "Staging"

The behavior is the same, sometimes the swap operation will just time out. Re-running works 99% of the time.

Where is @microsoft or @Azure support?

Saturate · 2024-06-04T10:36:56Z

We also see random swaps taking 20 minutes plus for a nodejs application. Sometime they timeout, rerunning works often.

rvvincelli · 2024-08-08T09:15:59Z

We're having this too, but the bad thing is that sometimes, when the swap fails the staging slot is left corrupted: the envvars from the prod slot get poured into the staging slot. Such a swap should be a transaction, contacted the Azure team on this one but they were to unable or even acknowledge the issue.

StephenWBertrand · 2024-08-20T13:38:36Z

We are seeing where the swaps take along time, but also it basically locks up the production slot and requests just start taking forever or just dropped. Normally cpu is under 10% all the time, then just for a deployment we jump to 50%, which is still plenty of headroom, but then something just sort of hangs for a bit. Sometimes the swaps is successfully, but the site seems down for a few minutes and sometimes and swaps doesnt work, and the old version pops back up after a few minutes of the site appearing down.

Kind of goes against the whole no downtime idea of using slots :)

goleafs · 2024-08-22T03:33:46Z

Very similar issues have started for us. Deployments/swaps succeeded fine with no interuption, although sometimes slower than others.

We host 2 web apps on one app service plan. Now when one is trying to swap staging to production, not only will it fail, it ends up taking down the other site due to shared app service plan and throttling resources. At this point everything is dead until instances can be restarted.

As stated, everything had been working flawlessly, this seems due to some internal MS change.

Would the AzureCli help here? don't see why. Could remove the swap from pipeline and try it manually, but what kind of automation is that?

michalkrzych · 2024-09-06T14:43:37Z

We had a similar issue with slot swapping and spoke to Microsoft about it. We have been told to try this:

Please add the below to setting to the app settings on the staging slot:

WEBSITE_SWAP_WARMUP_PING_PATH=/ : This setting is warming the staging slot in order to complete the swap operation. I kindly suggest adding this to the staging slot first and check if it resolves the issue. If not, then along with this setting add below as well:
WEBSITE_OVERRIDE_STICKY_DIAGNOSTICS_SETTINGS=0

Please refer to blow articles to get more information about swapping slot.
A Subtle Gotcha with Azure Deployment Slots and ASP.NET Core | You’ve Been Haacked
Set up staging environments - Azure App Service | Microsoft Learn

In our case, the first option has magically fixed the issue with slot swapping - well, we've only been able to observe this for the last 2 or 3 days but haven't seen any errors yet.

FrancescoBonizzi · 2024-09-13T08:07:53Z

Same problem here.
@michalkrzych, just a question. You said that you edited this value: WEBSITE_SWAP_WARMUP_PING_PATH, but the default is /. I didn't understand how you changed it!

Thanks

michalkrzych · 2024-09-19T08:02:27Z

Same problem here. @michalkrzych, just a question. You said that you edited this value: WEBSITE_SWAP_WARMUP_PING_PATH, but the default is /. I didn't understand how you changed it!

Thanks

Apologies for the confusion. I have only added this setting.

BTW. these settings haven't made an impact on our slot swapping issue. It's still happening and we are still troubleshooting with Microsoft. Next thing to check is the app start up in the slot, apparently when using vnets, kvs, managed identities, some of the settings aren't being copied from the parent slot so have to be added manually to ensure the app can start up without errors in order for the slot swapping to succeed.

rvvincelli · 2024-09-19T09:08:25Z

Same problem here. @michalkrzych, just a question. You said that you edited this value: WEBSITE_SWAP_WARMUP_PING_PATH, but the default is /. I didn't understand how you changed it!
Thanks

Apologies for the confusion. I have only added this setting.

BTW. these settings haven't made an impact on our slot swapping issue. It's still happening and we are still troubleshooting with Microsoft. Next thing to check is the app start up in the slot, apparently when using vnets, kvs, managed identities, some of the settings aren't being copied from the parent slot so have to be added manually to ensure the app can start up without errors in order for the slot swapping to succeed.

hi @michalkrzych ! Honestly, we gave up on this, after a lot of inconclusive debugging with the Azure/Mindtree teams, and resorted to preemptively patch the staging slot. So basically, everytime the slot swap fails, we give:

FIX-STAGING-SLOT:
    if: ${{ failure() && ((github.event_name == 'push' || github.event_name == 'workflow_dispatch') && github.ref == 'refs/heads/master-php8') }}
    needs: [SWAP-STAGING-TO-PRODUCTION]
    runs-on: ubuntu-latest
[...]

notice the failure() condition together with the needs, so that the Github Actions job only runs in case the swap fails; and the command (to repeat for each envvar that's slot-specific):

az webapp config appsettings set --resource-group ${{ vars.RESOURCE_GROUP }} --name ${{ vars.WEBAPP_NAME }} --slot ${{ vars.SLOT_NAME }} --slot-settings APP_DEBUG="${{ vars.APP_DEBUG }}"
echo "Setting environment variable: APP_ENV"

The issue is: a slot swap is not a transaction. In our case, sometimes it just hangs and fails because of some internal issues we still have to address (even with those ping health envvars etc), but no matter what: a swap should be a transaction and it is not. The team kind of avoided acknowledging this, but it is evident: swap fails, staging slot is left with the prod envvars.

In particular, what happens is that the staging slot gets corrupted because its envvars get overwritten with the envvar values from the prod slot, rendering the staging slot unusable (especially if you have kv-backed envvars on segregated keyvaults) and effectively breaking the whole green-blue swap thing. According to your scheme (e.g. canary with % traffic), it breaks prod too.

devdeer-alex · 2024-09-27T23:36:41Z

@Ruud2000 Just wondering which SKU your App Service Plan is running on? We've just moved from deprecated S1 to P0v3 (50 bucks more per month 😒). This solved timing issues so far.

jcrichlake · 2024-10-03T14:31:31Z

Any update on this? Our team is having this issue as well. At the very least could the CLI be updated to not hang for 10+ minutes?

jcrichlake · 2024-10-03T14:47:27Z

@Ruud2000 Just wondering which SKU your App Service Plan is running on? We've just moved from deprecated S1 to P0v3 (50 bucks more per month 😒). This solved timing issues so far.

We've been on a premium SKU but are still having this issue 😞

Ruud2000 · 2024-10-03T15:07:06Z

@Ruud2000 Just wondering which SKU your App Service Plan is running on? We've just moved from deprecated S1 to P0v3 (50 bucks more per month 😒). This solved timing issues so far.

We run on P1v3

KryptoBeard · 2024-11-07T21:06:31Z

We are also running into this issue. 10+ minutes for a simple app service swap...

ampandres · 2024-11-07T22:02:08Z

During the operation, we are experiencing the same inconsistent slow slot swaps + high CPU and Memory (80-90 percent). We are using p1v3.

rvvincelli · 2024-11-11T09:51:32Z

Another thing helped us here... make sure you do not perform any az config operations right before launching the swap. Some of them (e.g. updating the SCM whitelist) do result in a soft restart of the instances: we noticed great improvements after we added some sleeps in between.

zdenek-jelinek · 2024-11-12T10:30:21Z

I'm observing the exact same behavior as per the above post - my pipeline applies infrastructure as code changes that always contain differences (due to App Service changing the App Insights connection string, another story...) and this leads to swap being stuck approx. 50 % of the time.

Restarting the App Service swap source slot manually in Azure Portal helps. So I added stop + start the source slot in the pipeline and it gets stuck much less frequently.

I'm writing this because some people in this thread mentioned changing configuration properties helped while others state it did not. Changing the configuration causes a restart which may result in the swap working again so it may appear that a configuration change helped but it probably did not. I have all of the properties mentioned in this thread set up already.

Also, I do think this is not an issue with the pipelines task but with Linux App Service itself. The same behavior happens if I use Azure CLI directly.

I want to try some more things in the deployment pipeline like waiting for a bit or polling the URL and see what happens. Will report back if I find anything that helps.

I'm getting this on both P0v3 and P1v3.

@rvvincelli Could you share your sleep durations, please? Have you tried different values? Did you observe issues?

rvvincelli · 2024-11-13T09:15:18Z

Hi @zdenek-jelinek !

We are sleeping for two minutes after each az config change. After interleaving all these sleeps the rollout duration got 10% slower, but we are almost never encountering issues.

I think in this thread two issues got intertwined:

one thing is the slot failure failing altogether because of un-readiness issues (possibly addressed with avoiding az config changes etc)
another is that in case there's no warmup path (or it's not giving a 2xx), the incoming slot doesn't get swapped in for good

Finally, no matter the scenario, it shouldn't be the case that the slot is left inconsistent (i.e. envvars gets mixed) but sometimes that happens too.

v-gayatrij · 2024-12-05T07:29:31Z

@Ruud2000 , Thanks for reporting this. For further investigation, could you please share complete debug logs by setting variable system.debug = true

Ruud2000 · 2024-12-08T14:41:09Z

@Ruud2000 , Thanks for reporting this. For further investigation, could you please share complete debug logs by setting variable system.debug = true

Unfortunately I cannot since I'm no longer working at the client where we faced this issue, But since more people are experiencing the same issue, hopefully someone will be able to provide debug logs.

zdenek-jelinek · 2024-12-30T19:17:34Z

I have stopped reproducing this issue around Nov 19th, coinciding with App Service maintenance in West Europe where my instance is located.

I have removed the manual restart steps that helped mitigate the issue and still did not manage to reproduce the issue for several days.

FrancescoBonizzi · 2025-01-22T10:55:32Z

@zdenek-jelinek You are saying that now it swaps fast?

zdenek-jelinek · 2025-01-22T11:40:41Z

@FrancescoBonizzi I am consistently seeing swaps take 1:30 - 2 min for P0V3 and P1V3 Linux App Services right after deploying a new artifact into the source slot, whereas previously (i.e. before Nov 2024) they got stuck until the pipeline timed out more often than not.

FrancescoBonizzi · 2025-01-22T14:05:38Z

Thanks @zdenek-jelinek. I was trying before 2024 and I had to rollback everything, now it seems time to try again.

Ruud2000 added the help wanted label Nov 14, 2023

github-actions bot added Area: Release triage labels Nov 14, 2023

[Question]: Inconsistent duration swap slots using AzureAppServiceManage@0 for Web Apps on dedicated Linux App Service plan #19273

[Question]: Inconsistent duration swap slots using AzureAppServiceManage@0 for Web Apps on dedicated Linux App Service plan #19273

Comments

Ruud2000 commented Nov 14, 2023 • edited by v-gayatrij Loading

Task name

Task version

Environment type (Please select at least one enviroment where you face this issue)

Azure DevOps Server type

Azure DevOps Server Version (if applicable)

Operation system

Question

ivanBereznev commented Feb 2, 2024

211211 commented Mar 27, 2024 • edited Loading

devdeer-alex commented Apr 19, 2024 • edited Loading

P-DHrestak commented Apr 24, 2024

tobias-johansson-nltg commented Apr 30, 2024

omer-glazer commented Apr 30, 2024

chrisflem commented Apr 30, 2024

goodmanmd commented May 13, 2024

DennisJensen95 commented May 14, 2024

goodmanmd commented May 16, 2024 • edited Loading

ash-skelton commented May 17, 2024

pumacln commented Jun 3, 2024

Saturate commented Jun 4, 2024 • edited Loading

rvvincelli commented Aug 8, 2024

StephenWBertrand commented Aug 20, 2024

goleafs commented Aug 22, 2024

michalkrzych commented Sep 6, 2024

FrancescoBonizzi commented Sep 13, 2024

michalkrzych commented Sep 19, 2024

rvvincelli commented Sep 19, 2024 • edited Loading

devdeer-alex commented Sep 27, 2024

jcrichlake commented Oct 3, 2024

jcrichlake commented Oct 3, 2024

Ruud2000 commented Oct 3, 2024

KryptoBeard commented Nov 7, 2024

ampandres commented Nov 7, 2024

rvvincelli commented Nov 11, 2024 • edited Loading

zdenek-jelinek commented Nov 12, 2024 • edited Loading

rvvincelli commented Nov 13, 2024

v-gayatrij commented Dec 5, 2024

Ruud2000 commented Dec 8, 2024

zdenek-jelinek commented Dec 30, 2024 • edited Loading

FrancescoBonizzi commented Jan 22, 2025 • edited Loading

zdenek-jelinek commented Jan 22, 2025 • edited Loading

FrancescoBonizzi commented Jan 22, 2025

Ruud2000 commented Nov 14, 2023 •

edited by v-gayatrij

Loading

211211 commented Mar 27, 2024 •

edited

Loading

devdeer-alex commented Apr 19, 2024 •

edited

Loading

goodmanmd commented May 16, 2024 •

edited

Loading

Saturate commented Jun 4, 2024 •

edited

Loading

rvvincelli commented Sep 19, 2024 •

edited

Loading

rvvincelli commented Nov 11, 2024 •

edited

Loading

zdenek-jelinek commented Nov 12, 2024 •

edited

Loading

zdenek-jelinek commented Dec 30, 2024 •

edited

Loading

FrancescoBonizzi commented Jan 22, 2025 •

edited

Loading

zdenek-jelinek commented Jan 22, 2025 •

edited

Loading