-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Inconsistent duration swap slots using AzureAppServiceManage@0 for Web Apps on dedicated Linux App Service plan #19273
Comments
Experiencing the same issue. It takes 4-5 minutes to swap slots for a single web app. Tried both |
still facing the same issue on March 2024 with My command: |
I think this is related to Azure slots itself. The task just waits until the slot is swapped and this is sometimes taking a ridicilously long time. Its all over the usual discussions like on SO. I currently randomly get the following output after 20+ minutes:
|
Same here. |
Same here. |
We ran into this during a deploy last night. Both slots were accessible via browser and yet the task timed out after 23 minutes (!) with this error:
Is the task actually looking at HTTP rather than HTTPS? If so, that could explain what's going on. Our site redirects HTTP => HTTPS and therefore would not be returning a 2xx response code for any HTTP request if that's what the script is looking for to determine success. Even if it's not using HTTP, in our app, all requests to / would redirect to an auth screen so the same issue could still apply. FWIW we fell back to manually swapping slots for our apps via the portal and those processed successfully within 30-60 seconds. |
Besides the varying deployment times, which we also experience. We are also experiencing like you @goodmanmd the same stochastic timeout, if you then rerun it it succeeds. There are no indiciations of why this happens. We are using AzureCLI@2 for the swap operation. How are you doing to swap @goodmanmd? |
@DennisJensen95 for this particular app our deploys are infrequent - perhaps once or twice a year. In this case we fell back to swapping the slots via the Azure Portal as it was only 3 applications with 2 swaps each (staging, production, last-known-good). Edit: Re-reading the question and I think you may be asking what method we're using for the automated swap in our pipeline -- we are currently using AzureAppServiceManage@0 |
Would be great if Microsoft acknowledged this. We are seeing the exact same thing (using AzureAppServiceManager@0). It's happening sporadically across a few of our apps but it has definitely been getting worse. |
I am having the same issue.
The behavior is the same, sometimes the swap operation will just time out. Re-running works 99% of the time. Where is @microsoft or @Azure support? |
We also see random swaps taking 20 minutes plus for a nodejs application. Sometime they timeout, rerunning works often. |
We're having this too, but the bad thing is that sometimes, when the swap fails the staging slot is left corrupted: the envvars from the prod slot get poured into the staging slot. Such a swap should be a transaction, contacted the Azure team on this one but they were to unable or even acknowledge the issue. |
We are seeing where the swaps take along time, but also it basically locks up the production slot and requests just start taking forever or just dropped. Normally cpu is under 10% all the time, then just for a deployment we jump to 50%, which is still plenty of headroom, but then something just sort of hangs for a bit. Sometimes the swaps is successfully, but the site seems down for a few minutes and sometimes and swaps doesnt work, and the old version pops back up after a few minutes of the site appearing down. Kind of goes against the whole no downtime idea of using slots :) |
Very similar issues have started for us. Deployments/swaps succeeded fine with no interuption, although sometimes slower than others. We host 2 web apps on one app service plan. Now when one is trying to swap staging to production, not only will it fail, it ends up taking down the other site due to shared app service plan and throttling resources. At this point everything is dead until instances can be restarted. As stated, everything had been working flawlessly, this seems due to some internal MS change. Would the AzureCli help here? don't see why. Could remove the swap from pipeline and try it manually, but what kind of automation is that? |
We had a similar issue with slot swapping and spoke to Microsoft about it. We have been told to try this: Please add the below to setting to the app settings on the staging slot:
Please refer to blow articles to get more information about swapping slot. In our case, the first option has magically fixed the issue with slot swapping - well, we've only been able to observe this for the last 2 or 3 days but haven't seen any errors yet. |
Same problem here. Thanks |
Apologies for the confusion. I have only added this setting. BTW. these settings haven't made an impact on our slot swapping issue. It's still happening and we are still troubleshooting with Microsoft. Next thing to check is the app start up in the slot, apparently when using vnets, kvs, managed identities, some of the settings aren't being copied from the parent slot so have to be added manually to ensure the app can start up without errors in order for the slot swapping to succeed. |
hi @michalkrzych ! Honestly, we gave up on this, after a lot of inconclusive debugging with the Azure/Mindtree teams, and resorted to preemptively patch the staging slot. So basically, everytime the slot swap fails, we give: FIX-STAGING-SLOT:
if: ${{ failure() && ((github.event_name == 'push' || github.event_name == 'workflow_dispatch') && github.ref == 'refs/heads/master-php8') }}
needs: [SWAP-STAGING-TO-PRODUCTION]
runs-on: ubuntu-latest
[...] notice the az webapp config appsettings set --resource-group ${{ vars.RESOURCE_GROUP }} --name ${{ vars.WEBAPP_NAME }} --slot ${{ vars.SLOT_NAME }} --slot-settings APP_DEBUG="${{ vars.APP_DEBUG }}"
echo "Setting environment variable: APP_ENV" The issue is: a slot swap is not a transaction. In our case, sometimes it just hangs and fails because of some internal issues we still have to address (even with those ping health envvars etc), but no matter what: a swap should be a transaction and it is not. The team kind of avoided acknowledging this, but it is evident: swap fails, staging slot is left with the prod envvars. In particular, what happens is that the staging slot gets corrupted because its envvars get overwritten with the envvar values from the prod slot, rendering the staging slot unusable (especially if you have kv-backed envvars on segregated keyvaults) and effectively breaking the whole green-blue swap thing. According to your scheme (e.g. canary with % traffic), it breaks prod too. |
@Ruud2000 Just wondering which SKU your App Service Plan is running on? We've just moved from deprecated |
Any update on this? Our team is having this issue as well. At the very least could the CLI be updated to not hang for 10+ minutes? |
We've been on a premium SKU but are still having this issue 😞 |
We run on |
We are also running into this issue. 10+ minutes for a simple app service swap... |
During the operation, we are experiencing the same inconsistent slow slot swaps + high CPU and Memory (80-90 percent). We are using p1v3. |
Another thing helped us here... make sure you do not perform any |
I'm observing the exact same behavior as per the above post - my pipeline applies infrastructure as code changes that always contain differences (due to App Service changing the App Insights connection string, another story...) and this leads to swap being stuck approx. 50 % of the time. Restarting the App Service swap source slot manually in Azure Portal helps. So I added stop + start the source slot in the pipeline and it gets stuck much less frequently. I'm writing this because some people in this thread mentioned changing configuration properties helped while others state it did not. Changing the configuration causes a restart which may result in the swap working again so it may appear that a configuration change helped but it probably did not. I have all of the properties mentioned in this thread set up already. Also, I do think this is not an issue with the pipelines task but with Linux App Service itself. The same behavior happens if I use Azure CLI directly. I want to try some more things in the deployment pipeline like waiting for a bit or polling the URL and see what happens. Will report back if I find anything that helps. I'm getting this on both P0v3 and P1v3. @rvvincelli Could you share your sleep durations, please? Have you tried different values? Did you observe issues? |
Hi @zdenek-jelinek ! We are sleeping for two minutes after each I think in this thread two issues got intertwined:
Finally, no matter the scenario, it shouldn't be the case that the slot is left inconsistent (i.e. envvars gets mixed) but sometimes that happens too. |
@Ruud2000 , Thanks for reporting this. For further investigation, could you please share complete debug logs by setting variable system.debug = true |
Unfortunately I cannot since I'm no longer working at the client where we faced this issue, But since more people are experiencing the same issue, hopefully someone will be able to provide debug logs. |
I have stopped reproducing this issue around Nov 19th, coinciding with App Service maintenance in West Europe where my instance is located. I have removed the manual restart steps that helped mitigate the issue and still did not manage to reproduce the issue for several days. |
@zdenek-jelinek You are saying that now it swaps fast? |
@FrancescoBonizzi I am consistently seeing swaps take 1:30 - 2 min for P0V3 and P1V3 Linux App Services right after deploying a new artifact into the source slot, whereas previously (i.e. before Nov 2024) they got stuck until the pipeline timed out more often than not. |
Thanks @zdenek-jelinek. I was trying before 2024 and I had to rollback everything, now it seems time to try again. |
Task name
AzureAppServiceManage@0
Task version
0.228.1
Environment type (Please select at least one enviroment where you face this issue)
Azure DevOps Server type
dev.azure.com (formerly visualstudio.com)
Azure DevOps Server Version (if applicable)
No response
Operation system
Windows 2019 datacenter-core-g2
Question
Recently we introduced a staging deployment slot for our Web Apps. So each Web App now has a staging and production slot. All Web Apps run on a dedicated Linux App Service plan (P1v3). Average CPU percentage is between 10 and 20, and average memory percentage around 80.
We now deploy our software to the
staging
slot and use Azure DevOps taskAzureAppServiceManage@0
to swap thestaging
andproduction
slots. The duration of a swap is not consistent between multiple deployments. Most of the times the duration is between 1m 30s and 2m 30s, but we also have occurrences where a swap takes more than 12 minutes. Especially when multiple Web Apps have a slow swap the pipeline takes a very long time, risking hitting the 60 minutes timeout.The diagnostics show the swap starts by invoking:
[POST]https://management.azure.com/subscriptions/[redacted]/resourceGroups/Workload.WestEurope/providers/Microsoft.Web/sites/[redacted]/slots/staging/slotsswap?api-version=2016-08-01
Then we see the following call being invoked every 15 seconds, returning a HTTP response 202
[GET]https://management.azure.com/subscriptions/[redacted]/resourceGroups/Workload.WestEurope/providers/Microsoft.Web/sites/[redacted]/slots/staging/operationresults/08021b2d-33b1-4e10-bddb-8ac2b7ebd2cd?api-version=2016-08-01
And eventually after slightly more than 12 minutes this same call returns a HTTP response 200 and the swap is complete.
When we execute a swap in the Azure Portal we never seem to hit a slow swap. Looking at the developer tools in the browser while executing a swap in the portal shows a more current version for the swap API is used. The portal uses
slotsswap?api-version=2018-11-01
whileAzureAppServiceManage@0
usesapi-version=2016-08-01
. Could this perhaps explain the inconsistent durations?I found this question from November 2021 which is almost identical to our situation: https://learn.microsoft.com/en-us/answers/questions/612601/optimizing-cd-pipeline-when-swapping-multiple-web
Unfortunately I have not yet been able to verify if swap times are more consistent when using the
AzureCLI@2
task as suggested in the answer to this question, because our self hosted build agent currently has no Azure CLI installed.The text was updated successfully, but these errors were encountered: