Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DurableTaskClient.PurgeAllInstancesAsync: NotSupportedException and RpcException for .net 8 isolated #282

Open
th3ragex opened this issue Apr 4, 2024 · 20 comments · Fixed by Azure/azure-functions-durable-extension#2802 or Azure/azure-functions-durable-extension#2829
Labels

Comments

@th3ragex
Copy link

th3ragex commented Apr 4, 2024

I am periodically purging instances and old history of my durable function runs. We noticed that this logic stopped working some time ago.

Runtime:
.net 8 isolated azure function

Dependencies:

<PackageReference Include="Microsoft.Azure.Functions.Worker" Version="1.21.0" />
<PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.DurableTask" Version="1.1.2" />
<PackageReference Include="Microsoft.Azure.Functions.Worker.Sdk" Version="1.17.2" />

My TimerTrigger Azure Function which is implemented next to my durable function looks likes this:

    [Function(nameof(OrchestrationCleanupFunction))]
    public async Task Run(
        [TimerTrigger("0 */1 * * * *")] TimerInfo myTimer,
        [DurableClient] DurableTaskClient orchestrationClient)
    {
        var createdTimeFrom = DateTime.UtcNow.Subtract(TimeSpan.FromDays(365 * 5));
        var createdTimeTo = DateTime.UtcNow.Subtract(TimeSpan.FromDays(5));

        var runtimeStatus = new List<OrchestrationRuntimeStatus>
        {
            OrchestrationRuntimeStatus.Completed
        };

        await orchestrationClient.PurgeAllInstancesAsync(new PurgeInstancesFilter(createdTimeFrom, createdTimeTo, runtimeStatus));
    }

This code worked at some point and was initially developed as .Net 7 isolated function.

Now i am getting a System.NotSupportedException: Microsoft.Azure.Functions.Worker.FunctionsDurableTaskClient does not support purging of orchestration instances.

The implementation of DurableTaskClient.PurgeAllInstancesAsync is very confusing.

  • DurableTaskClient is abstract and has multiple virtual overloads of PurgeAllInstancesAsync with a default implementation that throws a System.NotSupportedException.
  • Each derived type does not override all "virtual" variations of PurgeAllInstancesAsync. (GrpcDurableTaskClient & FunctionsDurableTaskClient)

Example:
FunctionsDurableTaskClient implements: Task<PurgeResult> PurgeAllInstancesAsync(PurgeInstancesFilter filter, CancellationToken cancellation)
but not public virtual Task<PurgeResult> PurgeAllInstancesAsync( PurgeInstancesFilter filter, PurgeInstanceOptions? options = null, CancellationToken cancellation = default)

So calling:
client.PurgeAllInstancesAsync(new PurgeInstancesFilter(...), CancellationToken.None); and
client.PurgeAllInstancesAsync(new PurgeInstancesFilter(...));
makes an unexpected difference.

On top, i am getting a RpcException with {"Status(StatusCode="Unknown", Detail="Exception was thrown by handler.")"} locally and on Azure.

I would greatly appreciate some help getting this purge implementation up and running.

@dunxbc
Copy link

dunxbc commented Apr 11, 2024

I've encountered this issue as well. Rolling back Microsoft.Azure.Functions.Worker.Extensions.DurableTask to 1.1.0 fixes the issue, it appears to be a regression in v1.1.2 of that package.

@kemmis
Copy link

kemmis commented Apr 11, 2024

I ran into this too. Getting System.NotSupportedException: 'Microsoft.Azure.Functions.Worker.FunctionsDurableTaskClient does not support purging of orchestration instances.' when trying to call DurableTaskClient.PurgeInstanceAsync(), after upgrading from 1.0.3 -> 1.1.2.

@kemmis
Copy link

kemmis commented Apr 11, 2024

@Fazer01
Copy link

Fazer01 commented Apr 16, 2024

I would like to confirm the issue. Because of this, we cannot delete the orchestrator instances anymore. As these instances contain PII-data and therefore we are not GDPR-compliant anymore.
We've just finished our migration and on the latest package in the DurableTaskClient class the following implementations reside:

public virtual Task<PurgeResult> PurgeInstanceAsync(
    string instanceId, PurgeInstanceOptions? options = null, CancellationToken cancellation = default)
{
    throw new NotSupportedException($"{this.GetType()} does not support purging of orchestration instances.");
}

public virtual Task<PurgeResult> PurgeAllInstancesAsync(
     PurgeInstancesFilter filter, PurgeInstanceOptions? options = null, CancellationToken cancellation = default)
 {
     throw new NotSupportedException($"{this.GetType()} does not support purging of orchestration instances.");
 }

For now, we are falling back to another version (1.1.1 also fixes the issue) as @dunxbc mentioned in hit post above.

@davidmrdavid
Copy link
Member

Thanks all, sorry we did not catch this. I'll see if we can fix this in our upcoming releae.

@davidmrdavid
Copy link
Member

FYI @jviau, and @cgillum (since you were already submitting some changes for .NET isolated already)

@LockTar
Copy link

LockTar commented Apr 26, 2024

For me it was enough to downgrade package Microsoft.Azure.Functions.Worker.Extensions.DurableTask from 1.1.2 to 1.1.1.

@SimonCull
Copy link

SimonCull commented May 16, 2024

This fix doesn't appear to have made it into 1.1.3, any idea when we'll get it? (unless Microsoft.Azure.Functions.Worker.FunctionsDurableTaskClient does not support orchestration termination. is strictly speaking a different issue).

Edit: just had a look at the NotSupportedExceptions are still there for Purge too

Edit 2: Just seen 1.2.3 release notes

@gfgw
Copy link

gfgw commented May 22, 2024

I am running into same sort of problem when upgrading from to the latest version. But then with a NotSupportedException on DurableTaskClient.TerminateInstanceAsync. Will this be fixed as well?

@davidmrdavid
Copy link
Member

I'm able to reproduce the issue with Terminate - I was focused only on the purging issue (as in the title) so I missed Terminate had regressed on that same PR. Sorry folks, I'm looking to see if we can implement some kind of breaking change analyzer in the CI to prevent this in the future (not sure if this is the right approach yet) but I'm on it nonetheless.

@SimonCull
Copy link

@gfgw I'm currently using TerminateInstanceAsync(string instanceId, object? output, CancellationToken cancellation = default(CancellationToken)) as a temporary fix, it is only Task TerminateInstanceAsync(string instanceId, TerminateInstanceOptions? options = null, CancellationToken cancellation = default(CancellationToken)) that is throwing the unintended exception in the latest version

@BrunoCandia
Copy link

Still getting the error in Net 8 (Isolated):

Result: Failure Exception: Grpc.Core.RpcException: Status(StatusCode="Unknown", Detail="Exception was thrown by handler.") at Microsoft.DurableTask.Client.Grpc.GrpcDurableTaskClient.PurgeInstancesCoreAsync(PurgeInstancesRequest request, CancellationToken cancellation) at ...

Dependencies

<PackageReference Include="Microsoft.Azure.Functions.Extensions" Version="1.1.0" /> <PackageReference Include="Microsoft.Azure.Functions.Worker" Version="1.22.0" /> <PackageReference Include="Microsoft.Azure.Functions.Worker.Sdk" Version="1.17.2" /> <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.DurableTask" Version="1.1.4" />

we are getting this issue when executing:

` var filter = new PurgeInstancesFilter(
DateTime.MinValue,
createdToTimeResult,
new List
{
OrchestrationRuntimeStatus.Completed,
OrchestrationRuntimeStatus.Failed,
OrchestrationRuntimeStatus.Terminated
});

await durableTaskClient.PurgeAllInstancesAsync(filter);

`

is there a fix for this?, thanks in advance

@stnick5
Copy link

stnick5 commented Jul 10, 2024

Still getting the error in Net 8 (Isolated):

Result: Failure Exception: Grpc.Core.RpcException: Status(StatusCode="Unknown", Detail="Exception was thrown by handler.") at Microsoft.DurableTask.Client.Grpc.GrpcDurableTaskClient.PurgeInstancesCoreAsync(PurgeInstancesRequest request, CancellationToken cancellation) at ...

Dependencies

<PackageReference Include="Microsoft.Azure.Functions.Extensions" Version="1.1.0" /> <PackageReference Include="Microsoft.Azure.Functions.Worker" Version="1.22.0" /> <PackageReference Include="Microsoft.Azure.Functions.Worker.Sdk" Version="1.17.2" /> <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.DurableTask" Version="1.1.4" />

we are getting this issue when executing:

` var filter = new PurgeInstancesFilter( DateTime.MinValue, createdToTimeResult, new List { OrchestrationRuntimeStatus.Completed, OrchestrationRuntimeStatus.Failed, OrchestrationRuntimeStatus.Terminated });

await durableTaskClient.PurgeAllInstancesAsync(filter);

`

is there a fix for this?, thanks in advance

I was also getting the same thing in .Net 8 Isolated. I've tried to work around it by getting a list of all the completed orchestrator instances and purging them individually. I'm no longer getting that error message, but it's also not finding all of the completed orchestrators.

var createdTimeFrom = DateTime.UtcNow.AddYears(-1);
            var createdTimeTo = DateTime.UtcNow.AddMinutes(-5);
            var taskHubName = _config["FunctionHubName"];

            _logger.LogInformation(
                "Looking for completed orchestrators between {CleanupStartTime} and {CleanupEndTime} in Task Hub {TaskHubName}",
                createdTimeFrom, createdTimeTo, taskHubName);
            
            var instances = starter.GetAllInstancesAsync(new OrchestrationQuery(createdTimeFrom, createdTimeTo,
                new List<OrchestrationRuntimeStatus>
                {
                    OrchestrationRuntimeStatus.Completed
                }, new List<string>
                {
                    taskHubName
                }));

            
            var instanceIdsToDelete = new List<string>();

            await foreach (var page in instances.AsPages())
            {
                foreach (var item in page.Values)
                {
                    instanceIdsToDelete.Add(item.InstanceId);
                }
            }

            if (instanceIdsToDelete.IsEmpty())
                return;

            _logger.LogInformation("Found {OrchestratorCount} completed orchestrators to purge",
                instanceIdsToDelete.Count);

            var pendingDeleteTasks = new List<Task>();
            foreach (var instance in instanceIdsToDelete)
            {
                pendingDeleteTasks.Add(starter.PurgeInstanceAsync(instance));
            }

            await Task.WhenAll(pendingDeleteTasks);

            _logger.LogInformation("Purged {OrchestratorCount} completed orchestrators", instanceIdsToDelete.Count);

@stnick5
Copy link

stnick5 commented Jul 11, 2024

Just been doing some more debugging, it looks like if I remove the reference to the task hub when calling GetAllInstancesAsync() it will return all the completed orchestrators correctly. If I add the reference back in it doesn't find any. I've made sure I'm using the correct task hub name.

This finds all the completed orchestrators.

            var instances = starter.GetAllInstancesAsync(new OrchestrationQuery(createdTimeFrom, createdTimeTo,
                new List<OrchestrationRuntimeStatus>
                {
                    OrchestrationRuntimeStatus.Completed
                }));

This doesn't find any completed orchestrators.

            var instances = starter.GetAllInstancesAsync(new OrchestrationQuery(createdTimeFrom, createdTimeTo,
                new List<OrchestrationRuntimeStatus>
                {
                    OrchestrationRuntimeStatus.Completed
                }
                , new List<string>
                {
                    taskHubName
                }));

@Mivaweb
Copy link

Mivaweb commented Jan 22, 2025

Why is this issue closed? Still an issue in version 1.2.2...

Using .NET 8 isolated worker method PurgeAllInstancesAsync.

Grpc.Core.RpcException: 'Status(StatusCode="Unknown", Detail="Exception was thrown by handler.")'

EDIT

Resolved for now by first doing a GET instances and then run the PurgeInstanceAsync for each individual instance.

@davidmrdavid
Copy link
Member

cc @cgillum, @jviau - in case this is still an issue ^.

@Mivaweb - are you saying that if you perform PurgeAllInstancesAsync without first performing a 'GET' to an instance, you get an exception? Any chance you could describe a minimal repro? Thanks!

@cgillum
Copy link
Member

cgillum commented Jan 22, 2025

@andystaples let’s take a closer look at the issues mentioned here as part of the e2e testing work that you’re doing.

@cgillum cgillum reopened this Jan 22, 2025
@Mivaweb
Copy link

Mivaweb commented Jan 24, 2025

cc @cgillum, @jviau - in case this is still an issue ^.

@Mivaweb - are you saying that if you perform PurgeAllInstancesAsync without first performing a 'GET' to an instance, you get an exception? Any chance you could describe a minimal repro? Thanks!

@davidmrdavid yes correct.

Here is my code sample:

[Function(CleanupStarterName)]
public async Task Start(
    [TimerTrigger("%CleanupStarterSchedule%")] TimerInfo timerInfo,
    [DurableClient] DurableTaskClient starter)
{
    var createdTimeTo = DateTime.UtcNow.Subtract(TimeSpan.FromDays(_esbConfig.InstanceHistoryCleanupInDays));
    var runtimeStatus = new List<OrchestrationRuntimeStatus>
    {
        OrchestrationRuntimeStatus.Completed,
        OrchestrationRuntimeStatus.Failed,
        OrchestrationRuntimeStatus.Terminated
    };

    try
    {
        //var instances = starter.GetAllInstancesAsync(new OrchestrationQuery(null, createdTimeTo, runtimeStatus));
        //var pendingDeleteTasks = (await instances.AsPages().ToListAsync())
        //    .SelectMany(i => i.Values)
        //    .Select(i => i.InstanceId)
        //    .Select(i => starter.PurgeInstanceAsync(i))
        //    .ToList();

        //_logger.Info<CleanupStarterFunction>("Deleting '{Count}' instances.", parameters: [pendingDeleteTasks.Count]);

        //await Task.WhenAll(pendingDeleteTasks);

        var result = await starter.PurgeAllInstancesAsync(new PurgeInstancesFilter(
            null,
            createdTimeTo,
            runtimeStatus));
    }
    catch(Exception ex)
    {
        _logger.Error<CleanupStarterFunction>(ex, "Failed to cleanup instances.");
    }
}

Which results in:

Grpc.Core.RpcException
  HResult=0x80131500
  Message=Status(StatusCode="Unknown", Detail="Exception was thrown by handler.")
  Source=Microsoft.DurableTask.Client.Grpc
  StackTrace:
   at Microsoft.DurableTask.Client.Grpc.GrpcDurableTaskClient.<PurgeInstancesCoreAsync>d__25.MoveNext()
   at ESB.Durable.SDK.Starters.CleanupStarterFunction.<Start>d__3.MoveNext() in ...\CleanupStarterFunction.cs:line 49

  This exception was originally thrown at this call stack:
    [External Code]
    ....CleanupStarterFunction.Start(Microsoft.Azure.Functions.Worker.TimerInfo, Microsoft.DurableTask.Client.DurableTaskClient) in CleanupStarterFunction.cs

@Mivaweb
Copy link

Mivaweb commented Jan 24, 2025

Some more testing results:

  • When I provide a valid smaller CreatedFrom datetime then CreatedTo, it works.
  • When I provide CreatedFrom as DateTime.MinValue I got a argument exception:
System.ArgumentOutOfRangeException
  HResult=0x80131502
  Message=The UTC time represented when the offset is applied must be between year 0 and 10,000. (Parameter 'offset')
  Source=System.Private.CoreLib
  StackTrace:
   at System.DateTimeOffset.ValidateDate(DateTime dateTime, TimeSpan offset)
   at System.DateTimeOffset..ctor(DateTime dateTime)
   at System.DateTimeOffset.op_Implicit(DateTime dateTime)
   at ESB.Durable.SDK.Starters.CleanupStarterFunction.<Start>d__3.MoveNext() in ...\CleanupStarterFunction.cs:line 50

  This exception was originally thrown at this call stack:
    [External Code]
    CleanupStarterFunction.Start(Microsoft.Azure.Functions.Worker.TimerInfo, Microsoft.DurableTask.Client.DurableTaskClient) in CleanupStarterFunction.cs
  • When I provide a value of new DateTime(1900, 1, 1, 0, 0, 0, 0, DateTimeKind.Utc) to CreatedFrom, it works

Also when using the method PurgeInstancesAsync() I have the same behaviour as above. So when createdFrom is null we got an exception.

So I will switch back to the PurgeAllInstancesAsync but with a createdFrom of value '1900-01-01'

@davidmrdavid
Copy link
Member

Thanks for letting us know. @andystaples - please make sure this is tracked, thanks :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet