Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(kubernetes): improve messaging in the delete (manifest) stage #5605

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import com.netflix.spinnaker.clouddriver.data.task.TaskRepository;
import com.netflix.spinnaker.clouddriver.kubernetes.description.KubernetesCoordinates;
import com.netflix.spinnaker.clouddriver.kubernetes.description.manifest.KubernetesDeleteManifestDescription;
import com.netflix.spinnaker.clouddriver.kubernetes.description.manifest.KubernetesKind;
import com.netflix.spinnaker.clouddriver.kubernetes.op.OperationResult;
import com.netflix.spinnaker.clouddriver.kubernetes.op.handler.KubernetesHandler;
import com.netflix.spinnaker.clouddriver.kubernetes.security.KubernetesCredentials;
Expand Down Expand Up @@ -86,18 +87,28 @@ public OperationResult operate(List<OperationResult> priorOutputs) {
.updateStatus(OP_NAME, "Looking up resource properties for " + c.getKind() + "...");
KubernetesHandler deployer =
credentials.getResourcePropertyRegistry().get(c.getKind()).getHandler();
getTask().updateStatus(OP_NAME, "Calling delete operation...");
if (deployer.kind().equals(KubernetesKind.NONE)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for dynamically registered CRDs (those that are not specified in the kubernetes account configuration)? I think it should work based on previous PRs to fix delete and patch manifest stages for CRDs, but I'd like to double check.

There's a risk of breaking backwards compatibility for an obscure use case: native kubernetes kinds are hardcoded in this file, and we're not keeping it up to date with changes in kubernetes API versions, which is ok, because the way clouddriver works is that if it finds a kind that is not a CRD and is not in that list, it assigns the NONE kind. Then the default implementation of delete uses the name of the manifest for the kubectl call, which becomes something like kubectl delete corekind, which works because it doesn't need the group name to be specified, because it's a core kind.

In the end, this if condition breaks the case of deleting core kinds which are not in the hardcoded list in clouddriver. For example, try deleting a ReplicationController or ResourceQuota.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for dynamically registered CRDs (those that are not specified in the kubernetes account configuration)? I think it should work based on previous PRs to fix delete and patch manifest stages for CRDs, but I'd like to double check.

Yes it does. This test still passes with this PR.

There's a risk of breaking backwards compatibility for an obscure use case: native kubernetes kinds are hardcoded in this file, and we're not keeping it up to date with changes in kubernetes API versions, which is ok, because the way clouddriver works is that if it finds a kind that is not a CRD and is not in that list, it assigns the NONE kind. Then the default implementation of delete uses the name of the manifest for the kubectl call, which becomes something like kubectl delete corekind, which works because it doesn't need the group name to be specified, because it's a core kind.

In the end, this if condition breaks the case of deleting core kinds which are not in the hardcoded list in clouddriver. For example, try deleting a ReplicationController or ResourceQuota.

Hmmm...any thoughts on a good way forward that doesn't break this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think on these options:

  1. Add missing core kubernetes kinds to the hardcoded list (so anything coming from kubectl api-resources in a newer version cluster). One problem is that this is an ongoing effort, if kubernetes adds another core kind in the future and clouddriver is not updated, spinnaker will not be able to delete those kinds.
  2. Catch the KubectlException here when running deployer.delete and parse the error message to know if it is because of a truly missing kind in the cluster. I don't know if the error message is clear enough to be parsed.
  3. Add a new method like isValidKind to KubectlJobExecutor that runs a kubectl api-resources, and call it before a delete. If the kind is valid, register it dynamically like CRDs, so subsequent deletes have the right KubernetesKind instance:
if kind == NONE
    if kubectlJobExcutor.isValidKind
        // register kind, same logic as dynamic CRDs
    else
        throw unknown kind exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these suggestions. Of these, I think the third one has the best chance of success. If we were using a java client library instead of invoking kubectl, perhaps option 2 would be more feasible, but I think it'll be tough to get right as it is. And I agree that the maintenance burden of option 1 is too big.

I started pondering how to implement this. Probably makes sense to discuss in the sig, but using #5334 as a model, it feels like adding code to KubernetesCredentials to periodically call a new KubectlJobExecutor method that runs kubectl api-resources makes sense. Not sure what subclass of KubernetesHandler will make sense...maybe one that already exists, or maybe a new one. At least by name, KubernetesCustomResourceHandler doesn't feel right, but it may do the right thing.

From there, adding a new member and mutator method to GlobalResourcePropertyRegistry and adjusting the get method to use it...and then I'm not sure whether KubernetesKindRegistry needs to know about these new kinds or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds right, the only thing is that probably we can get away without another memoizer periodically calling kubectl api-resources, if we call it on demand when trying to process a kind NONE, or better yet, when creating the instance of KubernetesKind that assigns NONE when there's no known kind in the hardcoded list or in the CRD dynamic list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more link that I think is helpful is to KubectlJobExecutor::kubectlLookupInfo, which KubectlJobExecutor::delete uses to build the kubectl command.

@german-muzquiz I'm questioning this:

Then the default implementation of delete uses the name of the manifest for the kubectl call, which becomes something like kubectl delete corekind, which works because it doesn't need the group name to be specified, because it's a core kind.
because of this code in kubectlLookupInfo:

    if (!Strings.isNullOrEmpty(name)) {
      command.add(kind + "/" + name);
    } else {
      command.add(kind.toString());
    }

Doesn't this mean the kind is important / shows up in the resulting command in addition to the name? Instead of kubectl delete corekind we'd end up with kubectl delete none/corekind? Or maybe I'm missing something. Adding a test is the likely way forward, but still wanted to raise the question here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kind needs to be included in the command, so if the manifest name is not empty the command in the current implementation is kubectl delete corekind/name rather than kubectl delete none/corekind.
This is because the KubernetesKind is a dynamically created kind with the right name but with group NONE, so kind.toString() is corekind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah, ok. I seem to have lost the place where that dynamically created kind comes from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this in the kubernetes sig. See the minutes from 2022-03-08. Because deployer.delete gets the kind from the credentials (as opposed to the c object or c.getKind(), it's not clear how deleting core kinds not mentioned in KubernetesKindProperties::getGlobalKindProperties could ever work...but we generally think it does. Time to add some tests....

Copy link
Member

@mattgogerly mattgogerly Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually just run into a very similar sounding issue. If a custom Kind is not explicitly registered in the credentials definition, and there are no instances of that resource in the account, then the KubernetesUnregisteredCustomResourceHandler handler is used and returns a none kind, and the Delete (Manifest) stage fails.

If there are one or more the resources in the account, the KubernetesCustomResourceHandler is used and this returns the correct kind. This behaviour seems to have been introduced by #5334 and I'm really not sure how best to tackle it.

On the one hand it makes sense that the Delete stage fails if the thing you're trying to delete doesn't exist. On the other hand - does it matter? Perhaps we should introduce an option for the --ignore-not-found kubectl flag in the stage?

// No point in invoking kubectl with a resource type of none, as it fails with
//
// error: the server doesn't have a resource type "none"
//
// Throw an exception to fail the stage, similar to what happens when kubectl fails.
throw new IllegalStateException("Unable to delete unknown kind '" + c.getKind() + "'");
}
getTask().updateStatus(OP_NAME, "Calling delete operation for resource" + c + "...");
result.merge(
deployer.delete(
credentials,
c.getNamespace(),
c.getName(),
description.getLabelSelectors(),
deleteOptions));
getTask()
.updateStatus(OP_NAME, " delete operation completed successfully for " + c.getName());
getTask().updateStatus(OP_NAME, "Delete operation completed successfully for " + c);
});

getTask()
.updateStatus(
OP_NAME, "Delete operation completed successfully for all applicable resources");
return result;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,16 @@

package com.netflix.spinnaker.clouddriver.kubernetes.op;

import static org.assertj.core.api.Assertions.assertThat;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertNull;
import static org.junit.Assert.assertTrue;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.anyString;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.never;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;

Expand All @@ -40,6 +43,7 @@
import com.netflix.spinnaker.clouddriver.kubernetes.description.manifest.KubernetesKind;
import com.netflix.spinnaker.clouddriver.kubernetes.op.handler.KubernetesCustomResourceDefinitionHandler;
import com.netflix.spinnaker.clouddriver.kubernetes.op.handler.KubernetesCustomResourceHandler;
import com.netflix.spinnaker.clouddriver.kubernetes.op.handler.KubernetesDeploymentHandler;
import com.netflix.spinnaker.clouddriver.kubernetes.op.handler.KubernetesHandler;
import com.netflix.spinnaker.clouddriver.kubernetes.op.handler.KubernetesReplicaSetHandler;
import com.netflix.spinnaker.clouddriver.kubernetes.op.handler.KubernetesServiceHandler;
Expand Down Expand Up @@ -75,6 +79,7 @@ public class KubernetesDeleteManifestOperationTest {
private static final GlobalResourcePropertyRegistry resourcePropertyRegistry =
new GlobalResourcePropertyRegistry(
ImmutableList.of(
new KubernetesDeploymentHandler(),
new KubernetesReplicaSetHandler(),
new KubernetesServiceHandler(),
new KubernetesCustomResourceDefinitionHandler()),
Expand Down Expand Up @@ -227,6 +232,53 @@ public void deleteUnregisteredCustomResourceViaLabelSelector() throws IOExceptio
labelSelectorsCaptor.getValue());
}

@Test
public void deleteReallyUnregisteredCustomResourceViaManifestName() throws IOException {
// Beyond not configuring a spinnaker account with a particular kind (which
// is one definition of unregistered), this test exercises what happens when
// spinnaker has no idea about a particular kind, via configuration or
// discovery at runtime. In other words, nothing has called
// resourcePropertyRegistry.updateCrdProperties with information about the
// custom resource this test attempts to delete.
String unknownKind = "unknown.bar.com";
String pipelineJSON =
"{ "
+ " \"account\": \"kubernetes-account\","
+ " \"manifestName\": \""
+ unknownKind
+ " my-unknown-custom-resource\""
+ " }";

Map<String, Object> pipeline = mapper.readValue(pipelineJSON, mapType);

KubernetesDeleteManifestDescription description = buildDeleteManifestDescription(pipeline);

KubernetesCredentials mockKubernetesCreds = description.getCredentials().getCredentials();

IllegalStateException thrown =
assertThrows(
IllegalStateException.class,
() -> new KubernetesDeleteManifestOperation(description).operate(ImmutableList.of()));

assertThat(thrown.getMessage()).contains("Unable to delete unknown kind '" + unknownKind);

// With no handler for the given kind, GlobalResourcePropertyRegistry
// returns KubernetesUnregisteredCustomResourceHandler which has a kind of
// KubernetesKind.NONE. kubectl always fails with an invocation like:
//
// $ kubectl delete none
// error: the server doesn't have a resource type "none"
//
// so better to not invoke it and give a more helpful error message.
verify(mockKubernetesCreds, never())
.delete(
any(KubernetesKind.class),
anyString(),
anyString(),
any(KubernetesSelectorList.class),
any(V1DeleteOptions.class));
}

@Test
public void orphanDependentsTrueTest() throws IOException {
String pipelineJSON =
Expand Down