Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix fan out matrix task failed due to result ref #8327

Closed
wants to merge 1 commit into from

Conversation

chengjoey
Copy link
Member

@chengjoey chengjoey commented Oct 14, 2024

Changes

fix fan out matrix task failed due to result ref

fixes #8324

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • pre-commit Passed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

NONE

@tekton-robot tekton-robot added the release-note-none Denotes a PR that doesnt merit a release note. label Oct 14, 2024
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign jerop after the PR has been reviewed.
You can assign the PR to them by writing /assign @jerop in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 14, 2024
@chengjoey
Copy link
Member Author

/kind bug

@tekton-robot tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 14, 2024
@chengjoey
Copy link
Member Author

/hold

this can successfully fan out the matrix task, but need to determine whether we should do this.

and need to add tests

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 14, 2024
@tekton-robot
Copy link
Collaborator

@chengjoey: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-tekton-pipeline-unit-tests c48ebd6 link true /test pull-tekton-pipeline-unit-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@vdemeester
Copy link
Member

@chengjoey still working on this ?

@chengjoey
Copy link
Member Author

@chengjoey still working on this ?

@vdemeester ,Yes, since I'm changing jobs recently, I will continue this fix when I'm stable.

@fabricebrito
Copy link

+1 on this PR
I have the same behaviour as #8324

@afrittoli
Copy link
Member

+1 on this PR I have the same behaviour as #8324

Thanks @fabricebrito - I'm afraid @chengjoey may not be working on this PR anymore.
Would you be interested in picking this up and getting it ready to merge? We would need to add tests to check that the behaviour has been corrected and avoid future regressions.

@vdemeester do you know what concern @chengjoey had on this if any, see the comment

but need to determine whether we should do this.

@fabricebrito
Copy link

@afrittoli my Go skills are very limited so I don't think I can help here, I'm sorry.
It's just that this PR would allow using the fan-out results from a matrix Task and this is key in our Tekton pipelines.
I'm really looking forward for a fix :-)

@chengjoey
Copy link
Member Author

Hi @afrittoli , I recently went back to review the PR I mentioned before, but have not yet invested in this current PR (the logic is complex). If someone is willing to continue, I will provide assistance. If not, I will increase the priority of this PR.

@l-qing
Copy link
Contributor

l-qing commented Jan 14, 2025

Hi @afrittoli , I recently went back to review the PR I mentioned before, but have not yet invested in this current PR (the logic is complex). If someone is willing to continue, I will provide assistance. If not, I will increase the priority of this PR.

I will follow up on this issue. I should have time to deal with it within two weeks.

@l-qing
Copy link
Contributor

l-qing commented Jan 14, 2025

1. Reproducible steps

$ kubectl version

Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.28.8

$ tkn version

Client version: 0.39.0
Pipeline version: v0.66.0
cat <<'EOF' | kubectl replace -f -
apiVersion: tekton.dev/v1
kind: Task
metadata:
  name: array-emitter
spec:
  results:
  - name: array
    type: array
  steps:
    - name: echo
      image: mirror.gcr.io/alpine
      script: |
        echo -n "[\"linux\",\"max\",\"windows\"]" > $(results.array.path)

---
apiVersion: tekton.dev/v1
kind: Task
metadata:
  name: platform-browsers
spec:
  params:
    - name: platform
  results:
  - name: str
    type: string
  steps:
    - name: echo
      image: mirror.gcr.io/alpine
      script: |
        echo -n "$(params.platform)" | tee $(results.str.path)

---
apiVersion: tekton.dev/v1
kind: Task
metadata:
  name: printer
spec:
  params:
    - name: platform
      default: "default-platform"
    - name: platforms
      default: []
  steps:
    - name: echo
      image: mirror.gcr.io/alpine
      args:
        - "$(params.platforms)"
      script: |
        if [ -z "$(params.platform)" ]; then
          echo "platform: $(params.platform)"
        fi
        if [ $# -gt 0 ]; then
          echo "platforms: $@"
        fi

---
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
  name: matrixed-pr
spec:
  taskRunTemplate:
    serviceAccountName: "default"
  pipelineSpec:
    tasks:

    - name: array-emitter
      taskRef:
        name: array-emitter

    - name: platforms
      params:
        - name: test
          value: test
      matrix:
        params:
          - name: platform
            value: $(tasks.array-emitter.results.array[*])
      taskRef:
        name: platform-browsers

    - name: printer-matrix
      taskRef:
        name: printer
      matrix:
        params:
          - name: platform
            value: $(tasks.platforms.results.str[*])

    - name: printer-all-platforms
      taskRef:
        name: printer
      params:
        - name: platforms
          value: $(tasks.platforms.results.str[*])
EOF

2. Error message

invalid result reference in pipeline task "printer-matrix": unable to validate result referencing pipeline task "platforms": task spec not found

apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
  creationTimestamp: "2025-01-14T07:40:10Z"
  generation: 1
  name: matrixed-pr
  namespace: default
  resourceVersion: "26861"
  uid: feba93ac-759d-4817-8512-700eb8eef059
spec:
  pipelineSpec:
    tasks:
    - name: array-emitter
      taskRef:
        kind: Task
        name: array-emitter
    - matrix:
        params:
        - name: platform
          value: $(tasks.array-emitter.results.array[*])
      name: platforms
      params:
      - name: test
        value: test
      taskRef:
        kind: Task
        name: platform-browsers
    - matrix:
        params:
        - name: platform
          value: $(tasks.platforms.results.str[*])
      name: printer-matrix
      taskRef:
        kind: Task
        name: printer
    - name: printer-all-platforms
      params:
      - name: platforms
        value: $(tasks.platforms.results.str[*])
      taskRef:
        kind: Task
        name: printer
  taskRunTemplate:
    serviceAccountName: default
  timeouts:
    pipeline: 1h0m0s
status:
  completionTime: "2025-01-14T07:40:12Z"
  conditions:
  - lastTransitionTime: "2025-01-14T07:40:12Z"
    message: 'invalid result reference in pipeline task "printer-matrix": unable to
      validate result referencing pipeline task "platforms": task spec not found'
    reason: InvalidTaskResultReference
    status: "False"
    type: Succeeded
  pipelineSpec:
    tasks:
    - name: array-emitter
      taskRef:
        kind: Task
        name: array-emitter
    - matrix:
        params:
        - name: platform
          value: $(tasks.array-emitter.results.array[*])
      name: platforms
      params:
      - name: test
        value: test
      taskRef:
        kind: Task
        name: platform-browsers
    - matrix:
        params:
        - name: platform
          value: $(tasks.platforms.results.str[*])
      name: printer-matrix
      taskRef:
        kind: Task
        name: printer
    - name: printer-all-platforms
      params:
      - name: platforms
        value: $(tasks.platforms.results.str[*])
      taskRef:
        kind: Task
        name: printer
  provenance:
    featureFlags:
      AwaitSidecarReadiness: true
      Coschedule: workspaces
      DisableAffinityAssistant: false
      DisableCredsInit: false
      DisableInlineSpec: ""
      EnableAPIFields: beta
      EnableArtifacts: false
      EnableCELInWhenExpression: false
      EnableConciseResolverSyntax: false
      EnableKeepPodOnCancel: false
      EnableKubernetesSidecar: false
      EnableParamEnum: false
      EnableProvenanceInStatus: true
      EnableStepActions: false
      EnforceNonfalsifiability: none
      MaxResultSize: 4096
      RequireGitSSHSecretKnownHosts: false
      ResultExtractionMethod: termination-message
      RunningInEnvWithInjectedSidecars: true
      SendCloudEventsForRuns: false
      SetSecurityContext: false
      VerificationNoMatchPolicy: ignore
  startTime: "2025-01-14T07:40:12Z"

3. Analysis

a. validateResultRef: unable to validate result referencing pipeline task

if ptMap[ref.PipelineTask].ResolvedTask == nil || ptMap[ref.PipelineTask].ResolvedTask.TaskSpec == nil {

if ptMap[ref.PipelineTask].ResolvedTask == nil || ptMap[ref.PipelineTask].ResolvedTask.TaskSpec == nil {
return fmt.Errorf("unable to validate result referencing pipeline task %q: task spec not found", ref.PipelineTask)
}

b. ValidatePipelineTaskResults: invalid result reference in pipeline task

func ValidatePipelineTaskResults(state PipelineRunState) error {
ptMap := state.ToMap()
for _, rpt := range state {
for _, ref := range v1.PipelineTaskResultRefs(rpt.PipelineTask) {
if err := validateResultRef(ref, ptMap); err != nil {
return pipelineErrors.WrapUserError(fmt.Errorf("invalid result reference in pipeline task %q: %w", rpt.PipelineTask.Name, err))

c. PipelineRun-Reconcile: call ValidatePipelineTaskResults

if pipelineRunFacts.State.IsBeforeFirstTaskRun() {
if err := resources.ValidatePipelineTaskResults(pipelineRunFacts.State); err != nil {
logger.Errorf("Failed to resolve task result reference for %q with error %v", pr.Name, err)
pr.Status.MarkFailed(v1.PipelineRunReasonInvalidTaskResultReference.String(), err.Error())
return controller.NewPermanentError(err)
}

d. pipelineRunFacts.State: come from resolvePipelineState

// Second iteration
pipelineRunState, err = c.resolvePipelineState(ctx, notStartedTasks, pipelineMeta.ObjectMeta, pr, pipelineRunState)
switch {
case errors.Is(err, remote.ErrRequestInProgress):
message := fmt.Sprintf("PipelineRun %s/%s awaiting remote resource", pr.Namespace, pr.Name)
pr.Status.MarkRunning(v1.TaskRunReasonResolvingTaskRef, message)
return nil
case err != nil:
return err
default:
}
// Build PipelineRunFacts with a list of resolved pipeline tasks,
// dag tasks graph and final tasks graph
pipelineRunFacts := &resources.PipelineRunFacts{
State: pipelineRunState,

e. resolvePipelineState: resolvedTask - ResolvePipelineTask

resolvedTask, err := resources.ResolvePipelineTask(ctx,
*pr,
fn,
func(name string) (*v1.TaskRun, error) {
return c.taskRunLister.TaskRuns(pr.Namespace).Get(name)
},
getCustomRunFunc,
task,
pst,
)

f. ResolvePipelineTask: call CountCombinations

numCombinations := 1
// We want to resolve all of the result references and ignore any errors at this point since there could be
// instances where result references are missing here, but will be later skipped and resolved in
// skipBecauseResultReferencesAreMissing. The final validation is handled in CheckMissingResultReferences.
resolvedResultRefs, _, _ := ResolveResultRefs(pst, PipelineRunState{&rpt})
if err := validateArrayResultsIndex(resolvedResultRefs); err != nil {
return nil, err
}
ApplyTaskResults(PipelineRunState{&rpt}, resolvedResultRefs)
if rpt.PipelineTask.IsMatrixed() {
numCombinations = rpt.PipelineTask.Matrix.CountCombinations()
}
if rpt.IsCustomTask() {
rpt.CustomRunNames = getNamesOfCustomRuns(pipelineRun.Status.ChildReferences, pipelineTask.Name, pipelineRun.Name, numCombinations)
for _, runName := range rpt.CustomRunNames {
run, err := getRun(runName)
if err != nil && !kerrors.IsNotFound(err) {
return nil, fmt.Errorf("error retrieving CustomRun %s: %w", runName, err)
}
if run != nil {
rpt.CustomRuns = append(rpt.CustomRuns, run)
}
}
} else {
rpt.TaskRunNames = GetNamesOfTaskRuns(pipelineRun.Status.ChildReferences, pipelineTask.Name, pipelineRun.Name, numCombinations)
for _, taskRunName := range rpt.TaskRunNames {
if err := rpt.setTaskRunsAndResolvedTask(ctx, taskRunName, getTask, getTaskRun, pipelineTask); err != nil {

h1. CountCombinations: Calculate the number of TaskRuns

// CountCombinations returns the count of Combinations of Parameters generated from the Matrix in PipelineTask.
func (m *Matrix) CountCombinations() int {
// Iterate over Matrix Parameters and compute count of all generated Combinations
count := m.countGeneratedCombinationsFromParams()
// Add any additional Combinations generated from Matrix Include Parameters
count += m.countNewCombinationsFromInclude()
return count
}
// countGeneratedCombinationsFromParams returns the count of Combinations of Parameters generated from the Matrix
// Parameters
func (m *Matrix) countGeneratedCombinationsFromParams() int {
if !m.HasParams() {
return 0
}
count := 1
for _, param := range m.Params {
count *= len(param.Value.ArrayVal)
}
return count
}

h2. Error: The calculated count is 0.

func (m *Matrix) countGeneratedCombinationsFromParams() int {
if !m.HasParams() {
return 0
}
count := 1
for _, param := range m.Params {
count *= len(param.Value.ArrayVal)

Because the param.Value.StringVale is $(tasks.platforms.results.str[*]) and param.Value.ArrayVal is empty.

j1. ResolvePipelineTask: call GetNamesOfTaskRuns

rpt.TaskRunNames = GetNamesOfTaskRuns(pipelineRun.Status.ChildReferences, pipelineTask.Name, pipelineRun.Name, numCombinations)
for _, taskRunName := range rpt.TaskRunNames {
if err := rpt.setTaskRunsAndResolvedTask(ctx, taskRunName, getTask, getTaskRun, pipelineTask); err != nil {

j2. GetNamesOfTaskRuns: call getNewRunNames the numberOfRuns is 0, the result taskRunNames is empty

func getNewRunNames(ptName, prName string, numberOfRuns int) []string {
var taskRunNames []string
// If it is a singular TaskRun/CustomRun, we only append the ptName
if numberOfRuns == 1 {
taskRunName := kmeta.ChildName(prName, "-"+ptName)
return append(taskRunNames, taskRunName)
}
// For a matrix we append i to then end of the fanned out TaskRuns "matrixed-pr-taskrun-0"
for i := range numberOfRuns {
taskRunName := kmeta.ChildName(prName, fmt.Sprintf("-%s-%d", ptName, i))
// check if the taskRun name ends with a matrix instance count
if !strings.HasSuffix(taskRunName, fmt.Sprintf("-%d", i)) {
taskRunName = kmeta.ChildName(prName, "-"+ptName)
// kmeta.ChildName limits the size of a name to max of 63 characters based on k8s guidelines
// truncate the name such that "-<matrix-id>" can be appended to the taskRun name
longest := 63 - len(fmt.Sprintf("-%d", numberOfRuns))
taskRunName = taskRunName[0:longest]
taskRunName = fmt.Sprintf("%s-%d", taskRunName, i)
}
taskRunNames = append(taskRunNames, taskRunName)
}
return taskRunNames
}

k. ResolvePipelineTask: setTaskRunsAndResolvedTask has not been called.

rpt.TaskRunNames = GetNamesOfTaskRuns(pipelineRun.Status.ChildReferences, pipelineTask.Name, pipelineRun.Name, numCombinations)
for _, taskRunName := range rpt.TaskRunNames {
if err := rpt.setTaskRunsAndResolvedTask(ctx, taskRunName, getTask, getTaskRun, pipelineTask); err != nil {
return nil, err
}

l. setTaskRunsAndResolvedTask: ResolvedTask has not been set.

func (t *ResolvedPipelineTask) setTaskRunsAndResolvedTask(
ctx context.Context,
taskRunName string,
getTask resources.GetTask,
getTaskRun resources.GetTaskRun,
pipelineTask v1.PipelineTask,
) error {
taskRun, err := getTaskRun(taskRunName)
if err != nil {
if !kerrors.IsNotFound(err) {
return fmt.Errorf("error retrieving TaskRun %s: %w", taskRunName, err)
}
}
if taskRun != nil {
t.TaskRuns = append(t.TaskRuns, taskRun)
}
rt, err := resolveTask(ctx, taskRun, getTask, pipelineTask)
if err != nil {
return err
}
t.ResolvedTask = rt

m. So it led to the error seen at the top.

4. Action

I will add the unit tests and corresponding integration tests later.

@l-qing
Copy link
Contributor

l-qing commented Jan 14, 2025

@chengjoey I am unable to push commits to that branch of your code repository.

Could you either grant me the appropriate permissions or merge this modification into your branch?

main...l-qing:pipeline:fix/matrix-result-ref

@l-qing
Copy link
Contributor

l-qing commented Jan 14, 2025

It seems that this usage might bypass the limitation of DefaultMaxMatrixCombinationsCount.

func (m *Matrix) validateCombinationsCount(ctx context.Context) (errs *apis.FieldError) {
matrixCombinationsCount := m.CountCombinations()
maxMatrixCombinationsCount := config.FromContextOrDefaults(ctx).Defaults.DefaultMaxMatrixCombinationsCount
if matrixCombinationsCount > maxMatrixCombinationsCount {
errs = errs.Also(apis.ErrOutOfBoundsValue(matrixCombinationsCount, 0, maxMatrixCombinationsCount, "matrix"))
}
return errs
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesnt merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

InvalidTaskResultsReference when dynamically generating subsequent tasks in a pipeline using matrix
6 participants