Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in terraformPluginSDKExternal causing provider restarts #472

Open
jake-ciolek opened this issue Feb 19, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@jake-ciolek
Copy link
Contributor

What happened?

This has been previously reported here and here. Unfortunately, it hasn't been resolved yet.

This issue is prevalent in big environments where the provider manages many resources. The end result is that the Upjet provider pod restarts and affects stability in large Crossplane-managed environments using Upjet generated providers.

How can we reproduce it?

This is observed in a large Upjet-managed environment, for example one with a few hundred SQS Queues.

Root cause

I looked into this a bit and ran a race detector on Upjet itself, it fails with:

==================
WARNING: DATA RACE
Write at 0x000103c39580 by goroutine 82:
  github.com/crossplane/upjet/pkg/resource/fake.(*Observable).SetObservation()
      /Users/[email protected]/25feb/upjet/pkg/resource/fake/terraformed.go:32 +0x3c
  github.com/crossplane/upjet/pkg/resource/fake.(*Terraformed).SetObservation()
      <autogenerated>:1 +0x20
  github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKExternal).Update()
      /Users/[email protected]/25feb/upjet/pkg/controller/external_tfpluginsdk.go:718 +0x530
  github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Update.func1()
      /Users/[email protected]/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:199 +0x2a0

Previous write at 0x000103c39580 by goroutine 79:
  github.com/crossplane/upjet/pkg/resource/fake.(*Observable).SetObservation()
      /Users/[email protected]/25feb/upjet/pkg/resource/fake/terraformed.go:32 +0x3c
  github.com/crossplane/upjet/pkg/resource/fake.(*Terraformed).SetObservation()
      <autogenerated>:1 +0x20
  github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKExternal).Create()
      /Users/[email protected]/25feb/upjet/pkg/controller/external_tfpluginsdk.go:660 +0x5c0
  github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create.func1()
      /Users/[email protected]/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:166 +0x2a0

Goroutine 82 (running) created at:
  github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Update()
      /Users/[email protected]/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:178 +0x244
  github.com/crossplane/upjet/pkg/controller.TestAsyncTerraformPluginSDKUpdate.func3()
      /Users/[email protected]/25feb/upjet/pkg/controller/external_async_tfpluginsdk_test.go:286 +0x94
  testing.tRunner()
      /opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1792 +0x180
  testing.(*T).Run.gowrap1()
      /opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1851 +0x40

Goroutine 79 (finished) created at:
  github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create()
      /Users/[email protected]/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:145 +0x244
  github.com/crossplane/upjet/pkg/controller.TestAsyncTerraformPluginSDKCreate.func3()
      /Users/[email protected]/25feb/upjet/pkg/controller/external_async_tfpluginsdk_test.go:242 +0x94
  testing.tRunner()
      /opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1792 +0x180
  testing.(*T).Run.gowrap1()
      /opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1851 +0x40
==================
--- FAIL: TestConnect (0.00s)
    testing.go:1490: race detected during execution of test

It looks like it might be the cause of the problem, where the controller's Create and Update call the SetObservation method concurrently on a vanilla Go map which is not thread safe.

Solution

Introduce synchronization or use a thread-safe data structure like sync.Map.

@jake-ciolek jake-ciolek added the bug Something isn't working label Feb 19, 2025
@jake-ciolek
Copy link
Contributor Author

We've seen another failure related to this code, but now it's about writing to an uninitialized map. This occured in the S3 provider.

See:

2025/02/27 04:40:16 [DEBUG] Waiting for state to become: [success]
panic: assignment to entry in nil map

goroutine 292330486 [running]:
reflect.mapassign0(0xeb14160, 0xc002388020?, 0xc008c7dc38?, 0xc01880bd40?)
	runtime/map.go:1370 +0x1d
reflect.mapassign(0xc01880bd40?, 0xc005853b88?, 0xc005853bd8?, 0x40ffa9?)
	reflect/value.go:3828 +0x8d
github.com/modern-go/reflect2.(*UnsafeMapType).UnsafeSetIndex(...)
	github.com/modern-go/[email protected]/unsafe_map.go:76
github.com/json-iterator/go.(*mapDecoder).Decode(0xc006a17180, 0xc00f5114a0, 0xc01880bd40)
	github.com/json-iterator/[email protected]/reflect_map.go:191 +0x325
github.com/json-iterator/go.(*placeholderDecoder).Decode(0xc00c557f00?, 0x4159eb?, 0xc005853c60?)
	github.com/json-iterator/[email protected]/reflect.go:324 +0x1b
github.com/json-iterator/go.(*structFieldDecoder).Decode(0xc00b583640, 0xc00dba2000?, 0xc01880bd40)
	github.com/json-iterator/[email protected]/reflect_struct_decoder.go:1054 +0x50
github.com/json-iterator/go.(*generalStructDecoder).decodeOneField(0xc00b583a80, 0xc004e4ced0?, 0xc01880bd40)
	github.com/json-iterator/[email protected]/reflect_struct_decoder.go:552 +0x29c
github.com/json-iterator/go.(*generalStructDecoder).Decode(0xc00b583a80, 0xdb9d4e0?, 0xc01880bd40)
	github.com/json-iterator/[email protected]/reflect_struct_decoder.go:508 +0x8b
github.com/json-iterator/go.(*Iterator).ReadVal(0xc01880bd40, {0xf00b7c0, 0xc00f511398})
	github.com/json-iterator/[email protected]/reflect.go:79 +0x123
github.com/json-iterator/go.(*frozenConfig).Unmarshal(0xc0002394a0, {0xc00b42c700?, 0xc0173eeb70?, 0x10cb0ae0?}, {0xf00b7c0, 0xc00f511398})
	github.com/json-iterator/[email protected]/config.go:348 +0x99
github.com/upbound/provider-aws/apis/s3/v1beta1.(*Bucket).SetObservation(0xc00f511200, 0xc00e725750?)
	github.com/upbound/provider-aws/apis/s3/v1beta1/zz_generated_terraformed.go:47 +0x7e
github.com/crossplane/upjet/pkg/controller.(*noForkExternal).Update(0xc018819040, {0x12df5968, 0xc010945f80}, {0x12e4c5c0?, 0xc00f511200})
	github.com/crossplane/[email protected]/pkg/controller/external_nofork.go:673 +0x27b
github.com/crossplane/upjet/pkg/controller.(*noForkAsyncExternal).Update.func1()
	github.com/crossplane/[email protected]/pkg/controller/external_async_nofork.go:161 +0x145
created by github.com/crossplane/upjet/pkg/controller.(*noForkAsyncExternal).Update in goroutine 3936
	github.com/crossplane/[email protected]/pkg/controller/external_async_nofork.go:157 +0x168

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant