Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Version Heartbeat #15700

Merged
merged 8 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions core/services/chainlink/application.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import (
"math/big"
"net/http"
"sync"
"time"

"github.com/ethereum/go-ethereum/common"
"github.com/ethereum/go-ethereum/core/types"
Expand All @@ -20,10 +21,12 @@ import (
"go.uber.org/multierr"
"go.uber.org/zap/zapcore"

"github.com/smartcontractkit/chainlink-common/pkg/beholder"
"github.com/smartcontractkit/chainlink-common/pkg/custmsg"
"github.com/smartcontractkit/chainlink-common/pkg/loop"
commonservices "github.com/smartcontractkit/chainlink-common/pkg/services"
"github.com/smartcontractkit/chainlink-common/pkg/sqlutil"
"github.com/smartcontractkit/chainlink-common/pkg/timeutil"
"github.com/smartcontractkit/chainlink-common/pkg/utils"
"github.com/smartcontractkit/chainlink-common/pkg/utils/jsonserializable"
"github.com/smartcontractkit/chainlink-common/pkg/utils/mailbox"
Expand Down Expand Up @@ -80,6 +83,8 @@ import (
"github.com/smartcontractkit/chainlink/v2/plugins"
)

const HeartbeatPeriod = time.Second

// Application implements the common functions used in the core node.
type Application interface {
Start(ctx context.Context) error
Expand Down Expand Up @@ -192,13 +197,71 @@ type ApplicationOpts struct {
NewOracleFactoryFn standardcapabilities.NewOracleFactoryFn
}

type Heartbeat struct {
commonservices.Service
eng *commonservices.Engine

beat time.Duration
}

func NewHeartbeat(lggr logger.Logger) Heartbeat {
h := Heartbeat{
beat: HeartbeatPeriod,
}
h.Service, h.eng = commonservices.Config{
Name: "Heartbeat",
Start: h.start,
}.NewServiceEngine(lggr)
return h
}

func (h *Heartbeat) start(_ context.Context) error {
// Setup beholder resources
gauge, err := beholder.GetMeter().Int64Gauge("heartbeat")
if err != nil {
return err
}
count, err := beholder.GetMeter().Int64Gauge("heartbeat_count")
if err != nil {
return err
}
Comment on lines +220 to +227
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important to check these on start, and necessary to fail if they are not available? Or could we defer, and recall these each tick, in order to allow startup and recovery later if necessary?

If it does make sense to move them inside the tick func, maybe consider making it a separate method too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These specific resources, no. I don't expect them to fail, and their production implementations don't return errors:

func (m *meter) Int64Gauge(name string, options ...metric.Int64GaugeOption) (metric.Int64Gauge, error) {
	m.mtx.Lock()
	defer m.mtx.Unlock()

	if m.delegate != nil {
		return m.delegate.Int64Gauge(name, options...)
	}

	cfg := metric.NewInt64GaugeConfig(options...)
	id := instID{
		name:        name,
		kind:        reflect.TypeOf((*siGauge)(nil)),
		description: cfg.Description(),
		unit:        cfg.Unit(),
	}
	if f, ok := m.instruments[id]; ok {
		return f.(metric.Int64Gauge), nil
	}
	i := &siGauge{name: name, opts: options}
	m.instruments[id] = i
	return i, nil
}

I'd like to avoid putting resource init in the tick. I do think you've raised a good point re: recovery. I think the generic heartbeat is the better place to handle that. Since the init func is passed in as a consumer defined method there anyways, the generic heartbeat could handle a failure by storing that the resource init never completed successfully and then the next tick could try the resource init.

Consumers would have to be aware of this and ensure that the resource inits are idempotent, though. Or the generic heartbeat could handle this by having a user defined onClose functionality to erase un-wanted side-effects before the next tick invocation.


cme := custmsg.NewLabeler()

// Define tick functions
beatFn := func(ctx context.Context) {
// TODO allow override of tracer provider into engine for beholder
_, innerSpan := beholder.GetTracer().Start(ctx, "heartbeat.beat")
defer innerSpan.End()

gauge.Record(ctx, 1)
count.Record(ctx, 1)

err = cme.Emit(ctx, "heartbeat")
if err != nil {
h.eng.Errorw("heartbeat emit failed", "err", err)
}
}

h.eng.GoTick(timeutil.NewTicker(h.getBeat), beatFn)
return nil
}

func (h *Heartbeat) getBeat() time.Duration {
return h.beat
}

// NewApplication initializes a new store if one is not already
// present at the configured root directory (default: ~/.chainlink),
// the logger at the same directory and returns the Application to
// be used by the node.
// TODO: Inject more dependencies here to save booting up useless stuff in tests
func NewApplication(opts ApplicationOpts) (Application, error) {
var srvcs []services.ServiceCtx

heartbeat := NewHeartbeat(opts.Logger)
srvcs = append(srvcs, &heartbeat)

auditLogger := opts.AuditLogger
cfg := opts.Config
relayerChainInterops := opts.RelayerChainInteroperators
Expand Down
9 changes: 7 additions & 2 deletions core/services/chainlink/config_telemetry.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package chainlink

import (
"fmt"
"time"

"github.com/smartcontractkit/chainlink/v2/core/config/toml"
Expand Down Expand Up @@ -42,9 +43,13 @@ func (b *telemetryConfig) OtelExporterGRPCEndpoint() string {
//
// These can be overridden by the TOML if the user so chooses
func (b *telemetryConfig) ResourceAttributes() map[string]string {
sha, ver := static.Short()

defaults := map[string]string{
"service.name": "chainlink",
"service.version": static.Version,
"service.name": "chainlink",
"service.version": static.Version,
"service.sha": static.Sha,
"service.shortversion": fmt.Sprintf("%s@%s", ver, sha),
}

for k, v := range b.s.ResourceAttributes {
Expand Down
14 changes: 9 additions & 5 deletions core/services/chainlink/config_telemetry_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -97,17 +97,21 @@ func TestTelemetryConfig_ResourceAttributes(t *testing.T) {
"DefaultAttributes",
toml.Telemetry{ResourceAttributes: nil},
map[string]string{
"service.name": "chainlink",
"service.version": static.Version,
"service.name": "chainlink",
"service.sha": "unset",
"service.shortversion": "unset@unset",
"service.version": static.Version,
},
},
{
"CustomAttributes",
toml.Telemetry{ResourceAttributes: map[string]string{"custom.key": "custom.value"}},
map[string]string{
"service.name": "chainlink",
"service.version": static.Version,
"custom.key": "custom.value",
"service.name": "chainlink",
"service.sha": "unset",
"service.shortversion": "unset@unset",
"service.version": static.Version,
"custom.key": "custom.value",
},
},
}
Expand Down
3 changes: 3 additions & 0 deletions core/web/testdata/body/health.html
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,9 @@
<details open>
<summary title="HeadReporter" class="noexpand"><span class="passing">HeadReporter</span></summary>
</details>
<details open>
<summary title="Heartbeat" class="noexpand"><span class="passing">Heartbeat</span></summary>
</details>
<details open>
<summary title="JobSpawner" class="noexpand"><span class="passing">JobSpawner</span></summary>
</details>
Expand Down
9 changes: 9 additions & 0 deletions core/web/testdata/body/health.json
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,15 @@
"output": ""
}
},
{
"type": "checks",
"id": "Heartbeat",
"attributes": {
"name": "Heartbeat",
"status": "passing",
"output": ""
}
},
{
"type": "checks",
"id": "JobSpawner",
Expand Down
1 change: 1 addition & 0 deletions core/web/testdata/body/health.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ ok EVM.0.Txm.Confirmer
ok EVM.0.Txm.Finalizer
ok EVM.0.Txm.WrappedEvmEstimator
ok HeadReporter
ok Heartbeat
ok JobSpawner
ok Mailbox.Monitor
ok Mercury.WSRPCPool
Expand Down
10 changes: 10 additions & 0 deletions testdata/scripts/health/default.txtar
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ HTTPPort = $PORT

-- out.txt --
ok HeadReporter
ok Heartbeat
ok JobSpawner
ok Mailbox.Monitor
ok Mercury.WSRPCPool
Expand All @@ -55,6 +56,15 @@ ok WorkflowDBStore
"output": ""
}
},
{
"type": "checks",
"id": "Heartbeat",
"attributes": {
"name": "Heartbeat",
"status": "passing",
"output": ""
}
},
{
"type": "checks",
"id": "JobSpawner",
Expand Down
10 changes: 10 additions & 0 deletions testdata/scripts/health/multi-chain.txtar
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ ok EVM.1.Txm.Confirmer
ok EVM.1.Txm.Finalizer
ok EVM.1.Txm.WrappedEvmEstimator
ok HeadReporter
ok Heartbeat
ok JobSpawner
ok Mailbox.Monitor
ok Mercury.WSRPCPool
Expand Down Expand Up @@ -263,6 +264,15 @@ ok WorkflowDBStore
"output": ""
}
},
{
"type": "checks",
"id": "Heartbeat",
"attributes": {
"name": "Heartbeat",
"status": "passing",
"output": ""
}
},
{
"type": "checks",
"id": "JobSpawner",
Expand Down
Loading