Skip to content

fix: better error handling when fetching the master node from the sentinels #3349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 17, 2025

Conversation

ndyakov
Copy link
Member

@ndyakov ndyakov commented Apr 17, 2025

  • Make sure all sentinel errors are reported if the masterAddr was not resolved.
  • Resolving the masterAddr concurrently may result in read from errCh before read from done => first check if the masterAddr was resolved, and then proceed to report errors.

Fixes: #3172

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

sentinel.go:609

  • The for-range loop over errCh may block if errCh is not closed. Ensure that errCh is closed after all goroutines finish sending errors to allow proper aggregation.
for err := range errCh {

@ndyakov ndyakov requested a review from Copilot April 17, 2025 11:27
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

sentinel.go:590

  • Removal of the immediate return for context.Canceled and context.DeadlineExceeded errors alters the prioritization in error reporting; please verify that this behavior change is intentional.
errCh <- err

sentinel.go:613

  • [nitpick] The updated error message now combines multiple errors using errors.Join; consider including additional context or documentation to aid in debugging when multiple errors occur.
return "", fmt.Errorf("redis: all sentinels specified in configuration are unreachable: %w", errors.Join(errs...))

@kwenzh
Copy link

kwenzh commented Apr 17, 2025

current cancel() is also can stop goruntine.

Do we need to use select-case done in the goroutine to notify other goroutines to terminate the sentinel query early when the master address has already been queried?

the different is do cancel() more times;

@kwenzh
Copy link

kwenzh commented Apr 17, 2025

how about this :

https://github.com/redis/go-redis/pull/3350/files

@ndyakov
Copy link
Member Author

ndyakov commented Apr 17, 2025

current cancel() is also can stop goruntine.

Do we need to use select-case done in the goroutine to notify other goroutines to terminate the sentinel query early when the master address has already been queried?

the different is do cancel() more times;

The cancel for the context can be called only if the master is resolved and will close the other requests in this case, the error for closed context will be reported if the master was not resolve. Can you clarify what is the problem you are observing in this PR?

@kwenzh
Copy link

kwenzh commented Apr 17, 2025

The cancel for the context can be called only if the master is resolved and will close the other requests in this case, the error for closed context will be reported if the master was not resolve. Can you clarify what is the problem you are observing in this PR?

Well, I misunderstood

htemelski
htemelski previously approved these changes Apr 17, 2025
Copy link

@htemelski htemelski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good.
There's one minor nitpick.

Maybe the logic should be inverted with the happy path return being at the end of the function

@ndyakov
Copy link
Member Author

ndyakov commented Apr 17, 2025

Maybe the logic should be inverted with the happy path return being at the end of the function

In most cases - yes, but not here. If we were able to get the masterAddr from any of the sentinels, even if there are any errors for the rest (and there probably will be since if the context was canceled, there will be deadline errors anyway) the resolved masterAddrs is what we would like to be returned from this function.

@htemelski
Copy link

htemelski commented Apr 17, 2025

Maybe the logic should be inverted with the happy path return being at the end of the function

In most cases - yes, but not here. If we were able to get the masterAddr from any of the sentinels, even if there are any errors for the rest (and there probably will be since if the context was canceled, there will be deadline errors anyway) the resolved masterAddrs is what we would like to be returned from this function.

You are right, I didn't take into account the cancel() inside the once.Do block

@ndyakov ndyakov merged commit e191cf9 into master Apr 17, 2025
16 checks passed
ndyakov added a commit that referenced this pull request Apr 30, 2025
…tinels (#3349)

* Better error handling when fetching the master node from the sentinels

* fix error message generation

* close the errCh to not block

* use len over errCh
project-mirrors-bot-tu bot pushed a commit to project-mirrors/forgejo that referenced this pull request May 1, 2025
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/redis/go-redis/v9](https://github.com/redis/go-redis) | require | minor | `v9.7.3` -> `v9.8.0` |

---

### Release Notes

<details>
<summary>redis/go-redis (github.com/redis/go-redis/v9)</summary>

### [`v9.8.0`](https://github.com/redis/go-redis/releases/tag/v9.8.0)

[Compare Source](redis/go-redis@v9.7.3...v9.8.0)

### 9.8.0 (2025-04-30)

#### 🚀 Highlights

-   **Redis 8 Support**: Full compatibility with Redis 8.0, including testing and CI integration
-   **Enhanced Hash Operations**: Added support for new hash commands (`HGETDEL`, `HGETEX`, `HSETEX`) and `HSTRLEN` command
-   **Search Improvements**: Enabled Search DIALECT 2 by default and added `CountOnly` argument for `FT.Search`

#### ✨ New Features

-   Added support for new hash commands: `HGETDEL`, `HGETEX`, `HSETEX` ([#&#8203;3305](redis/go-redis#3305))
-   Added `HSTRLEN` command for hash operations ([#&#8203;2843](redis/go-redis#2843))
-   Added `Do` method for raw query by single connection from `pool.Conn()` ([#&#8203;3182](redis/go-redis#3182))
-   Prevent false-positive marshaling by treating zero time.Time as empty in isEmptyValue ([#&#8203;3273](redis/go-redis#3273))
-   Added FailoverClusterClient support for Universal client ([#&#8203;2794](redis/go-redis#2794))
-   Added support for cluster mode with `IsClusterMode` config parameter ([#&#8203;3255](redis/go-redis#3255))
-   Added client name support in `HELLO` RESP handshake ([#&#8203;3294](redis/go-redis#3294))
-   **Enabled Search DIALECT 2 by default** ([#&#8203;3213](redis/go-redis#3213))
-   Added read-only option for failover configurations ([#&#8203;3281](redis/go-redis#3281))
-   Added `CountOnly` argument for `FT.Search` to use `LIMIT 0 0` ([#&#8203;3338](redis/go-redis#3338))
-   Added `DB` option support in `NewFailoverClusterClient` ([#&#8203;3342](redis/go-redis#3342))
-   Added `nil` check for the options when creating a client ([#&#8203;3363](redis/go-redis#3363))

#### 🐛 Bug Fixes

-   Fixed `PubSub` concurrency safety issues ([#&#8203;3360](redis/go-redis#3360))
-   Fixed panic caused when argument is `nil` ([#&#8203;3353](redis/go-redis#3353))
-   Improved error handling when fetching master node from sentinels ([#&#8203;3349](redis/go-redis#3349))
-   Fixed connection pool timeout issues and increased retries ([#&#8203;3298](redis/go-redis#3298))
-   Fixed context cancellation error leading to connection spikes on Primary instances ([#&#8203;3190](redis/go-redis#3190))
-   Fixed RedisCluster client to consider `MASTERDOWN` a retriable error ([#&#8203;3164](redis/go-redis#3164))
-   Fixed tracing to show complete commands instead of truncated versions ([#&#8203;3290](redis/go-redis#3290))
-   Fixed OpenTelemetry instrumentation to prevent multiple span reporting ([#&#8203;3168](redis/go-redis#3168))
-   Fixed `FT.Search` Limit argument and added `CountOnly` argument for limit 0 0 ([#&#8203;3338](redis/go-redis#3338))
-   Fixed missing command in interface ([#&#8203;3344](redis/go-redis#3344))
-   Fixed slot calculation for `COUNTKEYSINSLOT` command ([#&#8203;3327](redis/go-redis#3327))
-   Updated PubSub implementation with correct context ([#&#8203;3329](redis/go-redis#3329))

#### 📚 Documentation

-   Added hash search examples ([#&#8203;3357](redis/go-redis#3357))
-   Fixed documentation comments ([#&#8203;3351](redis/go-redis#3351))
-   Added `CountOnly` search example ([#&#8203;3345](redis/go-redis#3345))
-   Added examples for list commands: `LLEN`, `LPOP`, `LPUSH`, `LRANGE`, `RPOP`, `RPUSH` ([#&#8203;3234](redis/go-redis#3234))
-   Added `SADD` and `SMEMBERS` command examples ([#&#8203;3242](redis/go-redis#3242))
-   Updated `README.md` to use Redis Discord guild ([#&#8203;3331](redis/go-redis#3331))
-   Updated `HExpire` command documentation ([#&#8203;3355](redis/go-redis#3355))
-   Featured OpenTelemetry instrumentation more prominently ([#&#8203;3316](redis/go-redis#3316))
-   Updated `README.md` with additional information ([#&#8203;310ce55](redis/go-redis@310ce55))

#### ⚡ Performance and Reliability

-   Bound connection pool background dials to configured dial timeout ([#&#8203;3089](redis/go-redis#3089))
-   Ensured context isn't exhausted via concurrent query ([#&#8203;3334](redis/go-redis#3334))

#### 🔧 Dependencies and Infrastructure

-   Updated testing image to Redis 8.0-RC2 ([#&#8203;3361](redis/go-redis#3361))
-   Enabled CI for Redis CE 8.0 ([#&#8203;3274](redis/go-redis#3274))
-   Updated various dependencies:
    -   Bumped golangci/golangci-lint-action from 6.5.0 to 7.0.0 ([#&#8203;3354](redis/go-redis#3354))
    -   Bumped rojopolis/spellcheck-github-actions ([#&#8203;3336](redis/go-redis#3336))
    -   Bumped golang.org/x/net in example/otel ([#&#8203;3308](redis/go-redis#3308))
-   Migrated golangci-lint configuration to v2 format ([#&#8203;3354](redis/go-redis#3354))

#### ⚠️ Breaking Changes

-   **Enabled Search DIALECT 2 by default** ([#&#8203;3213](redis/go-redis#3213))
-   Dropped RedisGears (Triggers and Functions) support ([#&#8203;3321](redis/go-redis#3321))
-   Dropped FT.PROFILE command that was never enabled ([#&#8203;3323](redis/go-redis#3323))

#### 🔒 Security

-   Fixed network error handling on SETINFO (CVE-2025-29923) ([#&#8203;3295](redis/go-redis#3295))

#### 🧪 Testing

-   Added integration tests for Redis 8 behavior changes in Redis Search ([#&#8203;3337](redis/go-redis#3337))
-   Added vector types INT8 and UINT8 tests ([#&#8203;3299](redis/go-redis#3299))
-   Added test codes for search_commands.go ([#&#8203;3285](redis/go-redis#3285))
-   Fixed example test sorting ([#&#8203;3292](redis/go-redis#3292))

#### 👥 Contributors

We would like to thank all the contributors who made this release possible:

[@&#8203;alexander-menshchikov](https://github.com/alexander-menshchikov), [@&#8203;EXPEbdodla](https://github.com/EXPEbdodla), [@&#8203;afti](https://github.com/afti), [@&#8203;dmaier-redislabs](https://github.com/dmaier-redislabs), [@&#8203;four_leaf_clover](https://github.com/four_leaf_clover), [@&#8203;alohaglenn](https://github.com/alohaglenn), [@&#8203;gh73962](https://github.com/gh73962), [@&#8203;justinmir](https://github.com/justinmir), [@&#8203;LINKIWI](https://github.com/LINKIWI), [@&#8203;liushuangbill](https://github.com/liushuangbill), [@&#8203;golang88](https://github.com/golang88), [@&#8203;gnpaone](https://github.com/gnpaone), [@&#8203;ndyakov](https://github.com/ndyakov), [@&#8203;nikolaydubina](https://github.com/nikolaydubina), [@&#8203;oleglacto](https://github.com/oleglacto), [@&#8203;andy-stark-redis](https://github.com/andy-stark-redis), [@&#8203;rodneyosodo](https://github.com/rodneyosodo), [@&#8203;dependabot](https://github.com/dependabot), [@&#8203;rfyiamcool](https://github.com/rfyiamcool), [@&#8203;frankxjkuang](https://github.com/frankxjkuang), [@&#8203;fukua95](https://github.com/fukua95), [@&#8203;soleymani-milad](https://github.com/soleymani-milad), [@&#8203;ofekshenawa](https://github.com/ofekshenawa), [@&#8203;khasanovbi](https://github.com/khasanovbi)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - Between 12:00 AM and 03:59 AM ( * 0-3 * * * ) (UTC), Automerge - Between 12:00 AM and 03:59 AM ( * 0-3 * * * ) (UTC).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yNjEuNCIsInVwZGF0ZWRJblZlciI6IjM5LjI2MS40IiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=-->

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/7739
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
Co-committed-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants