Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Gateway downtime due to repeated unlocking #3147

Open
soluty opened this issue Feb 18, 2025 · 0 comments
Open

[BUG] Gateway downtime due to repeated unlocking #3147

soluty opened this issue Feb 18, 2025 · 0 comments
Assignees
Labels
bug Categorizes issue or PR as related to a bug.

Comments

@soluty
Copy link

soluty commented Feb 18, 2025

OpenIM Server Version

3.8.2 最新版本应该也有

Operating System and CPU Architecture

Linux (AMD)

Deployment Method

Source Code Deployment

Bug Description and Steps to Reproduce

报错如下:

fatal error: sync: unlock of unlocked mutex
goroutine 460 [running]:
sync.fatal({0x16d89f0?, 0x121e2dc?})
	/root/.vmr/versions/go_versions/go/src/runtime/panic.go:1007 +0x18
sync.(*Mutex).unlockSlow(0xc02f5d34e0, 0xffffffff)
	/root/.vmr/versions/go_versions/go/src/sync/mutex.go:229 +0x35
sync.(*Mutex).Unlock(...)
	/root/.vmr/versions/go_versions/go/src/sync/mutex.go:223
github.com/openimsdk/open-im-server/v3/internal/msggateway.(*Client).writeBinaryMsg(0xc03b294820, {0x7d1, {0x0, 0x0}, {0xc03117c7b0, 0x22}, 0x0, {0x0, 0x0}, {0xc02d218000, ...}})
	/repo/open-im-server/internal/msggateway/client.go:369 +0x1a8
github.com/openimsdk/open-im-server/v3/internal/msggateway.(*Client).PushMessage(0xc03b294820, {0x19176d8, 0xc006be2630}, 0xc006be42c0)
	/repo/open-im-server/internal/msggateway/client.go:325 +0x36b
github.com/openimsdk/open-im-server/v3/internal/msggateway.(*Server).pushToUser(0xc0001d07e0, {0x19176d8, 0xc006be2630}, {0xc02b8b8f90, 0x14}, 0xc006be42c0)
	/repo/open-im-server/internal/msggateway/hub_server.go:150 +0x3b0
github.com/openimsdk/open-im-server/v3/internal/msggateway.(*Server).SuperGroupOnlineBatchPushOneMsg.func1()
	/repo/open-im-server/internal/msggateway/hub_server.go:177 +0x45
github.com/openimsdk/tools/mq/memamq.(*MemoryQueue).initialize.func1()
	/root/go/pkg/mod/github.com/openimsdk/[email protected]/mq/memamq/queue.go:54 +0x75
created by github.com/openimsdk/tools/mq/memamq.(*MemoryQueue).initialize in goroutine 1
	/root/go/pkg/mod/github.com/openimsdk/[email protected]/mq/memamq/queue.go:51 +0x65

相关代码和分析如下:
在msggateway/client.go文件中, 有一个锁 c.w, 它有writeBinaryMsg方法和ResetClient方法

func (c *Client) writeBinaryMsg(resp Resp) error {
	if c.closed.Load() {
		return nil
	}

	encodedBuf, err := c.Encoder.Encode(resp)
	if err != nil {
		return err
	}

	c.w.Lock()
	defer c.w.Unlock()
func (c *Client) ResetClient(ctx *UserConnContext, conn LongConn, longConnServer LongConnServer) {
	c.w = new(sync.Mutex)

在writeBinaryMsg方法中c.w拿到锁了, 结果通过ResetClient把c.w重置了, 然后writeBinaryMsg方法中的c.w换成了重置的锁, 这时走到unlock的时候就panic了,
目前我的改动如下

func (c *Client) ResetClient(ctx *UserConnContext, conn LongConn, longConnServer LongConnServer) {
  if c.w == nil  {
    c.w = new(sync.Mutex)
  }	

一个Client对象持有一把锁就够了, 不知道这样会不会导致别的问题, 也想问下你们要怎么去修复这个bug.
并且这个问题也我之前碰到的concurrent write 问题起因差不多一致.

Screenshots Link

No response

@soluty soluty added the bug Categorizes issue or PR as related to a bug. label Feb 18, 2025
@OpenIM-Robot OpenIM-Robot changed the title [BUG] 网关由于重复的解锁导致宕机 [BUG] Gateway downtime due to repeated unlocking Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants