Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Increasing Pod Memory Usage for Push Service #1945

Closed
Sylariam opened this issue Feb 22, 2024 · 5 comments
Closed

Bug: Increasing Pod Memory Usage for Push Service #1945

Sylariam opened this issue Feb 22, 2024 · 5 comments
Labels
bug Categorizes issue or PR as related to a bug.

Comments

@Sylariam
Copy link

Sylariam commented Feb 22, 2024

What happened?

Server version 3.5.0. Push service consumed too much memory. Possible memory leak?
image

What did you expect to happen?

stable memory usage

How can we reproduce it (as minimally and precisely as possible)?

This is a prometheus metrics. Queried by
sum (container_memory_working_set_bytes{image!="",pod_name=~"$Pod",namespace="$namespace"}) by (pod_name)

Anything else we need to know?

No response

version

```console $ {name} version # paste output here ```

Cloud provider

OS version

```console # On Linux: $ cat /etc/os-release # paste output here $ uname -a # paste output here # On Windows: C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture # paste output here ```

Install tools

@Sylariam Sylariam added the bug Categorizes issue or PR as related to a bug. label Feb 22, 2024
@cubxxw
Copy link
Contributor

cubxxw commented Feb 22, 2024

This seems a bit unusual.
@FGadvancer

@Sylariam
Copy link
Author

Sylariam commented Mar 6, 2024

Updates:
After integrated push service with Pyroscope, and ran for a week, I got these stats:
img_v3_028l_47679782-4626-4fe5-9aa3-bfd557f6511g
Looks like push service get a lot of grpc conn in the process, then I checked the code:
func (p *Pusher) k8sOnlinePush(ctx context.Context, msg *sdkws.MsgData, pushToUserIDs []string) (wsResults []*msggateway.SingleMsgToUserResults, err error) {
for host, userIds := range usersHost { tconn, _ := p.discov.GetConn(ctx, host) usersConns[tconn] = userIds }
every push will trigger this p.discov.GetConn, thus caused too much memory takeup

@Sylariam
Copy link
Author

Sylariam commented Mar 6, 2024

I made a temp workaround:
`

    var usersConns = make(map[*grpc.ClientConn][]string)
for host, userIds := range usersHost {
	//tconn, _ := p.discov.GetConn(ctx, host)
	//usersConns[tconn] = userIds

	if conn, ok := onlinePusherConnMap[host]; ok {
		log.ZDebug(ctx, "DEBUG reuse local conn", "host", host)
		usersConns[conn] = userIds
	} else {
		log.ZDebug(ctx, "DEBUG no valid local conn", "host", host)
		tconn, _ := p.discov.GetConn(ctx, host)
		usersConns[tconn] = userIds
		onlinePusherConnMu.Lock()
		//defer onlinePusherConnMu.Unlock()
		log.ZDebug(ctx, "DEBUG add to local conn", "host", host)
		onlinePusherConnMap[host] = tconn
		onlinePusherConnMu.Unlock()
	}
}

This will try to reuse conn if absent, otherwise GetConn. Not sure if this is safe solution. WDYT?

@kubbot
Copy link
Contributor

kubbot commented May 13, 2024

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@skiffer-git
Copy link
Member

this issue has fixed in release-v3.8, I recommend you update to new version. If you run into any new issues, please reopen this issue or create a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants