Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching metadata on subscribe causing high memory on client and server #100

Open
cruickshankpg opened this issue Dec 8, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@cruickshankpg
Copy link
Contributor

When subscribing to a stream, if the stream is not in the metadataCache the cache is completely updated resulting in the FetchMedata RPC being called. This can happen if the stream is new or does not exist.

c.metadata.update(ctx)

We only create a stream when a message is published to save on unnecessary creates but this means that if multiple subscribers attempt to subscribe before the publisher then a lot of FetchMetadata RPCs are made. When there are 1000s (I had 3000 when I hit this) of streams in the liftbridge cluster the metadata gets very big and marshalling it so frequently caused one of my liftbridge servers to become unresponsive and all the memory on my client to be used up.

Our liftbridge client service has liftbridge client connections to multiple liftbridge clusters so even storing the full metadata for each cluster is more memory than we would like. Keeping track of the brokers for a cluster is obviously necessary but do we need all the streams? Could individual stream partitions be fetched into the cache on demand?

@cruickshankpg
Copy link
Contributor Author

I got some allocs_space pprofs to work out why my servers were broken:

liftbridge allocs profile

      File: liftbridge
Type: alloc_space
Time: Dec 8, 2020 at 12:36pm (GMT)
Showing nodes accounting for 80565.66MB, 100% of 80565.66MB total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context 	 	 
----------------------------------------------------------+-------------
                                        16492.49MB   100% |   google.golang.org/grpc/encoding/proto.codec.Marshal /root/go/pkg/mod/google.golang.org/[email protected]/encoding/proto/proto.go:70
16492.49MB 20.47% 20.47% 16492.49MB 20.47%                | github.com/liftbridge-io/liftbridge-api/go.(*FetchMetadataResponse).Marshal /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/go/api.pb.go:4250
----------------------------------------------------------+-------------
                                         9612.19MB   100% |   syscall.BytePtrFromString /src/toolchain/_go/1.15.3/go/src/syscall/syscall.go:69
 9612.19MB 11.93% 32.40%  9612.19MB 11.93%                | syscall.ByteSliceFromString /src/toolchain/_go/1.15.3/go/src/syscall/syscall.go:53
----------------------------------------------------------+-------------
                                         7251.11MB   100% |   github.com/liftbridge-io/liftbridge/server.(*metadataAPI).createMetadataResponse /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:277
 7251.11MB  9.00% 41.40%  7251.11MB  9.00%                | github.com/liftbridge-io/liftbridge/server.getPartitionMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:1337
----------------------------------------------------------+-------------
                                         2230.10MB 33.76% |   github.com/liftbridge-io/liftbridge/server.getPartitionMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:1348
                                         2195.60MB 33.24% |   github.com/liftbridge-io/liftbridge/server.getPartitionMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:1347
                                         2180.10MB 33.00% |   github.com/liftbridge-io/liftbridge/server.getPartitionMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:1346
 6605.80MB  8.20% 49.60%  6605.80MB  8.20%                | github.com/liftbridge-io/liftbridge/server.eventTimestampsToProto /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:1320
----------------------------------------------------------+-------------
                                        20480.48MB   100% |   github.com/liftbridge-io/liftbridge/server.(*metadataAPI).FetchMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:123
 5148.55MB  6.39% 55.99% 20480.48MB 25.42%                | github.com/liftbridge-io/liftbridge/server.(*metadataAPI).createMetadataResponse /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:277
                                         7251.11MB 35.40% |   github.com/liftbridge-io/liftbridge/server.getPartitionMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:1337
                                         2230.10MB 10.89% |   github.com/liftbridge-io/liftbridge/server.getPartitionMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:1348
                                         2195.60MB 10.72% |   github.com/liftbridge-io/liftbridge/server.getPartitionMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:1347
                                         2180.10MB 10.64% |   github.com/liftbridge-io/liftbridge/server.getPartitionMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:1346
                                          758.01MB  3.70% |   github.com/liftbridge-io/liftbridge/server.getPartitionMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:1340
                                          717.01MB  3.50% |   github.com/liftbridge-io/liftbridge/server.getPartitionMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:1341
----------------------------------------------------------+-------------
                                         4391.40MB   100% |   github.com/liftbridge-io/liftbridge/server.(*metadataAPI).FetchMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:123
 4391.40MB  5.45% 61.44%  4391.40MB  5.45%                | github.com/liftbridge-io/liftbridge/server.(*metadataAPI).createMetadataResponse /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:279
----------------------------------------------------------+-------------
                                         3173.56MB   100% |   strings.(*Builder).Grow /src/toolchain/_go/1.15.3/go/src/strings/builder.go:82 (inline)
 3173.56MB  3.94% 65.38%  3173.56MB  3.94%                | strings.(*Builder).grow /src/toolchain/_go/1.15.3/go/src/strings/builder.go:68
----------------------------------------------------------+-------------
                                         2706.48MB   100% |   github.com/liftbridge-io/liftbridge/server.(*metadataAPI).FetchMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:123
 2706.48MB  3.36% 68.74%  2706.48MB  3.36%                | github.com/liftbridge-io/liftbridge/server.(*metadataAPI).createMetadataResponse /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:260
----------------------------------------------------------+-------------
                                         2212.10MB   100% |   github.com/liftbridge-io/liftbridge/server.(*metadataAPI).FetchMetadata /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:123
 2212.10MB  2.75% 71.49%  2212.10MB  2.75%                | github.com/liftbridge-io/liftbridge/server.(*metadataAPI).createMetadataResponse /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/server/metadata.go:275

client profile:

      File: ems-coordinator
Type: alloc_space
Time: Dec 7, 2020 at 6:17pm (GMT)
Showing nodes accounting for 118732.17MB, 100% of 118732.17MB total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context 	 	 
----------------------------------------------------------+-------------
                                        25840.43MB   100% |   google.golang.org/grpc.recvAndDecompress /root/go/pkg/mod/google.golang.org/[email protected]/rpc_util.go:689
25840.43MB 21.76% 21.76% 25840.43MB 21.76%                | google.golang.org/grpc.(*parser).recvMsg /root/go/pkg/mod/google.golang.org/[email protected]/rpc_util.go:576
----------------------------------------------------------+-------------
                                        15631.98MB 79.33% |   github.com/liftbridge-io/go-liftbridge/v2.(*client).subscribe /root/go/pkg/mod/github.com/liftbridge-io/go-liftbridge/[email protected]/client.go:1518
                                         4013.77MB 20.37% |   github.com/liftbridge-io/go-liftbridge/v2.(*client).FetchMetadata /root/go/pkg/mod/github.com/liftbridge-io/go-liftbridge/[email protected]/client.go:1241
                                           58.51MB   0.3% |   github.com/liftbridge-io/go-liftbridge/v2.(*client).subscribe /root/go/pkg/mod/github.com/liftbridge-io/go-liftbridge/[email protected]/client.go:1543
19704.26MB 16.60% 38.36% 19704.26MB 16.60%                | github.com/liftbridge-io/go-liftbridge/v2.(*metadataCache).update /root/go/pkg/mod/github.com/liftbridge-io/go-liftbridge/[email protected]/metadata.go:294
----------------------------------------------------------+-------------
                                         8983.87MB   100% |   github.com/liftbridge-io/liftbridge-api/go.(*FetchMetadataResponse).Unmarshal /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/go/api.pb.go:8604
 8983.87MB  7.57% 45.93%  8983.87MB  7.57%                | github.com/liftbridge-io/liftbridge-api/go.(*StreamMetadata).Unmarshal /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/go/api.pb.go:10692
----------------------------------------------------------+-------------
                                         7240.88MB   100% |   github.com/liftbridge-io/liftbridge-api/go.(*FetchMetadataResponse).Unmarshal /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/go/api.pb.go:8604
 7240.88MB  6.10% 52.02%  7240.88MB  6.10%                | github.com/liftbridge-io/liftbridge-api/go.(*StreamMetadata).Unmarshal /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/go/api.pb.go:10578
----------------------------------------------------------+-------------
                                         7120.87MB   100% |   github.com/liftbridge-io/liftbridge-api/go.(*FetchMetadataResponse).Unmarshal /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/go/api.pb.go:8604
 7120.87MB  6.00% 58.02%  7120.87MB  6.00%                | github.com/liftbridge-io/liftbridge-api/go.(*StreamMetadata).Unmarshal /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/go/api.pb.go:10546
----------------------------------------------------------+-------------
                                         7089.10MB   100% |   google.golang.org/grpc/encoding/proto.codec.Unmarshal /root/go/pkg/mod/google.golang.org/[email protected]/encoding/proto/proto.go:88
 7089.10MB  5.97% 63.99%  7089.10MB  5.97%                | github.com/liftbridge-io/liftbridge-api/go.(*FetchMetadataResponse).Unmarshal /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/go/api.pb.go:8603
----------------------------------------------------------+-------------
                                         6355.18MB   100% |   github.com/liftbridge-io/liftbridge-api/go.(*FetchMetadataResponse).Unmarshal /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/go/api.pb.go:8604
 6355.18MB  5.35% 69.34%  6355.18MB  5.35%                | github.com/liftbridge-io/liftbridge-api/go.(*StreamMetadata).Unmarshal /root/go/pkg/mod/github.com/liftbridge-io/[email protected]/go/api.pb.go:10712
----------------------------------------------------------+-------------
                                         4334.02MB 79.97% |   github.com/liftbridge-io/go-liftbridge/v2.(*client).subscribe /root/go/pkg/mod/github.com/liftbridge-io/go-liftbridge/[email protected]/client.go:1518
                                         1071.64MB 19.77% |   github.com/liftbridge-io/go-liftbridge/v2.(*client).FetchMetadata /root/go/pkg/mod/github.com/liftbridge-io/go-liftbridge/[email protected]/client.go:1241
                                           13.81MB  0.25% |   github.com/liftbridge-io/go-liftbridge/v2.(*client).subscribe /root/go/pkg/mod/github.com/liftbridge-io/go-liftbridge/[email protected]/client.go:1543
 5419.47MB  4.56% 73.91%  5419.47MB  4.56%                | github.com/liftbridge-io/go-liftbridge/v2.(*metadataCache).update /root/go/pkg/mod/github.com/liftbridge-io/go-liftbridge/[email protected]/metadata.go:303
----------------------------------------------------------+-------------
                                         2789.17MB 79.04% |   github.com/liftbridge-io/go-liftbridge/v2.(*client).subscribe /root/go/pkg/mod/github.com/liftbridge-io/go-liftbridge/[email protected]/client.go:1518
                                          724.54MB 20.53% |   github.com/liftbridge-io/go-liftbridge/v2.(*client).FetchMetadata /root/go/pkg/mod/github.com/liftbridge-io/go-liftbridge/[email protected]/client.go:1241
                                              15MB  0.43% |   github.com/liftbridge-io/go-liftbridge/v2.(*client).subscribe /root/go/pkg/mod/github.com/liftbridge-io/go-liftbridge/[email protected]/client.go:1543
 3528.72MB  2.97% 76.88%  3528.72MB  2.97%                | github.com/liftbridge-io/go-liftbridge/v2.(*metadataCache).update /root/go/pkg/mod/github.com/liftbridge-io/go-liftbridge/[email protected]/metadata.go:279

@tylertreat
Copy link
Member

Keeping track of the brokers for a cluster is obviously necessary but do we need all the streams? Could individual stream partitions be fetched into the cache on demand?

Yes, this is an area for improvement I've had in mind. The client should only fetch the streams it needs. Also, the FetchMetadata RPC already supports this. It just defaults to fetching everything if no streams are specified, so it should be a fairly simple change.

@tylertreat tylertreat added the enhancement New feature or request label Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants