SinkGroupSpec, DeploymentUnit, MaxBytesPerBatch #172

alok87 · 2021-03-21T16:03:23Z

Brings up new specification to define sinkGroup based batcher and loader configuration. Introduces the concept of deployment unit which the user can specify to solve the scaling issues described in #167.maxBytesPerBatch replaces maxSize as it makes all the topics operating in the same pod take the same amount of resources, paving the way for scaling being possible. As scaling requires homegeneous resource consumption across multiple topics.

MaxSize gets deprecated in favour of MaxBytesPerBatch. This config makes all fat and lean tables behave the same way in the Redshiftbatcher. They both take the same amount of memory and the scaling becomes easier now. Related #136 #167

Few more changes for metrics

alok87 · 2021-03-24T10:35:07Z

Live with backward compatibility

alok87 · 2021-03-24T12:20:12Z

Old spec is compatible, tested OK.

alok87 · 2021-03-24T14:20:33Z

Metric is a problem, values being Gauge and not being set in some conditions never go down to 0

This is not required

alok87 · 2021-03-25T09:52:51Z

Two problems have come out after testing with 300 tables:

The distribution of units is not same. Big topics are landing into the same unit, making some unit slow. The distribution of topics to units should be uniform.
The batcher units keep on waiting and running even when the lag has come down, and it is waiting for the loader lag to come down.

alok87 · 2021-03-25T12:16:30Z

Things are not auto recovering from

E0325 12:03:27.620298       1 batch_processor.go:498] ts.fabric.vn_call_forwardings, error(s) occured in processing (sending err)
I0325 12:03:27.620305       1 batch_processor.go:176] ts.fabric.vn_call_forwardings: batch processing gracefully shutingdown

alok87 · 2021-03-25T14:37:04Z

Releases are getting stuck.

alok87 · 2021-03-26T03:08:00Z

cpu_pprof.pdf

CPU Profiling

Update: opened a separate issue to solve this #173

Separate out realtime calculation and sinkgroup. Separation of concern. Main reason: Need batcher and loader lag to allocateDeploymentUnits

Fixes #151

Easier to debug and easier to delete this way

Deadlock is current cannot be populated until reload pods are there and reload cannot be done until current is not there

This is so that, we only operate on the topics which are not realtime but only reloading. Not doing this makes the allocator generate duplicates as it operates on the realtime topics also since the current status still has realtime topics. So whenever realtime updates happen always fix the state of batcher reloading topics.

alok87 · 2021-03-31T10:05:40Z

if maxReloadingTopics is reduced it does not take effect.

alok87 · 2021-03-31T11:31:17Z

sometimes batcher is stuck in session closure loop, mostly related to MaxProcessingTime, need to be checked
looks dupe of #172 (comment)

Error while closing connection to broker broken pipe

Update: trying internal kafka listners (Strimzi) strimzi/strimzi-kafka-operator#4688

This was fixed after using internal routing.

alok87 · 2021-03-31T13:10:53Z

Loader optimizations

Time

maxSizePerBatch: v low value causes slowness due to repeated merge
maxWaitSeconds: if batchSize mbs is small due to maxWait hit due to First schema creation un-necessarily retrys #129 then v small batches are loaded causing slowness
Update: maxWait should reset after processing is done, so that next batch is not small

Dividing loader into multiple pods

Grouping loader into group of pods based on the lag it has. So that there is minimum shuffling.
(not required, small batches operate at same speed)

This is required so that batches are made of big size at the time of full sink. Solves Time part of #172 (comment)

SinkGroupSpec, DeploymentUnit, MaxBytesPerBatch

This is required so that batches are made of big size at the time of full sink. Solves Time part of #172 (comment)

SinkGroupSpec, DeploymentUnit, MaxBytesPerBatch

This is required so that batches are made of big size at the time of full sink. Solves Time part of #172 (comment)

SinkGroupSpec, DeploymentUnit, MaxBytesPerBatch

alok87 added 2 commits March 21, 2021 21:31

MaxBytesPerBatch is better than maxSize

f2a7e3a

MaxSize gets deprecated in favour of MaxBytesPerBatch. This config makes all fat and lean tables behave the same way in the Redshiftbatcher. They both take the same amount of memory and the scaling becomes easier now. Related #136 #167

Deployment Unit and SinkGroup spec

458476a

alok87 changed the title ~~MaxBytesPerBatch is better than maxSize~~ SinkGroupSpec, DeploymentUnit, MaxBytesPerBatch Mar 22, 2021

alok87 added 6 commits March 22, 2021 22:06

Operator changes for SinkGroupSpec, DeploymentUnit, MaxBytesPerBatch

d8a6dac

Documentation improvements

6b61180

Loader changes for the new spec, resource Quanitity

ded4847

Few more changes for metrics

Fix the spec

0042a4e

Fix test

80a4b7b

Fixes after self review

9101b8d

alok87 added 2 commits March 24, 2021 16:24

Make deprecated fields optional

b1868be

Cleanup bug fix: name should be object name

0db2ec1

alok87 added 3 commits March 24, 2021 18:45

Add omitempty; needed for deprecated particularly

e728eb6

Bug fix for image backward compatibility

14d89e2

Bug fix for resource and tolerations backwardness

42c004a

alok87 added 3 commits March 24, 2021 20:16

Use counter and not gauge

e7b1c6e

Unit configuration is parallel now and part of rsk spec

eff35eb

This is not required

Remove status info

bca4380

Fix maxTopics not being set bug

8d4a484

alok87 added 2 commits March 25, 2021 20:18

Log improvements

774455f

Sort all states

509801a

alok87 added 3 commits March 26, 2021 16:48

Realtime calculator refactored out.

7bacc5d

Separate out realtime calculation and sinkgroup. Separation of concern. Main reason: Need batcher and loader lag to allocateDeploymentUnits

S3 path bug; make unique using consumerGroupID

3e66455

Fixes #151

Keep consumerGroupID the first dir

9bcedd7

Easier to debug and easier to delete this way

alok87 added 6 commits March 31, 2021 12:18

Unit allocation debuggers

c5f935d

Fix the deadlock

a7eb39c

Deadlock is current cannot be populated until reload pods are there and reload cannot be done until current is not there

Fix duplicate bug for allocator

f12c010

Fix bug: Batcher realtime should be removed

c0f58e0

Increase the name default

a1af1cd

alok87 added 2 commits March 31, 2021 16:13

Fix maxReloading unit decrement bug

3760af2

Update logs info for delete

1b503f3

alok87 added 2 commits March 31, 2021 18:00

Lag info

d2a0534

More log info

2d85daf

alok87 added 3 commits April 1, 2021 08:44

Debug flags for realtime calc

78283ec

Fix batcher and loader realtime calc bug

073d3c5

Stop and reset ticker after processing

febba1c

This is required so that batches are made of big size at the time of full sink. Solves Time part of #172 (comment)

alok87 merged commit 5952d7a into master Apr 1, 2021

alok87 mentioned this pull request Apr 8, 2021

Separate resource requirement for sinkgroups #138

Closed

alok87 added a commit that referenced this pull request Apr 14, 2021

Fixes #172 grant schema access from loader

8dc89cc

alok87 mentioned this pull request Apr 14, 2021

Grant schema access from loader #193

Merged

alok87 deleted the api-sinkgroups-maxBytes branch May 31, 2021 07:21

alok87 added a commit that referenced this pull request Jun 5, 2021

Stop and reset ticker after processing

1f50d4e

This is required so that batches are made of big size at the time of full sink. Solves Time part of #172 (comment)

alok87 added a commit that referenced this pull request Jun 5, 2021

Merge pull request #172 from practo/api-sinkgroups-maxBytes

334738d

SinkGroupSpec, DeploymentUnit, MaxBytesPerBatch

alok87 added a commit that referenced this pull request Jun 5, 2021

Fixes #172 grant schema access from loader

36aa49f

alok87 added a commit that referenced this pull request Jun 7, 2021

Stop and reset ticker after processing

0de38a9

This is required so that batches are made of big size at the time of full sink. Solves Time part of #172 (comment)

alok87 added a commit that referenced this pull request Jun 7, 2021

Merge pull request #172 from practo/api-sinkgroups-maxBytes

5159ea8

SinkGroupSpec, DeploymentUnit, MaxBytesPerBatch

alok87 added a commit that referenced this pull request Jun 7, 2021

Fixes #172 grant schema access from loader

d650741

alok87 added a commit that referenced this pull request Jun 17, 2021

Stop and reset ticker after processing

91d3878

This is required so that batches are made of big size at the time of full sink. Solves Time part of #172 (comment)

alok87 added a commit that referenced this pull request Jun 17, 2021

Merge pull request #172 from practo/api-sinkgroups-maxBytes

0da2253

SinkGroupSpec, DeploymentUnit, MaxBytesPerBatch

alok87 added a commit that referenced this pull request Jun 17, 2021

Fixes #172 grant schema access from loader

0292d9d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SinkGroupSpec, DeploymentUnit, MaxBytesPerBatch #172

SinkGroupSpec, DeploymentUnit, MaxBytesPerBatch #172

alok87 commented Mar 21, 2021 •

edited

Loading

alok87 commented Mar 24, 2021

alok87 commented Mar 24, 2021

alok87 commented Mar 24, 2021

alok87 commented Mar 25, 2021

alok87 commented Mar 25, 2021

alok87 commented Mar 25, 2021

alok87 commented Mar 26, 2021 •

edited

Loading

alok87 commented Mar 31, 2021

alok87 commented Mar 31, 2021 •

edited

Loading

alok87 commented Mar 31, 2021 •

edited

Loading

SinkGroupSpec, DeploymentUnit, MaxBytesPerBatch #172

SinkGroupSpec, DeploymentUnit, MaxBytesPerBatch #172

Conversation

alok87 commented Mar 21, 2021 • edited Loading

alok87 commented Mar 24, 2021

alok87 commented Mar 24, 2021

alok87 commented Mar 24, 2021

alok87 commented Mar 25, 2021

alok87 commented Mar 25, 2021

alok87 commented Mar 25, 2021

alok87 commented Mar 26, 2021 • edited Loading

alok87 commented Mar 31, 2021

alok87 commented Mar 31, 2021 • edited Loading

alok87 commented Mar 31, 2021 • edited Loading

Time

Dividing loader into multiple pods

alok87 commented Mar 21, 2021 •

edited

Loading

alok87 commented Mar 26, 2021 •

edited

Loading

alok87 commented Mar 31, 2021 •

edited

Loading

alok87 commented Mar 31, 2021 •

edited

Loading