Skip to content

Commit

Permalink
bugfix: S3C-3955 ignore errors from readRecords during LogReader setup
Browse files Browse the repository at this point in the history
The getRaftLog route returns an error 500 when a queried raft session
is not in cache. This, when the queue populator had multiple raft
sessions to manage, caused the whole setup to fail if one of them
failed. Previously, before S3C-3835 was fixed, it happened during
regular cron job batches which only impacted the missing raft session
and logged an error without preventing progress from other raft
sessions.

The fix consists of not causing a global error on error 500 from one
raft session, and instead falling back to starting from log offset 1.

This looks acceptable as a quick fix, but it would be better to have
metadata return a proper success status whenever raft sessions are
missing from its cache, but the scope is larger hence this improvement
over the current fix is postponed for now. Another way could be to
internally retry fetching the offset while allowing the rest of the
log readers to fetch their respective raft session log, but this would
also be a significant change and does not look the best approach to me
either.
  • Loading branch information
jonathan-gramain committed Feb 8, 2021
1 parent 0d50c85 commit bf3525b
Show file tree
Hide file tree
Showing 2 changed files with 47 additions and 11 deletions.
22 changes: 17 additions & 5 deletions lib/queuePopulator/LogReader.js
Original file line number Diff line number Diff line change
Expand Up @@ -148,11 +148,23 @@ class LogReader {
});
this.logConsumer.readRecords({ limit: 1 }, (err, res) => {
if (err) {
this.log.error('error while reading log', {
method: 'LogReader._initializeLogOffset',
error: err,
});
return done(err);
// FIXME: getRaftLog metadata route returns 500 when
// its cache does not contain the queried raft
// session. For the sake of simplicity, in order to
// allow the populator to make progress, we choose to
// accept errors fetching the current offset during
// setup phase and fallback to starting from offset
// 1. It would be better to have metadata return
// special success statuses in such case.
this.log.warn(
'error reading initial log offset, ' +
'default to initial offset 1', {
method: 'LogReader._initializeLogOffset',
zkPath: pathToLogOffset,
logOffset: 1,
error: err,
});
return done(null, 1);
}
const logOffset = res.info.cseq + 1;
this.log.info('starting after latest log sequence', {
Expand Down
36 changes: 30 additions & 6 deletions tests/unit/lib/queuePopulator/LogReader.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,29 @@ const assert = require('assert');

const ZookeeperMock = require('zookeeper-mock');

const { versioning } = require('arsenal');
const { errors, versioning } = require('arsenal');
const { DbPrefixes } = versioning.VersioningConstants;

const { Logger } = require('werelogs');

const LogReader = require('../../../../lib/queuePopulator/LogReader');

class MockLogConsumer {
constructor(params) {
this.params = params || {};
}

readRecords(params, cb) {
process.nextTick(() => {
cb(null, {
info: {
cseq: 12345,
},
});
if (this.params.readRecordsError) {
cb(this.params.readRecordsError);
} else {
cb(null, {
info: {
cseq: 12345,
},
});
}
});
}
}
Expand Down Expand Up @@ -78,4 +86,20 @@ describe('LogReader', () => {
done();
});
});

it('should start from offset 1 on log consumer readRecords error', done => {
const errorLogReader = new LogReader({
logId: 'test-log-reader',
zkClient: zkMock.createClient('localhost:2181'),
logConsumer: new MockLogConsumer({
readRecordsError: errors.InternalError,
}),
logger: new Logger('test:ErrorLogReader'),
});
errorLogReader.setup(err => {
assert.ifError(err);
assert.strictEqual(errorLogReader.logOffset, 1);
done();
});
});
});

0 comments on commit bf3525b

Please sign in to comment.