Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe allocations #15

Merged
merged 14 commits into from
Dec 7, 2020
Merged

Safe allocations #15

merged 14 commits into from
Dec 7, 2020

Conversation

bjosv
Copy link
Collaborator

@bjosv bjosv commented Dec 3, 2020

This PR handles vipshop/hiredis-vip#136 and the vulnerability report regarding handling failing allocations, see https://nvd.nist.gov/vuln/detail/CVE-2020-7105

  • Allocation results are now checked before the memory is used.
  • This PR also replaces existing allocation/free macros to use the same api as hiredis

Some details:
Hiredis uses its own allocator-functions which by default uses libc.
By using the same design in hiredis-cluster a user can replace the
allocator for both hiredis and hiredis-cluster at the same time.

Own allocator functions can be enabled by the API:
hiredisAllocFuncs hiredisSetAllocators(hiredisAllocFuncs *ha)

bjosv added 5 commits December 3, 2020 21:57
Hiredis uses its own allocator-functions which by default uses libc.
By using the same design in hiredis-cluster a user can replace the
allocator for both hiredis and hiredis-cluster at the same time.

Own allocator functions can be enabled by the API:
`hiredisAllocFuncs hiredisSetAllocators(hiredisAllocFuncs *ha)`

This commit only covers `free()` and cleanups of freeing data.
Hiredis uses its own allocator-functions which by default uses libc.
By using the same design in hiredis-cluster a user can replace the
allocator for both hiredis and hiredis-cluster at the same time.

Own allocator functions can be enabled by the API:
`hiredisAllocFuncs hiredisSetAllocators(hiredisAllocFuncs *ha)`

- This commit only covers `realloc()`
- Making sure that a realloc returning NULL is handled
Hiredis uses its own allocator-functions which by default uses libc.
By using the same design in hiredis-cluster a user can replace the
allocator for both hiredis and hiredis-cluster at the same time.

Own allocator functions can be enabled by the API:
`hiredisAllocFuncs hiredisSetAllocators(hiredisAllocFuncs *ha)`

- This commit only covers `calloc()`
- Making sure that a calloc() returning NULL is handled
There is no usage of strdup(), and therefor no need for hi_strdup()
Hiredis uses its own allocator-functions which by default uses libc.
By using the same design in hiredis-cluster a user can replace the
allocator for both hiredis and hiredis-cluster at the same time.

Own allocator functions can be enabled by the API:
`hiredisAllocFuncs hiredisSetAllocators(hiredisAllocFuncs *ha)`

- This commit only covers `malloc()`
- Making sure that a malloc() returning NULL is handled
@bjosv bjosv requested a review from zuiderkwast December 3, 2020 23:01
To have a common ground with hiredis regarding allocators the
not so commonly used hi_zalloc() is replaced with a hi_malloc() and
a memset().
Also making sure that hi_malloc() returning NULL is handled.
Copy link
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

dict.c Outdated
Comment on lines 176 to 187
free(ht->table);
hi_free(ht->table);
ht->table = NULL;

/* Re-initialize the table */
_dictReset(ht);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think ht->table = NULL is necessary since _dictReset() does that, but if you think it's clearer style to do it just after hi_free(), we can keep it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good, can remove, always nice with code with a reason. This file actually also exists in hiredis, and without the NULL ine aswell.. I change this
Maybe we should use dict.c from hiredis instead..but that could be a future change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zuiderkwast
Copy link
Collaborator

Btw, can we add the CVE under Security? I've seen it there in other Github projects.

@zuiderkwast
Copy link
Collaborator

Please add "allocation failure handling" in the README, perhaps together with leak corrections. I seems like an important point.

bjosv added 2 commits December 4, 2020 15:35
The iterators performs a malloc under the hood that might
fail due to out of memory. This avoids using a null iterator.
Making sure to set the error string when there is an out of memory
and cleanup the temporary allocated data.
@bjosv bjosv marked this pull request as ready for review December 4, 2020 15:32
return NULL;
}

// A OOM failing realloc()calloc
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will delete "calloc"

hircluster.c Outdated
@@ -4146,7 +4171,6 @@ int redisClusterAsyncFormattedCommand(redisClusterAsyncContext *acc,

cad = cluster_async_data_get();
if (cad == NULL) {
__redisClusterAsyncSetError(acc, REDIS_ERR_OOM, "Out of memory");
goto error;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change this back to OOM, cluster_async_data_get() return NULL when its malloc() fails
goto error; will change to goto oom;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. That's not a very good name of that function. It just allocates and empties a struct... But let's not refactor more than necessary in the same PR.

cc, REDIS_ERR_OOM,
"Slot ref node failed: out of memory.");
goto error;
goto oom;
Copy link
Collaborator Author

@bjosv bjosv Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cluster_slot_ref_node() is not doing any allocations, and oom is checked on arguments before.
Will change back to goto error;"

Wrong, it does allocations, like when using listAddNodeTail.
Seem very unlikely to have wrong role aswell, going for oom as we started with.

__redisClusterSetError(
cc, REDIS_ERR_OOM,
"Slot ref node failed: out of memory.");
goto error;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster_slot_ref_node() is not doing any allocations, and oom is checked on arguments before.
Will change back to goto error;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong, it does allocations, like when using listAddNodeTail.
Seem very unlikely to have wrong role aswell, going for oom as we started with.

The tests are very static and only verifies the handling of the
first allocation in the API functions.
Next step could be to randomize when an allocation fail to make
sure that all allocation failures are handled.
Copy link
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks very good. Sometimes the commit messages could contain more details or so. It's hard to verify what's happening without a lot of grepping (which I assume you have done) and probably we need to extend the tests as you noted.

hircluster.c Outdated
@@ -4146,7 +4171,6 @@ int redisClusterAsyncFormattedCommand(redisClusterAsyncContext *acc,

cad = cluster_async_data_get();
if (cad == NULL) {
__redisClusterAsyncSetError(acc, REDIS_ERR_OOM, "Out of memory");
goto error;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. That's not a very good name of that function. It just allocates and empties a struct... But let's not refactor more than necessary in the same PR.

hircluster.c Outdated
Comment on lines 859 to 862
ret = cluster_slot_ref_node(slot, master);
if (ret != REDIS_OK) {
goto oom;
goto error;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? It returns REDIS_ERR if listCreate() returns NULL (OOM?) or if slot or master is NULL (not OOM, but shouldn't happen?).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a bit hard to follow this, the thing that I hesitate on was if master node always for certain was a master, so I went on error. The difference is the errstr

cluster_slot_ref_node()
..
if (node->role != REDIS_ROLE_MASTER) {
return REDIS_ERR;
}

Comment on lines 10 to 14
// A OOM failing malloc()
static void *hi_malloc_fail(size_t size) {
UNUSED(size);
return NULL;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be extended with 2 global variables: a counter and a setting fail_on_nth_allocation, so we can test failing on the Nth allocation. The counter can be common for all the allocator functions and the test can loop over N. Is there a way to get code coverage so that we can know if every OOM if statement is covered?

Another option is of course to randomize the failures, but it might be more difficult to write predictable test cases that way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, nice. I have done this, and fixed a few things while running. Will push the commit

@@ -32,21 +32,6 @@ void hiarray_destroy(struct hiarray *a) {
hi_free(a);
}

int hiarray_init(struct hiarray *a, uint32_t n, size_t size) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps mention in the commit message that we use hiarray function from hiredis? (if that's what we do)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seems that hiarray.c was copied from hiredis, and this was an unused function that I removed.
hiarray_create() contains what it needs

hircluster.c Outdated
}

ret = cluster_slot_ref_node(slot, master);
if (ret != REDIS_OK) {
goto error;
goto error; // or possibly OOM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a review comment on a previous commit in this PR, you wrote "cluster_slot_ref_node() is not doing any allocations, and oom is checked on arguments before." here. Better write this in a comment here instead of "or possibly OOM".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, cluster_slot_ref_node is doing allocations, but also checking if master has correct role. So its not obvious if role always has REDIS_ROLE_MASTER. Ill improve the comment.

Comment on lines 1031 to +1032
if (ret != DICT_OK) {
// Key already exists, but possibly an OOM error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no way to tell them apart? Can't we make dictAdd() return specific error codes such as DICT_OOM?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its possible, would have to make sure the internals in dict handles it correctly for every case. Can do it in another pr

Copy link
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Also found some issues during the testcases regarding OOM
handling, like a leak during OOM.
Also fixed the order of setting timeout when reconnecting.
@bjosv
Copy link
Collaborator Author

bjosv commented Dec 7, 2020

Thats it.
I think we can wait with the DICT_OOM since it affects internal handling in dict, then since dict is a copy from hiredis we will diverge. Currently its very similar. The current tests make me quite certain that his is behaving as expected.

Copy link
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

Should we use dict and array from hiredis in another PR?

@bjosv
Copy link
Collaborator Author

bjosv commented Dec 7, 2020

That would be nice

@bjosv bjosv merged commit 0f2922a into master Dec 7, 2020
@bjosv bjosv deleted the safe_allocations branch December 7, 2020 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants