Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sairedis/syncd] Implement bulk get support #1509

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

stepanblyschak
Copy link
Contributor

@stepanblyschak stepanblyschak commented Jan 24, 2025

DEPENDS ON sonic-net/sonic-swss-common#966.

Implement bulk get support for SAIRedis, Syncd, SaiPlayer and VSlib.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).


status = meta_generic_validation_get(meta_key, attr_count[idx], attr_list[idx]);

// FIXME: This macro returns on failure.

Check notice

Code scanning / CodeQL

FIXME comment Note

FIXME comment: This macro returns on failure.
Comment on lines 2266 to 2359
switch (api)
{
case SAI_COMMON_API_BULK_CREATE:

{
sai_object_id_t switch_id = m_sai->switchIdQuery(local_ids[0]);
std::vector<sai_object_id_t> ids(object_count);

for (uint32_t it = 0; it < object_count; it++)
{
if (m_sai->switchIdQuery(local_ids[it]) != switch_id ||
switch_id == SAI_NULL_OBJECT_ID)
{
SWSS_LOG_THROW("invalid switch_id translated from VID %s",
sai_serialize_object_id(local_ids[it]).c_str());
}
}

std::vector<uint32_t> attr_counts(object_count);

std::vector<const sai_attribute_t*> attr_lists(object_count);

for (uint32_t idx = 0; idx < object_count; idx++)
{
attr_counts[idx] = attributes[idx]->get_attr_count();
attr_lists[idx] = attributes[idx]->get_attr_list();
}

switch_id = translate_local_to_redis(switch_id);

status = m_sai->bulkCreate(object_type,
switch_id,
object_count,
attr_counts.data(),
attr_lists.data(),
mode,
ids.data(),
statuses.data());

if (status == SAI_STATUS_SUCCESS)
{
for (uint32_t it = 0; it < object_count; it++)
{
match_redis_with_rec(ids[it], local_ids[it]);

SWSS_LOG_INFO("saved VID %s to RID %s",
sai_serialize_object_id(local_ids[it]).c_str(),
sai_serialize_object_id(ids[it]).c_str());
}
}

return status;
}
break;

case SAI_COMMON_API_BULK_REMOVE:

{
std::vector<sai_object_id_t> ids(object_count);

for (uint32_t it = 0; it < object_count; it++)
{
ids[it] = translate_local_to_redis(local_ids[it]);
}

status = m_sai->bulkRemove(object_type, object_count, ids.data(), mode, statuses.data());
}
break;

case SAI_COMMON_API_BULK_SET:

{
std::vector<sai_object_id_t> ids(object_count);

for (uint32_t it = 0; it < object_count; it++)
{
ids[it] = translate_local_to_redis(local_ids[it]);
}

std::vector<sai_attribute_t> attr_list;

// route can have multiple attributes, so we need to handle them all
for (const auto &alist: attributes)
{
attr_list.push_back(alist->get_attr_list()[0]);
}

status = m_sai->bulkSet(object_type, object_count, ids.data(), attr_list.data(), mode, statuses.data());
}
break;

case SAI_COMMON_API_BULK_GET:

{

Check notice

Code scanning / CodeQL

Long switch case Note

Switch has at least one case that is too long:
SAI_COMMON_API_BULK_CREATE (52 lines)
.
Comment on lines +2363 to +2371
switch (api)
{
case SAI_COMMON_API_BULK_GET:
sendBulkGetResponse(objectType, objectIds, all, attributes, statuses);
break;
default:
sendApiResponse(api, all, (uint32_t)objectIds.size(), statuses.data());
break;
}

Check notice

Code scanning / CodeQL

No trivial switch statements Note

This switch statement should either handle more cases, or be rewritten as an if statement.
Comment on lines +666 to +675
switch (object_type)
{
case SAI_OBJECT_TYPE_PORT:
ptr = m_apis.port_api->get_ports_attribute;
break;

SWSS_LOG_ERROR("not implemented, FIXME");
default:
SWSS_LOG_ERROR("not implemented %s, FIXME", sai_serialize_object_type(object_type).c_str());
return SAI_STATUS_NOT_IMPLEMENTED;
}

Check notice

Code scanning / CodeQL

No trivial switch statements Note

This switch statement should either handle more cases, or be rewritten as an if statement.

// FIXME: This macro returns on failure.
// When mode is SAI_BULK_OP_ERROR_MODE_IGNORE_ERROR we should continue instead of return.
// This issue exists for all bulk operations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fix it here for this api ? since GET operation is specific, so maybe some attributes maybe not implemented and they will definitely return failure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify? Attribute may be not implemented with SET as well.

@@ -2522,17 +2629,17 @@ int SaiPlayer::replay()

SWSS_LOG_NOTICE("using file: %s", filename.c_str());

std::ifstream infile(filename);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this logic is changed ? not need for this variable to be member of class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to use m_infile in processBulk method

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see that, but why the reason ? since this is local variable no need to upgrade it to member

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SaiPlayer operates on input stream, the input stream is feeding multiple functions with data throuhgout this operation. From that point of view the lifetime of the input stream needs to match the lifetime of SaiPlayer, thus fits to be a member variable. I did this change for ergonomics primaraly - not having to pass the input stream explicitelly everywhere it is needed in downstream code. Besides, passing an input stream explicitelly implies a member function may operate on different input streams throughout SaiPlayer's lifetime which is not the case.
I can make a function parameter though if you prefer so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcudnik can be resolved?

@@ -259,6 +260,8 @@ namespace saiplayer

std::shared_ptr<CommandLineOptions> m_commandLineOptions;

std::ifstream m_infile;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is moved to member ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to use m_infile in processBulk method

@@ -4,27 +4,3 @@
#include <gtest/gtest.h>

using namespace sairedis;

TEST(RedisRemoteSaiInterface, bulkGet)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't remove tests, implement correct TEST to validate your code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure what this test actually checks. It seems the test was added to satisfy code coverage rather then check functionality. As for functioanlity check and code coverage I added tests to TestSyncdBrcm and added player tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcudnik Please share your thoughts on what should be in this test case instead and if needed at all since the code is covered by a set of other tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is not your tests, then why you are removing it ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you added code that changes ouyput of this tests, that now it can succeed, then please update test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcudnik Yes, the output of this call changes from returning SAI_STATUS_NOT_IMPLEMENTED to throwing an exception waitForBulkGetResponse: wrong number of statuses, got 0, expected 1 after a 60 sec timeout because no one is responding, so I don't get the reason for such a test. We have bulkGet covered by player test and syncd test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update this test to run correctly

Copy link
Collaborator

@kcudnik kcudnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

address comments

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stepanblyschak
Copy link
Contributor Author

Tests are failing due to build using older swss-common which is missing PR sonic-net/sonic-swss-common#966.

Test is using swss-common built here - https://dev.azure.com/mssonic/build/_build/results?buildId=757465 which is from a previous commit.

Waiting for a new swss-common build

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kcudnik
Copy link
Collaborator

kcudnik commented Feb 13, 2025

please r esolve conflicts

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kcudnik
Copy link
Collaborator

kcudnik commented Feb 20, 2025

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kcudnik
Copy link
Collaborator

kcudnik commented Feb 20, 2025

since that PR is implementing Bulk Support, in test file: ~/sr/unittest/lib/TestClientServerSai.cpp there is this test:

213 SAIREDIS_DECLARE_EVERY_ENTRY(TEST_ENTRY)
214
215 #define TEST_BULK_ENTRY(OT,ot)                                                                                               \
216 TEST(ClientServerSai, bulk_ ## OT)                                                                                           \
217 {                                                                                                                            \
218     auto css = std::make_shared<ClientServerSai>();                                                                          \
219     sai_ ## ot ## _t e[2] = {};                                                                                              \
220     EXPECT_EQ(SAI_STATUS_SUCCESS, css->apiInitialize(0, &test_services));                                                       \
221     EXPECT_NE(SAI_STATUS_SUCCESS, css->bulkCreate(0, e, nullptr, nullptr, SAI_BULK_OP_ERROR_MODE_IGNORE_ERROR, nullptr));    \
222     EXPECT_NE(SAI_STATUS_SUCCESS, css->bulkSet(2, e, nullptr, SAI_BULK_OP_ERROR_MODE_IGNORE_ERROR, nullptr));                \
223     EXPECT_NE(SAI_STATUS_SUCCESS, css->bulkRemove(2, e, SAI_BULK_OP_ERROR_MODE_IGNORE_ERROR, nullptr));                      \
224                                                                                                                              \
225     auto ss = std::make_shared<ClientServerSai>();                                                                           \
226     EXPECT_EQ(SAI_STATUS_SUCCESS, ss->apiInitialize(0, &test_client_services));                                                 \
227     EXPECT_NE(SAI_STATUS_SUCCESS, ss->bulkCreate(0, e, nullptr, nullptr, SAI_BULK_OP_ERROR_MODE_IGNORE_ERROR, nullptr));     \
228     EXPECT_NE(SAI_STATUS_SUCCESS, ss->bulkSet(0, e, nullptr, SAI_BULK_OP_ERROR_MODE_IGNORE_ERROR, nullptr));                 \
229     EXPECT_NE(SAI_STATUS_SUCCESS, ss->bulkRemove(0, e, SAI_BULK_OP_ERROR_MODE_IGNORE_ERROR, nullptr));                       \
230 }

which expects that all bulk apis will fail, and seems like the changes to bulk PR causing this test to fail with:

[ RUN ] ClientServerSai.DIRECTION_LOOKUP_ENTRY
Assertion failed: _owned.empty () (src/own.cpp:197)
/bin/bash: line 6: 27699 Aborted (core dumped) ${dir}$tst
FAIL: tests

@Stepan Blyshchak can you check that locally ?

@mssonicbld
Copy link
Collaborator

/azp run

@kcudnik
Copy link
Collaborator

kcudnik commented Feb 20, 2025

i narrowed this down to : https://gitea.atria-soft.org/generic-library/libzmq/src/commit/092ad50b0c32174cdffdd45b5516b50237a8315d/src/own.cpp
this line:

Assertion failed: _owned.empty () (src/own.cpp:197)

is from zmq library, but beats me why this assert is crashing 😕

@kcudnik
Copy link
Collaborator

kcudnik commented Feb 20, 2025

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kcudnik
Copy link
Collaborator

kcudnik commented Feb 21, 2025

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kcudnik
Copy link
Collaborator

kcudnik commented Feb 21, 2025

some of the tests are not passing:

Replay BCM56850/port_bulk_get.rec
terminate called after throwing an instance of 'std::runtime_error'
  what():  :- waitForBulkGetResponse: wrong number of statuses, got 0, expected 2
player BCM56850/port_bulk_get.rec: exitcode: 134:
FAIL: BCM56850.pl

this seems like implementation bulk error

@stepanblyschak
Copy link
Contributor Author

@kcudnik I am familiar with this failure, it requires sonic-swss-common PR sonic-net/sonic-swss-common#966. The problem is - there is no new sonic-swss-common build that includes this PR as I pointed out in previous comment:

Tests are failing due to build using older swss-common which is missing PR sonic-net/sonic-swss-common#966.

Test is using swss-common built here - https://dev.azure.com/mssonic/build/_build/results?buildId=757465 which is from a previous commit.

Waiting for a new swss-common build

There's no new sonic-swss-common build for few weeks because of different failures in sonic-swss-common (one of which was the mentioned assertion in zmq lib) and I do not have a way to restart sonic-swss-common build:

image

@kcudnik
Copy link
Collaborator

kcudnik commented Feb 21, 2025

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants