Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible SAI API struct *_api_t mismatch #1823

Open
kcudnik opened this issue Jul 13, 2023 · 0 comments
Open

Possible SAI API struct *_api_t mismatch #1823

kcudnik opened this issue Jul 13, 2023 · 0 comments
Assignees
Labels

Comments

@kcudnik
Copy link
Collaborator

kcudnik commented Jul 13, 2023

Since commit #1817 we have backward compatibility with binary ENUM values as well ass sizeof() structs and unions in SAI. One exception is structures sai_*_api_t which can still add additional members at the end of existing structures, which can expand possible API set.

We introduced enum/struct backward compatibility, to be needed in SONiC, since some times not all vendor keep up with implementation development in SONiC. So by introducing backward compatibility point, we can move forward in SONiC, let say use SAI headers v1.14.0, but be backward compatible with v1.12.0 on SAI headers, and supply v1.12.0 vendor binary SAI library and all should work fine. But his rises some problems:

  1. if some new enum attributes were added in v1.14 that are not supported in v1.12, then vendor lib should notify with specific sai_status_t error code, that this attribute is not supported. This is actually no issue.
  2. if some new enum value was introduced in v1.14 that are not supported in v1.12, then also vendor lib should notify with specific sai_status_t error code that this enum value is not supported. This also is not an issue.
  3. between version v1.14 and v1.12 there could be some changes in struct sizes like sai_attribute_value_t, and this is breaking change which would lead to undefined behavior and crash, but since v1.12 we introduced backward compatibility to prevent that. Except sai_*_api_t structs.
  4. main problem is when new api will be added to existing sai_*_api_t struct, let say in v1.14 but don't exist in v1.12. Then SONiC by calling this new api will read some undefined memory, see example below:
// let say at v1.12 vlan api looks like this:

typedef struct _sai_vlan_api_t
{
    sai_create_vlan_fn                  create_vlan;
    sai_remove_vlan_fn                  remove_vlan;
    sai_set_vlan_attribute_fn           set_vlan_attribute;
    sai_get_vlan_attribute_fn           get_vlan_attribute;
    sai_create_vlan_member_fn           create_vlan_member;
    sai_remove_vlan_member_fn           remove_vlan_member;
    sai_set_vlan_member_attribute_fn    set_vlan_member_attribute;
    sai_get_vlan_member_attribute_fn    get_vlan_member_attribute;

} sai_vlan_api_t;

// but on v1.14 looks like this:

typedef struct _sai_vlan_api_t
{
    sai_create_vlan_fn                  create_vlan;
    sai_remove_vlan_fn                  remove_vlan;
    sai_set_vlan_attribute_fn           set_vlan_attribute;
    sai_get_vlan_attribute_fn           get_vlan_attribute;
    sai_create_vlan_member_fn           create_vlan_member;
    sai_remove_vlan_member_fn           remove_vlan_member;
    sai_set_vlan_member_attribute_fn    set_vlan_member_attribute;
    sai_get_vlan_member_attribute_fn    get_vlan_member_attribute;

    // introducing new api members
    sai_get_vlan_stats_fn               get_vlan_stats;
    sai_get_vlan_stats_ext_fn           get_vlan_stats_ext;
    sai_clear_vlan_stats_fn             clear_vlan_stats;

} sai_vlan_api_t;

Now imagine situation, that SONiC is compiled with headers v1.14 which supports vlan_stats, and vendor provided library compiled with v1.12. At this point all enums are binary compatible, all structs (except api) are binary compatible). And since sai_api_query() api returns pointer to some vendor SAI lib memory that holds pointers to specific api functions, is SONiC will try to execute get_vlan_stats api thinking it's v1.14 but actuall it is v1.12, then it will read some unallocated/unpopulated/garbage memory from actual vendor SAI lib memory. This will lead to crash, since that vendor has no idea about future v1.14 api members.

So question arises here, how we will deal with this scenario.

  • we can't compare get_vlan_stats api pointer with NULL to check if it exists, since we already reading some undefined memory, and there is huge chance that this memory is already populated with some garbage values which are not null, which will lead to crash, or this struct is defined at the end of valid memory page, and will cause page fault exception when reading.
  • we could ask vendors to add zero padding for each api struct, so we could actually read that NULL after the struct, but how big this padding should be ? This solution seems ugly, since we waist some space, each vendor in each library can have different number of space padding allocated, and this will get hard to maintain in the long run.
  • only working solution (but not easy to execute) that i can think of, is to introduce new sai API, that will return size of the current api struct. for example:
sai_status_t sai_query_api_size(
    _In_ sai_api_t api
    _Out_ sai_size_t* size)
{
    // implementation
    switch(api)
    {
    case SAI_API_UNSPECIFIED:
        return SAI_STATUS_NOT_SUPPORTED;
    case SAI_API_SWITCH:
        *size = sizeof(sai_switch_api_t);
        return SAI_STATUS_SUCCESS;
    case SAI_API_PORT:
        *size = sizeof(sai_port_api_t);
        return SAI_STATUS_SUCCESS;
...
    default:
        return SAI_STATUS_NOT_SUPPORTED;
}

Downside of this solution is, that each vendor would need to have this implemented and keep track of each new API when added.
Since those sizeof() are executed at compile time at vendor side, they contain actual size of api struct that vendor was using.

Unfortunately use case is actually ugly to write:

size_t off = offsetof(sai_vlan_api_t, get_vlan_stats); // offset of specific api we want to call

bool safeToExecute = false;

sai_size_t size;

if (sai_query_api_size(SAI_API_VLAN, &size) == SAI_STATUSS_SUCCESS)
{
    safeToExecute = (size > off);
}

if (safeToExecute)
{
    // execute actual vlan stat api
}

So as we can see a lot of code, just to make sure we are safe, And we need to know 3 things here:

  • api name in the struct
  • struct name
  • and api enum name

At user level this is too much to add just to make sure, i was thinking in a way, that SONiC uses libsairedis.so anyway, so this library is affected, but syncd is linking against libsai.so directly, but it's not executed directly it make call's via VendorSai.cpp which is wrapper c++ wrapper for all C api functions exposed from vandor libsai.so,

So maybe there is a way to actually make this ugly check inside VendorSai.cpp and not pollute any other code. I am thinking on some even automated implementation that could be added into metadata, generated size structures based on api version like v1.12 v1.14 etc, so each metadata would contain each api struct with it size, so actually no vendor and SONiC would need to keep track of that, and actual check could be done inside auto generated metadata itself, if api would simply be too "new" then we would return not implemented. The only problem here would be that this check would be done deep deep inside syncd at actual api execution time, instead of orchagent level to actually make decision wheter we want to execute this non existing api or not.

Other approach which i was thinking, to have api struct sizes dumped via numbers by sai version, for example dump all sizeof structres for version v1.12 then v1.14 and generate those sizes in sai metadata during compilation. Then use new added metadata function to validate whether it's possible to execute specific api, using current header version and version returned from vendro sai_query_api_version() api. But it would still evolve around similar check, since somehow we still need to know the offset of the given api, whether it extend over given struct or not. That's why in the example we use offsetof() operator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants