You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since commit #1817 we have backward compatibility with binary ENUM values as well ass sizeof() structs and unions in SAI. One exception is structures sai_*_api_t which can still add additional members at the end of existing structures, which can expand possible API set.
We introduced enum/struct backward compatibility, to be needed in SONiC, since some times not all vendor keep up with implementation development in SONiC. So by introducing backward compatibility point, we can move forward in SONiC, let say use SAI headers v1.14.0, but be backward compatible with v1.12.0 on SAI headers, and supply v1.12.0 vendor binary SAI library and all should work fine. But his rises some problems:
if some new enum attributes were added in v1.14 that are not supported in v1.12, then vendor lib should notify with specific sai_status_t error code, that this attribute is not supported. This is actually no issue.
if some new enum value was introduced in v1.14 that are not supported in v1.12, then also vendor lib should notify with specific sai_status_t error code that this enum value is not supported. This also is not an issue.
between version v1.14 and v1.12 there could be some changes in struct sizes like sai_attribute_value_t, and this is breaking change which would lead to undefined behavior and crash, but since v1.12 we introduced backward compatibility to prevent that. Except sai_*_api_t structs.
main problem is when new api will be added to existing sai_*_api_t struct, let say in v1.14 but don't exist in v1.12. Then SONiC by calling this new api will read some undefined memory, see example below:
// let say at v1.12 vlan api looks like this:typedefstruct_sai_vlan_api_t
{
sai_create_vlan_fncreate_vlan;
sai_remove_vlan_fnremove_vlan;
sai_set_vlan_attribute_fnset_vlan_attribute;
sai_get_vlan_attribute_fnget_vlan_attribute;
sai_create_vlan_member_fncreate_vlan_member;
sai_remove_vlan_member_fnremove_vlan_member;
sai_set_vlan_member_attribute_fnset_vlan_member_attribute;
sai_get_vlan_member_attribute_fnget_vlan_member_attribute;
} sai_vlan_api_t;
// but on v1.14 looks like this:typedefstruct_sai_vlan_api_t
{
sai_create_vlan_fncreate_vlan;
sai_remove_vlan_fnremove_vlan;
sai_set_vlan_attribute_fnset_vlan_attribute;
sai_get_vlan_attribute_fnget_vlan_attribute;
sai_create_vlan_member_fncreate_vlan_member;
sai_remove_vlan_member_fnremove_vlan_member;
sai_set_vlan_member_attribute_fnset_vlan_member_attribute;
sai_get_vlan_member_attribute_fnget_vlan_member_attribute;
// introducing new api memberssai_get_vlan_stats_fnget_vlan_stats;
sai_get_vlan_stats_ext_fnget_vlan_stats_ext;
sai_clear_vlan_stats_fnclear_vlan_stats;
} sai_vlan_api_t;
Now imagine situation, that SONiC is compiled with headers v1.14 which supports vlan_stats, and vendor provided library compiled with v1.12. At this point all enums are binary compatible, all structs (except api) are binary compatible). And since sai_api_query() api returns pointer to some vendor SAI lib memory that holds pointers to specific api functions, is SONiC will try to execute get_vlan_stats api thinking it's v1.14 but actuall it is v1.12, then it will read some unallocated/unpopulated/garbage memory from actual vendor SAI lib memory. This will lead to crash, since that vendor has no idea about future v1.14 api members.
So question arises here, how we will deal with this scenario.
we can't compare get_vlan_stats api pointer with NULL to check if it exists, since we already reading some undefined memory, and there is huge chance that this memory is already populated with some garbage values which are not null, which will lead to crash, or this struct is defined at the end of valid memory page, and will cause page fault exception when reading.
we could ask vendors to add zero padding for each api struct, so we could actually read that NULL after the struct, but how big this padding should be ? This solution seems ugly, since we waist some space, each vendor in each library can have different number of space padding allocated, and this will get hard to maintain in the long run.
only working solution (but not easy to execute) that i can think of, is to introduce new sai API, that will return size of the current api struct. for example:
Downside of this solution is, that each vendor would need to have this implemented and keep track of each new API when added.
Since those sizeof() are executed at compile time at vendor side, they contain actual size of api struct that vendor was using.
Unfortunately use case is actually ugly to write:
size_t off = offsetof(sai_vlan_api_t, get_vlan_stats); // offset of specific api we want to callbool safeToExecute = false;
sai_size_t size;
if (sai_query_api_size(SAI_API_VLAN, &size) == SAI_STATUSS_SUCCESS)
{
safeToExecute = (size > off);
}
if (safeToExecute)
{
// execute actual vlan stat api
}
So as we can see a lot of code, just to make sure we are safe, And we need to know 3 things here:
api name in the struct
struct name
and api enum name
At user level this is too much to add just to make sure, i was thinking in a way, that SONiC uses libsairedis.so anyway, so this library is affected, but syncd is linking against libsai.so directly, but it's not executed directly it make call's via VendorSai.cpp which is wrapper c++ wrapper for all C api functions exposed from vandor libsai.so,
So maybe there is a way to actually make this ugly check inside VendorSai.cpp and not pollute any other code. I am thinking on some even automated implementation that could be added into metadata, generated size structures based on api version like v1.12 v1.14 etc, so each metadata would contain each api struct with it size, so actually no vendor and SONiC would need to keep track of that, and actual check could be done inside auto generated metadata itself, if api would simply be too "new" then we would return not implemented. The only problem here would be that this check would be done deep deep inside syncd at actual api execution time, instead of orchagent level to actually make decision wheter we want to execute this non existing api or not.
Other approach which i was thinking, to have api struct sizes dumped via numbers by sai version, for example dump all sizeof structres for version v1.12 then v1.14 and generate those sizes in sai metadata during compilation. Then use new added metadata function to validate whether it's possible to execute specific api, using current header version and version returned from vendro sai_query_api_version() api. But it would still evolve around similar check, since somehow we still need to know the offset of the given api, whether it extend over given struct or not. That's why in the example we use offsetof() operator
The text was updated successfully, but these errors were encountered:
Since commit #1817 we have backward compatibility with binary ENUM values as well ass sizeof() structs and unions in SAI. One exception is structures sai_*_api_t which can still add additional members at the end of existing structures, which can expand possible API set.
We introduced enum/struct backward compatibility, to be needed in SONiC, since some times not all vendor keep up with implementation development in SONiC. So by introducing backward compatibility point, we can move forward in SONiC, let say use SAI headers v1.14.0, but be backward compatible with v1.12.0 on SAI headers, and supply v1.12.0 vendor binary SAI library and all should work fine. But his rises some problems:
Now imagine situation, that SONiC is compiled with headers v1.14 which supports vlan_stats, and vendor provided library compiled with v1.12. At this point all enums are binary compatible, all structs (except api) are binary compatible). And since sai_api_query() api returns pointer to some vendor SAI lib memory that holds pointers to specific api functions, is SONiC will try to execute get_vlan_stats api thinking it's v1.14 but actuall it is v1.12, then it will read some unallocated/unpopulated/garbage memory from actual vendor SAI lib memory. This will lead to crash, since that vendor has no idea about future v1.14 api members.
So question arises here, how we will deal with this scenario.
Downside of this solution is, that each vendor would need to have this implemented and keep track of each new API when added.
Since those sizeof() are executed at compile time at vendor side, they contain actual size of api struct that vendor was using.
Unfortunately use case is actually ugly to write:
So as we can see a lot of code, just to make sure we are safe, And we need to know 3 things here:
At user level this is too much to add just to make sure, i was thinking in a way, that SONiC uses libsairedis.so anyway, so this library is affected, but syncd is linking against libsai.so directly, but it's not executed directly it make call's via VendorSai.cpp which is wrapper c++ wrapper for all C api functions exposed from vandor libsai.so,
So maybe there is a way to actually make this ugly check inside VendorSai.cpp and not pollute any other code. I am thinking on some even automated implementation that could be added into metadata, generated size structures based on api version like v1.12 v1.14 etc, so each metadata would contain each api struct with it size, so actually no vendor and SONiC would need to keep track of that, and actual check could be done inside auto generated metadata itself, if api would simply be too "new" then we would return not implemented. The only problem here would be that this check would be done deep deep inside syncd at actual api execution time, instead of orchagent level to actually make decision wheter we want to execute this non existing api or not.
Other approach which i was thinking, to have api struct sizes dumped via numbers by sai version, for example dump all sizeof structres for version v1.12 then v1.14 and generate those sizes in sai metadata during compilation. Then use new added metadata function to validate whether it's possible to execute specific api, using current header version and version returned from vendro sai_query_api_version() api. But it would still evolve around similar check, since somehow we still need to know the offset of the given api, whether it extend over given struct or not. That's why in the example we use offsetof() operator
The text was updated successfully, but these errors were encountered: