-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for fatal error monitoring #98
base: integ_sp7
Are you sure you want to change the base?
Conversation
7287f07
to
8c8c3ab
Compare
@abinayaddhandapani Can you please run ./../openbmc-build-scripts/scripts/format-code.sh tool . In your devbox you can do docker setup and run this tool from docker. |
amd-bmc-ras$ ./../openbmc-build-scripts/scripts/format-code.sh |
@abinayaddhandapani Direction on Header file inclusion:
Refer https://github.com/openbmc/docs/blob/master/cpp-style-and-conventions.md for more details. |
2b48bca
to
a1de8ec
Compare
adhandap@ebcc56e935b6:/mnt/work2/adhandap/Code/amd-bmc-ras$ ../openbmc-build-scripts/scripts/format-code.sh |
@ojayanth adhandap@ebcc56e935b6:/mnt/work2/adhandap/Code/amd-bmc-ras$ ../openbmc-build-scripts/scripts/format-code.sh |
@ojayanth Moved the header files to the corresponding source files instead of adding it in ras.hpp file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review still in progress
endif | ||
|
||
executable( | ||
'amd-bmc-ras', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can get better executable name , not a project a name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amd-bmc-ras seems to be more suitable for the application. Few more options would be ADDC, amd-crashdump.
If you have any better suggestion , please suggest
Still seeing lot of unused constants varible definition in the commit some |
inc/ras.hpp
Outdated
static const int EPYC_PROG_SEG_ID = 0x01; | ||
static const int MI_PROG_SEG_ID = 0x02; | ||
static const int MAX_ERROR_FILE = 10; | ||
static const std::string RAS_DIR = "/var/lib/amd-ras/"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this as part of meson option , will help user to modify path if required. , and other varibales can define as exxtension of this
@abinayaddhandapani please run format-code.sh commit specific and strt fixing format issues specific to commit . Also move clang cofig file before adding source code commit. |
7a3942d
to
6a1908b
Compare
@@ -0,0 +1,61 @@ | |||
#include "config_manager.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- OpenBMC CPP Coding guide line available here https://github.com/openbmc/docs/blob/master/cpp-style-and-conventions.md, please try to follow the guidelines for File/Function/Variables Naming , prologue etc.
Eg:
/**
- @brief Sets the threshold value for the temperature sensor.
- This function sets the threshold value for the temperature sensor. The
- threshold value is used to trigger alerts when the temperature exceeds
- the specified limit.
- @param[in] sensorId The ID of the sensor.
- @param[in] threshold The threshold value to set.
- @return True if the threshold was set successfully, false otherwise.
*/
bool setTemperatureThreshold(int sensorId, int threshold)
{
// Function implementation
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not addressed in the latest patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the comments as metioned in the above format wherever possible.
The comments where added as part of the function declaration in the classes.
src/main.cpp
Outdated
rasManagerObj->configure(); | ||
|
||
#endif | ||
/* TODO: Enable PLDM : Create rasManagerObj pointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either you should report build failure or Log error message saying support is not available till PLDM support is in place . This code can messup , when user enabled PLDM falg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
# It is expected that these 4 sections will be listed in the order above and | ||
# data within them will be kept sorted. | ||
|
||
owners: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing. Add your's and Jayanth's name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
@@ -0,0 +1,201 @@ | |||
Apache License |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apache licensing is fine as we have used that before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
inc/ras.hpp
Outdated
static const int BLOCK_ID_38 = 38; | ||
static const int BLOCK_ID_39 = 39; | ||
static const int BLOCK_ID_40 = 40; | ||
static const int MI300A_MODEL_NUMBER = 0x90; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong in generic ras.hpp file. Need a platform file to define this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
inc/ras.hpp
Outdated
|
||
static const int RESERVED = 0; | ||
static const int BASE_16 = 16; | ||
static const int TWO_SOCKET = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong in generic ras.hpp file. Need a platform file to define this
inc/ras.hpp
Outdated
static const int BLOCK_ID_39 = 39; | ||
static const int BLOCK_ID_40 = 40; | ||
static const int MI300A_MODEL_NUMBER = 0x90; | ||
static const int MI300C_MODEL_NUMBER = 0x80; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong in generic ras.hpp file. Need a platform file to define this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
inc/ras.hpp
Outdated
static const int BLOCK_ID_40 = 40; | ||
static const int MI300A_MODEL_NUMBER = 0x90; | ||
static const int MI300C_MODEL_NUMBER = 0x80; | ||
static const int TURIN_FAMILY_ID = 0x1A; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong in generic ras.hpp file. Need a platform file to define this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
inc/ras.hpp
Outdated
static const int MI300A_MODEL_NUMBER = 0x90; | ||
static const int MI300C_MODEL_NUMBER = 0x80; | ||
static const int TURIN_FAMILY_ID = 0x1A; | ||
static const int GENOA_FAMILY_ID = 0x19; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong in generic ras.hpp file. Need a platform file to define this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
inc/ras.hpp
Outdated
static const int MI300C_MODEL_NUMBER = 0x80; | ||
static const int TURIN_FAMILY_ID = 0x1A; | ||
static const int GENOA_FAMILY_ID = 0x19; | ||
static const int EPYC_PROG_SEG_ID = 0x01; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong in generic ras.hpp file. Need a platform file to define this.
inc/ras.hpp
Outdated
static const int TURIN_FAMILY_ID = 0x1A; | ||
static const int GENOA_FAMILY_ID = 0x19; | ||
static const int EPYC_PROG_SEG_ID = 0x01; | ||
static const int MI_PROG_SEG_ID = 0x02; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong in generic ras.hpp file. Need a platform file to define this
- Added README.md with project overview. - Added OWNERS file to define project ownership and contributors. - Added LICENSE file to specify project licensing terms. Signed-off-by: Abinaya Dhandapani <[email protected]>
Added clang-18 formatter. Signed-off-by: Abinaya Dhandapani <[email protected]>
6b63ae9
to
f831b26
Compare
d274822
to
d0b4bf4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like still missing Key reveiw comments from previosu reveiws , can you please look my comments and respond with your plan , like ack/reject ?
"ApmlRetries": { | ||
"Description": "Number of APML retry count", | ||
"Value": 10, | ||
"MaxBoundLimit": "50" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really going to try 50 times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should allow max bound limit as 50 as internal validation team may set that as high for some rare case failures and debug it.
} | ||
}, | ||
{ | ||
"SystemRecovery": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SystemRecoveryMode ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RAS twiki page mentions SystemRecovery every wherehttps://twiki.amd.com/twiki/bin/view/SIG/Sp5OobRasSpec.
I hope we should also align with the same naming
}, | ||
{ | ||
"ResetSignal": { | ||
"Description": "Reset Signal Type", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it Type or Name ? if it is type it should support mutiple right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset signal can be of 2 types - SYS_RST or RSMRST reset. By default it is SYS_RST
"0x2c", | ||
"0x08", | ||
"0x0c", | ||
"null", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is null indicates here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the deafult values [30h, 34h, 28h, 2Ch, 08h, 0Ch, null, null], which specifies an ordered list of 4-byte aligned offsets to compute the failure signature ID
@@ -0,0 +1,61 @@ | |||
#include "config_manager.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not addressed in the latest patch.
@@ -0,0 +1,48 @@ | |||
#include "cper_generator.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will continue reveiw Only after addressing all the previous comments related coding style specific.
d0b4bf4
to
639b848
Compare
- The application monitors APML_ALERT_L and upon assertion of the GPIO PIN , BMC collects the MCA MSR dump and creates CPER record. - Depending on the user configuration , the application initiates system recovery. - ras-config.json contains the configuration parameters with default value. User can get and set the configuration parameters using d-bus calls to the methods getAttribute and setAttribute. - The application also contains harvesting of last transaction address, debug log ID's. - The application is intended to be supported for 1P and 2P platforms. Tested fields: Tested in sp5 platform. root@sp5:/var/lib/amd-ras# ls current_index ras-error0.cper ras-error2.cper ras-error4.cper ras-error6.cper ras-error8.cper ras-config.json ras-error1.cper ras-error3.cper ras-error5.cper ras-error7.cper ras-error9.cper Signed-off-by: Abinaya Dhandapani <[email protected]>
639b848
to
a13485a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making some Good progress , still need some more changes to get OPenBMC coding style . Added high level functional comments. Still seeing updated function prologues, . better get organize the code now and you can start fixing the previous comments. Also noticed Not follwing Naming conventions. - refer https://grok.openbmc.org/xref/openbmc/bmcweb/.clang-tidy?r=0771a264#378 for expeceted format.
limitations under the License. | ||
*/ | ||
|
||
#include "apml_manager.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be part if #ifdef APML
|
||
#include <fstream> | ||
|
||
constexpr int EPYC_PROG_SEG_ID = 0x01; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General direction on constants handling ;
-
Use constexpr for Compile-Time Constants
Definition: Use constexpr for constants that can be evaluated at compile time.
Scope: Define constexpr constants in the smallest scope possible, such as within a function or a class, to limit their visibility. -
Use static constexpr for Class-Level Constants
Definition: Use static constexpr for constants that belong to a class but should not be tied to any instance of the class. -
Namespace-Level Constants
Definition: Use namespaces to group related constants together, which helps in organizing code and avoiding name conflicts. -
Inline Variables for Header Files (C++17 and Later)
Definition: Use inline with constexpr for constants defined in header files to ensure there is only one definition across translation units. -
Avoid Magic Numbers
Definition: Replace magic numbers with named constants to improve code readability and maintainability. -
Use const for Non-Compile-Time Constants
Definition: Use const for constants that cannot be evaluated at compile time but should not change during runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
Serviceability) configurations.*/ | ||
class RasManagerBase : public RasConfiguration | ||
{ | ||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define default constructors explicitly if needed, and delete constructors that should not be used. Refer OpenBMCsource code related to class definition header files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
* | ||
* Virtual destructor ensures proper cleanup of derived class objects. | ||
*/ | ||
virtual ~RasManagerBase(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It will be good to keep, just after std constructor list , refer openbmc code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
protected: | ||
boost::asio::io_service& io; | ||
uint8_t numOfCpu; | ||
CpuId* cpuId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General: Replaced raw pointers with std::unique_ptr for cpuId, uCode, and ppin to manage ownership and prevent memory leaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
*/ | ||
virtual ~RasManagerBase(); | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File Scope: This approach seems to be Not extendable design, Like creating processor specific function, What is the limitation for creating common function, and based on processor number act on it?
constexpr char CPU_INVENTORY_INTERFACE[] = | ||
"xyz.openbmc_project.Inventory.Item.Cpu"; | ||
constexpr char COMMAND_NUM_OF_CPU[] = "/sbin/fw_printenv -n num_of_cpu"; | ||
static const std::string COMMAND_BOARD_ID = "/sbin/fw_printenv -n board_id"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced static const std::string with constexpr std::string_view for COMMAND_BOARD_ID. std::string_view is a lightweight, non-owning view of a string, which is more efficient for read-only string data2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
* | ||
* @return True if there is a match, false otherwise. | ||
*/ | ||
bool compare_with_bitwise_AND(const uint32_t* values, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to utility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
* | ||
* @return A vector containing converted values. | ||
*/ | ||
std::vector<uint32_t> hexstring_to_vector(const std::string& hexString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to util
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
@@ -0,0 +1,181 @@ | |||
#include "cper_generator.hpp" | |||
#include "interface_manager_base.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General: Enable Namespace based Code organization:
OpenBMC code is extensively used Namespaces in C++, a powerful feature for organizing code and preventing name conflicts, especially in large projects. Here’s a detailed look at namespace-based design in C++: Please refer any of the OpenBMC repo to see this approach .
Example:
// File: sensor_manager.h
namespace openbmc_project {
namespace sensor {
namespace manager {
class SensorManager {
public:
SensorManager();
void initialize();
private:
void readSensorData();
};
} // namespace manager
} // namespace sensor
} // names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
#include "cper_generator.hpp" | ||
#include "interface_manager_base.hpp" | ||
|
||
class ApmlInterfaceManager : public RasManagerBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should use for implementing Interface function Logic using APML library API's
APML Librarry API's should move to APML library class/namaespace based on implmention front. this approch will help to organize and manage beter way.
template <typename T> | ||
class CperGenerator | ||
{ | ||
protected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be consistent on public/private/protectd definition sequence in the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
current_index ras-error0.cper ras-error2.cper ras-error4.cper ras-error6.cper ras-error8.cper
ras-config.json ras-error1.cper ras-error3.cper ras-error5.cper ras-error7.cper ras-error9.cper
root@sp5:/var/lib/amd-ras# hexdump -C ras-error0.cper
00000000 43 50 45 52 00 01 ff ff ff ff 02 00 01 00 00 00 |CPER............|
00000010 03 00 00 00 10 01 02 00 07 1a 11 01 0f 0b 18 15 |................|
00000020 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 ac 3f fa 61 80 cb 92 42 8b fb d6 43 b1 de 17 f4 |.?.a...B...C....|
00000050 fe 6f f5 e8 9c 91 c5 4c ba 88 65 ab e1 49 13 bb |.o.....L..e..I..|
00000060 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 10 01 00 00 00 00 01 00 01 11 03 00 01 00 00 00 |................|
00000090 78 0c ac 32 23 26 f6 48 b0 d0 73 65 72 5f d6 ae |x..2#&.H..ser_..|
000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000b0 01 00 00 00 50 30 00 00 00 00 00 00 00 00 00 00 |....P0..........|
000000c0 00 00 00 00 00 00 00 00 10 01 01 00 00 00 01 00 |................|
000000d0 01 11 03 00 01 00 00 00 78 0c ac 32 23 26 f6 48 |........x..2#&.H|
000000e0 b0 d0 73 65 72 5f d6 ae 00 00 00 00 00 00 00 00 |..ser_..........|
000000f0 00 00 00 00 00 00 00 00 01 00 00 00 50 31 00 00 |............P1..|
00000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000110 07 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000120 11 00 00 00 0b 00 ff 00 00 00 00 00 00 00 00 00 |................|
00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000150 e0 00 00 4d 00 00 00 00 00 70 b1 13 18 00 00 00 |...M.....p......|
00000160 0b 01 00 00 00 00 a0 ba 00 00 00 00 00 00 00 00 |................|
00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000190 01 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001a0 3f 00 00 00 00 00 00 00 0b 01 00 00 00 00 a0 ba |?...............|
000001b0 00 00 00 00 00 00 00 00 00 00 00 00 f6 0f 0c d0 |................|
000001c0 fd 01 00 00 27 00 00 00 00 70 b1 13 18 00 00 00 |....'....p......|
000001d0 e0 00 00 4d 00 00 00 00 00 00 00 00 00 00 00 00 |...M............|
000001e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00020110