Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AP_ESC_Telem: support Extended DShot Telemetry v2 #26321

Merged
merged 5 commits into from
May 15, 2024

Conversation

mbuzdalov
Copy link
Contributor

@mbuzdalov mbuzdalov commented Feb 25, 2024

Description [Edited 2024-05-04 to reflect the new changes]

This PR adds support for Extended DShot Telemetry v2, the specification for which is available here. Currently, recent versions of Bluejay support this specification, AM32 may support it soon, and BlHeli32 possibly has some limited support.

Notably, EDTv2 introduces two new types of messages, which are useful to investigate the performance of ESCs:

  • the "stress level" message that contains a number in the range [0..255], which is the greater the more effort an ESC had to make during commutation;
  • the "status" message contains three bits for different warning levels:
    • "alert": the lowest level, Bluejay relates it to demag events,
    • "warning": the middle level, Bluejay relates it to desync events,
    • "error": the highest level, Bluejay relates it to stall events,
      as well as the maximum stress level encountered, which is in the range [0..15].

The existing EDT parser handles these messages, but ignores them. This PR propagates them all the way through the AP_ESC_Telem machinery and logs the obtained information using a new log message, EDT2, which contains the following fields:

  • Instance: the ESC instance identifier;
  • Stress: the latest stress measurement, in the range [0..255];
  • MaxStress: the maximum stress level since latest arming, in the range [0..15];
  • Status: a combination of the status bits from above and of the info on what has been received:
    • bit 0: true if the message contains up-to-date stress data;
    • bit 1: true if the message contains up-to-date status data;
    • bit 2: true if the last status had the "alert" bit (e.g. demag);
    • bit 3: true if the last status had the "warning" bit (e.g. desync);
    • bit 4: true if the last status had the "error" bit (e.g. stall).

Design decisions

  • As the log message comes from two different messages, which come at different rates, the "up-to-date" bits were introduced to see how recently a certain value has been updated. An alternative would be some kind of stall data logic, but EDTv2 messages come really irregularly anyway, and there is no NaN marker for integral values.
  • Two different stress components had to be introduced, because they are related in a vendor-dependent non-trivial way (it is not linear scaling), and certain events appear to be noticed in either of these measurements.
  • If updates come more frequently than the logger writes them, the records are merged in a natural way, e.g. stress levels are maxed, status bits are OR-ed.
  • The new macro, EDTV2_SUPPORT, controls inclusion of all relevant code (apart from message declarations, which cannot be easily defined out). By default, the support is enabled when either the board itself, or the IOMCU, support bidirectional DShot, so that potential EDTv2 data can come in. One may forcibly set it to 0, so that the code does not get compiled. Note that in the case of boards with IOMCU, this also requires recompiling the IOMCU firmware, because the DShot telemetry structures are different (to save bandwidth if one does not want EDTv2).

Tests performed

This PR was tested on:

  • a quadcopter using the FlywooF405S-AIO controller, running Bluejay on ESCs;
  • a coax copter using the MatekH743-WLITE controller, running also Bluejay;
  • a test stand using the Pixhawk6C and two Holybro ESCs, all donated by Holybro. One ESC was running BlHeli32, another one AM32, with one motor from the coax copter also connected to the Pixhawk.

Example dataflash logs:

For Pixhawk, tests have been performed with all combinations of:

  • main firmware: Pixhawk6C, Pixhawk6C-bdshot
  • IOMCU options: BRD_IO_DSHOT=0,1
  • motors and servos connected to: FMU, IOMCU

In all cases, motors and servos behaved as expected, and EDTv2 logs were written when appropriate. The AM32 ESC had problems starting at lower voltages and in the PWM mode, which I mainly attribute to some exceptionally high magnet cogging of the corresponding motor. The only meaningful logs were produced by the Bluejay-driven motor (instance 3 or 11), which is somewhat expected.

@mbuzdalov
Copy link
Contributor Author

I have no idea why test ccache fails, what this means and whether it is my fault. Could someone maybe help me?

@andyp1per
Copy link
Collaborator

Looking good! In 4.5 the iomcu dshot firmware also now supports EDT so you will need to at a minimum recompile that for this PR and probably wire the new data through AP_IOMCU so that you can get EDT logging from there as well.

@mbuzdalov
Copy link
Contributor Author

@andyp1per I seem to have done what you requested, with an additional change to avoid modifying AP_Logger/LogStructure.h.

I am not sure whether the IOMCU part works though, because I don't have any hardware with IOMCU, so I cannot test the changes. What is more, I see that while the F103 MCU firmware increased in size, the other one did not (it likely decreased because I used a reference instead of a dozen of indexed accesses in one place).

And I still don't understand whether the test ccache test failure is a bad thing - or 75% is something out of thin air, and it is just time to get the safety threshold down to some 70%.

@andyp1per
Copy link
Collaborator

@mbuzdalov thanks - I can test at some point, although it might be easier to try and get hardware - be happy to approve a hardware request for a Pixhawk 6C and Holybro might send you one anyway.

@andyp1per
Copy link
Collaborator

I've put this up for devcall because @IamPete1 and others might have an opinion on the logging

Copy link
Member

@IamPete1 IamPete1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need two log messages? Why not include the "stress" value in EDT2?

@mbuzdalov
Copy link
Contributor Author

Do we need two log messages?

First, because these two stress values cannot be converted one into another in a vendor-independent way.
Second, because stress messages come more frequently, and status messages come roughly at 1Hz, so if one hunts for a particular bad event, stress messages might be more helpful.

Comment on lines 619 to 621
alert : EDT2_ALERT_BIT_FROM_STATUS(edt2_status),
warning : EDT2_WARNING_BIT_FROM_STATUS(edt2_status),
error : EDT2_ERROR_BIT_FROM_STATUS(edt2_status),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be combined into a single uint8 flags field with our new logging bitmask documentation the log review tools should be able to pull out the data.

This is a example for PID:

enum class log_PID_Flags : uint8_t {
LIMIT = 1U<<0, // true if the output is saturated, I term anti windup is active
PD_SUM_LIMIT = 1U<<1, // true if the PD sum limit is active
RESET = 1U<<2, // true if the controller was reset
I_TERM_SET = 1U<<3, // true if the I term has been set externally including reseting to 0
};
uint8_t flags = 0;
if (info.limit) {
flags |= (uint8_t)log_PID_Flags::LIMIT;
}
if (info.PD_limit) {
flags |= (uint8_t)log_PID_Flags::PD_SUM_LIMIT;
}
if (info.reset) {
flags |= (uint8_t)log_PID_Flags::RESET;
}
if (info.I_term_set) {
flags |= (uint8_t)log_PID_Flags::I_TERM_SET;
}

The enum is then referenced here:

// @Field: Flags: bitmask of PID state flags
// @FieldBitmaskEnum: Flags: log_PID_Flags

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should work for multi bit values too. So you could do the max_stress in the same way, but it might be a little odd having that under "flags".

Copy link
Contributor Author

@mbuzdalov mbuzdalov Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mapping is not written anywhere in the logs, right? So anything except for Mission Planner, which has these mappings compiled-in, is unlikely to interpret that correctly.

Normally, as a Linux user, I use APMPlanner2, which relies on in-log headers only. But even the web-based log viewer, which, I think, is supposed to follow the recent developments quicker and more precisely than anything desktop-based, cannot handle PID flags in the way described above - just checked. And based on what I have observed during this test, it also relies mostly on in-log headers, as it interpreted EDT2 and EDTS messages correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data is still there, it is just not broken out in to fields nicely. Your right, we should support it in other tools as well. For me it makes little difference as I already don't know what "error" "warning" and "alert" really mean so we might as well save the log space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether the savings will be large, as all these uint8_ts take 40 bits compared to the timestamp of 64 bits. And given the message rate of EDT2, which is roughly once per second, this is a tiny fraction of everything else logged, I think.

(I actually tried to log the entire status as a byte in the early stages of this development, and it was really hard to understand what is happening from this byte, the mental bit juggling involved was somewhat too much for me)

Overall, to me it makes more sense to keep this piece in its current form now, and convert it to a more compressed form once the tools are ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that having the four bit max_stress in there makes it too confusing for manual review. But the three single bit values are not too bad.

My point is really that you already need some extra knowledge that it not present in the log. Without knowledge of what the fields actual mean the information is not useful. If I get a "error" do I need to replace my ESC? Maybe if I get a "warning" I should land immediately? This extra information is provided by the log documentation, in this case it might be ESC dependent, but the log documentation can tell me to go and look at the ESC documentation. So if I have to look at the documentation anyway we can document the bitmask too.

The documentation is also used for the wiki, going back to the PID example:
https://ardupilot.org/copter/docs/logmessages.html#pidp

So although it would be nice if the tools did it automatically, that they don't does not put the information completely out of reach.

@IamPete1
Copy link
Member

First, because these two stress values cannot be converted one into another in a vendor-independent way. Second, because stress messages come more frequently, and status messages come roughly at 1Hz, so if one hunts for a particular bad event, stress messages might be more helpful.

It does depend on the relative rate of the two messages, each message has quite a big overhead. The header, timestamp and instance give 12 bytes before we even get to the data.

In your example log ESC is at 20hz, EDTS around 8 (but quite inconsistent) and EDT2 hardly at all. Message rates from this WebTool. Note that this is the message rate in the log, so a quad logging at 5hz is 20hz message because of the four instances. So in your example it would be a smaller log to include the stress in ESC (1 * 20 vs 13 * 8). The cross over point would be 1/13, so for EDTS at 8hz ESC would have to be above 260Hz.

However I guess since we won't always have the new values using a new message means we only pay that extra cost if the data is there.

More data in the log is always good, so I certainly support this PR but thinking about how we can minimize the cost in log size means we can keep on adding more logging for more things in the future.

@tridge
Copy link
Contributor

tridge commented Feb 27, 2024

@andyp1per added for EU call

@IamPete1
Copy link
Member

I didn't really think about it before, but I guess the "stress" is sort of a duty cycle? I ask because in DroneCAN we have a ESC telem field "power percentage". Annoyingly both definitions are a bit fuzzy and dependent on ESC manufacturer. But I have a commit witch adds logging for that (admittedly on a older branch) IamPete1@88be9ba. I just added that to the existing ESC log. Could we share a field for both users?

@mbuzdalov
Copy link
Contributor Author

mbuzdalov commented Feb 27, 2024

I guess the "stress" is sort of a duty cycle?

Bluejay understands this as follows (copy from the EDT spec description):

Stress level: Indicates the effort the ESC has to make to keep commutation going. In Blheli_S/Bluejay for example it is related to desync events. When it reaches the maximum indicates that the ESC cannot make reliable commutation. Zero is the ideal value, when ESC commutation is perfect.

Putting aside that currently Bluejay's 8-bit stress value is always >= 120 (the 4-bit one can indeed be zero though), this is possibly similar in spirit to what @IamPete1 says about DroneCAN's "power percentage", but not completely identical.

It can be good to share this kind of data in a single field, I think, but my main concern is about the possible difference in message rates. In the case of EDT: ESC messages are logged at 100 Hz if not filtered (and possibly come more often), "stress" messages come at most at some 30 Hz, and not regularly at all (which is consistent with the spec, as these are logged if the ESC chip can do it without suffering the performance). What shall we log if "stress" is not updated? Putting a zero seems a poor choice, as the plot will be much saw-shaped without any underlying physical reason. Keeping the previous value seems not well-motivated as well. As this is not a floating-point value, we cannot log a NaN.

Copy link
Contributor

@peterbarker peterbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is costing everyone 132 bytes even if they're not doing bdshot. That's not good - why is that?

@@ -42,6 +46,10 @@ class AP_ESC_Telem_Backend {
USAGE = 1 << 5,
TEMPERATURE_EXTERNAL = 1 << 6,
MOTOR_TEMPERATURE_EXTERNAL = 1 << 7,
#ifdef HAL_WITH_BIDIR_DSHOT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get a separate define for this, V2 functionality, please?

Suggested change
#ifdef HAL_WITH_BIDIR_DSHOT
#ifdef HAL_WITH_BIDIR_DSHOT_V2

It can be defined as

#define HAL_WITH_BIDIR_DSHOT_V2 HAL_WITH_BIDIR_DSHOT

... but it would be nice to have the option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address this next time I update, following checking the IOMCU functionality and syncing everyone's thoughts on where exactly logging should go.

@tridge
Copy link
Contributor

tridge commented Feb 28, 2024

@andyp1per will organise some testing and discuss logging with @IamPete1

@tridge tridge removed the DevCallEU label Feb 28, 2024
@andyp1per
Copy link
Collaborator

@mbuzdalov please put in a funding request here for a Pixhawk6c - https://discuss.ardupilot.org/c/proposals/125 - it will get approved and make it easier for you to test stuff.

@mbuzdalov
Copy link
Contributor Author

mbuzdalov commented Feb 29, 2024

please put in a funding request

@andyp1per here it is, hopefully I am not too far away from the guidelines...

@mbuzdalov mbuzdalov force-pushed the edt-v2-support branch 2 times, most recently from 54f831c to 6f0a31c Compare March 26, 2024 17:03
@mbuzdalov mbuzdalov force-pushed the edt-v2-support branch 3 times, most recently from a11fc0d to c972943 Compare April 6, 2024 22:38
@mbuzdalov mbuzdalov requested a review from andyp1per May 4, 2024 21:07
@Hwurzburg Hwurzburg added the WikiNeeded needs wiki update label May 5, 2024
Copy link
Collaborator

@andyp1per andyp1per left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with this now. Basic flight test was fine.

@tridge
Copy link
Contributor

tridge commented May 13, 2024

we need to split the commits

@tridge
Copy link
Contributor

tridge commented May 13, 2024

discussed on dev call, needs a different define name, maybe AP_EXTENDED_DSHOT_V2_ENABLED ?

@tridge
Copy link
Contributor

tridge commented May 13, 2024

@mbuzdalov I've fixed the commit msgs and the define to follow our code style

@mbuzdalov
Copy link
Contributor Author

mbuzdalov commented May 14, 2024

@mbuzdalov I've fixed the commit msgs and the define to follow our code style

@tridge Thank you! Maybe, if it's not too late, you fix commit messages once again, so that they don't designate subsystems multiple times?

(like "AP_HAL_ChibiOS: aP_ESC_Telem, AP_IOMCU: add support..." => "AP_HAL_ChibiOS: add support")

@andyp1per
Copy link
Collaborator

@mbuzdalov if you want you can redo the commit messages and re-push - it's not yet been merged

@andyp1per
Copy link
Collaborator

@mbuzdalov I changed the commit messages for you

@mbuzdalov
Copy link
Contributor Author

mbuzdalov commented May 14, 2024

@mbuzdalov I changed the commit messages for you

Thank you! I saw that strange stuff was happening with CI this morning, so I thought it would be wise to ask.

@mbuzdalov
Copy link
Contributor Author

@andyp1per @tridge this one seems ready to merge finally.

#endif
return false;
}
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that it matters but we don't need the break right at the bottom do we?

@tridge tridge merged commit ce04c33 into ArduPilot:master May 15, 2024
91 checks passed
@tridge
Copy link
Contributor

tridge commented May 15, 2024

@mbuzdalov great work, thanks!

@mbuzdalov mbuzdalov deleted the edt-v2-support branch May 15, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement WikiNeeded needs wiki update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants