From 3e9cefee1bce92deadd17fb6279af06f1f889ecd Mon Sep 17 00:00:00 2001 From: "Robert Chyla (MIPS)" Date: Wed, 27 Mar 2024 00:33:17 +0100 Subject: [PATCH] More changes (with Jay). Only few left. --- docs/RISC-V-N-Trace.adoc | 96 ++++++++++++++++++++++++++-------------- 1 file changed, 64 insertions(+), 32 deletions(-) diff --git a/docs/RISC-V-N-Trace.adoc b/docs/RISC-V-N-Trace.adoc index c2792ce..b3d0855 100644 --- a/docs/RISC-V-N-Trace.adoc +++ b/docs/RISC-V-N-Trace.adoc @@ -701,7 +701,10 @@ IEEE-5001 Nexus Standard does not define limits for variable-length fields, but |HIST|NTRACE_MAX_HIST|32|It includes stop-bit. This size is optimal for not wasting any bits in very often used <> messages. [[NTRACE_MAX_TSTAMP]] |TSTAMP|NTRACE_MAX_TSTAMP|64|It is certainly big enough. It corresponds to architecture defined timer and cycle count registers. -|HREPEAT|NTRACE_MAX_HREPEAT|18|Assure some trace is generated for long loops. +[[NTRACE_MAX_HREPEAT]] +|HREPEAT|NTRACE_MAX_HREPEAT|18|Assure some trace is periodically generated for very long loops. +[[NTRACE_MAX_BCNT]] +|B-CNT|NTRACE_MAX_BCNT|18|Assure some trace is periodically generated for very long loops. |====================================================================================================== == N-Trace Messages (Details) @@ -727,6 +730,18 @@ This message furnishes the requisite context (privileged mode and Context ID, as segments associated with various programs. Activation of this feature requires explicit enabling of the <> control bit. Reporting of this information occurs under one of the following three conditions: +* Upon the retirement of an instruction that writes to the scontext/hcontext CSR. +2. Following any trace synchronizing message, specifically any +message that includes the SYNC field. Importantly: + Should hcontext be implemented, the protocol requires two +consecutive messages: the first presenting hcontext information +and the second scontext information. This sequence is critical +for enabling the decoder to precisely identify the code +associated with a specific process. +3. In the event of a trap or trap return that results in a change in +privilege mode. + + * When an instruction which is changing privilege mode or *scontext/hcontext* CSR write instruction retired (as reported via 'priv' and 'context' field on an ingress port). * As the next message following any trace <> (any message that includes the <> field). ** If *hcontext* is implemented two messages must follow (first providing *hcontext* and second providing *scontext*). It is necessary so the decoder will be able to locate the code for a specific process. @@ -867,7 +882,7 @@ An error message must be generated in the event of an internal messages FIFO ove *Explanations and Notes* -Error Message must be sent immediately prior to a <> as soon as space is available in the Trace Encoder output queue. It is suggested to have a timestamp at the moment when the first trace messages got dropped, but it is not required. +Error Message must be sent immediately prior to a <> as soon as space is available in the Trace Encoder output queue. It is recommended that the timestamp reported in the message corresponds to the moment when the first trace message was dropped; however, this is not a requirement. [NOTE] ==== @@ -875,8 +890,11 @@ This message *is required* as otherwise decoder (despite the fact that restart a * Trace is turned off by trigger (or from any other reason). * Message reporting 'trace off' event is lost (due to lack of space for it). +** Here Error Message should be generated (as soon as there is a room) * Trace is never restarted. -* Trace is stopped (this will not generate any trace as trace is turned off) +* Trace is stopped (this will not generate any trace as trace is turned off). + +In above case, Error Message will be the last message in trace stream. ==== [[msg2_ProgTraceSync]] @@ -947,7 +965,7 @@ However, it further includes details on the reason for synchronization via the S This message is generated in the same conditions as <> message, but additionally provides a reason for synchronization (SYNC field) and full PC (F-ADDR field). [[msg2_ResourceFull]] -=== Resource Full Message +=== ResourceFull Message This message is emitted when either the HIST register is full or the I-CNT counter became full for a given encoder implementation. This mechanism ensures that no information is lost, as it enables the decoder to reconstruct larger I-CNT and HIST fields by concatenating or adding the emitted values. @@ -1048,7 +1066,7 @@ Number of times the previous branch message (without a <> field *Explanations and Notes* -This message is reported when an identical branch message is encountered (just to save trace bandwidth). Trace decoder should just repeat handling of previous branch message B-CNT times. +This message is reported when an identical (direct or indirect) branch message is encountered (just to save trace bandwidth). Trace decoder should just repeat handling of previous branch message B-CNT times. [[msg2_ProgTraceCorrelation]] === ProgTraceCorrelation Message @@ -1094,14 +1112,14 @@ This chapter describes in detail how key fields (I-CNT, HIST, U-ADDR/F-ADDR and === Address Compression -Address transmissions is compliant with the IEEE-5001 Nexus Standard (most significant bit 0-s skipped) with optional extension allowing to skip identical most significant bits (following Sv39/Sv48/Sv57 address generation rules). See <> chapter below for clarifications. +Address transmissions is compliant with the IEEE-5001 Nexus Standard (most significant bit 0-s skipped) with optional extension allowing to skip identical most significant bits. See <> chapter below for clarifications. Rules when generating addresses: * Only execution addresses (as seen by the hart) are reported. When virtual memory system is enabled these are virtual addresses. * The <> field is the full address associated with the trace event, provides a starting point for reconstructing relative addresses. * The <> field is a compressed address that is relative to the previous trace message with an address field. It is generated by XORing the address with the previous message. -* To decode the full address from the relative address (U-ADDR) can be XORed with the previously decoded full address. +** To decode the full address from the relative address (U-ADDR) can be XORed with the previously decoded full address. * Address fields are sent beginning with bit 1 since all execution addresses are on a 2-byte boundaries (the least significant bit is always 0 and never sent). Example: @@ -1126,12 +1144,23 @@ This optimization must be enabled by <> control bit is implemented and set: -* If F-ADDR/U-ADDR field is sent then last (most significant) bit of the very last MDO record must be extended up to bit#63 or bit#31 (depending of XLEN of the core). It is similar to sign-extension, but it is NOT a sign bit. -* This method does NOT require trace decoder to know what is a size of virtual address or if an address is physical or virtual. Decoder must look at most significant bit of last MDO in F-ADDR/U-ADDR field and either extend or not. -* Simple implementations may not implement an enable bit and always send full address. -** Benefits of using it on 32-bit cores is small, so it may not be implemented. +* The encoder may skip any number of most significant identical bits in the U-ADDR/F-ADDR fields. However, it must ensure that if any bits are skipped, then the number of transmitted bits is an multiple of the MDO size. Additionally, the most significant transmitted bit must have the same value as the skipped bits. + +* If F-ADDR/U-ADDR field is received by decoder, then the last (most significant) bit of the very last MDO record must be extended up to bit#63 or bit#31 (depending on XLEN of the core). It is similar to sign-extension, but it is NOT a sign bit. + +* This method does NOT require a trace decoder to know what a size of virtual address is or if an address is physical or virtual. The decoder must look at the most significant bit of the last MDO in F-ADDR/U-ADDR field and either extend or not. + +* Simple implementations may not implement an enable bit and always send full address. Benefits of using it on 32-bit cores is small, so it may not be implemented. This way of encodign allows an encoder to efficiently send: @@ -1201,11 +1230,11 @@ Trace encoder must implement a most significant bit detection (skipping identica #10: 000101_01 <- Last MDO+MSEO byte. Most significant bit #5 is 0, so NO extension. F-ADDR field=0x5FFF_FFFF_FFFF_FFFF, Encoded address=0xBFFF_FFFF_FFFF_FFFE -NOTE: Address *0xBFFF_FFFF_FFFF_FFFF* is NOT a legal address in any Sv39/Sv48/Sv57 modes as it does not have all most significant bits identical. But such an address may be encountered as result of a bug and as such should be reported. +NOTE: Address *0xBFFF_FFFF_FFFF_FFFF* is NOT a legal address in any RISC-V virtual memory modes as it does not have all most significant bits identical. But such an address may be encountered as result of a bug and as such should be reported. === HIST Field Generation -When the encoder is operating in <> mode direct conditional branches do NOT generate any messages. Each conditional branch (taken or not-taken direct) adding a single bit to the internal HIST register/accumulator.If a direct conditional branch is taken, bit=1 is added at the least significant position. If a direct conditional branch is not-taken, bit=0 is added at the least significant position. HIST field accumulator may be implemented as left-shift register. +When the encoder is operating in <> mode direct conditional branches do NOT generate any messages. Each conditional branch (taken or not-taken direct) adding a single bit to the internal HIST register/accumulator. If a direct conditional branch is taken, bit=1 is added at the least significant position. If a direct conditional branch is not-taken, bit=0 is added at the least significant position. HIST field accumulator may be implemented as left-shift register. Most significant bit value 1 in the HIST field is used as a stop-bit. It allows the HIST field to be transmitted as a variable-length field efficiently (as most significant 0-bits are not transmitted). @@ -1229,19 +1258,18 @@ If this is happening, a <> with the HIST field ( NOTE: Trace decoders do not have to be aware about the actual size of the HIST field implemented by the encoder, however in order to allow efficient implementation of trace encoders (and also allowing HIST pattern detection) this N-Trace specification limits HIST field size to max 32-bits. Longer HIST fields would not provide much of a gain and would make repeated HIST field detection more costly (in terms of hardware resources). When a HIST buffer is identical in two or more consecutive <> messages, it can be detected and reported using the HIST + HREPEAT (History Repeat Counter) instead of many identical messages. - See <> chapter for more details. === I-CNT Details -Field I-CNT (present in most messages) is counting the number of halfwords for the instruction units reported as retired. +The I-CNT field, present in most messages, transmits the value of the I-CNT counter, which counts the number of halfwords used to encode retired instructions. -I-CNT counter is reset to 0 in one of these two situations (as defined by IEEE-5001 Nexus Standard): +The I-CNT counter in the trace encoder is reset to 0, in accordance with the IEEE-5001 Nexus Standard, under one of the following two conditions -* When a trace starts or is restarted (for any reason). -* After I-CNT field is sent in a message. +* When tracing starts or is restarted for any reason. +* After the I-CNT counter value has been transmitted in a message. -Every retired instruction MUST increment I-CNT by 1 (for 16-bit instruction) or by 2 (for 32-bit instruction). Specifically: +Every retired instruction MUST increment I-CNT counter by 1 (for 16-bit instruction) or by 2 (for 32-bit instruction). Specifically: * If an instruction is explicitly changing the PC (as jump or return), that instruction itself MUST update the I-CNT. * An exception or interrupt before retirement of an instruction CANNOT update the I-CNT. @@ -1249,12 +1277,10 @@ Every retired instruction MUST increment I-CNT by 1 (for 16-bit instruction) or NOTE: In case of longer instructions (48-bit, 64-bit, ...) (future ISA standards or custom) I-CNT may increment by 3 or more. -When I-CNT counter is full (reaches it's maximum value or overflow bit is set) it may be reported in one of two ways: +When I-CNT counter is full (reaches it's maximum value or overflow bit is set) it can be reported in one of two ways: -* The <> message with <>=0 should be generated. -* Optionally I-CNT counter full may be reported using a <> with *SYNC=4 (Sequential Instruction Counter)*. -** This method may be only used in <> mode. -** Reporting HIST overflow requires to use <> message (as corresponding SYNC code is not defined) so I-CNT overflows should be reported in the same way. +* By using a <> message with <>=0. This method is applicable to both BTM and HTM. +* Optionally, by using a <> with *SYNC=4 (Sequential Instruction Counter)*. It may be only used in <> mode. NOTE: Overflow bit allows efficient handling of cases, when single ingress port cycle reports bigger I-CNT (several instructions retired). Reporting maximum value (exactly) is not required and smaller or bigger value may be reported instead. @@ -1631,15 +1657,21 @@ Nonetheless, the guidelines provided herein are applicable to any instruction si *Main Rules* -. Instructions which are not control transfer instructions and direct unconditional jumps generate no trace. -** These are called inferable instructions, where the next PC can be known from static analysis of binary code. -. Only direct conditional branches, indirect unconditional flow transfer instructions and exceptions/interrupts generate trace. -** These are called non-inferable instructions, where the next PC cannot be known from static analysis of binary code. +. *Inferable Instructions*: This category includes instructions that do not perform control transfers or +are direct jumps. The subsequent program counter (PC) for these instructions can be determined through static analysis of the binary code. Because these instructions exhibit a predictable +execution flow, they are termed "inferable," and no trace is generated for them. + +. *Non-Inferable Instructions*: This category comprises conditional branches and indirect jumps. Due to the unpredictability of the next PC as determined through +static analysis alone, non-inferable instructions require trace generation. + +. *Interrupts and Exceptions*: Control flow changes caused by interrupts and exceptions necessitate +trace generation. These events alter the flow in an unpredictable manner, similar to non-inferable +instructions, thereby requiring their occurrences to be traced. *Detailed Rules* -. If tracing is started (after it was disabled), a <> message is generated. -** This message includes the reason for a start (<> field) and full address (<> field). +. If tracing is started (or restarted after it was disabled), a <> message is generated. +** This message specifies the reason for the start in the <> field and includes full address in the <> field. . A retired 16-bit instruction increments the <> counter by 1, while a retired 32-bit instruction increments it by 2. . The following types of instructions allow trace decoders to know the next PC (encoder should not generate any trace for them). ** Instruction which is not control transfer instructions => PC is at the next instruction (+2 or +4). @@ -1647,7 +1679,7 @@ Nonetheless, the guidelines provided herein are applicable to any instruction si ** Not taken direct conditional branch (in BTM mode) => PC is next instruction (+2 or +4). . Indirect, unconditional jump instruction is handled as: ** In BTM mode, an <> message is generated. -** In HTM mode. an <> message is generated. Should the <> field be empty, an <> message may optionally be generated instead. +** In HTM mode, an <> message is generated. Should the <> field be empty, an <> message may optionally be generated instead. . Direct, conditional branch instruction is handled as: ** In BTM mode, a <> message is generated, but only if the branch is taken. ** In HTM mode, the outcome of the branch (1 for taken or 0 for not-taken) is appended as a single bit into the branch history buffer (<> register). @@ -1933,7 +1965,7 @@ SRC field (if enabled) may change the otherwise optimal layout of <