You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The result would be incorrect if the target is same as the first operand. The
target==this version would require this to be of column major. I modified it so
that this requirement is no longer needed:
void NVMatrix::rightMult(const NVMatrix &b, float scaleAB, NVMatrix &target)
const {
assert(isContiguous() && b.isContiguous() && target.isContiguous());
// assert(&target != &b);
assert(_numCols == b.getNumRows());
if(&target != this) {
target.resize(_numRows, b.getNumCols());
//target.setTrans(true); // default column major
}
assert(target.getNumRows() == _numRows);
assert(target.getNumCols() == b.getNumCols());
if(_numRows % 64 != 0 || _numCols % 64 != 0 || b.getNumCols() % 64 != 0) {
WARN("Matrix dimensions not divisible by 64 -- cublasSgemm performance may suffer.");
}
cublasSgemm(getTransChar(), b.getTransChar(), _numRows, b.getNumCols(), _numCols,
scaleAB, _devData, getLeadingDim(), b.getDevData(), b.getLeadingDim(),
0, target.getDevData(), getNumRows());
target.setTrans(true); // added isTrans specification
checkCublasError("cublasSgemm failed");
// cudaThreadSynchronize();
}
Original issue reported on code.google.com by [email protected] on 12 Jul 2013 at 3:47
The text was updated successfully, but these errors were encountered:
Original issue reported on code.google.com by
[email protected]
on 12 Jul 2013 at 3:47The text was updated successfully, but these errors were encountered: