-
Notifications
You must be signed in to change notification settings - Fork 15
ScalarLiveOut
When you write to a scalar in a device kernel and need to subsequently read that value on the host, you encounter a "scalar live-out" scenario, and some compilers even tell you when this happens (though some do not). This happens most often in the following scenario:
-
Testing routines: You pass through some data and determine whether it's realistic or not, assigning this to a
bool
that is read on the host later to report the error.
These situations are reductions in nature, but often it's not convenient or efficient to express them as reductions.
In these cases, the scalar must be explicitly allocated in device memory, the initial scalar value transferred from host to device memory, the scalar value altered in the kernel, and the scalar value transferred from device to host memory after the kernel. ScalarLiveOut
handles all of this for you as follows:
// Creates a bool scalar that is allocated in device memory
// and has an initial value of false (which is transferred
// to device memory for you in the constructor)
yakl::ScalarLiveOut<bool> dataIsBad(false);
yakl::c::parallel_for( yakl::c::Bounds<2>(ny,nx) ,
YAKL_LAMBDA (int j, int i) {
// The ScalarLiveOut class overloads operator=, so you can
// simply assign to it like any other scalar inside a kernel
if (density(j,i) < 0 || pressure(j,i) < 0) {
dataIsBad = true;
}
});
// To read on the host after a kernel, use the hostRead()
// member function, which transfers the value to the host for you
if (dataIsBad.hostRead()) {
std::cout << "ERROR: Invalid density or pressure!\n";
throw ...
}
When to not use ScalarLiveOut
: If you find yourself wanting to use atomics on a scalar, often times you're better off using a reduction instead, because all of the data is being reduced to a single scalar value. To facilitate this, it's best to create a temporary array with all necessary calculations, and then perform a reduction on that array. While there is an operator()
to expose the scalar for reading on the GPU, if you're needing to do this, there is often (though not always) an easier solution to your problem.