Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize BVP classes to handle either real or complex numbers (Done by P.Palacios) #31

Open
wants to merge 17 commits into
base: v2.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 21 additions & 9 deletions docs/BoundaryValueProblemSolver.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,7 @@ With the boundary conditions:

Being :math:`\alpha_n,\beta_n` some arbitrary parameters. These parameters can take any value (including zero) as long as the resulting BVP remains well defined. For instance, making :math:`\alpha_1,\beta_1` equal to zero at the same time results in a system with no unique solution.

Both :math:`y(z)` and :math:`f(z)` can be complex-valued. In the main branches of the UAMMD repository only :math:`\alpha_2,\beta_2` can be complex numbers (the rest of the parameters, including :math:`k`, must be real valued). The branch complex_bvp contains an adaptation of the solver that allows every parameter to be complex. As a side note, should the need arise in the future, it would be fairly easy to modify the solver as to make the type of the parameters a template argument.

Both :math:`y(z)` and :math:`f(z)` can be real or complex-valued as well as :math:`\alpha_2,\beta_2,k` and any other parameter.
.. hint::

Initialization of the batched BVP solver requires inverting a dense matrix for each value of :math:`k`, which can become quite expensive. The solver tries to mitigate this cost by inverting these matrices in parallel, but experience suggests that letting it use more than 4 cores is counter-productive.
Expand All @@ -37,10 +36,9 @@ Usage
------

Every function and class in the BVP solver source code lies under the :cpp:`uammd::BVP` namespace.
The BVP solver library exposes two main classes:

The BVP solver library exposes two main template classes , designed to manage both real and complex-valued numbers. To ensure consistent behavior and avoid unexpected results, it is recommended to use the complex number types defined in the Thrust library, specifically thrust::complex<real>, when working with complex numbers.

.. cpp:class:: BatchedBVPHandler
.. cpp:class:: template<class T> BatchedBVPHandler<T>

Used to initialize and hold the information for a group of boundary value problems. Each subproblem can have different parameters (mainly :math:`\alpha_n,\beta_n,k` in the above equations.
This class cannot be used in device code, its only used for initialization. See :cpp:any:`BatchedBVPGPUSolver`.
Expand All @@ -56,14 +54,14 @@ The BVP solver library exposes two main classes:
:param H: The location of the boundary conditions (goes from -H to H).
:param nz: Number of elements of a subproblem (all subproblems must have the same size).

.. cpp:function:: BatchedBVPGPUSolver getGPUSolver();
.. cpp:function:: template<class T> BatchedBVPGPUSolver_impl<T> getGPUSolver();

Provides an instance of the solver to be used in the GPU.

.. cpp:class:: BatchedBVPGPUSolver
.. cpp:class:: template<class T> BatchedBVPGPUSolver<T>

While :cpp:any:`BatchedBVPHandler` is used to initialize and store the different subproblems, this class is used to actually solve the subproblems in a CUDA kernel.
This class has no public constructors, the only way to get an instance to it is via :cpp:any:`BatchedBVPHandler::getGPUSolver`.
While :cpp:any:`BatchedBVPHandler<T>` is used to initialize and store the different subproblems, this class is used to actually solve the subproblems in a CUDA kernel.
This class has no public constructors, the only way to get an instance to it is via :cpp:any:`BatchedBVPHandler<T>::getGPUSolver`.

.. cpp:function:: template<class T, class FnIterator, class AnIterator, class CnIterator> __device__ void solve(int instance, const FnIterator& fn, T alpha_2, T beta_2, AnIterator& an, CnIterator& cn);

Expand Down Expand Up @@ -93,6 +91,20 @@ Initialization requires an iterator to a special type of functor that provides t

Returns :math:`\alpha_1` or :math:`\beta_1`, depending on which BC this class represents.

Aliases for Real and Complex Types
----------------------------------

To facilitate the use of the BVP solver with real and complex numbers, the following aliases are defined:

.. code-block:: cpp

using BatchedBVPHandlerReal = BatchedBVPHandler_impl<real>;
using BatchedBVPHandlerrComplex = BatchedBVPHandler_impl<thrust::complex<real>>;

using BatchedBVPGPUSolverReal = BatchedBVPGPUSolver_impl<real>;
using BatchedBVPGPUSolverComplex = BatchedBVPGPUSolver_impl<thrust::complex<real>>;

These aliases allow for a more intuitive and type-safe way to work with the BVP solver for different numerical types.

Example
++++++++
Expand Down
13 changes: 10 additions & 3 deletions src/Integrator/BDHI/BDHI_FCM.cu
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include"BDHI_FCM.cuh"
namespace uammd{
namespace BDHI{
inline
auto FCMIntegrator::computeHydrodynamicDisplacements(){
auto pos = pd->getPos(access::location::gpu, access::mode::read);
auto force = pd->getForce(access::location::gpu, access::mode::read);
Expand All @@ -15,6 +16,7 @@ namespace uammd{
temperature, 1.0/sqrt(dt), st);
}

inline
void FCMIntegrator::updateInteractors(){
for(auto forceComp: interactors) forceComp->updateSimulationTime(steps*dt);
if(steps==1){
Expand All @@ -26,6 +28,7 @@ namespace uammd{
}
}

inline
void FCMIntegrator::resetForces(){
int numberParticles = pg->getNumberParticles();
auto force = pd->getForce(access::location::gpu, access::mode::write);
Expand All @@ -34,6 +37,7 @@ namespace uammd{
CudaCheckError();
}

inline
void FCMIntegrator::resetTorques(){
int numberParticles = pg->getNumberParticles();
auto torque = pd->getTorque(access::location::gpu, access::mode::write);
Expand All @@ -42,6 +46,7 @@ namespace uammd{
CudaCheckError();
}

inline
void FCMIntegrator::computeCurrentForces(){
resetForces();
if (pd->isDirAllocated()) resetTorques();
Expand All @@ -56,6 +61,7 @@ namespace uammd{
/*With all the terms computed, update the positions*/
/*T=0 case is templated*/
template<class IndexIterator>
inline
__global__ void integrateEulerMaruyamaD(real4* pos,
real4* dir,
IndexIterator indexIterator,
Expand All @@ -75,7 +81,7 @@ namespace uammd{
/*Write to global memory*/
pos[i] = make_real4(p,c);
/*Update the orientation*/
if(dir){
if(dir){
Quat dirc = dir[i];
//printf("W %f %f %f\n", angularV[id].x, angularV[id].y, angularV[id].z);
//printf("V %f %f %f\n", linearV[id].x, linearV[id].y, linearV[id].z);
Expand All @@ -86,6 +92,7 @@ namespace uammd{
}
}

inline
void FCMIntegrator::forwardTime(){
steps++;
sys->log<System::DEBUG1>("[BDHI::FCM] Performing integration step %d", steps);
Expand All @@ -97,12 +104,12 @@ namespace uammd{
auto angularVelocities = disp.second;
auto indexIter = pg->getIndexIterator(access::location::gpu);
auto pos = pd->getPos(access::location::gpu, access::mode::readwrite);
auto dir = pd->getDirIfAllocated(access::location::gpu, access::mode::readwrite);
auto dir = pd->getDirIfAllocated(access::location::gpu, access::mode::readwrite);
real3* d_linearV = thrust::raw_pointer_cast(linearVelocities.data());
real3* d_angularV = dir.raw()?thrust::raw_pointer_cast(angularVelocities.data()):nullptr;
int BLOCKSIZE = 128; /*threads per block*/
int nthreads = BLOCKSIZE<numberParticles?BLOCKSIZE:numberParticles;
int nblocks = numberParticles/nthreads + ((numberParticles%nthreads!=0)?1:0);
int nblocks = numberParticles/nthreads + ((numberParticles%nthreads!=0)?1:0);
FCM_ns::integrateEulerMaruyamaD<<<nblocks, nthreads, 0, st>>>(pos.raw(),
dir.raw(),
indexIter,
Expand Down
2 changes: 1 addition & 1 deletion src/Integrator/BDHI/DoublyPeriodic/DPStokesSlab.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ else throw std::runtime_error("[DPStokesSlab] Can only average in direction X (0
real gw;
real tolerance;
WallMode mode;
shared_ptr<BVP::BatchedBVPHandler> bvpSolver;
shared_ptr<BVP::BatchedBVPHandlerReal> bvpSolver;

};

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -506,7 +506,7 @@ namespace uammd{
}

auto computeZeroModeBoundaryConditions(int nz, real H, WallMode mode){
BVP::SchurBoundaryCondition bcs(nz, H);
BVP::SchurBoundaryConditionReal bcs(nz, H);
if(mode == WallMode::bottom){
correction_ns::TopBoundaryConditions top(0, H);
correction_ns::BottomBoundaryConditions bot(0, H);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ namespace uammd{
auto botBC = thrust::make_transform_iterator(thrust::make_counting_iterator<int>(0), botdispatch);
int numberSystems = (nk.x/2+1)*nk.y;
int nz = grid.cellDim.z;
this->bvpSolver = std::make_shared<BVP::BatchedBVPHandler>(klist, topBC, botBC, numberSystems, halfH, nz);
this->bvpSolver = std::make_shared<BVP::BatchedBVPHandlerReal>(klist, topBC, botBC, numberSystems, halfH, nz);
CudaCheckError();
}

Expand Down
17 changes: 17 additions & 0 deletions src/Integrator/BDHI/FCM/FCM_impl.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,7 @@ namespace uammd{
};

template<class IterPos, class IterForce, class Kernel>
inline
cached_vector<real3> spreadForces(IterPos& pos, IterForce& force,
int numberParticles,
std::shared_ptr<Kernel> kernel,
Expand All @@ -248,6 +249,7 @@ namespace uammd{
};

template<class T, class T3, class Container>
inline
auto getCoordinateVector(Container &v, int coord){
cached_vector<T> v_a(v.size());
T* ptr= (T*)thrust::raw_pointer_cast(v.data());
Expand All @@ -264,14 +266,17 @@ namespace uammd{
return {thrust::get<0>(a), thrust::get<1>(a),thrust::get<2>(a)};
}
};

template<class T3, class Container>
inline
auto interleave(Container &a, Container &b, Container &c){
auto zip = thrust::make_zip_iterator(thrust::make_tuple(a.begin(), b.begin(), c.begin()));
cached_vector<T3> res(a.size());
thrust::transform(zip, zip+a.size(), res.begin(), Zip23());
return res;
}

inline
cached_vector<complex3> forwardTransform(cached_vector<real3>& gridReal,
int3 n,
cufftHandle plan, cudaStream_t st){
Expand All @@ -286,6 +291,7 @@ namespace uammd{
}


inline
__global__ void addTorqueCurl(complex3 *gridTorquesFourier, complex3* gridVelsFourier, Grid grid){
int id = blockDim.x*blockIdx.x + threadIdx.x;
const int3 nk = grid.cellDim;
Expand All @@ -307,6 +313,7 @@ namespace uammd{


template<class IterPos, class IterTorque, class Kernel>
inline
void addSpreadTorquesFourier(IterPos& pos, IterTorque& torque, int numberParticles,
Grid grid,
std::shared_ptr<Kernel> kernel,
Expand Down Expand Up @@ -339,6 +346,7 @@ namespace uammd{
Output:gridVels = B·FFTf·S·F -> B \propto (I-k^k/|k|^2)
*/
/*A thread per fourier node*/
inline
__global__ void forceFourier2Vel(const complex3 * gridForces, /*Input array*/
complex3 * gridVels, /*Output array, can be the same as input*/
real vis,
Expand All @@ -361,6 +369,7 @@ namespace uammd{
gridVels[id] = projectFourier(k2, dk, factor)*(B/real(ncells.x*ncells.y*ncells.z));
}

inline
void convolveFourier(cached_vector<complex3>& gridVelsFourier, real viscosity, Grid grid, cudaStream_t st){
System::log<System::DEBUG2>("[BDHI::FCM] Wave space velocity scaling");
/*Scale the wave space grid forces, transforming in velocities -> B·FFT·S·F*/
Expand All @@ -379,6 +388,7 @@ namespace uammd{
This kernel gets v_k = gridVelsFourier = B·FFtt·S·F as input and adds 1/√σ·√B(k)·dWw.
Keeping special care that v_k = v*_{N-k}, which implies that dWw_k = dWw*_{N-k}
*/
inline
__global__ void fourierBrownianNoise(complex3 * gridVelsFourier,
Grid grid,
real prefactor,/* sqrt(2·T/dt)*/
Expand Down Expand Up @@ -451,6 +461,7 @@ namespace uammd{
}
}

inline
void addBrownianNoise(cached_vector<complex3>& gridVelsFourier,
real temperature, real viscosity, real prefactor,
uint seed,
Expand Down Expand Up @@ -487,6 +498,7 @@ namespace uammd{
}
}

inline
cached_vector<real3> inverseTransform(cached_vector<complex3>& gridFourier,
int3 n, cufftHandle plan, cudaStream_t st){
int nx = 2*(n.x/2+1);
Expand All @@ -501,6 +513,7 @@ namespace uammd{
}

template<class IterPos, class Kernel>
inline
cached_vector<real3> interpolateVelocity(IterPos& pos, cached_vector<real3>& gridVels,
Grid grid, std::shared_ptr<Kernel> kernel,
int numberParticles, cudaStream_t st){
Expand All @@ -524,6 +537,7 @@ namespace uammd{
// = 0.5( i*k_y*V_z - i*k_z(V_y), i*k_z(V_x) - i*k_x*V_z, i*k_x*V_y - i*k_y*V_x)
//Overwrite the output vector with the angular velocities in Fourier space
//The input velocity vector is overwritten
inline
__global__ void computeVelocityCurlFourier(const complex3 *gridVelsFourier,
complex3* gridAngVelsFourier,
Grid grid){
Expand All @@ -549,6 +563,7 @@ namespace uammd{
gridAngVelsFourier[id] = gridAng;
}

inline
cached_vector<complex3> computeGridAngularVelocityFourier(cached_vector<complex3>& gridVelsFourier,
Grid grid, cudaStream_t st){
const int3 n = grid.cellDim;
Expand All @@ -565,6 +580,7 @@ namespace uammd{
}

template<class IterPos, class Kernel>
inline
cached_vector<real3> interpolateAngularVelocity(IterPos& pos,
cached_vector<real3>& gridAngVels,
Grid grid,
Expand All @@ -584,6 +600,7 @@ namespace uammd{
}

template<class Kernel, class KernelTorque>
inline
std::pair<cached_vector<real3>, cached_vector<real3>>
FCM_impl<Kernel, KernelTorque>::computeHydrodynamicDisplacements(real4* pos,
real4* force, real4* torque,
Expand Down
7 changes: 7 additions & 0 deletions src/Integrator/BDHI/FCM/utils.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ namespace uammd{
using complex = cufftComplex_t<real>;
using complex3 = cufftComplex3_t<real>;

inline
__device__ int3 indexToWaveNumber(int i, int3 nk){
int ikx = i%(nk.x/2+1);
int iky = (i/(nk.x/2+1))%nk.y;
Expand All @@ -32,10 +33,12 @@ namespace uammd{
return make_int3(ikx, iky, ikz);
}

inline
__device__ real3 waveNumberToWaveVector(int3 ik, real3 L){
return (real(2.0)*real(M_PI)/L)*make_real3(ik.x, ik.y, ik.z);
}

inline
__device__ real3 getGradientFourier(int3 ik, int3 nk, real3 L){
const bool isUnpairedX = ik.x == (nk.x - ik.x);
const bool isUnpairedY = ik.y == (nk.y - ik.y);
Expand All @@ -55,6 +58,7 @@ namespace uammd{
unpaired modes set to zero
fr is the factor to project
*/
inline
__device__ real3 projectFourier(real k2, real3 dk, real3 fr){
const real invk2 = real(1.0)/k2;
real3 vr = fr - dk*dot(fr, dk*invk2);
Expand All @@ -68,6 +72,7 @@ namespace uammd{
unpaired modes set to zero
fr is the factor to project
*/
inline
__device__ complex3 projectFourier(real k2, real3 dk, complex3 factor){
real3 re = projectFourier(k2, dk, make_real3(factor.x.x, factor.y.x, factor.z.x));
real3 imag = projectFourier(k2, dk, make_real3(factor.x.y, factor.y.y, factor.z.y));
Expand All @@ -77,6 +82,7 @@ namespace uammd{

/*Compute gaussian complex noise dW, std = prefactor -> ||z||^2 = <x^2>/sqrt(2)+<y^2>/sqrt(2) = prefactor*/
/*A complex random number for each direction*/
inline
__device__ complex3 generateNoise(real prefactor, uint id, uint seed1, uint seed2){ //Uncomment to use uniform numbers instead of gaussian
Saru saru(id, seed1, seed2);
complex3 noise;
Expand All @@ -91,6 +97,7 @@ namespace uammd{
return noise;
}

inline
__device__ bool isNyquistWaveNumber(int3 cell, int3 ncells){
/*Beware of nyquist points! They only appear with even cell dimensions
There are 8 nyquist points at most (cell=0,0,0 is excluded at the start of the kernel)
Expand Down
17 changes: 9 additions & 8 deletions src/Integrator/BDHI/PSE/initialization.cu
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Initialization
#include"Integrator/BDHI/BDHI_PSE.cuh"
namespace uammd{
namespace BDHI{
namespace pse_ns{
namespace pse_ns{
void checkInputValidity(BDHI::PSE::Parameters par){
real3 L = par.box.boxSize;
if(L.x == real(0.0) && L.y == real(0.0) && L.z == real(0.0)){
Expand All @@ -16,10 +16,10 @@ namespace uammd{
if(L.x != L.y || L.y != L.z || L.x != L.z){
System::log<System::WARNING>("[BDHI::PSE] Non cubic boxes are not really tested!");
}


}

long double computeSelfMobility(PSE::Parameters par){
//O(a^8) accuracy. See Hashimoto 1959.
//With a Gaussian this expression has a minimum deviation from measuraments of 7e-7*rh at L=64*rh.
Expand All @@ -34,11 +34,12 @@ namespace uammd{
long double a6pref = 16.0l*M_PIl*M_PIl/45.0l + 630.0L*b*b;
return 1.0l/(6.0l*M_PIl*par.viscosity*rh)*(1.0l-c*a+(4.0l/3.0l)*M_PIl*a3-a6pref*a3*a3);
}

}

PSE::PSE(shared_ptr<ParticleGroup> pg, Parameters par):
pg(pg), hydrodynamicRadius(par.hydrodynamicRadius){

PSE::PSE(shared_ptr<ParticleGroup> pg, Parameters par):
pg(pg), hydrodynamicRadius(par.hydrodynamicRadius),
temperature(par.temperature), dt(par.dt){
System::log<System::MESSAGE>("[BDHI::PSE] Initialized");
this->M0 = pse_ns::computeSelfMobility(par);
System::log<System::MESSAGE>("[BDHI::PSE] Self mobility: %f", M0);
Expand Down
Loading