RaulPPelaez · PabloIbannez · Feb 19, 2024 · Feb 20, 2024 · Feb 20, 2024 · Feb 20, 2024
diff --git a/docs/BoundaryValueProblemSolver.rst b/docs/BoundaryValueProblemSolver.rst
@@ -19,8 +19,7 @@ With the boundary conditions:
 
 Being :math:`\alpha_n,\beta_n` some arbitrary parameters. These parameters can take any value (including zero) as long as the resulting BVP remains well defined. For instance, making  :math:`\alpha_1,\beta_1` equal to zero at the same time results in a system with no unique solution.
 
-Both :math:`y(z)` and :math:`f(z)` can be complex-valued. In the main branches of the UAMMD repository only :math:`\alpha_2,\beta_2` can be complex numbers (the rest of the parameters, including :math:`k`, must be real valued). The branch complex_bvp contains an adaptation of the solver that allows every parameter to be complex. As a side note, should the need arise in the future, it would be fairly easy to modify the solver as to make the type of the parameters a template argument.
-
+Both :math:`y(z)` and :math:`f(z)` can be real or complex-valued as well as :math:`\alpha_2,\beta_2,k` and any other parameter.
 .. hint::
 
    Initialization of the batched BVP solver requires inverting a dense matrix for each value of :math:`k`, which can become quite expensive. The solver tries to mitigate this cost by inverting these matrices in parallel, but experience suggests that letting it use more than 4 cores is counter-productive.
@@ -37,10 +36,9 @@ Usage
 ------
 
 Every function and class in the BVP solver source code lies under the :cpp:`uammd::BVP` namespace.
-The BVP solver library exposes two main classes:
-
+The BVP solver library exposes two main template classes , designed to manage both real and complex-valued numbers. To ensure consistent behavior and avoid unexpected results, it is recommended to use the complex number types defined in the Thrust library, specifically thrust::complex<real>, when working with complex numbers.
 
-.. cpp:class:: BatchedBVPHandler
+.. cpp:class:: template<class T> BatchedBVPHandler<T>
 
    Used to initialize and hold the information for a group of boundary value problems. Each subproblem can have different parameters (mainly :math:`\alpha_n,\beta_n,k` in the above equations.
    This class cannot be used in device code, its only used for initialization. See :cpp:any:`BatchedBVPGPUSolver`.
@@ -56,14 +54,14 @@ The BVP solver library exposes two main classes:
 		  :param H: The location of the boundary conditions (goes from -H to H).
 		  :param nz: Number of elements of a subproblem (all subproblems must have the same size).
 
-   .. cpp:function::  BatchedBVPGPUSolver getGPUSolver();
+   .. cpp:function::  template<class T> BatchedBVPGPUSolver_impl<T> getGPUSolver();
 
       Provides an instance of the solver to be used in the GPU.
 
-.. cpp:class:: BatchedBVPGPUSolver
+.. cpp:class:: template<class T> BatchedBVPGPUSolver<T>
 
-   While :cpp:any:`BatchedBVPHandler` is used to initialize and store the different subproblems, this class is used to actually solve the subproblems in a CUDA kernel.
-   This class has no public constructors, the only way to get an instance to it is via :cpp:any:`BatchedBVPHandler::getGPUSolver`.
+   While :cpp:any:`BatchedBVPHandler<T>` is used to initialize and store the different subproblems, this class is used to actually solve the subproblems in a CUDA kernel.
+   This class has no public constructors, the only way to get an instance to it is via :cpp:any:`BatchedBVPHandler<T>::getGPUSolver`.
 
    .. cpp:function:: template<class T, class FnIterator, class AnIterator, class CnIterator>   __device__ void solve(int instance,		    const FnIterator& fn,			    T alpha_2, T beta_2,			    AnIterator& an,			    CnIterator& cn);
 
@@ -93,6 +91,20 @@ Initialization requires an iterator to a special type of functor that provides t
 
       Returns :math:`\alpha_1` or :math:`\beta_1`, depending on which BC this class represents.
 
+Aliases for Real and Complex Types
+----------------------------------
+
+To facilitate the use of the BVP solver with real and complex numbers, the following aliases are defined:
+
+.. code-block:: cpp
+
+    using BatchedBVPHandlerReal     = BatchedBVPHandler_impl<real>;
+    using BatchedBVPHandlerrComplex = BatchedBVPHandler_impl<thrust::complex<real>>;
+
+    using BatchedBVPGPUSolverReal    = BatchedBVPGPUSolver_impl<real>;
+    using BatchedBVPGPUSolverComplex = BatchedBVPGPUSolver_impl<thrust::complex<real>>;
+
+These aliases allow for a more intuitive and type-safe way to work with the BVP solver for different numerical types.
 
 Example
 ++++++++

diff --git a/src/Integrator/BDHI/BDHI_FCM.cu b/src/Integrator/BDHI/BDHI_FCM.cu
@@ -4,6 +4,7 @@
 #include"BDHI_FCM.cuh"
 namespace uammd{
   namespace BDHI{
+    inline
     auto FCMIntegrator::computeHydrodynamicDisplacements(){
       auto pos = pd->getPos(access::location::gpu, access::mode::read);
       auto force = pd->getForce(access::location::gpu, access::mode::read);
@@ -15,6 +16,7 @@ namespace uammd{
 						   temperature, 1.0/sqrt(dt), st);
     }
 
+    inline
     void FCMIntegrator::updateInteractors(){
       for(auto forceComp: interactors) forceComp->updateSimulationTime(steps*dt);
       if(steps==1){
@@ -26,6 +28,7 @@ namespace uammd{
       }
     }
 
+    inline
     void FCMIntegrator::resetForces(){
       int numberParticles = pg->getNumberParticles();
       auto force = pd->getForce(access::location::gpu, access::mode::write);
@@ -34,6 +37,7 @@ namespace uammd{
       CudaCheckError();
     }
 
+    inline
     void FCMIntegrator::resetTorques(){
       int numberParticles = pg->getNumberParticles();
       auto torque = pd->getTorque(access::location::gpu, access::mode::write);
@@ -42,6 +46,7 @@ namespace uammd{
       CudaCheckError();
     }
 
+    inline
     void FCMIntegrator::computeCurrentForces(){
       resetForces();
       if (pd->isDirAllocated()) resetTorques();
@@ -56,6 +61,7 @@ namespace uammd{
       /*With all the terms computed, update the positions*/
       /*T=0 case is templated*/
       template<class IndexIterator>
+      inline
       __global__ void integrateEulerMaruyamaD(real4* pos,
 					      real4* dir,
 					      IndexIterator indexIterator,
@@ -75,7 +81,7 @@ namespace uammd{
 	/*Write to global memory*/
 	pos[i] = make_real4(p,c);
 	/*Update the orientation*/
-	if(dir){ 
+	if(dir){
 	  Quat dirc = dir[i];
 	  //printf("W %f %f %f\n", angularV[id].x, angularV[id].y, angularV[id].z);
 	  //printf("V %f %f %f\n", linearV[id].x, linearV[id].y, linearV[id].z);
@@ -86,6 +92,7 @@ namespace uammd{
       }
     }
 
+    inline
     void FCMIntegrator::forwardTime(){
       steps++;
       sys->log<System::DEBUG1>("[BDHI::FCM] Performing integration step %d", steps);
@@ -97,12 +104,12 @@ namespace uammd{
       auto angularVelocities = disp.second;
       auto indexIter = pg->getIndexIterator(access::location::gpu);
       auto pos = pd->getPos(access::location::gpu, access::mode::readwrite);
-      auto dir = pd->getDirIfAllocated(access::location::gpu, access::mode::readwrite);      
+      auto dir = pd->getDirIfAllocated(access::location::gpu, access::mode::readwrite);
       real3* d_linearV = thrust::raw_pointer_cast(linearVelocities.data());
       real3* d_angularV = dir.raw()?thrust::raw_pointer_cast(angularVelocities.data()):nullptr;
       int BLOCKSIZE = 128; /*threads per block*/
       int nthreads = BLOCKSIZE<numberParticles?BLOCKSIZE:numberParticles;
-      int nblocks = numberParticles/nthreads + ((numberParticles%nthreads!=0)?1:0);      
+      int nblocks = numberParticles/nthreads + ((numberParticles%nthreads!=0)?1:0);
       FCM_ns::integrateEulerMaruyamaD<<<nblocks, nthreads, 0, st>>>(pos.raw(),
 								    dir.raw(),
 								    indexIter,

diff --git a/src/Integrator/BDHI/DoublyPeriodic/DPStokesSlab.cuh b/src/Integrator/BDHI/DoublyPeriodic/DPStokesSlab.cuh
@@ -169,7 +169,7 @@ else throw std::runtime_error("[DPStokesSlab] Can only average in direction X (0
       real gw;
       real tolerance;
       WallMode mode;
-      shared_ptr<BVP::BatchedBVPHandler> bvpSolver;
+      shared_ptr<BVP::BatchedBVPHandlerReal> bvpSolver;
 
     };
 

diff --git a/src/Integrator/BDHI/DoublyPeriodic/StokesSlab/Correction.cuh b/src/Integrator/BDHI/DoublyPeriodic/StokesSlab/Correction.cuh
@@ -506,7 +506,7 @@ namespace uammd{
     }
 
     auto computeZeroModeBoundaryConditions(int nz, real H, WallMode mode){
-      BVP::SchurBoundaryCondition bcs(nz, H);
+      BVP::SchurBoundaryConditionReal bcs(nz, H);
       if(mode == WallMode::bottom){
 	correction_ns::TopBoundaryConditions top(0, H);
 	correction_ns::BottomBoundaryConditions bot(0, H);

diff --git a/src/Integrator/BDHI/DoublyPeriodic/StokesSlab/initialization.cu b/src/Integrator/BDHI/DoublyPeriodic/StokesSlab/initialization.cu
@@ -144,7 +144,7 @@ namespace uammd{
       auto botBC = thrust::make_transform_iterator(thrust::make_counting_iterator<int>(0), botdispatch);
       int numberSystems = (nk.x/2+1)*nk.y;
       int nz = grid.cellDim.z;
-      this->bvpSolver = std::make_shared<BVP::BatchedBVPHandler>(klist, topBC, botBC, numberSystems, halfH, nz);
+      this->bvpSolver = std::make_shared<BVP::BatchedBVPHandlerReal>(klist, topBC, botBC, numberSystems, halfH, nz);
       CudaCheckError();
     }
 

diff --git a/src/Integrator/BDHI/FCM/FCM_impl.cuh b/src/Integrator/BDHI/FCM/FCM_impl.cuh
@@ -224,6 +224,7 @@ namespace uammd{
       };
 
       template<class IterPos, class IterForce, class Kernel>
+      inline
       cached_vector<real3> spreadForces(IterPos& pos, IterForce& force,
 					int numberParticles,
 					std::shared_ptr<Kernel> kernel,
@@ -248,6 +249,7 @@ namespace uammd{
       };
 
       template<class T, class T3, class Container>
+      inline
       auto getCoordinateVector(Container &v, int coord){
 	cached_vector<T> v_a(v.size());
 	T* ptr= (T*)thrust::raw_pointer_cast(v.data());
@@ -264,14 +266,17 @@ namespace uammd{
 	  return {thrust::get<0>(a), thrust::get<1>(a),thrust::get<2>(a)};
 	}
       };
+
       template<class T3, class Container>
+      inline
       auto interleave(Container &a, Container &b, Container &c){
 	auto zip = thrust::make_zip_iterator(thrust::make_tuple(a.begin(), b.begin(), c.begin()));
 	cached_vector<T3> res(a.size());
         thrust::transform(zip, zip+a.size(), res.begin(), Zip23());
 	return res;
       }
 
+      inline
       cached_vector<complex3> forwardTransform(cached_vector<real3>& gridReal,
 						    int3 n,
 						    cufftHandle plan, cudaStream_t st){
@@ -286,6 +291,7 @@ namespace uammd{
       }
 
 
+      inline
       __global__ void addTorqueCurl(complex3 *gridTorquesFourier, complex3* gridVelsFourier, Grid grid){
 	int id = blockDim.x*blockIdx.x + threadIdx.x;
 	const int3 nk = grid.cellDim;
@@ -307,6 +313,7 @@ namespace uammd{
 
 
       template<class IterPos, class IterTorque, class Kernel>
+      inline
       void addSpreadTorquesFourier(IterPos& pos, IterTorque& torque, int numberParticles,
 				   Grid grid,
 				   std::shared_ptr<Kernel> kernel,
@@ -339,6 +346,7 @@ namespace uammd{
 	 Output:gridVels = B·FFTf·S·F -> B \propto (I-k^k/|k|^2)
        */
       /*A thread per fourier node*/
+      inline
       __global__ void forceFourier2Vel(const complex3 * gridForces, /*Input array*/
 				       complex3 * gridVels, /*Output array, can be the same as input*/
 				       real vis,
@@ -361,6 +369,7 @@ namespace uammd{
         gridVels[id] = projectFourier(k2, dk, factor)*(B/real(ncells.x*ncells.y*ncells.z));
       }
 
+      inline
       void convolveFourier(cached_vector<complex3>& gridVelsFourier, real viscosity, Grid grid, cudaStream_t st){
 	System::log<System::DEBUG2>("[BDHI::FCM] Wave space velocity scaling");
 	/*Scale the wave space grid forces, transforming in velocities -> B·FFT·S·F*/
@@ -379,6 +388,7 @@ namespace uammd{
 	This kernel gets v_k = gridVelsFourier = B·FFtt·S·F as input and adds 1/√σ·√B(k)·dWw.
 	Keeping special care that v_k = v*_{N-k}, which implies that dWw_k = dWw*_{N-k}
       */
+      inline
       __global__ void fourierBrownianNoise(complex3 * gridVelsFourier,
 					   Grid grid,
 					   real prefactor,/* sqrt(2·T/dt)*/
@@ -451,6 +461,7 @@ namespace uammd{
 	}
 }
 
+      inline
       void addBrownianNoise(cached_vector<complex3>& gridVelsFourier,
 			    real temperature, real viscosity, real prefactor,
 			    uint seed,
@@ -487,6 +498,7 @@ namespace uammd{
 	}
       }
 
+      inline
       cached_vector<real3> inverseTransform(cached_vector<complex3>& gridFourier,
 					    int3 n, cufftHandle plan, cudaStream_t st){
 	int nx = 2*(n.x/2+1);
@@ -501,6 +513,7 @@ namespace uammd{
       }
 
       template<class IterPos, class Kernel>
+      inline
       cached_vector<real3> interpolateVelocity(IterPos& pos, cached_vector<real3>& gridVels,
 					       Grid grid, std::shared_ptr<Kernel> kernel,
 					       int numberParticles, cudaStream_t st){
@@ -524,6 +537,7 @@ namespace uammd{
       // = 0.5( i*k_y*V_z - i*k_z(V_y), i*k_z(V_x) - i*k_x*V_z, i*k_x*V_y - i*k_y*V_x)
       //Overwrite the output vector with the angular velocities in Fourier space
       //The input velocity vector is overwritten
+      inline
       __global__ void computeVelocityCurlFourier(const complex3 *gridVelsFourier,
 						 complex3* gridAngVelsFourier,
 						 Grid grid){
@@ -549,6 +563,7 @@ namespace uammd{
 	gridAngVelsFourier[id] = gridAng;
       }
 
+      inline
       cached_vector<complex3> computeGridAngularVelocityFourier(cached_vector<complex3>& gridVelsFourier,
 								Grid grid,  cudaStream_t st){
 	const int3 n = grid.cellDim;
@@ -565,6 +580,7 @@ namespace uammd{
       }
 
       template<class IterPos, class Kernel>
+      inline
       cached_vector<real3> interpolateAngularVelocity(IterPos& pos,
 						      cached_vector<real3>& gridAngVels,
 						      Grid grid,
@@ -584,6 +600,7 @@ namespace uammd{
     }
 
     template<class Kernel, class KernelTorque>
+    inline
     std::pair<cached_vector<real3>, cached_vector<real3>>
     FCM_impl<Kernel, KernelTorque>::computeHydrodynamicDisplacements(real4* pos,
 								     real4* force, real4* torque,

diff --git a/src/Integrator/BDHI/FCM/utils.cuh b/src/Integrator/BDHI/FCM/utils.cuh
@@ -22,6 +22,7 @@ namespace uammd{
       using complex = cufftComplex_t<real>;
       using complex3 = cufftComplex3_t<real>;
 
+      inline
       __device__ int3 indexToWaveNumber(int i, int3 nk){
 	int ikx = i%(nk.x/2+1);
 	int iky = (i/(nk.x/2+1))%nk.y;
@@ -32,10 +33,12 @@ namespace uammd{
 	return make_int3(ikx, iky, ikz);
       }
 
+      inline
       __device__ real3 waveNumberToWaveVector(int3 ik, real3 L){
 	return (real(2.0)*real(M_PI)/L)*make_real3(ik.x, ik.y, ik.z);
       }
 
+      inline
       __device__ real3 getGradientFourier(int3 ik, int3 nk, real3 L){
 	const bool isUnpairedX = ik.x == (nk.x - ik.x);
 	const bool isUnpairedY = ik.y == (nk.y - ik.y);
@@ -55,6 +58,7 @@ namespace uammd{
 	  unpaired modes set to zero
 	fr is the factor to project
       */
+      inline
       __device__ real3 projectFourier(real k2, real3 dk, real3 fr){
 	const real invk2 = real(1.0)/k2;
 	real3 vr = fr - dk*dot(fr, dk*invk2);
@@ -68,6 +72,7 @@ namespace uammd{
 	  unpaired modes set to zero
 	fr is the factor to project
       */
+      inline
       __device__ complex3 projectFourier(real k2, real3 dk, complex3 factor){
 	real3 re = projectFourier(k2, dk, make_real3(factor.x.x, factor.y.x, factor.z.x));
 	real3 imag = projectFourier(k2, dk, make_real3(factor.x.y, factor.y.y, factor.z.y));
@@ -77,6 +82,7 @@ namespace uammd{
 
       /*Compute gaussian complex noise dW, std = prefactor -> ||z||^2 = <x^2>/sqrt(2)+<y^2>/sqrt(2) = prefactor*/
       /*A complex random number for each direction*/
+      inline
       __device__ complex3 generateNoise(real prefactor, uint id, uint seed1, uint seed2){	  //Uncomment to use uniform numbers instead of gaussian
 	Saru saru(id, seed1, seed2);
 	complex3 noise;
@@ -91,6 +97,7 @@ namespace uammd{
 	return noise;
       }
 
+      inline
       __device__ bool isNyquistWaveNumber(int3 cell, int3 ncells){
 	/*Beware of nyquist points! They only appear with even cell dimensions
 	  There are 8 nyquist points at most (cell=0,0,0 is excluded at the start of the kernel)

diff --git a/src/Integrator/BDHI/PSE/initialization.cu b/src/Integrator/BDHI/PSE/initialization.cu
@@ -6,7 +6,7 @@ Initialization
 #include"Integrator/BDHI/BDHI_PSE.cuh"
 namespace uammd{
   namespace BDHI{
-    namespace pse_ns{      
+    namespace pse_ns{
       void checkInputValidity(BDHI::PSE::Parameters par){
 	real3 L = par.box.boxSize;
 	if(L.x == real(0.0) && L.y == real(0.0) && L.z == real(0.0)){
@@ -16,10 +16,10 @@ namespace uammd{
 	if(L.x != L.y || L.y != L.z || L.x != L.z){
 	  System::log<System::WARNING>("[BDHI::PSE] Non cubic boxes are not really tested!");
 	}
-	
-	
+
+
       }
-      
+
       long double computeSelfMobility(PSE::Parameters par){
 	//O(a^8) accuracy. See Hashimoto 1959.
 	//With a Gaussian this expression has a minimum deviation from measuraments of 7e-7*rh at L=64*rh.
@@ -34,11 +34,12 @@ namespace uammd{
 	long double a6pref = 16.0l*M_PIl*M_PIl/45.0l + 630.0L*b*b;
 	return  1.0l/(6.0l*M_PIl*par.viscosity*rh)*(1.0l-c*a+(4.0l/3.0l)*M_PIl*a3-a6pref*a3*a3);
       }
-      
+
     }
-
-    PSE::PSE(shared_ptr<ParticleGroup> pg, Parameters par):
-      pg(pg), hydrodynamicRadius(par.hydrodynamicRadius){
+
+PSE::PSE(shared_ptr<ParticleGroup> pg, Parameters par):
+      pg(pg), hydrodynamicRadius(par.hydrodynamicRadius),
+      temperature(par.temperature), dt(par.dt){
       System::log<System::MESSAGE>("[BDHI::PSE] Initialized");
       this->M0 = pse_ns::computeSelfMobility(par);
       System::log<System::MESSAGE>("[BDHI::PSE] Self mobility: %f", M0);