This repository contains my MSc thesis on the subject of using GPUs to solve the Navier-Stokes equations using the Finite Element Method. For a brief summary check below, for the full document check MSc_Vasil_Pashov.pdf. This repo contains both CPU and GPU implementation of the described methods.
The Navier-Stokes equations are a system of non-linear partial differential equations which describe the flow incompressible Newtonian fluids.
The first equation is known as the Conservation of momentum. It can be derived from the Caucy's equations (Newton's second law). It consists of three major terms, diffusion, advection and pressure.
The second equation is known as the Continuity equation. It enforces the incompressibility of the fluid.
The FEM is a method used to solve differential equations, on of the advantages of the method is that it can be used in complex geometries. Not all elements are appropriate for solving the Navier-Stokes equations. In order for an element to be stable if has to conform to the LBB condition. In this work we use P2-P1 Tylor-Hood element. It has 6 degrees of freedom (one in each vertex and one in the middle of each side), the pressure has 3 degrees of freedom (on in each vertex).
Solving the equations as-is is a difficult problem, applying the FEM directly would require to solve non-linear system of equations. Thus we split the system of equations into three parts:
- Advection - solved via the semi-Lagrangian method
- Diffusion - solved via FEM, the linear system is solved via Conjugate Gradient Method
- Pressure Projection - imposes the incompressibility, the advection must always be performed in a divergence free velocity field. This basically requires solving a linear system of equations via the Conjugate Gradient Method
- The semi-Lagrangian method used for the advection is perfect for GPU implementation, however only small portion of the time is spent in it.
- Most of the time is spent in the Conjugate Gradient method. Megakernel implementation of the method (with a global thread lock inside the kernel) is far superior to multikernel implementation.
- Better algorithm for building the KD Tree for the advection
- Use second order operator splitting (Strang splitting)
- Research suitable preconditioners. The IL0 and Incomplete Cholesky are slow to build and apply and are not trivial to implement in multithreaded environment.
- Only Dirichlet and homogeneous second order boundary conditions are allowed.
- Cannot mesh the geometry it has to be split into triangles
Check the conanfile.txt for list of dependencies. In order to download all dependencies you can use the following:
mkdir conan
cd conan
conan install ../ --build=missing -s build_type=Release
conan install ../ --build=missing -s build_type=Debug
This will prepare all required packages. After preparing the pachages don't forget to point the CMAKE_INSTALL_PREFIX
variable to the folder where conan packages were installed (i.e. the conan
folder from the above example).
- The Sparse Matrix Math library is added as a submodule. In order to fetch the submodule run:
git submodule update --init --recursive
You can start reading from main.cpp, the assembling of the FEM matrices and the solution of the equations happens in assembly.h. The CPU implementation of the Conjugate Gradient method is implemented in the Sparse Matrix Math library. The GPU implementation is in gpu, gpu_cpu_shared contains headers with structures which can be used both on CPU and on GPU. Most of the GPU device setup happens in gpu_host_common.cpp and gpu_simulation_device.cpp, expression.h and expression.cpp contain expression tree which parses simple math formulas and can then plug variables into them and compute the result, it's used for the boundary conditions.