Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

GPU prover #16

Open
Brechtpd opened this issue Sep 12, 2022 · 5 comments
Open

GPU prover #16

Brechtpd opened this issue Sep 12, 2022 · 5 comments
Assignees

Comments

@Brechtpd
Copy link

Brechtpd commented Sep 12, 2022

Look into using the GPU to speed up certain prover work:

  • FFT
  • MSM
  • Custom gates?

Libraries:

@mratsim
Copy link

mratsim commented Jun 14, 2023

Others:

See also my quick analysis at: mratsim/constantine#92

There are 2 additional backends that might be interesting:

  • AMD GPUs, in particular because AMD offers significantly more memory than Nvidia, (see AMD teasing: https://community.amd.com/t5/gaming/building-an-enthusiast-pc/ba-p/599407) but they aren't available in cloud machines
  • Apple Metal, due to unified memory, Mac Studios and Mac pro can access up to 192GB of memory, enough to fit the super-circuit. However Metal Assembly is closed source, I tried to look into reverse engineering effort to at least find add-with-carry, either from Apple LLVM or Asahi Linux but I'm not hopeful.

Intel integrated GPUs also have unified memory but they are not powerful enough. In case we want to use those we need to wait for an LLVM version with SPIR-V that is not experimental otherwise LLVM needs to be built from source with a couple of other LLVM+SPIR-V translators.

@hugo-blue
Copy link

Look into using the GPU to speed up certain prover work:

  • FFT
  • MSM
  • Custom gates?

Libraries:

The evaluation part of lookup and permutation also deserve optimization.

@hugo-blue
Copy link

Others:

See also my quick analysis at: mratsim/constantine#92

There are 2 additional backends that might be interesting:

  • AMD GPUs, in particular because AMD offers significantly more memory than Nvidia, (see AMD teasing: https://community.amd.com/t5/gaming/building-an-enthusiast-pc/ba-p/599407) but they aren't available in cloud machines
  • Apple Metal, due to unified memory, Mac Studios and Mac pro can access up to 192GB of memory, enough to fit the super-circuit. However Metal Assembly is closed source, I tried to look into reverse engineering effort to at least find add-with-carry, either from Apple LLVM or Asahi Linux but I'm not hopeful.

Intel integrated GPUs also have unified memory but they are not powerful enough. In case we want to use those we need to wait for an LLVM version with SPIR-V that is not experimental otherwise LLVM needs to be built from source with a couple of other LLVM+SPIR-V translators.

As there are many Nvidia GPUs available in the crypto mining market. Focusing on Nvidia GPU should be enough.

For each zkp project, to reduce the time of data copy and save memory, there should be also a common memory management module for MSM, FFT and so on.

@mratsim
Copy link

mratsim commented Jun 26, 2023

As there are many Nvidia GPUs available in the crypto mining market. Focusing on Nvidia GPU should be enough.

The miners focused on megahash per watt first, which was dominated by AMD GPUs, then they used Nvidia GPUs. However, GPUs with large amount of VRAM consume more (and cost more) without it being useful for parallel SHA256 computation.

Concretely they bought a lot of AMD RX480 and Nvidia GTX 1080ti but those had only 8 and 11GB of RAM.

And nvidia is still gimping the RAM of its GPUs (there are AMD consumer GPUs with 24GB)

For each zkp project, to reduce the time of data copy and save memory, there should be also a common memory management module for MSM, FFT and so on.

Do you have an example of this? Even on CPUs.

@hugo-blue
Copy link

hugo-blue commented Jun 26, 2023

As there are many Nvidia GPUs available in the crypto mining market. Focusing on Nvidia GPU should be enough.

The miners focused on megahash per watt first, which was dominated by AMD GPUs, then they used Nvidia GPUs. However, GPUs with large amount of VRAM consume more (and cost more) without it being useful for parallel SHA256 computation.

Concretely they bought a lot of AMD RX480 and Nvidia GTX 1080ti but those had only 8 and 11GB of RAM.

And nvidia is still gimping the RAM of its GPUs (there AMD consumer GPUs with 24GB)

I see. So, there is a challenge to let low-end machines with GPUs like 1080 to do zkp proving.

For each zkp project, to reduce the time of data copy and save memory, there should be also a common memory management module for MSM, FFT and so on.

Do you have an example of this? Even on CPUs.

On CPUs, the system DDR is shared for all the computation, and no need to care about this. For GPU, there is limited memory, which is smaller than DDR, so memory management is essential.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Status: 📝 Todo
Development

No branches or pull requests

5 participants