Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same NaN error / not a solution. RX 6800 #7

Open
InkyZima opened this issue May 7, 2023 · 9 comments
Open

Same NaN error / not a solution. RX 6800 #7

InkyZima opened this issue May 7, 2023 · 9 comments

Comments

@InkyZima
Copy link

InkyZima commented May 7, 2023

Context: I am here from AUTOMATIC1111/stable-diffusion-webui#5468.
@hydrian i tried this repo / docker; it does not work for me. AMD RX 6800. clean Lubuntu host (5.19 kernel). I also tried --precision full, --no-half, "Upcast cross attention layer to float32".
--disable-nan-check just produces black images.

@hydrian
Copy link
Owner

hydrian commented May 7, 2023

Last i knew, 5.19 kernel was not supported by ROCm. Try downgrading 5.17 kernel.

@hydrian
Copy link
Owner

hydrian commented May 7, 2023

Actually, installling rocm 5.5 just released and supports kernel 5.19. You can try updating that on the host system.

@InkyZima
Copy link
Author

InkyZima commented May 8, 2023

very interesting! thanks a lot for the info. Will try ASAP, this weekend latest, and let you know. fingers crossed (:

btw, you are missing a ' at the end of the line in the readme in "Run on the command docker build . -t 'stable-diffusion-webui-rocm"

@InkyZima
Copy link
Author

InkyZima commented May 9, 2023

tried; didn't work with kernel 5.17.15 and rocm 5.4.2. it keeps producing NaNs / black images only.
Regarding rocm 5.5: i don't know how to get that to work; i can install rocm 5.5 from amd on my host, but there is no torch rocm5.5.

@hydrian
Copy link
Owner

hydrian commented May 9, 2023

How are installing rocm on the host system?

I'm using the deb Installation and the rocm packages are very picky. You can't just use mainline/urkuu and install a kernel of the 'supported' version.

With rocm 5.4.2, I had to install the kernel deb package, linux-oem-22.04 deb package. This will give rocm the 5.17 the package it is expect. Pytorch wants this version too.

With rocm 5.5, things get messier. Last I knew, pytorch only officially supported up to 5.4.2. They haven't added 5.4.3 or 5.5 support officially yet. I'm assuming rocm 5.5 is based off the linux-image-generic-hwe-22.04 deb kernel package. I'm testing it now. Can't say I'm holding my breath here. So we can try mixed versions. Not great, if it helps it could be helpful for people.

We really need my rocm development / testing. It feels like rocm is a second class citizen to cuda.

@InkyZima
Copy link
Author

InkyZima commented May 9, 2023

thanks for the info. ill try to spend some more time testing this weekend. Though it might be wise to just wait a few weeks until pytorch+rocm5.5 is out. Related: vladmandic/automatic#741 (reply in thread)

@hydrian
Copy link
Owner

hydrian commented May 11, 2023

I just updated the rocm5.5 branch. That loads the rocm 5.5 deb packages but still uses the SDW 5.4.2rocm build. I haven't had any issues with the mixed version so far.

You can easily build the image by using the command bash build.sh rocm5.5 and deploy it with the standard docker-compose command.

See how it works for you.

@InkyZima
Copy link
Author

InkyZima commented May 14, 2023

Hi, thanks for the effort; unfortunately no luck; same NaN error.
As a side note (Im sure there's a way to do this, Im just not Docker skilled enough); when wanting to change the COMMANDLINE_ARGS (such as, for example to try and see if it works with --precision full --no-half), i would edit the docker-compose.yml (e.g. uncommenting that env variable), and that would lead to re-download of pytorch (that is 1.5GB of data) on next docker-compose up, which is annoying. I think this could be avoided.
Thanks again for the effort.

@hydrian
Copy link
Owner

hydrian commented May 14, 2023

That is sort of how the SDW application works. I can't really help that. That's inside the application. When you update the docker-compose, docker redeploys the whole container, so SDW can't find the previous download and thus redownload.

The other option is to make is part of the container image which isn't ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants