PyTroch-Lightning Version Update #104

TaleirOfDeynai · 2022-10-14T03:59:10Z

What this does:

This PR works to resolve all deprecation warnings and hard incompatibilities with current versions of PyTorch-Lightning. It also sets 1.7.7 as the requested version in environment.yaml.

Switches from the now completely dead and gone TestTubeLogger to the now Lightning default TensorBoardLogger. If you need to use another logger, you should be able to set it up in the configs. It seems to work more or less the same as TestTubeLogger, in-so-far as how Textual Inversion was using it.

There are a few other minor code adjustments:

Shifting things to better locations in Lightning modules.
- Like how the keyboard interrupt checkpointing is handled.
- Or how the loaded dataset info is reported to console.
Resolving some warnings displayed by the IDE I found in, probably, unused code-paths.
Adding a .gitignore to exclude a few things that have appeared while using TI and probably shouldn't be committed.

What this does not do:

The reason for all these deprecation warnings was they were shifting towards a hardware agnostic API. In order to become hardware agnostic, we'd need to apply a few additional changes here and there. I'm leaving that to someone else to resolve, if it interests them.

I don't know if this will run on other accelerators besides GPU and maybe CPU in its current state, but someone else can make those changes if it interests them.

Why?

My AMD system is a house of cards when it comes to compute, and it was having difficulty inter-operating with Anaconda. Having to run on the global installation of Python and it's abysmal package management, I needed to bring Textual Inversion up to date so it was not fighting with other Stable Diffusion libraries that were keeping up with their dependencies.

There's been some big updates to the ROCm stack lately, so maybe I can now use Anaconda!? ...but that was after I had already started this journey. Doing dependency upkeep (especially on your core framework) is a good thing anyways, so here's the PR!

Additionally, after updating and using the new strategies and accelerators system, I got a 60% performance boost, because the hard-coded DDP mode was detrimental to my single-GPU setup. PyTorch-Lightning has actually gotten quite good at auto-detecting the best execution method for compute code, so I left it un-opinionated and updated the readme to demonstrate the --accellerator gpu flag, which probably isn't even needed... But since I have not tested the other accelerators besides GPU, it's the one to put there in the demo.

If you really want to force the DDP strategy, you can use --strategy ddp to set it up.

What needs special review attention:

I am not setup for anything besides Stable Diffusion and I'm frankly afraid to jostle this fragile setup trying to test training for other models. I would appreciate it if someone who is setup to test the autoencoder and latent-diffusion configs could please give this branch a try and make sure no deprecation warnings appear for an epoch or two of training.

fixme - updating

Also, moves the printing of `datasets` to the `prepare_data` function. Lightning does handle the calling of those. The problem was that the `datasets` property does not exist until `setup` is called, so it was the wrong place to print this information.

In one place, an unqualified module access and in another, some code was rendered inaccessible when someone added a `raise`. Neither of these codepaths are probably relevant, but I want them fixed up anyways.

I have been using this function to get feedback on the relative complexity of the embedding run-to-run by recording it in the logger, but something, somewhere, is calling `backward` on it after a little more than 12100 steps. This makes sure it has a gradient.

`self.logvar` is on the CPU while the `t` given is usually created on `self.device` which is likely different. 1.12 would have allowed this indexing followed by the transfer, but 1.13 is a little more strict. Fixing this by transferring the whole tensor before performing the indexing.

TaleirOfDeynai added 6 commits October 8, 2022 20:22

Add a gitignore with a few basic things.

3960a81

Updating pytorch-lightning and resolving deprecations.

d506d74

fixme - updating

Fixing some code oddities.

dd7ccee

In one place, an unqualified module access and in another, some code was rendered inaccessible when someone added a `raise`. Neither of these codepaths are probably relevant, but I want them fixed up anyways.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTroch-Lightning Version Update #104

PyTroch-Lightning Version Update #104

TaleirOfDeynai commented Oct 14, 2022

PyTroch-Lightning Version Update #104

Are you sure you want to change the base?

PyTroch-Lightning Version Update #104

Conversation

TaleirOfDeynai commented Oct 14, 2022

What this does:

What this does not do:

Why?

What needs special review attention: