-
Describe the taskCan I use GPU for training? I looked at the code and it doesn't seem to have this interface Acceptance Criteria
PriorityHigh Related EpicNo response Estimated TimeNo response Current StatusNot Started Additional InformationNo response |
Beta Was this translation helpful? Give feedback.
Replies: 12 comments 2 replies
-
Hi. The Engine is based on Lightning trainer, so you can pass any Trainer argument to the Engine. Keep in mind that currently multi-GPU training is not supported, but you can use a single GPU like you would using Trainer. |
Beta Was this translation helpful? Give feedback.
-
Then you can't pass any Trainer argument to the Engine... devices=[0,1,2,3,4,5,6,7] I've honestly found anomalib very difficult to use. |
Beta Was this translation helpful? Give feedback.
-
@vmiller987 can you check you installed the torch with cuda option. GPU in Anomalib training should be automatically picked up if you have the correct torch. With that being said, please note that multi-GPU is currently not supported, but we are working on it to enable it in v2. I would love to get your feedback regarding which parts of anomalib you find it difficult to work with |
Beta Was this translation helpful? Give feedback.
-
@samet-akcay I have the correct torch. I can get it to run on one GPU. I have to use an environment variable in order to assign Anomalib to a specific GPU. I can't pass the Engine I am a novice when it comes to unsupervised learning. I am trying to learn as I mainly have experience with supervised learning. Anomalib doesn't have a good place that explains it's models and how they should be used. The notebooks mainly revolve around Padim it seems. I am looking through the core papers/repo's for the other models to try and understand them.
|
Beta Was this translation helpful? Give feedback.
-
I'm going to retract this part. I was doing something very silly and fixed it. I'm able to get quite a few of them to run including Ganomaly. |
Beta Was this translation helpful? Give feedback.
-
This is a known issue, I've created a PR for this, which has not been merged yet. We are also working on a better solution, where you will be able to choose the device ID or train multi GPU |
Beta Was this translation helpful? Give feedback.
-
1、 Because I wanted to use GPU, I changed it to the following code snippet: datamodule.setup() 2、 Then reinstalled CUDA and TORCH that support GPU training: 3、 Finally appeared 4、 I would like to ask, what would the code look like if I use GPU training correctly? And then how much is required for the torch version? |
Beta Was this translation helpful? Give feedback.
-
Hello! Thank you for your reply! 1、 Because I wanted to use GPU, I changed it to the following code snippet: datamodule.setup() 2、 Then reinstalled CUDA and TORCH that support GPU training: 3、 Finally appeared 4、 I would like to ask, what would the code look like if I use GPU training correctly? And then how much is required for the torch version? |
Beta Was this translation helpful? Give feedback.
-
Hello, thank you for your reply! I found that I don't know how to use the GPU, and the downloaded torch version seems to have problems as well. But if I download the default torch version, it only supports CPU training, so I downloaded other torch versions that support CUDA, but in the end, it still doesn't work. How did you solve this problem? |
Beta Was this translation helpful? Give feedback.
-
How do you install torch? Regarding the torch version, anomalib requires the following torch requirement. Your torch version could also be one of the issues: Line 52 in 6ed0067 |
Beta Was this translation helpful? Give feedback.
-
You don't need to specify the GPU as accelerator, as it is automatically in For example here is the setup I tried. Available GPU❯ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:17:00.0 Off | N/A |
| 31% 38C P8 24W / 350W | 3062MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:65:00.0 Off | N/A |
| 30% 41C P8 18W / 350W | 283MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+ Code# Import the required modules
from anomalib.data import MVTec
from anomalib.engine import Engine
from anomalib.models import Patchcore
# Initialize the datamodule, model and engine
datamodule = MVTec()
model = Patchcore()
engine = Engine()
# Train the model
engine.fit(datamodule=datamodule, model=model) OutputFutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
>>> # Look at here to see if you have GPU installed, and are using it
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<<<
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
/home/sakcay/.pyenv/versions/3.11.8/envs/anomalib/lib/python3.11/site-packages/lightning/pytorch/core/optimizer.py:181: `LightningModule.configure_optimizers` returned `None`, this fit will run with no optimizer
| Name | Type | Params
------------------------------------------------------------
0 | pre_processor | PreProcessor | 0
1 | post_processor | OneClassPostProcessor | 0
2 | model | PatchcoreModel | 24.9 M
3 | image_metrics | AnomalibMetricCollection | 0
4 | pixel_metrics | AnomalibMetricCollection | 0
------------------------------------------------------------
24.9 M Trainable params
0 Non-trainable params
24.9 M Total params
99.450 Total estimated model params size (MB)
Epoch 0: 0%| | 0/7 [00:00<?, ?it/s]/home/sakcay/.pyenv/versions/3.11.8/envs/anomalib/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py:132: `training_step` returned `None`. If this was on purpose, ignore this warning...
Epoch 0: 100%|██████████████████████████████████████████████| 7/7 [00:01<00:00, 4.84it/s^Selecting Coreset Indices.: 16%|███ | 2685/16385 [00:03<00:17, 795.60it/s] |
Beta Was this translation helpful? Give feedback.
-
I'm moving this to the Q&A as I don't think this is a bug on Anomalib, but an installation issue on your end. Feel free to ask your questions there. Thanks |
Beta Was this translation helpful? Give feedback.
Hello, thank you for your reply!
I know where the problem lies. Initially, I used a computer with only a CPU for installation, that is, pip install anomalib. After successfully running, the environment was copied to a computer with a GPU, only to find that the GPU could not be used for training. Reinstalling has now resolved the issue.