Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch Lightning Model Question #107

Closed
calvinp0 opened this issue Jun 24, 2024 · 6 comments · Fixed by #98
Closed

Pytorch Lightning Model Question #107

calvinp0 opened this issue Jun 24, 2024 · 6 comments · Fixed by #98
Assignees
Labels
question Further information is requested

Comments

@calvinp0
Copy link

Hi!

I have built my model using Pytorch Lighntning, thus it has the functions training_step, validation_step etc. I attempted to follow the tutorial here: https://torch-uncertainty.github.io/auto_tutorials/tutorial_der_cubic.html#gathering-everything-and-training-the-model

But it errors with NotImplementedError: Module [CMPNNModel] is missing the required "forward" function (which I guess may be obvious). So does this mean to utilise this package I will need to change my model from a PyTorch Lightning one to a Torch one? Or have I done something incorrect.

Thank you!

@o-laurent o-laurent self-assigned this Jun 24, 2024
@o-laurent
Copy link
Contributor

Hi @calvinp0!

Thank you for your feedback!

You need to define the __forward__(self, x: Tensor) -> Tensor method of your Lightning module (as shown in the starter example). A Lightning module is an extension of an nn.Module, and therefore should include a __forward__. If you do so, you will be able to test and train your model with the RegressionRoutine.

However, if you use trainer.fit or trainer.test, it will not use your own loops but those of the RegressionRoutine, so it may not work depending on your model. In the general supervised case, we would advise wrapping a simple torch.nn.Module in the RegressionRoutine.

🚧 Our implementation for regression is still unstable but we have made progress in the soon-to-come 0.2.1 version that we will merge in the following days. Reach out and raise issues if you have other questions or concerns. 🚧

To read if you want to keep your LightningModule:

You won't have the computation of the metrics that come with the RegressionRoutine - you can directly use the DERLoss from torch_uncertainty.losses and the NormalInverseGammaLayer from torch_uncertainty.layers.distributions. But anyway, you will need the __forward__(self, x: Tensor) -> Tensor method.

In any case, don't hesitate to give us more details here or contact us through Discord

(written with @alafage)

@o-laurent o-laurent added the question Further information is requested label Jun 24, 2024
@o-laurent o-laurent reopened this Jun 26, 2024
@calvinp0
Copy link
Author

calvinp0 commented Jul 8, 2024

Thank @o-laurent

I in the end decided to create simple torch.nn.Module version of my model. However, I want to clarify, would the package be able to accommodate a customLightningDataModule, for example:

class DataModule(pl.LightningDataModule):
    def __init__(self, data_dir: str, features_generator: List[str], batch_size: int, num_workers: int, persistent_workers: bool = False):
        super().__init__()
        self.data_dir = data_dir
        self.features_generator = features_generator
        self.batch_size = batch_size
        self.num_workers = num_workers
        self.persistent_workers = persistent_workers

    def prepare_data(self):
        """
        This method is called only once and on only one GPU. It's used to perform any data download or preparation steps.
        """
        print("Preparing data...")

    def setup(self, stage: Optional[str] = None):
        """
        Call in SmilesDataset
        Need to consider if splitting via scaffolding or 5 fold etc.
        
        Multiple GPU
        """
        self.data = SmilesDataset(f'{self.data_dir}/delaney-processed.csv', features_generator=self.features_generator)
        if stage == 'fit' or stage is None:
            self.train_data = self.data.get_split('train')
            self.val_data = self.data.get_split('val')
            self.test_data = self.data.get_split('test')
        if stage == 'test':
            self.test_data = self.data.get_split('test')

    def train_dataloader(self):
        return DataLoader(self.train_data, batch_size=self.batch_size, num_workers=self.num_workers, collate_fn=collate_molgraph_dataset, persistent_workers=self.persistent_workers)

    def val_dataloader(self):
        return DataLoader(self.val_data, batch_size=self.batch_size, num_workers=self.num_workers, collate_fn=collate_molgraph_dataset, persistent_workers=self.persistent_workers)

    def test_dataloader(self):
        return DataLoader(self.test_data, batch_size=self.batch_size, num_workers=self.num_workers, collate_fn=collate_molgraph_dataset, persistent_workers=self.persistent_workers)

I ask, because I have attempted to follow the tutorial here, whilst using my model and dataset but am now receiving when I run the code:

from lightning.pytorch import Trainer
trainer = Trainer(max_epochs=5) #, enable_progress_bar=False)
trainer.fit(model=routine, datamodule=data_module)

and receive this error:

ValueError                                Traceback (most recent call last)
Cell In[27], line 3
      1 from lightning.pytorch import Trainer
      2 trainer = Trainer(max_epochs=5) #, enable_progress_bar=False)
----> 3 trainer.fit(model=routine, datamodule=data_module)

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:543, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    541 self.state.status = TrainerStatus.RUNNING
    542 self.training = True
--> 543 call._call_and_handle_interrupt(
    544     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    545 )

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py:44, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     42     if trainer.strategy.launcher is not None:
     43         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 44     return trainer_fn(*args, **kwargs)
     46 except _TunerExitException:
     47     _call_teardown_hook(trainer)

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:579, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    572 assert self.state.fn is not None
    573 ckpt_path = self._checkpoint_connector._select_ckpt_path(
    574     self.state.fn,
    575     ckpt_path,
    576     model_provided=True,
    577     model_connected=self.lightning_module is not None,
    578 )
--> 579 self._run(model, ckpt_path=ckpt_path)
    581 assert self.state.stopped
    582 self.training = False

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:946, in Trainer._run(self, model, ckpt_path)
    943 self.__setup_profiler()
    945 log.debug(f"{self.__class__.__name__}: preparing data")
--> 946 self._data_connector.prepare_data()
    948 call._call_setup_hook(self)  # allow user to set up LightningModule in accelerator environment
    949 log.debug(f"{self.__class__.__name__}: configuring model")

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:89, in _DataConnector.prepare_data(self)
     87 lightning_module = trainer.lightning_module
     88 # handle datamodule prepare data:
---> 89 if datamodule is not None and is_overridden("prepare_data", datamodule):
     90     prepare_data_per_node = datamodule.prepare_data_per_node
     91     with _InfiniteBarrier():

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/utilities/model_helpers.py:42, in is_overridden(method_name, instance, parent)
     40     if parent is None:
     41         _check_mixed_imports(instance)
---> 42         raise ValueError("Expected a parent")
     44 from lightning_utilities.core.overrides import is_overridden as _is_overridden
     46 return _is_overridden(method_name, instance, parent)

ValueError: Expected a parent

@calvinp0
Copy link
Author

calvinp0 commented Jul 8, 2024

Thank @o-laurent

I in the end decided to create simple torch.nn.Module version of my model. However, I want to clarify, would the package be able to accommodate a customLightningDataModule, for example:

class DataModule(pl.LightningDataModule):
    def __init__(self, data_dir: str, features_generator: List[str], batch_size: int, num_workers: int, persistent_workers: bool = False):
        super().__init__()
        self.data_dir = data_dir
        self.features_generator = features_generator
        self.batch_size = batch_size
        self.num_workers = num_workers
        self.persistent_workers = persistent_workers

    def prepare_data(self):
        """
        This method is called only once and on only one GPU. It's used to perform any data download or preparation steps.
        """
        print("Preparing data...")

    def setup(self, stage: Optional[str] = None):
        """
        Call in SmilesDataset
        Need to consider if splitting via scaffolding or 5 fold etc.
        
        Multiple GPU
        """
        self.data = SmilesDataset(f'{self.data_dir}/delaney-processed.csv', features_generator=self.features_generator)
        if stage == 'fit' or stage is None:
            self.train_data = self.data.get_split('train')
            self.val_data = self.data.get_split('val')
            self.test_data = self.data.get_split('test')
        if stage == 'test':
            self.test_data = self.data.get_split('test')

    def train_dataloader(self):
        return DataLoader(self.train_data, batch_size=self.batch_size, num_workers=self.num_workers, collate_fn=collate_molgraph_dataset, persistent_workers=self.persistent_workers)

    def val_dataloader(self):
        return DataLoader(self.val_data, batch_size=self.batch_size, num_workers=self.num_workers, collate_fn=collate_molgraph_dataset, persistent_workers=self.persistent_workers)

    def test_dataloader(self):
        return DataLoader(self.test_data, batch_size=self.batch_size, num_workers=self.num_workers, collate_fn=collate_molgraph_dataset, persistent_workers=self.persistent_workers)

I ask, because I have attempted to follow the tutorial here, whilst using my model and dataset but am now receiving when I run the code:

from lightning.pytorch import Trainer
trainer = Trainer(max_epochs=5) #, enable_progress_bar=False)
trainer.fit(model=routine, datamodule=data_module)

and receive this error:

ValueError                                Traceback (most recent call last)
Cell In[27], line 3
      1 from lightning.pytorch import Trainer
      2 trainer = Trainer(max_epochs=5) #, enable_progress_bar=False)
----> 3 trainer.fit(model=routine, datamodule=data_module)

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:543, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    541 self.state.status = TrainerStatus.RUNNING
    542 self.training = True
--> 543 call._call_and_handle_interrupt(
    544     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    545 )

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py:44, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     42     if trainer.strategy.launcher is not None:
     43         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 44     return trainer_fn(*args, **kwargs)
     46 except _TunerExitException:
     47     _call_teardown_hook(trainer)

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:579, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    572 assert self.state.fn is not None
    573 ckpt_path = self._checkpoint_connector._select_ckpt_path(
    574     self.state.fn,
    575     ckpt_path,
    576     model_provided=True,
    577     model_connected=self.lightning_module is not None,
    578 )
--> 579 self._run(model, ckpt_path=ckpt_path)
    581 assert self.state.stopped
    582 self.training = False

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:946, in Trainer._run(self, model, ckpt_path)
    943 self.__setup_profiler()
    945 log.debug(f"{self.__class__.__name__}: preparing data")
--> 946 self._data_connector.prepare_data()
    948 call._call_setup_hook(self)  # allow user to set up LightningModule in accelerator environment
    949 log.debug(f"{self.__class__.__name__}: configuring model")

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:89, in _DataConnector.prepare_data(self)
     87 lightning_module = trainer.lightning_module
     88 # handle datamodule prepare data:
---> 89 if datamodule is not None and is_overridden("prepare_data", datamodule):
     90     prepare_data_per_node = datamodule.prepare_data_per_node
     91     with _InfiniteBarrier():

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/utilities/model_helpers.py:42, in is_overridden(method_name, instance, parent)
     40     if parent is None:
     41         _check_mixed_imports(instance)
---> 42         raise ValueError("Expected a parent")
     44 from lightning_utilities.core.overrides import is_overridden as _is_overridden
     46 return _is_overridden(method_name, instance, parent)

ValueError: Expected a parent

Actually, I discovered the issue was calling pytorch lightning differently during import as reported here: Lightning-AI/pytorch-lightning#17485

@calvinp0 calvinp0 closed this as completed Jul 8, 2024
@o-laurent
Copy link
Contributor

Hi @calvinp0,

Thanks for the details! We could add a comment to advise users to use lightning.pytorch if you find it relevant. As a side note, you could use our slightly modified version of the Trainer called TUTrainer in the utils folder to have improved metric printing. We also plan to improve this side of the library in the following months.

Please don't hesitate to let us know if we can help you in any way.

@calvinp0
Copy link
Author

Hi @calvinp0,

Thanks for the details! We could add a comment to advise users to use lightning.pytorch if you find it relevant. As a side note, you could use our slightly modified version of the Trainer called TUTrainer in the utils folder to have improved metric printing. We also plan to improve this side of the library in the following months.

Please don't hesitate to let us know if we can help you in any way.

Hi @o-laurent , yes I think that would be great to add that advisement for future users.

I will try to utilise the TUTTrainer, thanks! On that note, and please tell me if I should open up another thread in discussions, is there a tutorial or information on the Monte Carlo Dropout wrapper: https://github.com/ENSTA-U2IS-AI/torch-uncertainty/blob/main/torch_uncertainty/models/wrappers/mc_dropout.py

@o-laurent
Copy link
Contributor

Hi again @calvinp0,

Thanks, we'll find a place to highlight this when we improve the documentation.

We can create a discussion thread or chat on Discord if you have more specific questions. Otherwise, I've just slightly improved the wrapper, its documentation, and the MC-Dropout tutorial on the dev branch.

NB: Since the modified version of the tutorial is not yet pushed on main, our website's tutorial page remains outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
3 participants