Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sweeps #22

Merged
merged 20 commits into from
Jul 5, 2022
Merged

Sweeps #22

merged 20 commits into from
Jul 5, 2022

Conversation

cosmo3769
Copy link
Owner

@cosmo3769 cosmo3769 commented Jun 4, 2022

Sweeps Code here. This is to resolve issue 23, issue 24, issue 20, issue 26.

@cosmo3769
Copy link
Owner Author

cosmo3769 commented Jun 4, 2022

@ayulockin I managed to produce some results. Please review the code. Things I am unsure of:

  • I ran the main function which also runs the train function. In the train function, the configs for epochs is only 1 to quickly see the results. But while running sweeps, should the epoch be more than 1? Does this epoch for train config also depend on the sweep_config epoch values?
  • Does the sweeps result show at the end of the wandb run? I got one sweep result at the end of the run. Here is the link showing the result: sweep result
  • Why is every sweep ran is showing in pending state: all sweeps
  • I don't have the permission to delete the sweeps. The sweeps are increasing in number. Could you check from your side if you can do so?
  • The whole process is running completely with !python sweep_train.py --configs configs/config.py with no errors in quick sight. But when I look closely the output of the result, I see a peculiar error. I am attaching a screenshot of the error down below. By looking at this error, I am in doubt that am i getting the correct result for sweep or not. Have a look at the error(can't find 'main' module in ''):

1

@cosmo3769 cosmo3769 requested a review from ayulockin June 4, 2022 20:30
@cosmo3769
Copy link
Owner Author

cosmo3769 commented Jun 4, 2022

  • The whole process is running completely with !python sweep_train.py --configs configs/config.py with no errors in quick sight. But when I look closely the output of the result, I see a peculiar error. I am attaching a screenshot of the error down below. By looking at this error, I am in doubt that am i getting the correct result for sweep or not. Have a look at the error(can't find 'main' module in ''):

1

I think there is an error.(ERROR - Detected 5 failed runs in a row, shutting down)

2

@cosmo3769
Copy link
Owner Author

cosmo3769 commented Jun 4, 2022

I coded keeping the main function. I think I have to go for other alternative. But I also think there is some slight error using the main function and it can be fixed. Not very sure about this.

@ayulockin
Copy link
Collaborator

First thing first, we will have to remove the code from #21 from this PR.

@ayulockin
Copy link
Collaborator

I don't have the permission to delete the sweeps. The sweeps are increasing in number. Could you check from your side if you can do so?

You have admin access now, you can delete the runs/sweeps/artifacts.

@cosmo3769
Copy link
Owner Author

Manual configuring sweep error

sweep

@cosmo3769
Copy link
Owner Author

Will this example work while using FLAGS?

@ayulockin
Copy link
Collaborator

I don't know actually.

@cosmo3769
Copy link
Owner Author

cosmo3769 commented Jun 26, 2022

wandb.config resolved for sweeps to work as given in the documentaion and this example

with wandb.init(config=CONFIG.value.to_dict(), entity="wandb_fc", project="ssl-study"):
      config = wandb.config

@cosmo3769
Copy link
Owner Author

Now, the issue to resolve is to fix the sweeps config file so it can take the parameters value from the .yaml file or the .py file.

@cosmo3769
Copy link
Owner Author

cosmo3769 commented Jun 26, 2022

Manual configuring sweep error

Now, the issue to resolve is to fix the sweeps config file so it can take the parameters value from the .yaml file or the .py file.

The recent commit resolves these issues. wandb.agent properly working with no errors. I have to remove FLAGS from the sweep_train.py to make it work.

With this, I still have some questions:

  • Is the sweeps showing in the w&b portal correct?
  • wandb.agent is taking the parameters value from sweep_config.yaml file. The main config file has epochs of 3. wandb.agent is taking epoch value of 5. So, should it have to run for 3 epochs that's given in the main config file config.py or 5 epochs that's given in the sweep_config.py file?

@ayulockin
Copy link
Collaborator

Is the sweeps showing in the w&b portal correct?

can you share the sweep dashboard?

The main config file has epochs of 3. wandb.agent is taking epoch value of 5

The agent will pick from epoch values assigned in the sweep_config.yaml. In your sweep yaml file you have [5, 10, etc].

@cosmo3769
Copy link
Owner Author

cosmo3769 commented Jun 26, 2022

can you share the sweep dashboard?

@ayulockin Here is the sweep dashboard.

@ayulockin
Copy link
Collaborator

This looks perfect. @cosmo3769

Sabash.

@ayulockin
Copy link
Collaborator

If you think code refactoring is required, do it.

@cosmo3769
Copy link
Owner Author

If you think code refactoring is required, do it.

Done.

@cosmo3769
Copy link
Owner Author

@ayulockin Should we merge this branch into master branch now or after fixing the wandb.Table logging everytime issue?

int(tmp_df.label),
int(np.argmax(evaluation[i], axis = 0))
)

if wandb.run is not None:
wandb.log({
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do something like this to fix wandb.Table logging everytime issue?

if wandb.run is not None:
      if self.args.train_config.use_validation_table_log:
            wandb.log({
                'val_eval_loss': val_eval_loss,
                'val_top@1': val_top_1_acc,
                'val_top@5': val_top_5_acc,
                'val_table': validation_table
            })
       else: 
            wandb.log({
                'val_eval_loss': val_eval_loss,
                'val_top@1': val_top_1_acc,
                'val_top@5': val_top_5_acc,
            })

Copy link
Owner Author

@cosmo3769 cosmo3769 Jun 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do something like this to fix wandb.Table logging everytime issue?

Yes, it works.

  • wandb.Table logging everytime issue fixed.

@ayulockin Is there any other way to do it?

@cosmo3769
Copy link
Owner Author

cosmo3769 commented Jun 27, 2022

Work done:

  • Sweeps
  • wandb.Table
  • Code Refactored
  • Some augmentations used (We have to create more robust pipeline)
  • Class Weights
  • LR Scheduling (Not working correctly)
  • README updated

@ayulockin
Copy link
Collaborator

So in favor of this PR we should close #21 PR? Given there's overlap of the code and everything in #21 is also present here.

Also LGTM. I will give your code a try and merge it.

@cosmo3769
Copy link
Owner Author

So in favor of this PR we should close #21 PR?

I think when we will merge this #22 PR, the #21 PR will get closed too.

@ayulockin
Copy link
Collaborator

I think the way you are doing sweep is correct. It's not working with train.py as stated. LGTM.

I am merging the PR and we will fix the edge cased if we encounter (that we didn't so far) one PR at a time. :D

@ayulockin ayulockin merged commit bf40a75 into main Jul 5, 2022
@cosmo3769 cosmo3769 deleted the sweeps branch July 5, 2022 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make Sweeps work
2 participants