Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

116 - Update CUDA in Dockerfile and bump requirement versions #117

Merged
merged 9 commits into from
Jan 16, 2024

Conversation

MaxJa4
Copy link
Collaborator

@MaxJa4 MaxJa4 commented Nov 30, 2023

Description

Update CUDA in Dockerfile and bump requirement versions. See issue below for reasons why.

Fixes #116

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Does this PR introduce a breaking change?

No

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes (might be obsolete with CI later on)

@MaxJa4 MaxJa4 added the infrastructure Docker, Project setup, ... label Nov 30, 2023
@MaxJa4 MaxJa4 added this to the Average Driving Score of 0.10 milestone Nov 30, 2023
@MaxJa4 MaxJa4 requested a review from samuelkuehnel November 30, 2023 10:04
@MaxJa4 MaxJa4 self-assigned this Nov 30, 2023
@MaxJa4
Copy link
Collaborator Author

MaxJa4 commented Nov 30, 2023

@okrusch You can use this branch to check whether #102 is fixed with this. Be aware that this now installs Torch 2.1.1 instead of 1.13.1

This comment was marked as off-topic.

@MaxJa4 MaxJa4 mentioned this pull request Dec 2, 2023
7 tasks
@samuelkuehnel
Copy link
Collaborator

image
I had an error when cloning the repo and running the command b5 install.

@MaxJa4
Copy link
Collaborator Author

MaxJa4 commented Dec 8, 2023

I had an error when cloning the repo and running the command b5 install.

Would be great to see the actual source of the error which seems to be cut off at the top :)

@samuelkuehnel
Copy link
Collaborator

Sorry for the bad screenshot, I will add a better one tomorrow.

@samuelkuehnel
Copy link
Collaborator

This one contains the whole error message:
image

@MaxJa4
Copy link
Collaborator Author

MaxJa4 commented Dec 12, 2023

@samuelkuehnel Thanks. I changed the installation method (which is also faster and simpler) for CUDA. Maybe that helps. Also removed libgit as it was a workaround for DVC and it seems to be fine without it now.

@samuelkuehnel
Copy link
Collaborator

The error seems to be on the PC as the git action that builds the image completes without an error. I still get an error when running b5 install.
image

@JuliusMiller also tried to run b5 install after cloning your branch and he sees the same error.

This comment was marked as off-topic.

@MaxJa4
Copy link
Collaborator Author

MaxJa4 commented Dec 12, 2023

The error seems to be on the PC as the git action that builds the image completes without an error. I still get an error when running b5 install.

image

@JuliusMiller also tried to run b5 install after cloning your branch and he sees the same error.

Thanks for testing it. Hmm maybe the lab pcs need an apt upgrade as they may use an old kernel.
I'll try to find a workaround.
It also works for me on my machine (Ubuntu).

@MaxJa4
Copy link
Collaborator Author

MaxJa4 commented Dec 12, 2023

Found a possible workaround for the error. Can you or @JuliusMiller try again? Thanks!

This comment was marked as off-topic.

This comment was marked as off-topic.

@samuelkuehnel
Copy link
Collaborator

Error sadly still exists 😢
image

This comment was marked as off-topic.

@MaxJa4
Copy link
Collaborator Author

MaxJa4 commented Dec 12, 2023

Did some more research, as there is no helpful error message anymore.
Turns out, that the lab pc graphics cards are too "old" for CUDA 12.X, so I downgraded to 11.8 (which still works with PyTorch 2).
Hope it works now. I won't be able to work on PAF tomorrow, so no hurry from my side :) Thanks!

This comment was marked as off-topic.

@samuelkuehnel
Copy link
Collaborator

I tried the b5 installcommand again and it still fails 😢
image
I am in the laboratory today until 4:30 pm if you want me to test other solutions.

Copy link
Collaborator

@samuelkuehnel samuelkuehnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update works. Image is built and container starts as expected

@samuelkuehnel samuelkuehnel merged commit cef2a50 into main Jan 16, 2024
2 of 3 checks passed
@MaxJa4 MaxJa4 deleted the 116-update-cuda-and-requirements branch January 16, 2024 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Docker, Project setup, ...
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[Feature]: Update CUDA and Requirements to newest supported version
2 participants