Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Power required to run GeoTrackNet #20

Open
mikoloay opened this issue May 14, 2022 · 2 comments
Open

GPU Power required to run GeoTrackNet #20

mikoloay opened this issue May 14, 2022 · 2 comments

Comments

@mikoloay
Copy link

Hi! I'm working on anomaly detection on AIS data and I'm trying to replicate the results from the GeoTrackNet article.
While training the VRNN on my local machine I get stuck because of an Out of Memory error. My GPU is RTX3060 for laptops with 6144MiB of memory.

  1. I was wondering on what machine did you run the model? What was the GPU?
  2. What part of the code should I change in order to lower the batch size?
@samsy44
Copy link

samsy44 commented Jul 25, 2022

I added the following code just before "tf.train.MonitoredTrainingSession" in line 319 in runners.py.

gpu_config = tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction = 0.5), allow_soft_placement=True)
gpu_config.gpu_options.allow_growth=True
gpu_config.gpu_options.polling_inactive_delay_msecs = 10

Then simply add it as a config parameter as shown below.

with tf.train.MonitoredTrainingSession(master=config.master,
                                                    is_chief=config.task == 0,
                                                    config=gpu_config,                 #NEW PARAMETER ADDED
                                                    hooks=[log_hook],
                                                    checkpoint_dir=config.logdir,
                                                    save_checkpoint_secs=120,
                                                    save_summaries_steps=config.summarize_every,
                                                    log_step_count_steps=config.summarize_every) as sess:

This will make sure you use a limited amount of your memory while training. Run your training after this while keeping an eye on your task manager (GPU usage) to see weather the process still peaks or not.

@samsy44
Copy link

samsy44 commented Jul 25, 2022

Let me know if it works for you after these fixes. Because I am experiencing similar issues and I am using RTX3090 with Cuda 11.7.
Training still crashes even after fixing my "Out of memory" issues. :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants