Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError. When train DistributedRL after an hour. #104

Open
JazzTao opened this issue Apr 15, 2019 · 1 comment
Open

MemoryError. When train DistributedRL after an hour. #104

JazzTao opened this issue Apr 15, 2019 · 1 comment

Comments

@JazzTao
Copy link

JazzTao commented Apr 15, 2019

Your issue may already be reported! Please make sure to search all open and closed issues before starting a new one.

Please fill out the sections below so we can understand your issue better and resolve it quickly.

Problem description

When I train DistributedRL:
https://github.com/Microsoft/AutonomousDrivingCookbook/blob/master/DistributedRL/LaunchLocalTrainingJob.ipynb

it works at first. After about one hour,I get the error below.
(PS: Actually I changed “threshold=np.nan” to "threshold=sys.maxsize" which in the "distributed_agent.py" Line 609 to let it work at the first time I run “train.bat”. I don't know if it matters.)

My english is not very good. I don't know if I express it clearly.

Problem details

Start time: 2019-04-15 07:23:33.036246, end time: 2019-04-15 07:23:45.755073
Percent random actions: 0.10204081632653061
Num total actions: 98
Generating 98 minibatches...
Sampling Experiences.
Publishing AirSim Epoch.
Publishing epoch data and getting latest model from parameter server...
Traceback (most recent call last):
File "distributed_agent.py", line 643, in
agent.start()
File "distributed_agent.py", line 80, in start
self.__run_function()
File "distributed_agent.py", line 175, in __run_function
self.__publish_batch_and_update_model(sampled_experiences, frame_count)
File "distributed_agent.py", line 401, in __publish_batch_and_update_model
gradients = self.__model.get_gradient_update_from_batches(batches)
File "E:\File\Train_Airsim\AD_Cookbook_AirSim\python36_DRL\Share\scripts_downpour\app\rl_model.py", line 135, in get_gradient_update_from_batches
post_states = np.array(batches['post_states'])
MemoryError

Experiment/Environment details

  • Tutorial used: DistributedRL
  • Environment used: neighborhood
  • Versions of artifacts used (if applicable): tensorflow 1.13.1 ;Python 3.6.2,;Keras 2.1.2;numpy 1.16.2
    *The state of my harddisk:C:9.48GB available,E : 25.4GB (DistributedRL‘sworkspace) available
    *My computer equipment:GPU-GTX960M-4G ,Memory-8G,CPU i5-6300HQ
@Zhenlin-Xu
Copy link

What is your solution? I came up with the same issue on the newest version. Instead of changing “threshold=np.nan” to "threshold=sys.maxsize", i changed it to "threshold=np.inf" in order to run the script without error coming.

Thank U!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants