Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Please provide a run instructions #2

Open
Nostrademous opened this issue Jan 6, 2019 · 7 comments
Open

[Request] Please provide a run instructions #2

Nostrademous opened this issue Jan 6, 2019 · 7 comments

Comments

@Nostrademous
Copy link
Collaborator

In order to collaborate and work together, since I'm not very much familiar with K8s provisioning, configuration, and the necessary steps for deploying my workload against them, would you be kind enough to provide a README or at least links to instructions that you found useful to setup my own cluster, credentials, account, etc?

I would like to be able to throw as much money as I want at the problem and replicate your current setup and work (I have corporate time and money to throw at the problem coming up soon). Once I'm up and running there is a lot more value I should be able to bring.

@Nostrademous
Copy link
Collaborator Author

Nostrademous commented Jan 8, 2019

Things I had to do so far... (although I'm still not up and running locally with latest restructure).

  1. Install rabbitmq
    brew install rabbitmq

  2. Enable recent_history_exchange
    /usr/local/opt/rabbitmq/sbin/rabbitmq-plugins enable rabbitmq_recent_history_exchange

  3. Install some more python modules

pprint
aioamqp
pika
  1. Comment out some GCP and CloudStorage code (blob-related) since I don't have a properly configured account yet.
AGMP:dotaclient andrzej.gorski$ git diff
diff --git a/optimizer.py b/optimizer.py
index 08c21c7..ba3fa7b 100644
--- a/optimizer.py
+++ b/optimizer.py
@@ -67,8 +67,8 @@ class DotaOptimizer:
             # TODO(tzaman): Set logdir ourselves?
             self.writer = SummaryWriter()
             logger.info('Checkpointing to: {}'.format(self.log_dir))
-            client = storage.Client()
-            self.bucket = client.get_bucket(self.BUCKET_NAME)
+            #client = storage.Client()
+            #self.bucket = client.get_bucket(self.BUCKET_NAME)
 
             if pretrained_model is not None:
                 logger.info('Downloading: {}'.format(pretrained_model))
@@ -257,8 +257,8 @@ class DotaOptimizer:
                 self.writer.add_scalar(name, metric, self.episode)
 
             # Upload events to GCS
-            blob = self.bucket.blob(self.events_filename)
-            blob.upload_from_filename(filename=self.events_filename)
+            #blob = self.bucket.blob(self.events_filename)
+            #blob.upload_from_filename(filename=self.events_filename)
 
             self.upload_model()
 
@@ -305,8 +305,8 @@ class DotaOptimizer:
             logger.exception('Failed pushing latest weights to RMQ')
 
         # Upload to GCP.
-        blob = self.bucket.blob(rel_path)
-        blob.upload_from_string(data=state_dict_b)  # Model
+        #blob = self.bucket.blob(rel_path)
+        #blob.upload_from_string(data=state_dict_b)  # Model
  1. Currently you have to run python3.7 optimizer.py and python3.7 agent.py and python3.7 -m dotaservice

Currently things run but it doesn't seem to be doing anything (all rewards are 0) ... not sure what I'm missing yet.

2019-01-08 09:19:17,422 INFO     === Starting Episode 0.
2019-01-08 09:19:17,423 INFO     Starting game.
2019-01-08 09:19:17,429 INFO     Received new model: version=0, size=1207690b
2019-01-08 09:19:17,432 INFO     Updated weights to version 0
2019-01-08 09:19:47,400 INFO     Player 0 rollout.
2019-01-08 09:19:47,401 INFO     Player 0 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:19:47,528 INFO     Player 5 rollout.
2019-01-08 09:19:47,529 INFO     Player 5 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:00,962 INFO     Player 0 rollout.
2019-01-08 09:20:00,963 INFO     Player 0 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:01,039 INFO     Player 5 rollout.
2019-01-08 09:20:01,040 INFO     Player 5 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:14,576 INFO     Player 0 rollout.
2019-01-08 09:20:14,577 INFO     Player 0 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:14,651 INFO     Player 5 rollout.
2019-01-08 09:20:14,652 INFO     Player 5 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:29,547 INFO     Player 0 rollout.
2019-01-08 09:20:29,548 INFO     Player 0 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:29,619 INFO     Player 5 rollout.
2019-01-08 09:20:29,620 INFO     Player 5 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:42,469 INFO     Received new model: version=1, size=1207690b
2019-01-08 09:20:42,473 INFO     Updated weights to version 1
2019-01-08 09:20:44,718 INFO     Player 0 rollout.
2019-01-08 09:20:44,719 INFO     Player 0 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:44,802 INFO     Player 5 rollout.
2019-01-08 09:20:44,803 INFO     Player 5 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:50,055 INFO     Player 0 rollout.
2019-01-08 09:20:50,056 INFO     Player 0 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:50,088 INFO     Player 5 rollout.
2019-01-08 09:20:50,089 INFO     Player 5 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:50,122 INFO     Game finished.

@Nostrademous
Copy link
Collaborator Author

Well, it looks like it is working... just we don't have any "driver" to make it go to Location(0,0,0) so initially it is wondering around aimlessly.

I do see it moving around in GUI mode.

@Nostrademous
Copy link
Collaborator Author

@TimZaman - thanks for continued updates of README.md - definitely helps although for someone that hasn't created accounts or used Google Cloud Platform many things are still very unclear.

So far I have installed:

  1. ksonnet : brew install ksonnet/tap/ks
  2. kubernetes : brew install kubernetes-cli

Regarding GKE - Seems like I need to enable a bunch of GCP APIs before I can do anything. Browsing around the internet I finally got to:
https://www.kubeflow.org/docs/started/getting-started-gke/

That lists the following APIs as needed:

  • Compute Engine
  • GKE
  • Identity and Access Management (IAM)
  • Deployment Manager

I can work on getting those enabled next although what is the expected "realistic" pricing I am looking at here on a monthly basis?

@TimZaman
Copy link
Owner

TimZaman commented Jan 17, 2019

On GCP you get $300 for free, which is equal to around 40 cores for a month. (20 agents). But you don't really need it. You can just install k8s locally (minikube). I do currently use GCP (Google Cloud Storage, part of GCP) to save and resume the model/tensorboard. I find that pretty handy. GCS itself (google cloud storage) is super cheap (so you can use the $300 towards that goal). Alternatively, we make GCS optional, feel free to add support for that.
I also have K8s running on a raspberry pi cluster of 4 machines. However, those are ARM chips so they cannot run dota. You can also setup your own k8s cluster with a few old machines. Dota needs around 1 core per agent. I was running on my mac pro (6core) 8 agents while using a total of 80% of all CPU.
Alternatively, you have all agents run in multiple docker containers with port forwarding to your local machine, where you have rmq and optimizer running.

@TimZaman
Copy link
Owner

Oh btw, nice that you got it working! Yeah what you posted is exact how it should work! Nice! And it will take a few hours for it to go towards mid, but then it will accelerate bc of the XP, then the last hits, etc.

@TimZaman
Copy link
Owner

Here is a model with the latest dota trained last night on top-of-tree (0.3.4) [your code!]
exp1_job2_model_000001576.pt.zip

@billypoggi
Copy link

brew install gcc (prereq for rabbitmq)
xcode-select --install (because running on Mac requires Xcode)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants