-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baseline Comparison? #26
Comments
In README, it says the "App" is this: https://itunes.apple.com/ca/app/id574915961. I'm not familiar with it and don't have an iOS device, but I'm guessing it's not that strong. |
Hi @vincentlooi
I use iOS app of https://itunes.apple.com/ca/app/id574915961 as the benchmark.
Yes.
I didn't know grhino. |
Hi @evalon32
The app has levels of 1~99.
I would like you to tell me, what is RAZ? (I couldn't search it in google ...)
I also think that is a good feature. |
Oh sorry, RAZ = reversi-alpha-zero :) |
Oh, I see! (^^ |
FYI:
|
I just had the newest model play a match of 10 games vs grhino L2 (took forever, since I don't have a GPU). |
That's good!
FYI: |
I managed to make some progress in training the model. I played the model against grhino lv2 5 times: 4 wins, 1 loss. Still lost vs grhino lv3 though. I also played the model against the newest/best model in your download script, and had a win rate of ~85% over roughly 25 games. I managed to train this model over the course of a week from scratch (on 1080 GPU), by constantly removing old data (data older than 1-2 days) manually from the data/play_data folder each time while the model keeps self-playing. The current training method in your script trains on all data in the folder regardless of when the data was created, which means training per epoch iteration will always become longer as self-play generates more and more data. I'm not sure if this is necessary, since old data reflects older policy and not necessarily the newest policy, and hence could be redundant at the cost of more training steps and potentially overfitting. Perhaps it might be a good idea to weight the data based on how recently it was played i.e. how much the data reflects the latest policy, or consider turning the data into a fixed size buffer (perhaps 250k-300k samples) that discards old data as new ones are generated EDIT: |
@vincentlooi Thank you for sharing exciting information!
That's great!!
Nice try! I will change the parameter in my training. |
what is the best reversi game? |
I use GRhino by docker on mac. |
@mokemokechicken @vincentlooi @evalon32 When playing with GRhino, besides the "level" setting, what is your "open book varation" setting? I am playing Ubuntu GRhino with my model, and want to do a (indirect) comparsion with yours. Thanks. |
My model (black) beats GRhino lv5 with open book variation "Low" and randomness 0 now. |
@gooooloo My open book variation is "Low". |
@mokemokechicken gotcha. Thanks. |
I see "Online Reversi" on the Microsoft Store is very excellent. |
Hi everyone, I found http://www.orbanova.com/nboard/ is very strong. Also it supports many levels to play with. Would be a good baseline to compare with. |
@gooooloo it's great! Thank you very much! |
I implemented NBoard Protocol. |
@mokemokechicken Just a report, my model beats Lv99 using 800 simulations per move setting. See https://play.lobi.co/video/17f52b6e921be174057239d39d239b6061d3c1c9. The AlphaGoZero method works. I am also using 800 simulations per move when self play. I keep the evaluator alive, with best model replacing condition: ELO rating >= 150 among 400 games( with ELO rating we are counting draw games in) . I am using 2 historical boards as Neural Network input, which means a shape of 588. Besides, when playing with the App, I found using 40 or 100 simulations per move setting is already quite strong. The 100 sims setting beats Lv98 easily. But Lv99 is more difficult than Lv98, I tested 40/100/400 sims and all of them loses, until I changed to 800 sims. |
Great! Congratulation!! I am surprised to hear from this report!
After all, in order to be strong, it may be necessary to use large "simulations per move" in self-play, isn't it?
It is very interesting. |
@mokemokechicken it is halfly because of your great implementation. So thank you :)
I also think so. At first, I was using 100 sims per move. I wanted a fast self play speed. After about 100k steps( batch_size = 3072 ),it seemed got stuck and not improving. Then changed to 800 sims. Then at about 200k steps, it has become quite strong. My final model beating lv99 is at 300k+ steps. I think what is also worthy mentioning is that, although I changed to 800 sims, I didn't make the overall selfplay too much slower. I did this by separating MCTS and Neural Network to different processes. They communicate via named pipes. Then I can run several MCTS processes and only 1 Neural Network process at the same time. This idea is borrowed from this repo (Thanks @Akababa ). By doing this, I make full use of GPU and CPU. Although a simple game get slower due to 800 sims, but multi-games parallelization saves back a lot. ---- I am mentioning this because I think in AlphaGoZero method, self play speed does matter.
Because I happened to see this reddit post from David Silver @ DeepMind. This is the quote:
I use this implementation from the beginning and didn't test the 3 * 8 * 8 shape, so I don't have the experience to say. But I believe it is possible to bring an "attention" chance ( by subtracting the previous board ). Maybe it helps. At last, I am using 6GPU: 5 Tesla P40(1 for optimaztion, 4 for self play) + 1 Tesla M40(for evaluator). Maybe it is mostly because of the computation force... |
Thank you for your reply.
Great. I think it is the best implementation.
I see.
That's very powerful!! :) |
@gooooloo Um... Is really useful history? |
@apollo-time do you mean the first step of game? As the AlphaGoZero paper mentions, all-zero board are used if there is not enough history boards.
"t < 0" is the case here. |
@gooooloo No, I mean that some game can play from some board state that is not initial state, for example Chess Puzzles. |
@gooooloo can u beat windows online reversi game level 5? |
I see. I don't consider that case.
I don't have a windows system( I will try to find one ). But I can't beat NBoard's Novello 20 level. ( I can beat 10 level though with 1600 sims per move). Nor the NTest 30 level. |
@gooooloo thanks, My question is same with Cassandra120's |
I just played with it, using the same model and simulations_per_move(which is 800) with the Lv99 game, and I win online reversi game level 5 ( 2:0 ), lose to level 6 ( 1:3 ). |
@gooooloo My model(simulations_per_move=800) beats online reversi game level 4 now, and my model don't use history. |
@apollo-time I had another new generation model the day before yesterday, but not getting any better model these two days. Let's wait for some more days and see. |
That's also the case in AlphaZero. The performance more or less stagnated after that point. But they achieved an already strong performance (with difference board game) at 100k iters not only due to using 800 sims/move but also due to their large architecture and large buffer. Also, they did one iteration of update for each 30 or so games (3M games after 100k iters), which may not be the case in the implementation of @mokemokechicken, Zeta36 and Akababa. How was your case? Did you use "normal" setting instead of "mini" of config? |
my config (the network architecture are same as @mokemokechicken 's original implementation) :
mine:
I also change sampling method. I do this because I found in my case(much more play data), @mokemokechicken 's original implementation takes too long waiting for all loaded data got trained at least once before new play data got loaded and before new candidate model got generated.
So basically, I am using "normal" config, but changes a lot of things.
|
@gooooloo Thanks so much for detailed information. Looks like you don't have self.search_threads for multi-threading. Did you find multi-processing only to be sufficient? It's impressive that your sampling method enabled you to finish 200k iters with your large architecture. Looks like Akababa's multiprocessing is very powerful. But I've failed to see how many self-play games you've finished up til 100~200k iters. Have you tracked the number of games? |
@gooooloo @apollo-time @evalon32 @vincentlooi @AranKomat I created Performance Reports for sharing our achievements, and linked from the top of readme. |
No I have not. I wish I had. |
Hi everyone, my codes getting the model are here: https://github.com/gooooloo/reversi-alpha-zero, if you are interested. |
Is there a baseline for comparing the learned model e.g. a benchmark software to evaluate against? It would be useful for us to know how effective the learning algorithm actually is.
For example, what do you mean by "Won the App LV x?" Does it mean that if the model beat the app even once, it counts as a win even if it loses the other times?
I downloaded your "best model" and "newest model", and played both networks against grhino AI (level 2). Sadly, both networks got destroyed by grhino on multiple tries. If you have a benchmark of levels to beat before grhino, that would be really helpful
The text was updated successfully, but these errors were encountered: