Baseline Comparison? #26

mrlooi · 2017-12-25T14:46:30Z

Is there a baseline for comparing the learned model e.g. a benchmark software to evaluate against? It would be useful for us to know how effective the learning algorithm actually is.

For example, what do you mean by "Won the App LV x?" Does it mean that if the model beat the app even once, it counts as a win even if it loses the other times?

I downloaded your "best model" and "newest model", and played both networks against grhino AI (level 2). Sadly, both networks got destroyed by grhino on multiple tries. If you have a benchmark of levels to beat before grhino, that would be really helpful

evalon32 · 2017-12-25T15:34:48Z

In README, it says the "App" is this: https://itunes.apple.com/ca/app/id574915961. I'm not familiar with it and don't have an iOS device, but I'm guessing it's not that strong.
For what it's worth, I've also been testing the networks against grhino, with similar results. I've had RAZ beat ghrino L2 once, but only because it got lucky. That said, I think it's a good sign that RAZ can now tell that its position gradually deteriorates (the evaluation goes relatively smoothly from 0 to -1 over the course of the game). Earlier, it often had no idea. It also used to lose consistently to grhino L1; now it usually wins (sadly, it's usually because grhino L1 blunders in a won position).

mokemokechicken · 2017-12-26T05:21:04Z

Hi @vincentlooi

Is there a baseline for comparing the learned model e.g. a benchmark software to evaluate against?

I use iOS app of https://itunes.apple.com/ca/app/id574915961 as the benchmark.
The app has 1 ~ 99 levels.

For example, what do you mean by "Won the App LV x?" Does it mean that if the model beat the app even once, it counts as a win even if it loses the other times?

Yes.
"Won the App LV x?" means the model won the level at least once (regardless of the number of losses).

I downloaded your "best model" and "newest model", and played both networks against grhino AI (level 2). Sadly, both networks got destroyed by grhino on multiple tries. If you have a benchmark of levels to beat before grhino, that would be really helpful

I didn't know grhino.
And I confirmed that the newest model loses grhino Lv2...

mokemokechicken · 2017-12-26T05:35:26Z

Hi @evalon32

In README, it says the "App" is this: https://itunes.apple.com/ca/app/id574915961. I'm not familiar with it and don't have an iOS device, but I'm guessing it's not that strong.

The app has levels of 1~99.
Maybe the lv29 is not so strong.

For what it's worth, I've also been testing the networks against grhino, with similar results. I've had RAZ beat ghrino L2 once, but only because it got lucky.

I would like you to tell me, what is RAZ? (I couldn't search it in google ...)

That said, I think it's a good sign that RAZ can now tell that its position gradually deteriorates (the evaluation goes relatively smoothly from 0 to -1 over the course of the game)

I also think that is a good feature.
In my newest model, the evaluation often plummets.

evalon32 · 2017-12-26T05:38:11Z

I would like you to tell me, what is RAZ? (I couldn't search it in google ...)

Oh sorry, RAZ = reversi-alpha-zero :)

mokemokechicken · 2017-12-26T06:03:54Z

RAZ = reversi-alpha-zero :)

Oh, I see! (^^

mokemokechicken · 2017-12-26T06:44:17Z

FYI:

the App LV29 vs grhino Lv2: LV29 won 2 times and lost 0 times.
the App LV29 vs grhino Lv3: LV29 won 0 times and lost 1 time.

evalon32 · 2017-12-29T20:06:27Z

I just had the newest model play a match of 10 games vs grhino L2 (took forever, since I don't have a GPU).
It won 2 out of 5 as black and 2 out of 5 as white. Getting exciting!

mokemokechicken · 2017-12-29T23:20:58Z

That's good!

took forever, since I don't have a GPU

FYI:
I am also evaluating on Mac(not have a GPU),
optimized TensorFlow(1.4) is about 3~5 times faster than normal pip CPU version.
https://www.tensorflow.org/install/install_sources

mrlooi · 2018-01-02T14:30:12Z

I managed to make some progress in training the model. I played the model against grhino lv2 5 times: 4 wins, 1 loss. Still lost vs grhino lv3 though. I also played the model against the newest/best model in your download script, and had a win rate of ~85% over roughly 25 games.

I managed to train this model over the course of a week from scratch (on 1080 GPU), by constantly removing old data (data older than 1-2 days) manually from the data/play_data folder each time while the model keeps self-playing.

The current training method in your script trains on all data in the folder regardless of when the data was created, which means training per epoch iteration will always become longer as self-play generates more and more data. I'm not sure if this is necessary, since old data reflects older policy and not necessarily the newest policy, and hence could be redundant at the cost of more training steps and potentially overfitting. Perhaps it might be a good idea to weight the data based on how recently it was played i.e. how much the data reflects the latest policy, or consider turning the data into a fixed size buffer (perhaps 250k-300k samples) that discards old data as new ones are generated

EDIT:
Just beat grhino lv3! The model now beats grhino lv2 almost every time, getting exciting

mokemokechicken · 2018-01-04T01:26:28Z

@vincentlooi

Thank you for sharing exciting information!

EDIT: Just beat grhino lv3! The model now beats grhino lv2 almost every time, getting exciting

That's great!!

I managed to train this model over the course of a week from scratch (on 1080 GPU), by constantly removing old data (data older than 1-2 days) manually from the data/play_data folder each time while the model keeps self-playing.

Nice try!
I also think it is one of the important hyperparameter.
The max sample number of training data can be changed by PlayDataConfig#{nb_game_in_file,max_file_num} (used here ).

I will change the parameter in my training.
In my environment, the number of training data files generated by self-play is about 100/day (500 games/day).
So, it seems better to set max_file_num around 300 (currently 2000).

apollo-time · 2018-01-08T04:03:11Z

what is the best reversi game?
I have not iPhone but have Mac.
My model beats all of Android Reversi and Windows App Reversi.

mokemokechicken · 2018-01-08T08:37:48Z

@apollo-time

I use GRhino by docker on mac.
FYI: https://github.com/mokemokechicken/grhino-docker

gooooloo · 2018-01-08T12:21:21Z

@mokemokechicken @vincentlooi @evalon32 When playing with GRhino, besides the "level" setting, what is your "open book varation" setting? I am playing Ubuntu GRhino with my model, and want to do a (indirect) comparsion with yours. Thanks.

apollo-time · 2018-01-09T02:18:41Z

My model (black) beats GRhino lv5 with open book variation "Low" and randomness 0 now.
I make web player html, but I haven't any server to run tensorflow model.

mokemokechicken · 2018-01-09T05:14:02Z

@gooooloo My open book variation is "Low".

gooooloo · 2018-01-09T07:25:02Z

@mokemokechicken gotcha. Thanks.

apollo-time · 2018-01-10T02:15:12Z

I see "Online Reversi" on the Microsoft Store is very excellent.
My model beats level 2 hardly. (2018/01/10)
My model beats level 3 hardly now. (2018/01/11)

gooooloo · 2018-01-12T10:18:24Z

Hi everyone, I found http://www.orbanova.com/nboard/ is very strong. Also it supports many levels to play with. Would be a good baseline to compare with.

mokemokechicken · 2018-01-12T11:34:58Z

@gooooloo it's great! Thank you very much!

mokemokechicken · 2018-01-13T06:44:44Z

I implemented NBoard Protocol.

gooooloo · 2018-01-22T03:49:41Z

@mokemokechicken Just a report, my model beats Lv99 using 800 simulations per move setting. See https://play.lobi.co/video/17f52b6e921be174057239d39d239b6061d3c1c9. The AlphaGoZero method works. I am also using 800 simulations per move when self play. I keep the evaluator alive, with best model replacing condition: ELO rating >= 150 among 400 games( with ELO rating we are counting draw games in) . I am using 2 historical boards as Neural Network input, which means a shape of 588.

Besides, when playing with the App, I found using 40 or 100 simulations per move setting is already quite strong. The 100 sims setting beats Lv98 easily. But Lv99 is more difficult than Lv98, I tested 40/100/400 sims and all of them loses, until I changed to 800 sims.

mokemokechicken · 2018-01-22T05:03:21Z

@gooooloo

Great! Congratulation!!

I am surprised to hear from this report!

800 simulations per move

After all, in order to be strong, it may be necessary to use large "simulations per move" in self-play, isn't it?
I am feeling that "simulations per move" decides the model's upper strength.

2 historical boards as Neural Network input, which means a shape of 5 * 8 * 8

It is very interesting.
Why do you use history?
Do you think it brought good effects?

gooooloo · 2018-01-22T07:10:38Z

@mokemokechicken it is halfly because of your great implementation. So thank you :)

After all, in order to be strong, it may be necessary to use large "simulations per move" in self-play, isn't it?

I also think so. At first, I was using 100 sims per move. I wanted a fast self play speed. After about 100k steps( batch_size = 3072 )，it seemed got stuck and not improving. Then changed to 800 sims. Then at about 200k steps, it has become quite strong. My final model beating lv99 is at 300k+ steps.

I think what is also worthy mentioning is that, although I changed to 800 sims, I didn't make the overall selfplay too much slower. I did this by separating MCTS and Neural Network to different processes. They communicate via named pipes. Then I can run several MCTS processes and only 1 Neural Network process at the same time. This idea is borrowed from this repo (Thanks @Akababa ). By doing this, I make full use of GPU and CPU. Although a simple game get slower due to 800 sims, but multi-games parallelization saves back a lot. ---- I am mentioning this because I think in AlphaGoZero method, self play speed does matter.

2 historical boards as Neural Network input, which means a shape of 5 * 8 * 8

Why do you use history?

Because I happened to see this reddit post from David Silver @ DeepMind. This is the quote:

it is useful to have some history to have an idea of where the opponent played recently - these can act as a kind of attention mechanism (i.e. focus on where my opponent thinks is important)

I use this implementation from the beginning and didn't test the 3 * 8 * 8 shape, so I don't have the experience to say. But I believe it is possible to bring an "attention" chance ( by subtracting the previous board ). Maybe it helps.

At last, I am using 6GPU: 5 Tesla P40(1 for optimaztion, 4 for self play) + 1 Tesla M40(for evaluator). Maybe it is mostly because of the computation force...

mokemokechicken · 2018-01-22T09:29:15Z

@gooooloo

Thank you for your reply.

I think what is also worthy mentioning is that, although I changed to 800 sims, I didn't make the overall selfplay too much slower. I did this by separating MCTS and Neural Network to different processes. They communicate via named pipes.

Great. I think it is the best implementation.

I believe it is possible to bring an "attention" chance ( by subtracting the previous board ). Maybe it helps.

I see.
I could not think of that possibility. It is very interesting.

At last, I am using 6GPU: 5 Tesla P40(1 for optimaztion, 4 for self play) + 1 Tesla M40(for evaluator). Maybe it is mostly because of the computation force...

That's very powerful!! :)

apollo-time · 2018-01-23T02:33:17Z

@gooooloo Um... Is really useful history?
When use history, the player can not play on the one board state.
I see some games as chess have must play from some board state that it is not initial state.

gooooloo · 2018-01-23T03:17:59Z

@apollo-time do you mean the first step of game? As the AlphaGoZero paper mentions, all-zero board are used if there is not enough history boards.

8 feature planes Xt consist of binary values indicating the presence of the current player’s stones (Xti = 1 if intersection i contains a stone of the player’s colour at time-step t; 0 if the intersection is empty, contains an opponent stone, or if t < 0)

"t < 0" is the case here.

apollo-time · 2018-01-23T07:15:06Z

@gooooloo No, I mean that some game can play from some board state that is not initial state, for example Chess Puzzles.

apollo-time · 2018-01-23T08:14:12Z

@gooooloo can u beat windows online reversi game level 5?

gooooloo · 2018-01-23T10:16:01Z

@apollo-time

No, I mean that some game can play from some board state that is not initial state, for example Chess Puzzles.

I see. I don't consider that case.

can u beat windows online reversi game level 5?

I don't have a windows system( I will try to find one ). But I can't beat NBoard's Novello 20 level. ( I can beat 10 level though with 1600 sims per move). Nor the NTest 30 level.

apollo-time · 2018-01-23T11:05:05Z

@gooooloo thanks, My question is same with Cassandra120's

gooooloo · 2018-01-23T15:48:48Z

@apollo-time

can u beat windows online reversi game level 5?

I just played with it, using the same model and simulations_per_move(which is 800) with the Lv99 game, and I win online reversi game level 5 ( 2:0 ), lose to level 6 ( 1:3 ).

apollo-time · 2018-01-24T01:49:34Z

@gooooloo My model(simulations_per_move=800) beats online reversi game level 4 now, and my model don't use history.
But do you feel the model improved continuously?

gooooloo · 2018-01-24T06:13:20Z

@apollo-time I had another new generation model the day before yesterday, but not getting any better model these two days. Let's wait for some more days and see.

AranKomat · 2018-01-27T10:55:15Z

@gooooloo

After about 100k steps( batch_size = 3072 )，it seemed got stuck and not improving.

That's also the case in AlphaZero. The performance more or less stagnated after that point. But they achieved an already strong performance (with difference board game) at 100k iters not only due to using 800 sims/move but also due to their large architecture and large buffer. Also, they did one iteration of update for each 30 or so games (3M games after 100k iters), which may not be the case in the implementation of @mokemokechicken, Zeta36 and Akababa.

How was your case? Did you use "normal" setting instead of "mini" of config?

gooooloo · 2018-01-27T16:56:40Z

@AranKomat

... due to their large architecture ...

my config (the network architecture are same as @mokemokechicken 's original implementation) :

class ModelConfig:
    cnn_filter_num = 256
    cnn_filter_size = 3
    res_layer_num = 10
    l2_reg = 1e-4
    value_fc_size = 256
    input_size = (5,8,8) 
    policy_size = 8*8+1

... and large buffer

mine:

class PlayDataConfig:
    def __init__(self):
        self.nb_game_in_file = 50
        self.max_file_num = 1000

class TrainerConfig:
    def __init__(self):
        self.batch_size = 3072
        self.epoch_to_checkpoint = 1
        self.epoch_steps = 100
        self.save_model_steps = 800
        self.lr_schedule = (
            (0.2,    1500),  # means being 0.2 until 1500 steps.
            (0.02,   20000),
            (0.002,  100000),
            (0.0002, 9999999999)
        )

I also change sampling method. I do this because I found in my case(much more play data), @mokemokechicken 's original implementation takes too long waiting for all loaded data got trained at least once before new play data got loaded and before new candidate model got generated.

    def generate_train_data(self, batch_size):
        while True:
            x = []

            for _ in range(batch_size):
                n = randint(0, data_size - 1)
                # sample the nth data and append to x

            yield x

    def train_epoch(self, epochs):
        tc = self.config.trainer
        self.model.model.fit_generator(generator=self.generate_train_data(tc.batch_size),
                                       steps_per_epoch=tc.epoch_steps,
                                       epochs=epochs)
        return tc.epoch_steps * epochs


    def training(self):
        while True:
            self.update_learning_rate()
            steps = self.train_epoch(self.config.trainer.epoch_to_checkpoint)
            self.total_steps += steps

            if last_save_step + self.config.trainer.save_model_steps <= self.total_steps:
                self.save_current_model_as_to_eval()
                last_save_step = self.total_steps

            self.load_play_data()

So basically, I am using "normal" config, but changes a lot of things.
Other configs are listed as below if you are interested:

class PlayConfig:
    def __init__(self):
        self.simulation_num_per_move = 800
        self.c_puct = 5
        self.noise_eps = 0.25
        self.dirichlet_alpha = 0.4
        self.change_tau_turn = 10
        self.virtual_loss = 3
        self.prediction_queue_size = 8
        self.parallel_search_num = 8
        self.v_resign_check_min_n = 100
        self.v_resign_init = -0.9
        self.v_resign_delta = 0.01
        self.v_resign_disable_prop = 0.1
        self.v_resign_false_positive_fraction_t_max = 0.05
        self.v_resign_false_positive_fraction_t_min = 0.04

AranKomat · 2018-01-27T21:59:27Z

@gooooloo Thanks so much for detailed information. Looks like you don't have self.search_threads for multi-threading. Did you find multi-processing only to be sufficient? It's impressive that your sampling method enabled you to finish 200k iters with your large architecture. Looks like Akababa's multiprocessing is very powerful. But I've failed to see how many self-play games you've finished up til 100~200k iters. Have you tracked the number of games?

mokemokechicken · 2018-01-28T03:47:31Z

@gooooloo @apollo-time @evalon32 @vincentlooi @AranKomat

I created Performance Reports for sharing our achievements, and linked from the top of readme.
I would be grateful if you would post it.

gooooloo · 2018-01-28T06:12:58Z

@AranKomat

Have you tracked the number of games?

No I have not. I wish I had.

gooooloo · 2018-01-28T15:43:01Z

Hi everyone, my codes getting the model are here: https://github.com/gooooloo/reversi-alpha-zero, if you are interested.

mokemokechicken added a commit that referenced this issue Jan 4, 2018

change max_file_num from 2000 to 300 #26

2e50e87

mokemokechicken mentioned this issue Jan 8, 2018

Is it multiple searching at the same time? #23

Open

mokemokechicken mentioned this issue Jan 28, 2018

Performance Reports #40

Open

Baseline Comparison? #26

Baseline Comparison? #26

Comments

mrlooi commented Dec 25, 2017 • edited Loading

evalon32 commented Dec 25, 2017

mokemokechicken commented Dec 26, 2017

mokemokechicken commented Dec 26, 2017

evalon32 commented Dec 26, 2017 • edited Loading

mokemokechicken commented Dec 26, 2017

mokemokechicken commented Dec 26, 2017

evalon32 commented Dec 29, 2017

mokemokechicken commented Dec 29, 2017

mrlooi commented Jan 2, 2018 • edited Loading

mokemokechicken commented Jan 4, 2018 • edited Loading

apollo-time commented Jan 8, 2018

mokemokechicken commented Jan 8, 2018

gooooloo commented Jan 8, 2018

apollo-time commented Jan 9, 2018 • edited Loading

mokemokechicken commented Jan 9, 2018

gooooloo commented Jan 9, 2018

apollo-time commented Jan 10, 2018 • edited Loading

gooooloo commented Jan 12, 2018 • edited Loading

mokemokechicken commented Jan 12, 2018

mokemokechicken commented Jan 13, 2018

gooooloo commented Jan 22, 2018

mokemokechicken commented Jan 22, 2018

gooooloo commented Jan 22, 2018 • edited Loading

mokemokechicken commented Jan 22, 2018

apollo-time commented Jan 23, 2018 • edited Loading

gooooloo commented Jan 23, 2018

apollo-time commented Jan 23, 2018

apollo-time commented Jan 23, 2018

gooooloo commented Jan 23, 2018

apollo-time commented Jan 23, 2018

gooooloo commented Jan 23, 2018

apollo-time commented Jan 24, 2018 • edited Loading

gooooloo commented Jan 24, 2018

AranKomat commented Jan 27, 2018

gooooloo commented Jan 27, 2018

AranKomat commented Jan 27, 2018

mokemokechicken commented Jan 28, 2018

gooooloo commented Jan 28, 2018

gooooloo commented Jan 28, 2018

mrlooi commented Dec 25, 2017 •

edited

Loading

evalon32 commented Dec 26, 2017 •

edited

Loading

mrlooi commented Jan 2, 2018 •

edited

Loading

mokemokechicken commented Jan 4, 2018 •

edited

Loading

apollo-time commented Jan 9, 2018 •

edited

Loading

apollo-time commented Jan 10, 2018 •

edited

Loading

gooooloo commented Jan 12, 2018 •

edited

Loading

gooooloo commented Jan 22, 2018 •

edited

Loading

apollo-time commented Jan 23, 2018 •

edited

Loading

apollo-time commented Jan 24, 2018 •

edited

Loading