Multi-armed bandit? #108

kevinmickey · 2016-06-12T17:02:58Z

Is it possible to use a multi-armed bandit algorithm? (Like http://stevehanov.ca/blog/index.php?id=132)

eytan · 2016-06-13T22:14:50Z

Yes, most definitely. The easiest thing to do is batch-based Thompson sampling, where you start with an experimental design that looks like:

x <- uniformChoice(choices=['a','b','c','d'], unit=userid);

Observe some data, then use Thompson sampling to generate a distribution of winners, and use that as weights for a new batch. For example, you can make multiple draws from the beta posterior over arms, and mark how many times each arm is the winner. Then you just use that tabulation as the weights for the new batch, yielding something that might look like:

x <- weightedChoice(choices=['a','b','c','d'], weights=[10,100,200,690], unit=userid);

You would then repeat this process once a day or a few times per day, perhaps using namespaces to manage the experiment.

With the above method you directly represent policies as PlanOut scripts, but you could also use an external service to store / manage the policies. The latest version of the PlanOut reference implementation makes it easy to add your own operators if you wanted to do something like this.

HTH.

kevinmickey · 2016-06-14T00:23:47Z

Thanks! I think your batched approach makes sense, particularly for scalability where logs are big. I'm thinking of writing an operator that continuously re-calculates the weights. (I suppose with an external service, you could store the previous weights....) Would this be a reasonable PR, if other people find this potentially useful?

eytan · 2017-02-02T01:19:14Z

Hi @kevinmickey --- apologies for letting this fall off the map. It would be great to have such functionality in contrib/, but since it requires an external service I would not want to include it in the core reference implementation. In case you are developing a custom operator, we do have a mechanism for doing that without needing to modify the package itself (see https://github.com/facebook/planout/blob/master/python/planout/test/test_interpreter.py#L41). This offer might be a little too late, but I'd be happy to review / provide feedback on any bandit-related things involving PlanOut

javidjamae · 2020-09-13T17:20:13Z

Based on how the WeightedChoice is implemented, it seems that if you redistribute the weights, the variation that is served across multiple requests is no longer deterministic.

If a user with a particular id/hash is assigned one variation with a given set of weights and those weights change, that same user might subsequently get assigned to a different variation if you ask for their assignment again.

It seems like this could be particularly problematic for any experiment that doesn't conclude quickly or on a single page, like a multi-page funnel where different pages need to re-request the assignment.

I guess it would put the onus on the caller to cache the assignment and not be able to rely on the library to return assignments?

eytan · 2020-09-14T03:06:03Z

Hi Javid, The assignment should is deterministic as long as your input IDs and experimental design (the planout scripts) remain the same. It’s assumed that you won’t change the experiment while it’s running. Changing experiments while are running are a huge source of error in practice, and we recommend using namespaces to manage changes. See S5 of https://hci.stanford.edu/publications/2014/planout/planout-www2014.pdf for details. E

…

On Sun, Sep 13, 2020 at 10:20 AM Javid Jamae ***@***.***> wrote: Based on how the WeightedChoice is implemented <https://github.com/facebook/planout/blob/d2f0088c905bdf5a250337019d1ee1f1c0067b5e/alpha/ruby/lib/plan_out/op_random.rb#L53>, it seems that if you redistribute the weights, the variation that is served across multiple requests is no longer deterministic. If a user with a particular id/hash is assigned one variation with a given set of weights and those weights change, that same user might subsequently get assigned to a different variation if you ask for their assignment again. It seems like this could be particularly problematic for any experiment that doesn't conclude quickly or on a single page, like a multi-page funnel where different pages need to re-request the assignment. I guess it would put the onus on the caller to cache the assignment and not be able to rely on the library to return assignments? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#108 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAW34J2ILVLJNM6KH7RLN3SFT5NXANCNFSM4CGPK7FA> .

Amitg1 · 2020-12-09T14:09:57Z

Saw this 4 years late.
but, Seems that they managed to do it here:
https://engineering.ezcater.com/multi-armed-bandit-experimentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-armed bandit? #108

Multi-armed bandit? #108

kevinmickey commented Jun 12, 2016

eytan commented Jun 13, 2016 •

edited

Loading

kevinmickey commented Jun 14, 2016

eytan commented Feb 2, 2017

javidjamae commented Sep 13, 2020

eytan commented Sep 14, 2020 via email

Amitg1 commented Dec 9, 2020

Multi-armed bandit? #108

Multi-armed bandit? #108

Comments

kevinmickey commented Jun 12, 2016

eytan commented Jun 13, 2016 • edited Loading

kevinmickey commented Jun 14, 2016

eytan commented Feb 2, 2017

javidjamae commented Sep 13, 2020

eytan commented Sep 14, 2020 via email

Amitg1 commented Dec 9, 2020

eytan commented Jun 13, 2016 •

edited

Loading