Skip to content
This repository has been archived by the owner on Jun 1, 2021. It is now read-only.

Multi-armed bandit? #108

Open
kevinmickey opened this issue Jun 12, 2016 · 6 comments
Open

Multi-armed bandit? #108

kevinmickey opened this issue Jun 12, 2016 · 6 comments

Comments

@kevinmickey
Copy link

Is it possible to use a multi-armed bandit algorithm? (Like http://stevehanov.ca/blog/index.php?id=132)

@eytan
Copy link
Contributor

eytan commented Jun 13, 2016

Yes, most definitely. The easiest thing to do is batch-based Thompson sampling, where you start with an experimental design that looks like:

x <- uniformChoice(choices=['a','b','c','d'], unit=userid);

Observe some data, then use Thompson sampling to generate a distribution of winners, and use that as weights for a new batch. For example, you can make multiple draws from the beta posterior over arms, and mark how many times each arm is the winner. Then you just use that tabulation as the weights for the new batch, yielding something that might look like:

x <- weightedChoice(choices=['a','b','c','d'], weights=[10,100,200,690], unit=userid);

You would then repeat this process once a day or a few times per day, perhaps using namespaces to manage the experiment.

With the above method you directly represent policies as PlanOut scripts, but you could also use an external service to store / manage the policies. The latest version of the PlanOut reference implementation makes it easy to add your own operators if you wanted to do something like this.

HTH.

@kevinmickey
Copy link
Author

Thanks! I think your batched approach makes sense, particularly for scalability where logs are big. I'm thinking of writing an operator that continuously re-calculates the weights. (I suppose with an external service, you could store the previous weights....) Would this be a reasonable PR, if other people find this potentially useful?

@eytan
Copy link
Contributor

eytan commented Feb 2, 2017

Hi @kevinmickey --- apologies for letting this fall off the map. It would be great to have such functionality in contrib/, but since it requires an external service I would not want to include it in the core reference implementation. In case you are developing a custom operator, we do have a mechanism for doing that without needing to modify the package itself (see https://github.com/facebook/planout/blob/master/python/planout/test/test_interpreter.py#L41). This offer might be a little too late, but I'd be happy to review / provide feedback on any bandit-related things involving PlanOut

@javidjamae
Copy link

Based on how the WeightedChoice is implemented, it seems that if you redistribute the weights, the variation that is served across multiple requests is no longer deterministic.

If a user with a particular id/hash is assigned one variation with a given set of weights and those weights change, that same user might subsequently get assigned to a different variation if you ask for their assignment again.

It seems like this could be particularly problematic for any experiment that doesn't conclude quickly or on a single page, like a multi-page funnel where different pages need to re-request the assignment.

I guess it would put the onus on the caller to cache the assignment and not be able to rely on the library to return assignments?

@eytan
Copy link
Contributor

eytan commented Sep 14, 2020 via email

@Amitg1
Copy link

Amitg1 commented Dec 9, 2020

Saw this 4 years late.
but, Seems that they managed to do it here:
https://engineering.ezcater.com/multi-armed-bandit-experimentation

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants