-
Notifications
You must be signed in to change notification settings - Fork 216
Multi-armed bandit? #108
Comments
Yes, most definitely. The easiest thing to do is batch-based Thompson sampling, where you start with an experimental design that looks like:
Observe some data, then use Thompson sampling to generate a distribution of winners, and use that as weights for a new batch. For example, you can make multiple draws from the beta posterior over arms, and mark how many times each arm is the winner. Then you just use that tabulation as the weights for the new batch, yielding something that might look like:
You would then repeat this process once a day or a few times per day, perhaps using namespaces to manage the experiment. With the above method you directly represent policies as PlanOut scripts, but you could also use an external service to store / manage the policies. The latest version of the PlanOut reference implementation makes it easy to add your own operators if you wanted to do something like this. HTH. |
Thanks! I think your batched approach makes sense, particularly for scalability where logs are big. I'm thinking of writing an operator that continuously re-calculates the weights. (I suppose with an external service, you could store the previous weights....) Would this be a reasonable PR, if other people find this potentially useful? |
Hi @kevinmickey --- apologies for letting this fall off the map. It would be great to have such functionality in contrib/, but since it requires an external service I would not want to include it in the core reference implementation. In case you are developing a custom operator, we do have a mechanism for doing that without needing to modify the package itself (see https://github.com/facebook/planout/blob/master/python/planout/test/test_interpreter.py#L41). This offer might be a little too late, but I'd be happy to review / provide feedback on any bandit-related things involving PlanOut |
Based on how the WeightedChoice is implemented, it seems that if you redistribute the weights, the variation that is served across multiple requests is no longer deterministic. If a user with a particular id/hash is assigned one variation with a given set of weights and those weights change, that same user might subsequently get assigned to a different variation if you ask for their assignment again. It seems like this could be particularly problematic for any experiment that doesn't conclude quickly or on a single page, like a multi-page funnel where different pages need to re-request the assignment. I guess it would put the onus on the caller to cache the assignment and not be able to rely on the library to return assignments? |
Hi Javid,
The assignment should is deterministic as long as your input IDs and
experimental design (the planout scripts) remain the same. It’s assumed
that you won’t change the experiment while it’s running. Changing
experiments while are running are a huge source of error in practice, and
we recommend using namespaces to manage changes. See S5 of
https://hci.stanford.edu/publications/2014/planout/planout-www2014.pdf for
details.
E
…On Sun, Sep 13, 2020 at 10:20 AM Javid Jamae ***@***.***> wrote:
Based on how the WeightedChoice is implemented
<https://github.com/facebook/planout/blob/d2f0088c905bdf5a250337019d1ee1f1c0067b5e/alpha/ruby/lib/plan_out/op_random.rb#L53>,
it seems that if you redistribute the weights, the variation that is served
across multiple requests is no longer deterministic.
If a user with a particular id/hash is assigned one variation with a given
set of weights and those weights change, that same user might subsequently
get assigned to a different variation if you ask for their assignment again.
It seems like this could be particularly problematic for any experiment
that doesn't conclude quickly or on a single page, like a multi-page funnel
where different pages need to re-request the assignment.
I guess it would put the onus on the caller to cache the assignment and
not be able to rely on the library to return assignments?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#108 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAW34J2ILVLJNM6KH7RLN3SFT5NXANCNFSM4CGPK7FA>
.
|
Saw this 4 years late. |
Is it possible to use a multi-armed bandit algorithm? (Like http://stevehanov.ca/blog/index.php?id=132)
The text was updated successfully, but these errors were encountered: