Skip to content
This repository has been archived by the owner on Oct 8, 2024. It is now read-only.

lengthBonus as a Summation of Difficulty Instead of Object Count #64

Open
Xexxar opened this issue Oct 20, 2018 · 99 comments
Open

lengthBonus as a Summation of Difficulty Instead of Object Count #64

Xexxar opened this issue Oct 20, 2018 · 99 comments

Comments

@Xexxar
Copy link
Contributor

Xexxar commented Oct 20, 2018

Hello,

In collaboration with MBmasher, I present a new solution to the length problem.

The Problem
Currently the algorithm addresses length as the total object count of a map. While this idea certainly has merit, it has several problems. For example, a 2 minute stream map that has lots of 1/4th rhythm can be treated as longer than a 5 minute jump map simply due to the object count being more dense. In addition, the difficulty of these objects are not assessed. A map that has a relatively consistent difficulty would get the same length bonus as a map that has difficulty spikes assuming the object count and other factors were equal.

Thus, something new is needed.

The Approach / The Math
In order to properly balance length, a system that assesses the aggregate difficulty of a map respective to the star rating is needed. To make this a bit more visual for those who aren't seeing where I'm going with this, here are some graphs that display the difficulty curves of both speed.cs and aim.cs

http://puu.sh/BNTOR/cf7fa8a2b2.png
http://puu.sh/BNTPh/cd6a05b751.png

With these images, it should be clear what I intend to do. By summing up the area under the curve, we can give reward for length not only in terms of real time, but also balance for the difficulty of the section. Without any more delay, the math for this suggestion is shown below.

Formal: http://puu.sh/BNTWD/e5090554ca.png
Code:

for strain in strains:
            total += strain ^ d
length_bonus = -c + b * ((2 + log10(aim_total + speed_total)) / (2 + log10(stars)) - 1)

Where strain in strains refers to the sum / calculation for both aim_total and speed_total and stars

In essence, this function takes the log of the aggregate difficulty with the log base being the star rating of the map (as I'm using the formula log_x(y) = log (y) / log(x)). This makes it so length is rewarded regardless of the star rating. Some people may have some questions as to why I have 2+log etc. but for the most part these constants are to prevent huge adverse scaling for low SR maps. With my testing that I have done, the following values have been tested and I believe produce satisfactory results.

b= 1.6
c=-1.28
d=1.12

I will say that the only major issue I have with this method currently is the slightly too large buff for normal and hard maps (because these type of maps don't spike at all, so they are buffed in comparison). That being said, the issues are minor compared to the benefits.

Some Results
Ah yes the juicy part that everyone cares about. I'm a bit concerned with the calculator that we developed to do the testing as there are a few anomalies, so some values might be slightly inaccurate. (I apologize for the small sample size). Please note that these values are still subject to change.

Within Temptation - The Unforgiven[Marathon] https://puu.sh/BNUBj/255ac14540.png
Fairy FORE - Vivid[Insane] https://puu.sh/BNUBM/583c3a33b5.png
Will Stetson - Harumachi Clover [Oh no!] +DT http://puu.sh/BNVvK/64b1959c7e.png
Will Stetson - Harumachi Clover [Oh no!] https://puu.sh/BNUFw/d1aac4c924.png
VINXIS - Sidetracked Day[THREE DIMENSIONS] +HR http://puu.sh/BNVB6/c2378d6cb4.png
MuryokuP - Sweet Sweet Cendrillon Drug[Normal] https://puu.sh/BNVbx/39f1d87226.png
MuryokuP - Sweet Sweet Cendrillon Drug[Cendrillon] https://puu.sh/BNVbe/62217919a5.png
Imperial Circus Dead Decadence - Uta[Himei] +HR https://puu.sh/BNUWU/ecdeab1163.png
Yamajet feat. Hiura Masako - Sunglow[Melody] http://puu.sh/BNVFx/3894582b87.png
Yamajet feat. Hiura Masako - Sunglow[Melody] +DT http://puu.sh/BNVGz/238f09cb50.png
GYZE - HONESTY[DISHONEST] +HR https://puu.sh/BNUVB/2ccf2a7da0.png
GYZE - HONESTY[DISHONEST] http://puu.sh/BNVIu/93fc534305.png
Fujijo Seitokai Shikkou-bu - Best FriendS (GoldenWolf)[Extreme] http://puu.sh/BNVRr/df180fdef9.png
Fujijo Seitokai Shikkou-bu - Best FriendS (GoldenWolf)[Extreme] +DT http://puu.sh/BNVSk/042ead7be8.png

The astute among you all might have realize that DT universally causes the lengthBonus to be reduced. This is indeed intended and one of the best features of this algorithm: lengthBonus varies across mods. In fact, DT causes the bonus to lower and HT causes the bonus to raise universally.

Conclusion
While these values aren't all perfect, I believe them to be leaps and bounds better than the current system. In the coming days MBmasher will likely release an updated calculator for those who are interested in testing some values on their own.

With that all said, I hope you all are interested in pursuing this change. Hopefully an official PR can be produced and testing can begin on the official code soon. Thank you for reading and remember to drop a 👍 if you like the change!

@Iryya
Copy link

Iryya commented Oct 20, 2018

It's true that short maps with DT (+HD) are somewhat overweighted, but I think the current math nerfs them a bit too hard.

@CrappySalami
Copy link

I like the idea of this rebalance. Researched very well. I pray that peppy will actually see this, and not ignore it like he does with actually good suggestions.

@smoogipoo
Copy link
Contributor

smoogipoo commented Oct 20, 2018

How are you using lengthBonus to determine the difficulty of the map? I.e. how are you factoring it into the star rating? Furthermore, if it's also supposed to be a completely separate difficulty attribute alongside aim/speed, is there a similar formula for pp that you can provide?

Right now I basically have:
image

Where total strain is the summation of exponentiated strains, and difficulty value is the summation of weighted strains. As you can see I've just added it into the final SR, but idk if that's what you intend.

Also, you mention stars in the code but the sum of weighted strains as per your formula. Which one is correct? Because SR is effectively the sum of scaled weighted strains.

Also one thing I noticed as part of your formula is that you weigh the first strain by (.9)^1, where it's actually (.9)^0. Probably doesn't have much of an effect, just a discrepancy to note.

@MBmasher
Copy link
Member

lengthBonus will not affect the SR of the map, it'll only affect the pp of it
aim_total and speed_total refer to the weighted strains BEFORE scaling, while stars refers to TOTAL star rating AFTER scaling

@xDololow
Copy link

What about this map? https://osu.ppy.sh/beatmapsets/429411/#osu/939197

@smoogipoo
Copy link
Contributor

Ah I see, so this is a pp processor implementation that essentially requites two new attributes.

@TheBicPen
Copy link

TheBicPen commented Oct 20, 2018

Is there any way to adjust the bonus relative to the length of the whole map? Maybe changing the 2 from 2+log into a constant or something similar. Edit: just to be clear, I mean changing it from 2+log( to a+log(, where a is a constant like b, c, and d.
The reason I bring this up is because of this:

In fact, DT causes the bonus to lower and HT causes the bonus to raise universally.

I personally have no issues with this result, but I think the ability to fine-tune the extent to which this happens would make the algorithm much better for the long term. You mentioned that the algorithm buffs normal/hard maps too much, so if I understand the purpose of that 2 correctly, changing this value could help fix the problem.

@johnmave126
Copy link

johnmave126 commented Oct 20, 2018

The length bonus function is not a monotone function (as a set function), which is a desired property
We can regard length bonus function as a multi-set function or collection function which intakes a collection of difficulty value and output one real value.
Basically, we want the length bonus function f to have the property that:
If X\subset Y, then f(X)\leq f(Y)
In other words, if we add any chunk to a beatmap, the length bonus should not decrease.
But the function you propose violates this property
I quickly implemented it as

function func(arr, c, b, d) {
    const seq = arr.sort().reverse();
    const total = Math.log10(seq.reduce((s, a) => s + Math.pow(a, d), 0));
    const sr= Math.log10(seq.reduce(([s, mul], a) => [s + mul * a, mul * 0.9], [0, 0.9])[0]);
    return c + b*((2+ total)/(2 + sr) - 1);
}

And we have

> func([1,1,1,1], 1.28, 1.6, 1.12)
1.3515534994438259
> func([1,1,1,1,3], 1.28, 1.6, 1.12)
1.3418421687793718

So I don't think it is a good idea.

In fact, I was checking another property I think the function should have while found your function doesn't satisfy the basic monotonicity. I think the function should also satisfy submodular property which says:
If X\subset Y, and let x' be any chunk, then submodular
In other words, suppose we have a short map X, and a long map Y induced by adding more chunks to the short map. If we add the same new chunk x' to the two maps, the length bonus gain to the short map should be no less than the gain to the long map. i.e., adding a chunk adds more length bonus to the short map.

I believe designer of length bonus function should have these two properties in mind.

@Xexxar
Copy link
Contributor Author

Xexxar commented Oct 20, 2018

The effect of your first point you're observing was entirely intended. As the bottom function you display includes a 3 (aka the new hardest point of the map), the star rating increases and would cause the length bonus to decrease. This is designed so that maps with difficulty curves that spike don't get overly high length bonuses because the sections compared to the diff spikes would be so easy that they would not affect the difficulty to FC for the player capable of FCing the spike.

As for the second property, a short map obviously would receive more bonus for the section being added. If it didn't, then length would scale infinitely and maps like The Unforgiven would give 1000pp.

I don't see why these properties would be desired.

@Xexxar Xexxar closed this as completed Oct 20, 2018
@Xexxar Xexxar reopened this Oct 20, 2018
@Tom94
Copy link
Collaborator

Tom94 commented Oct 20, 2018 via email

@johnmave126
Copy link

Second to Tom94. The motivation that I hope length bonus function to be monotone and submodular is because in that case, the final pp function is naturally monotone and submodular.

Now let me elaborate submodular.
This property is desired because it ensures that, when you slap a new section in your map, it should have larger impact on short map than on long map. Otherwise, the pp gained from a map will be super-linear to its length. In that case, I guess we will see new meta like length filler or something like that.

@Xexxar
Copy link
Contributor Author

Xexxar commented Oct 20, 2018

The only case in which the additional stuff being added to the end of the map would cause the lengthBonus to lower would be if the additional stuff added to the end of the map increases the star rating, thereby increasing the log base. This would therefore theoretically raise the overall pp even though the lengthBonus itself has decreased. I'll see if I can either formally prove such a property exists or modify the algorithm until such a result is achieved.

@akainocode
Copy link

Would it be possible to scale strain for each subsection off of the maximum aim/speed up to the subsection compared to the aim/speed of the current subsection?

It'd allow for maps with a difficulty spike in the middle to still maintain the amount of pp from the beginning of the map to that point, but it'd lessen the pp gained afterwards (but still be greater than the amount from the beginning to the diff spike)

@kszlim
Copy link

kszlim commented Oct 21, 2018

Is it not the case where length is already implicitly accounted for in the algorithm (since you're summing the strains + a decay factor)? Looks like to me, you could reduce the decay if you wanted to explictly benefit longer maps and remove the length bonus altogether.

This would require re-tuning of all parameters though, but might be the best way to go?

@johnmave126
Copy link

I personally suggest looking at the class of Archimedean t-norm or Archimedean t-conorm functions. I think it is easy to turn a t-conorm function to a set function, and we will naturally have monotonicity. Apart from that, we will also naturally have null-additive. But it is a large class though and many of such functions are too trivial. So it is just a personal suggestion.

@o-x-e-y
Copy link

o-x-e-y commented Oct 31, 2018

Is there a chance that this will be implemented in Catch too? I feel like it might partially fix maps like Image Material

@peppy
Copy link
Member

peppy commented Nov 6, 2018

Any updates?

@EbombaGang
Copy link

When?

@camellirite
Copy link

camellirite commented Nov 18, 2018

When it's ready.

@Xexxar
Copy link
Contributor Author

Xexxar commented Nov 24, 2018

Hello,

Good news! Updates! While there is still balancing and testing to be done, I believe I have an amended algorithm that should help solve the following issues.

Separation of lengthBonus
Instead of the previous suggestion where both speed and aim scale by the same lengthBonus that encompasses both skills, the new algorithm will split lengthBonus into two separate bonuses that can be balanced separately (as these two curves have different properties). In addition, this will prevent cases where maps with highly consistent aim difficulty inaccurately scale a speed spike.

Universal Star Rating Balance
In the previous version of this algorithm, I noticed an issue regarding lengthBonus and a maps star rating. Imagine two difficulties with perfectly consistent difficulty curves, where one is 2 stars and the other is 5 stars. Under the previous algorithm, these two maps would have different bonuses (in fact, the easier map would get more of a bonus). Now, the bonus is perfectly balanced to be independent of the star rating of the map.

Spikes Do Not Lower PP
There were some previous fears that since lengthBonus can lower in value when a new section is added (a property that is still maintained in this version because it is desired), the PP of a map may also lower when such a new section is added. From my testing so far, it is not possible for the PP to drop as a result of adding a new section to a map. The only case in which lengthBonus drops is when a new spike is added to the difficulty. This spike causes the star rating of the map to increase thereby lowering the weight of the other sections of the map as compared to the star rating. Thus, while the lengthBonus decreases, the maps star rating must rise as a result. However, since the star rating has increased, the net PP change would be positive from such a section being added. Spikes should and will always outweigh the effects of the lengthBonus's change from the new section.

Expect an update to the original post sometime in the future (soon) with the amended algorithm and a new data supporting it's implementation.

Thank you for reading,

Xexxar

(For those who care about the math, log(aggregateSR / scaledAggregateSR) did the trick. Using a separate log base caused more problems than it was worth (aggregateSR is the sum of all difficulty sections, scaledAggregateSR is the same sum with the .9^n scaling factor))

@diamondburned
Copy link

I actually support this change, though I'm doubting some of the vague parts of your latest comment, such as

Under the previous algorithm, these two maps would have different bonuses (in fact, the easier map would get more of a bonus). Now, the bonus is perfectly balanced to be independent of the star rating of the map.

I'm not sure how the bonus balances out. If the easier map gets the bonus in the previous algorithm, that would make it an issue. However, being independent of the star rating doesn't sound like it will fix it?

@Feuerholz
Copy link

I'm not sure how the bonus balances out. If the easier map gets the bonus in the previous algorithm, that would make it an issue. However, being independent of the star rating doesn't sound like it will fix it?

The sr already scales the pp itself, and therefore the absolute increase in pp will be higher for harder maps, the percentage stays the same. The percentage shouldn't be different as it is just meant to reflect consistent difficulty, no matter whether the map is high or low sr

@o-x-e-y
Copy link

o-x-e-y commented Nov 24, 2018

Great to see we're getting updates on this! :)

@smoogipoo
Copy link
Contributor

Here's some sample data with this change:

Original (reference, no changes):
http://original.smgi.me

This change + speed bonus
http://xexxar-lb-sb.smgi.me

Note that I've been told the length bonus and speed bonus changes are supposed to be included together.

@Xexxar
Copy link
Contributor Author

Xexxar commented Dec 9, 2018

I would like to introduce an amended version of the algorithm that no longer has a minimum value.

In my testing, d=1.2, beta = .45, c = .38 seem to work very well by keeping the changes in a ~85%-110% range. Overall I would say that stream maps are going to see a pretty sizable nerf but this is to be expected due to object count no longer being used for length. I tried to balance TV size maps so there would be no change for them.

At this point I think this is ready to be merged. Other than minor balancing I believe this is as good as it can get. As always, if there are questions let me know.

@smoogipoo
Copy link
Contributor

smoogipoo commented Dec 9, 2018

Have updated the sample with the proposed changes ^^

(code)

@joseph-ireland
Copy link

joseph-ireland commented Apr 17, 2019

@abraker95 just read through your example in more detail, I really can't see why that format is any better. You're basically defining a function step by step that you can tweak so it looks about right to you. There's nothing special about it, it has no interesting mathematical properties other than its limits. As far as I'm concerned, that's no different to the current PP system which was created in the same way. There's no way to statistically test anything either, so everything is just hypothesis with no test possible.

I've defined a function which does make verifiable predictions, and does have interesting mathematical properties, then I go on to show how those properties improve the PP system. It's still an arbitrary choice though, and I don't think there's any possible sound justification without collecting a lot of data and analysing it. I don't have a lot of data. I do have the existing PP system though, which, like it or not, is actually not bad, so I don't think it's a terrible idea to use it as a benchmark.

@abraker95
Copy link

abraker95 commented Apr 17, 2019

@joseph-ireland I vent very verbose with my process of explaining. While it helps greatly to understand why the functions are the way they are, and gives insight into model construction, you don't really need to be that detailed. Just detailed enough to understand where the whole thing comes from.

Interesting properties and able to test predictions aside, if you can't trace the logic that comes to the conclusion of that model, then that sends me red flags that the model is wrong. tr3sleches seems to be more lenient in regards to that, and he ignored logical traceability and went ahead to state faults within your model. I usually don't bother continuing that far if the models are created from thin air. The existing pp system generally works, but is very, very faulty down to the core and needs to be redone from scratch. I don't mind that people are adding on to it, but I do mind that further additions be not created from thin air and trial and error like much of the existing pp formula.

@joseph-ireland
Copy link

He eroneously (and unceremoniously) stated faults that were at most a miscommunication (I believe).

And there is no justification for the skill->hit probability function except that it looks about right, has desirable properties, and gives good results. In the same way that theres no logical justification for most of the PP system amd there's no logical justification for whatever it is that you posted earlier except that it looks about right (maybe) and gives good results (maybe). What exactly do you want to see?

And there is logical justification for almost everything else I've written.

As far as I'm concerned, nothing we can do will ever be fully "right", since there'll always be more factors to model. All that matters is if it's a better model than the current best, and the only real way to judge that is to look at the results. I'll admit that what I've done isn't good enough to publish in a journal, for that I'd need to get lots of data and look at the stats to see how accurate the model is. But I'm not publishing this in a journal, and the fact that this is even possible is a big step forward from anything we've already got including anything you've done.

To be honest, I think that the only reason you are so opposed to this is because it mostly achieves something you wanted to do yourself but haven't succeeded so far because you set your standards too high and wanted to start again. All your comments reek of second-system syndrome and not-invented-here syndrome, both of which are a big problem for inexperienced engineering teams and present a huge barrier to progress in any system.

Also your standards aren't even how things are done in academia. Things usually go from vague intuition to definitions, then theorems, and finally evaluation. An analogy to your position is to see "assuming that the population is normally distributed" at the start of a paper, and then just discard the whole thing because they haven't logically deduced that the population is actually normally distributed, even if the results show that the population is in fact approximately normally distributed.

I'm not interested in going round in any more circled regarding this.

@joseph-ireland
Copy link

Here's another analogy: newton's law of gravity.

Is there any logical justification for that? Not really it just seems to work, and experiments show that it does. It's actually completely wrong, it doesn't take a lot of things into account. General relativity is an improved model, but it's still wrong, it doesn't take quantum mechanics into account.

NASA still used newtons laws to get to the moon though.

@abraker95
Copy link

abraker95 commented Apr 17, 2019

There are always more factors to model, but you can be correct by not factoring all factors by saying they are outside the scope of what you are focusing on. However, the stuff you presented does not derive from sound logic and is therefore very likely not correct. There is a chance it can be correct, but the odds of landing on the correct formula blindly are basically none. Which is why there needs to be justification for the model you are proposing via an explanation of what you need the model to behave like instead of coming across something that behaves something similar to what is desired. All the borderline flaming aside, I am not speaking against you, I am simply trying to help fix the proposal.

I may address the rest in dm's otherwise this will devolve into off-topic.

@joseph-ireland
Copy link

You keep saying "wrong" and "not correct" but it's a model, it's not supposed to be correct. The only thing it can be is inaccurate. I'm treating the strains themselves as a black box, they're just a number, I've not done any reasoning on their behaviour other than bigger numbers = more difficult. I think you'd probably agree that it'd be hard to give them any "correct" theoretical significance other than an arbitrary measure of difficulty, so I don't see how you can say anything derived from them is right or wrong.

What I can say is that if the tail behaviour of the generating function f(s) is regular in some way, then it's likely to be closely related to the "first attempt". Assuming the tail behaviour of f(s) is similar to (1-x^-k), then the function I've given is a good approximation for all cases we care about, and is easy to calculate. Here's a graph showing the difference between two different models with the same tail behaviour for a 1.5 star map. They match really well, and they only match more as star rating increases.

If the tail behaviour is similar to 1-e^-kx, then you get the "first attempt". Any other tail behaviour can be calculated by just taking p_u(d)(u(x)) for some substitution function u. From my initial guess, I took u(x) = log(x) because it seemed to work, and because it's simple and the end result has nice mathematical properties.

The actual tail behaviour will depend on the strains, which are just arbitrary difficulty measures, so I don't see how you can possibly get anything more "correct" than an educated guess.

@tr3sleches
Copy link

tr3sleches commented Apr 18, 2019

Honestly, you are better off summing and or integrating (depending if you are doing this continuously or discretely) for the expected time to FC and normalizing that to find skill required.
Something like
normalizedTimetoFC = sum(delta_time/Pr(success of strain s at time t|player skill x))
Solve for player skill x.
*set this normalized time to FC to a value of some sort (e.g. 1 billion milliseconds).

I personally solved it using the MGF function of the data it’s respect to player skill because these can be easily approximated with said data (all you have to do is find the moments).

@joseph-ireland
Copy link

I am already summing for the expected time to FC.

The formula you've given isn't correct, the expected number of times you attempt a note while going for an FC is equal to 1/(probability you hit that note and every note after it), since every time you miss a note later than it, you need to retry and play it again.
It should be

normalizedTimetoFC = sum(delta_time/Pr(success of all strains s(t) from time t to t_last|player skill x))

which is what I've already got.

I don't really see how the MGF will help solve for player skill, the distribution of time to FC has a lot of parameters, and it changes when you change player skill. We've already calculated the expectation, and we don't need to find variance or any other moments, so I don't see any use for the MGF.

The actual equation to solve looks something like this

Wolframalpha is better at solving stuff than me, and even with just 4 terms, it chooses to solve it numerically, so I'm not too hopeful for an exact solution, although if you can find one, and it can be computed faster than the numerical solution, then fantastic.

@tr3sleches
Copy link

If you do that formula then difficulty at the end will be worth more than difficulty at the beginning, which doesn’t make sense to me. I get how that can be used to find the expected time to FC, but it seems counter productive to have impact on difficulty change depending on where difficulty is in the song (looks like I’m assuming independence now lol).
I used a different probability model, so that might be why I was able to use the MGF (I used a modded exponential distribution). The MGF makes the integration and/or summation infinitely easier (don’t have to deal with gamma functions and the like, having to shift the PDF was annoying though).

@joseph-ireland
Copy link

It does make sense, you can just spam retry a diff spike at the start until you get it, but you can't spam retry a diff spike at the end of a 10 min map.

@tr3sleches
Copy link

I’m talking about in reference to difficulty calculation. As long as the values are the same, anything pertaining to difficulty should not change. Having a diff spike at the beginning or the end should not make the map register as easier or harder if everything else stays the same. Adding objects could potentially decrease your difficulty, which should almost never happen (it would have to be some outrageous circumstances for this to happen).
It would end up like the abomination Xexxar’s angle bonus is.

@joseph-ireland
Copy link

Adding objects won't decrease difficulty, it can only increase time spent waiting around and increase the probability of failure, both make difficulty increase.

Moving a huge diff spike from the end of a 10 minute map to the start absolutely makes it easier to FC by quite a bit and the calculation reflects that.

@abraker95
Copy link

abraker95 commented Apr 18, 2019

What I can say is that if the tail behaviour of the generating function f(s) is regular in some way, then it's likely to be closely related to the "first attempt". Assuming the tail behaviour of f(s) is similar to (1-x^-k), then the function I've given is a good approximation for all cases we care about, and is easy to calculate.

If the tail behaviour is similar to 1-e^-kx, then you get the "first attempt". Any other tail behaviour can be calculated by just taking p_u(d)(u(x)) for some substitution function u. From my initial guess, I took u(x) = log(x) because it seemed to work, and because it's simple and the end result has nice mathematical properties.

It needs to be more than just a guess that ends up with a coincidence. Please come up with a reason to use log(x) prior to examination of the mathematical properties that result from it. I want to know how you came up with

What made you to decide to raise e to the -e power, and that to the power of another value? What made you to decide to apply log to s and d to produce

That (e^-e)^-x function you made up behaves very similar to an S-Curve, a function for which uses and properties are known well enough to understand what it is being used for:

Is this what were you going for? Or were you going for the Error Function? There is a whole list of possible functions that match the behavior you might have been intending, each that is used for specific applications.

I want to see where your formulas come from. They have to derive from some observation that concludes the curve changes in a particular way that rules out any other function.

@joseph-ireland
Copy link

The graphs you've posted there are the generating function f(s). For difficulty to be fully characterised by a single number, and for the system to be closed under multiplication, all other functions p_d(s) need to be a power of that function. Try raising both of those to the nth power and see what happens (hint, theyre almost exactly the same - the difference is that my one doesn't change shape for very small n).

The way I originally found e^-e^-(x-d) was by calculating the limit as n->infty of 1/(1+(e^-x)/n)^n, since I identified that as the shape produced by a whole bunch of generating functions I tried, heres a visualisation of that from my original post.

It is precisely the s curve I intended.

All other S curves can be transformed into that one using an increasing bijection u(x), so you can easily perform all the calculations for any S curve without any major code changes. I originally chose log(x) because its simple, and because it maps R+ into R, and the strains are in R+. I didn't find out until later how nice the interpretation was and how well it reflects the current PP system.

Any theoretical justification for a choice of any u(x) would have to reference properties of the strains themselves, and I don't see a way forward there.

@abraker95
Copy link

abraker95 commented Apr 20, 2019

The way I originally found e^-e^-(x-d) was by calculating the limit as n->infty of 1/(1+(e^-x)/n)^n, since I identified that as the shape produced by a whole bunch of generating functions I tried, heres a visualisation of that from my original post.

All other S curves can be transformed into that one using an increasing bijection u(x), so you can easily perform all the calculations for any S curve without any major code changes. I originally chose log(x) because its simple, and because it maps R+ into R, and the strains are in R+.

That sounds like something that REALLY should be in the paper before the First Attempt section. Also going off the graph visualization you provided, you have also omitted the information explaining how your function approaches the sigmoid function for higher values of n, and your intent to use that property to calculate the products. You actually show the use, but fail to explain the intention beforehand, making it seem like a coincidence. I suggest calling your function a sigmoid function approximation, with the description that it is designed to be a more accurate approximation for larger powers of n.

Also when dealing with probabilities the error function is used, not the sigmoid function. While using the sigmoid function should still give you somewhat reasonable results for strains in relation probability, it unfortunately can't be used in relation to actual probabilities. Idk whether are going for such capability, but I'm just pointing out the trade-off between calculation optimization and having more accurate values that have a closer relation to real-world results. At very most you can say those are pseudo-probabilities.

Right after final formula for the second attempt:

making the difficulty multiplier per length double 2^(1/10.58)=1.067.

This shows me that k has quantifiable meaning which you mention only at this point. Previously when first introducing k you just refer it as some constant. So the use of k like this comes out unexpected, and it would help if you describe the role of this variable prior to using it like that.

@tr3sleches
Copy link

That’s great and all but your probability function should be based on observations and conclusions from the map data itself,
e.g.
no matter what the player skill, the probability of FCing nothing is 100%
Probability to FC decreases with increasing strain, because of increasing difficulty
As probability to FC itself decreases, the change in probability decreases (sharper decrease at 95% than 1%) (relative or % change could be the same but requires further proving and such).

This would be an example of a start to something like what I’m talking about. I would model this to an exponential distribution with a moving failure rate (that’s just me though).

If you build these things up, I think this can really work (much better than the selections above you with that log BS).

You should be able to construct hazard, survival and failure rate functions and have them make sense as well. With this, you’ll just get functions with no reasoning behind them, which defeats the whole point of this exercise.

@joseph-ireland
Copy link

Sorry I've been on holiday with no PC, didn't get a chance to address this stuff.

@abraker95
I agree, that stuff should probably be added in, I didn't want to make it too bloated with unnecessary details but I guess it doesn't have to be too long.
For the properties of k, I though the n^1/k stuff explained that earlier, just put n=2 to get the multiplier for playing the map twice. I don't think calculating that backwards would add anything.

@tr3sleches

As probability to FC itself decreases, the change in probability decreases (sharper decrease at 95% than 1%) (relative or % change could be the same but requires further proving and such).

I don't think this is true, I think some kind of S curve is more natural. The function I've given satisfies the other properties you've listed, and I really think it's the simplest and most natural function describing FC probability.

I would model this to an exponential distribution with a moving failure rate (that’s just me though).

This doesn't make sense to me. You haven't said what is exponentially distributed, or what "moving failure rate" means, and it isn't immediately obvious to me. The probability functions I've been working with are just functions from skill->FC probability, not probability distributions.

Just taking a wild guess, if you're saying to take the CDF of an exponential distribution, f(x) = 1-e^kx, and use that as a generating function, then that would give similar results to the "first attempt" as addressed in my last post. It's something you could try, but I don't think it'd work any better, and it's definitely not any more elegant or well reasoned.

If you want to do something else then that probably won't be closed under multiplication. That doesn't necessarily make it wrong, but it does make everything more complicated and potentially hard to compute efficiently.

With this, you’ll just get functions with no reasoning behind them, which defeats the whole point of this exercise.

There is reasoning behind everything, including the initial assumptions, but they can't be logically airtight like you want, in the same way your little list of axioms isn't, or any list of axioms for that matter. You need to start somewhere, and I think I started somewhere reasonable, as I tried to explain in the previous post, but only proper experimentation can verify or disprove that.

You're always welcome to improve things further if you can, but maybe think things through more and test it before calling other stuff BS.

@tr3sleches
Copy link

I meant much better than Xexxar’s length bonus above, yours was alright with some tweaks
I can now see where you’re coming from for the probability, but I still think the vindication for the probability function needs work. Ex. To make the limit work you have to divide the e^(-x) by n, but when multiplying probabilities together you can’t just do that, it would be 1/(1+e^(-x))^n, which turns out differently.
Honestly, your first failed function will probably work better reasoning wise:
Putting aside your probability function, you can do something like this:
That relationship between your d’s resembles the cumulant generating function of the strains
Take the cumulant function of our strain data. You can use the zeros of itself and its derivatives ( which are just it’s central moments) and it’s end behavior to approximate it’s derivative and use Simpson’s rule or solve the integration to find the cumulant generating function. Use it’s properties to separate the length aspect from the total (just split the sum into num objects * expected value and use log rules to separate out their logarithms and just subtract the log of the num objects from the total). You can then do whatever length stuff you want in th pp section. If I had the dats in vectors I could find the values and run a hypothesis to ensure the mean of the new pp to old stay the same (I.e. not whole increase or whole decrease), but more on that later. Either that, or you can find the weighted avg. strain with that and Use that to find the legacy strain (far easier and works well too).

Best of all, this converts into a continuous function even more easily than the decay weight system (status quo) does.
With the MGFs, you wouldn’t need to mess with the probability functions.

Going forward with your second plan would be much harder (although admittedly more wholistic). You can’t just put that delta time on top, divide by that probability and be done with it. The probability depends on time too (discrete or not). You cannot just put a constant all through there and expect it to be fine. Otherwise why bother? What I meant by moving failure rate is that it changes with strain because higher strains have a higher chance of failing and hence a higher rate of failing. The constant in that (I use the λ one) is your failure rate, which is probability of FC-inf the next note given that you FC’ed the rest. Because of independence. Find that probability (you already can based on the probability per note). Then use it to make your running probability based on strain and time (it’s a little more difficult then I’ve stated here, but plenty doable). Luckily, there are a lot of probability rules that can help.
For this too the integration is doable, since exponents will most likely be involved somehow in the probabilities, because of how going from note to note is and such, you can use the MGF and CGF of the data to ease the integration. Also, you can use Simpson’s rule.

Tl;dr
For first method, use MGF or CGF to separate length from data (add them in discrete or integration in continuous), approximate CGF using data and legacy strain from there. With this, you can make length bonus however you want outside of SR calculation
For second method, you need to both find reasoning your behind probability functions and more adequately calculate your time to FC. Your reasoning for the time to FC is okay, but the execution needs work in both the discrete and continuous case.

Lol this is hella long.

@joseph-ireland
Copy link

I only divide by n in the limit to stop the positive part moving off to infinity,otherwise the limit will be zero. You could also think of it as 1/(1+e^-(x+ln(n))^n which makes it more clear that it's only shifting the curve along the x axis, so it's not changing shape.

I really don't see why you'd think that the amount of time you've been playing could possibly affect the probability you hit a note like you seem to be suggesting. I assumed at the start somewhere that the probability of hitting a note is fixed, I'm not trying to model player fatigue or anything. Probability of hitting a note does affect time to FC, but that's expected, and doesn't falsify any calculations.

You can just put delta time on top, divide by probability and be done with it, and I explained why quite thoroughly IMO. I feel like you're looking at the end result and saying "thats wrong" without looking at the reasoning. You can apply the same reasoning to simpler problems like expected number of coinflips to get n heads and you get the right answer. This guy's derivation almost matches my reasoning, and using his reasoning for this problem gives the same answer as I've got. If you don't like the derivation, tell me what about it you don't like, or try to derive it yourself, don't just say "that's wrong".

The rest of your post seems to just be suggesting loads of alternatives not saying what's wrong with what I've got.

@tr3sleches
Copy link

Even if it is before, the problem still stands. If you have a long pause before a huge diff spike section, it can still be heavily overweighted.
I did look at the reasoning but while the idea of expected time to FC being the sum of is fine, it’s the execution that’s the problem.
I wish I could show you an example.
This would be fine in the 400ms chunk system because all the times are the same.
Looking at the rest of it again, even though this expected part is kinda janky, the reason why is that your probability function isn’t adequately defined through time (should have seen that earlier tbh). Ex. The same strain over twice the time has a lower probability of FCing than that strain over the initial time. That’s why I said a joint distribution between the probability of hitting notes and the time over which they are laid (the easiest way is for time to be exponentially distributed with the strain or difficulty part weighing the failure rate). Then you product those together to get the probability of FCing. I’ll post more on this in a bit

@tr3sleches
Copy link

Here is a fix of the time problem I was talking about:
image

@tr3sleches
Copy link

While I like this way better, if you just take the expected value of p(s) over each time interval (which is p in the paper above), and use that as the p in T_FC that would be fine too.
T_FC = sum(Δt/p)
Since p(s) will likely be exponential because of how difficulty to hit stuff works, this integral solution will be some kind of expint function (if we don’t have that kind of function we can just approximate it using Taylor series or Simpson’s rule).

@joseph-ireland
Copy link

joseph-ireland commented May 10, 2019

What's the expected value of a function? You mean the mean value. i.e. integral between t0 and t1 divided by t1-t0?

I think that doesn't make sense by itself, if there's one section with p(success)=1 and another with p(success)=0, then the mean will be 0.5, but actuall success probability will be 0.

If we say p(s) represents the probability of success for a given strain over 1 second, then the probability of success for a given strain over T seconds is p(s)^T, i.e. the product of p(s) for each second played.

For continuously varying strain, it seems that the natural extension is this:

\lim_{\Delta t\rightarrow 0} \prod_{i=0}^{\frac{T}{\Delta t}} p(s(i\Delta t))^{\Delta t}

That corresponds to the integral I've given already in the "converting strain to difficulty" section (which people didn't like the results from).

@tr3sleches
Copy link

tr3sleches commented May 20, 2019

You’re half right. What integration method did you use? For me, the values turn out just fine.
In direct response to your comment, you are right about the calculation of the arithmetic mean, it doesn’t make much sense in this context. Instead it would be similar to how you have it above, like a geometric mean. The concept doesn’t change, just the computation.
In regards to the addition to the expected time to FC, with each retry, you restart the whole map, so the change in time to FC is map_time*(1/Pr(FC until map_time))*dt
Assuming independence between objects and that the probability is memoryless means that you can take the computed probability of the last object outside the integral.
image

This seems like a humongous number, but if you take out the t from the integral, then you are just restarting from the last object instead of from the beginning of the map, which is not your intention.
Sorry for the long time between responses, been busy.

Edits: Trying to load this goddamn image without the browser restarting and changing some wording. Missed a parenthesis
Edit from my comment in Issue #92: I just saw a flaw in the integration I just did. It's vastly overstated because properties do not really work since this expected value isn't really calculated properly (which is okay since "proper" doesn't really matter given how it's used right now).
Multiplying the integral part by 2/(t_(n+1) - t_n) for each object n fixes it, but it bugs me XD.

@joseph-ireland
Copy link

OK, here's the full derivation of that integral I mentioned earlier:
integral

As I said, I tried it and it wasn't an improvement, so it's not currently in there.

It's possible to follow the integral all the way through the "expected time to FC" calculation, but you've got an error there. Say you've got a 10s empty map, expected time to FC should be 10s for everyone. In that case your integral simplifies down to integral (t dt) from 0 to 10., i.e. not 10s

I think the expected time per play is this:

image

and you can use that to calculate expected time to FC.

It's not going to be significantly different though, since the discrete case is a good approximation. The discrete case is essentially taking failure probability spread between time t_i and t_i+1, and concentrating it all on time t_i. The average failure time of each note will likely shift forward about 100ms in the continuous case, so difference is going to be similar to adding 100ms of time to the start of the map, which doesn't make a big difference.

@tr3sleches
Copy link

I noticed the problem with the squares two and fixed that in the last sentence.
Multiplying each segment by 2/(time_interval). It fixes the error
Taking the t out would give you more pleasant numbers, but would have you restarting the map at each interval instead of the beginning. Think about it, at the end of the map, say you have a diff spike that requires 10 tries, you will have to restart the map 10 times just due to that and going to have to go through the rest of the map, possibly retrying at other points in the map. That is why the t is in the integral. Without the t, if you divide dt by the probability to FC, you are going over that dt 1/Pr(FC) times instead of restarting the map 1/Pr(FC) times which I thought was the idea. Honestly, I think getting rid of the t is better, but then you will be giving up the whole “restart the map when you miss” thing (which I am totally ok with).

Expected time per play would be the weighted average of the times until you fail (obviously). The problem here is that your probability function technically isn’t a valid probability function, so the survival function method of finding the expected value (what you are doing here) won’t work. You might get a number you can work with, but it won’t be the expected time per play.

You have to fix the domain over which these probabilities and such are defined. Just shift your time a bit, sube that can work.

@tr3sleches
Copy link

tr3sleches commented May 20, 2019

Ignore my fix in the last comment. The reason why my integral is off is because attempts is 1 + 1/odds not 1/P, so my integral is int[(1 + (1-P(t))/P(t))*t)*dt]
where P(t) is the probability to FC until time t.
T_FC = int[(1 + (1-P(t))/P(t))*t)*dt]
This simplified to:
T_map -0.5*T_map^2 + int(t/P(t)*dt)

@joseph-ireland
Copy link

You've got mismatched parens, but seems like that actually simplifies to just integral t/P(t) dt no?

@joseph-ireland
Copy link

You've been making a new suggestion every few days for a while now, and none have made sense to me so far, including this one. If I were you and I wanted to make a contribution, I'd put more effort into actually understanding what's there already. Don't trust some intuition-based argument I made? try to make it more rigorous or use a different method to get that result, I don't see how making some other hand-wavey hypothesis and rolling with that until you find an error is a good strategy.

I honestly think, given the assumptions I've made at the start, that most of what's there is just objectively right, and I don't see loads of room for improvements. I've derived things with different methods, still get the same answer, I've checked the reasoning against more common problems, such as number of coinflips to get n heads in a row, I get the right answer, I quite rigorously take the limit as delta t goes towards zero, i get the same result as just converting the sums and delta ts into integrals and dts. Everything just makes sense as is.

TBH I'm quite surprised by how well everything has just worked itself out. I went in completely blind with pure theory and all of the math just clicked into place, and as I added more things to the model people say pp values are improving. From my point of view, random PP changes popped out. I was just pleased that they kind of matched what was already there, but top players say the results are better than before.

And I'm by no means having a go at you here. This stuff was floating around in my head for months before I even wrote anything down, and I've had a lot of time to think about things since then. And it's always easier to understand your own ideas than it is to understand someone elses. I'm just getting tired of all these comments that to me just completely miss the point.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests