-
Notifications
You must be signed in to change notification settings - Fork 45
lengthBonus as a Summation of Difficulty Instead of Object Count #64
Comments
It's true that short maps with DT (+HD) are somewhat overweighted, but I think the current math nerfs them a bit too hard. |
I like the idea of this rebalance. Researched very well. I pray that peppy will actually see this, and not ignore it like he does with actually good suggestions. |
How are you using lengthBonus to determine the difficulty of the map? I.e. how are you factoring it into the star rating? Furthermore, if it's also supposed to be a completely separate difficulty attribute alongside aim/speed, is there a similar formula for pp that you can provide? Where total strain is the summation of exponentiated strains, and difficulty value is the summation of weighted strains. As you can see I've just added it into the final SR, but idk if that's what you intend. Also, you mention stars in the code but the sum of weighted strains as per your formula. Which one is correct? Because SR is effectively the sum of scaled weighted strains. Also one thing I noticed as part of your formula is that you weigh the first strain by |
lengthBonus will not affect the SR of the map, it'll only affect the pp of it |
What about this map? https://osu.ppy.sh/beatmapsets/429411/#osu/939197 |
Ah I see, so this is a pp processor implementation that essentially requites two new attributes. |
Is there any way to adjust the bonus relative to the length of the whole map? Maybe changing the 2 from 2+log into a constant or something similar. Edit: just to be clear, I mean changing it from 2+log( to a+log(, where a is a constant like b, c, and d.
I personally have no issues with this result, but I think the ability to fine-tune the extent to which this happens would make the algorithm much better for the long term. You mentioned that the algorithm buffs normal/hard maps too much, so if I understand the purpose of that 2 correctly, changing this value could help fix the problem. |
The effect of your first point you're observing was entirely intended. As the bottom function you display includes a 3 (aka the new hardest point of the map), the star rating increases and would cause the length bonus to decrease. This is designed so that maps with difficulty curves that spike don't get overly high length bonuses because the sections compared to the diff spikes would be so easy that they would not affect the difficulty to FC for the player capable of FCing the spike. As for the second property, a short map obviously would receive more bonus for the section being added. If it didn't, then length would scale infinitely and maps like The Unforgiven would give 1000pp. I don't see why these properties would be desired. |
While I agree that the length bonus itself does not _absolutely need_ to be monotonic when adding hit objects to the end or beginning of a map, being so would ensure that also the final pp value itself is monotonic, which _must_ be the case. Otherwise the algorithm is broken.
If additional stuff is added to the end or beginning of a map, then the pp for SSing it should _never, ever_ decrease.
If the length bonus is _not_ monotonic, (as your suggested algorithm is not), then you need to be extra careful to make the final pp value still is. Ideally with a formal proof, because this absolutely must not go wrong.
|
Second to Tom94. The motivation that I hope length bonus function to be monotone and submodular is because in that case, the final pp function is naturally monotone and submodular. Now let me elaborate submodular. |
The only case in which the additional stuff being added to the end of the map would cause the |
Would it be possible to scale strain for each subsection off of the maximum aim/speed up to the subsection compared to the aim/speed of the current subsection? It'd allow for maps with a difficulty spike in the middle to still maintain the amount of pp from the beginning of the map to that point, but it'd lessen the pp gained afterwards (but still be greater than the amount from the beginning to the diff spike) |
Is it not the case where length is already implicitly accounted for in the algorithm (since you're summing the strains + a decay factor)? Looks like to me, you could reduce the decay if you wanted to explictly benefit longer maps and remove the length bonus altogether. This would require re-tuning of all parameters though, but might be the best way to go? |
I personally suggest looking at the class of Archimedean t-norm or Archimedean t-conorm functions. I think it is easy to turn a t-conorm function to a set function, and we will naturally have monotonicity. Apart from that, we will also naturally have null-additive. But it is a large class though and many of such functions are too trivial. So it is just a personal suggestion. |
Is there a chance that this will be implemented in Catch too? I feel like it might partially fix maps like Image Material |
Any updates? |
When? |
When it's ready. |
Hello, Good news! Updates! While there is still balancing and testing to be done, I believe I have an amended algorithm that should help solve the following issues. Separation of lengthBonus Universal Star Rating Balance Spikes Do Not Lower PP Expect an update to the original post sometime in the future (soon) with the amended algorithm and a new data supporting it's implementation. Thank you for reading, Xexxar (For those who care about the math, |
I actually support this change, though I'm doubting some of the vague parts of your latest comment, such as
I'm not sure how the bonus balances out. If the easier map gets the bonus in the previous algorithm, that would make it an issue. However, being independent of the star rating doesn't sound like it will fix it? |
The sr already scales the pp itself, and therefore the absolute increase in pp will be higher for harder maps, the percentage stays the same. The percentage shouldn't be different as it is just meant to reflect consistent difficulty, no matter whether the map is high or low sr |
Great to see we're getting updates on this! :) |
Here's some sample data with this change: Original (reference, no changes): This change + speed bonus Note that I've been told the length bonus and speed bonus changes are supposed to be included together. |
@abraker95 just read through your example in more detail, I really can't see why that format is any better. You're basically defining a function step by step that you can tweak so it looks about right to you. There's nothing special about it, it has no interesting mathematical properties other than its limits. As far as I'm concerned, that's no different to the current PP system which was created in the same way. There's no way to statistically test anything either, so everything is just hypothesis with no test possible. I've defined a function which does make verifiable predictions, and does have interesting mathematical properties, then I go on to show how those properties improve the PP system. It's still an arbitrary choice though, and I don't think there's any possible sound justification without collecting a lot of data and analysing it. I don't have a lot of data. I do have the existing PP system though, which, like it or not, is actually not bad, so I don't think it's a terrible idea to use it as a benchmark. |
@joseph-ireland I vent very verbose with my process of explaining. While it helps greatly to understand why the functions are the way they are, and gives insight into model construction, you don't really need to be that detailed. Just detailed enough to understand where the whole thing comes from. Interesting properties and able to test predictions aside, if you can't trace the logic that comes to the conclusion of that model, then that sends me red flags that the model is wrong. tr3sleches seems to be more lenient in regards to that, and he ignored logical traceability and went ahead to state faults within your model. I usually don't bother continuing that far if the models are created from thin air. The existing pp system generally works, but is very, very faulty down to the core and needs to be redone from scratch. I don't mind that people are adding on to it, but I do mind that further additions be not created from thin air and trial and error like much of the existing pp formula. |
He eroneously (and unceremoniously) stated faults that were at most a miscommunication (I believe). And there is no justification for the skill->hit probability function except that it looks about right, has desirable properties, and gives good results. In the same way that theres no logical justification for most of the PP system amd there's no logical justification for whatever it is that you posted earlier except that it looks about right (maybe) and gives good results (maybe). What exactly do you want to see? And there is logical justification for almost everything else I've written. As far as I'm concerned, nothing we can do will ever be fully "right", since there'll always be more factors to model. All that matters is if it's a better model than the current best, and the only real way to judge that is to look at the results. I'll admit that what I've done isn't good enough to publish in a journal, for that I'd need to get lots of data and look at the stats to see how accurate the model is. But I'm not publishing this in a journal, and the fact that this is even possible is a big step forward from anything we've already got including anything you've done. To be honest, I think that the only reason you are so opposed to this is because it mostly achieves something you wanted to do yourself but haven't succeeded so far because you set your standards too high and wanted to start again. All your comments reek of second-system syndrome and not-invented-here syndrome, both of which are a big problem for inexperienced engineering teams and present a huge barrier to progress in any system. Also your standards aren't even how things are done in academia. Things usually go from vague intuition to definitions, then theorems, and finally evaluation. An analogy to your position is to see "assuming that the population is normally distributed" at the start of a paper, and then just discard the whole thing because they haven't logically deduced that the population is actually normally distributed, even if the results show that the population is in fact approximately normally distributed. I'm not interested in going round in any more circled regarding this. |
Here's another analogy: newton's law of gravity. Is there any logical justification for that? Not really it just seems to work, and experiments show that it does. It's actually completely wrong, it doesn't take a lot of things into account. General relativity is an improved model, but it's still wrong, it doesn't take quantum mechanics into account. NASA still used newtons laws to get to the moon though. |
There are always more factors to model, but you can be correct by not factoring all factors by saying they are outside the scope of what you are focusing on. However, the stuff you presented does not derive from sound logic and is therefore very likely not correct. There is a chance it can be correct, but the odds of landing on the correct formula blindly are basically none. Which is why there needs to be justification for the model you are proposing via an explanation of what you need the model to behave like instead of coming across something that behaves something similar to what is desired. All the borderline flaming aside, I am not speaking against you, I am simply trying to help fix the proposal. I may address the rest in dm's otherwise this will devolve into off-topic. |
You keep saying "wrong" and "not correct" but it's a model, it's not supposed to be correct. The only thing it can be is inaccurate. I'm treating the strains themselves as a black box, they're just a number, I've not done any reasoning on their behaviour other than bigger numbers = more difficult. I think you'd probably agree that it'd be hard to give them any "correct" theoretical significance other than an arbitrary measure of difficulty, so I don't see how you can say anything derived from them is right or wrong. What I can say is that if the tail behaviour of the generating function f(s) is regular in some way, then it's likely to be closely related to the "first attempt". Assuming the tail behaviour of f(s) is similar to (1-x^-k), then the function I've given is a good approximation for all cases we care about, and is easy to calculate. Here's a graph showing the difference between two different models with the same tail behaviour for a 1.5 star map. They match really well, and they only match more as star rating increases. If the tail behaviour is similar to 1-e^-kx, then you get the "first attempt". Any other tail behaviour can be calculated by just taking p_u(d)(u(x)) for some substitution function u. From my initial guess, I took u(x) = log(x) because it seemed to work, and because it's simple and the end result has nice mathematical properties. The actual tail behaviour will depend on the strains, which are just arbitrary difficulty measures, so I don't see how you can possibly get anything more "correct" than an educated guess. |
Honestly, you are better off summing and or integrating (depending if you are doing this continuously or discretely) for the expected time to FC and normalizing that to find skill required. I personally solved it using the MGF function of the data it’s respect to player skill because these can be easily approximated with said data (all you have to do is find the moments). |
I am already summing for the expected time to FC. The formula you've given isn't correct, the expected number of times you attempt a note while going for an FC is equal to 1/(probability you hit that note and every note after it), since every time you miss a note later than it, you need to retry and play it again. normalizedTimetoFC = sum(delta_time/Pr(success of all strains s(t) from time t to t_last|player skill x)) which is what I've already got. I don't really see how the MGF will help solve for player skill, the distribution of time to FC has a lot of parameters, and it changes when you change player skill. We've already calculated the expectation, and we don't need to find variance or any other moments, so I don't see any use for the MGF. The actual equation to solve looks something like this Wolframalpha is better at solving stuff than me, and even with just 4 terms, it chooses to solve it numerically, so I'm not too hopeful for an exact solution, although if you can find one, and it can be computed faster than the numerical solution, then fantastic. |
If you do that formula then difficulty at the end will be worth more than difficulty at the beginning, which doesn’t make sense to me. I get how that can be used to find the expected time to FC, but it seems counter productive to have impact on difficulty change depending on where difficulty is in the song (looks like I’m assuming independence now lol). |
It does make sense, you can just spam retry a diff spike at the start until you get it, but you can't spam retry a diff spike at the end of a 10 min map. |
I’m talking about in reference to difficulty calculation. As long as the values are the same, anything pertaining to difficulty should not change. Having a diff spike at the beginning or the end should not make the map register as easier or harder if everything else stays the same. Adding objects could potentially decrease your difficulty, which should almost never happen (it would have to be some outrageous circumstances for this to happen). |
Adding objects won't decrease difficulty, it can only increase time spent waiting around and increase the probability of failure, both make difficulty increase. Moving a huge diff spike from the end of a 10 minute map to the start absolutely makes it easier to FC by quite a bit and the calculation reflects that. |
It needs to be more than just a guess that ends up with a coincidence. Please come up with a reason to use log(x) prior to examination of the mathematical properties that result from it. I want to know how you came up with That (e^-e)^-x function you made up behaves very similar to an S-Curve, a function for which uses and properties are known well enough to understand what it is being used for: I want to see where your formulas come from. They have to derive from some observation that concludes the curve changes in a particular way that rules out any other function. |
The graphs you've posted there are the generating function f(s). For difficulty to be fully characterised by a single number, and for the system to be closed under multiplication, all other functions p_d(s) need to be a power of that function. Try raising both of those to the nth power and see what happens (hint, theyre almost exactly the same - the difference is that my one doesn't change shape for very small n). The way I originally found e^-e^-(x-d) was by calculating the limit as n->infty of 1/(1+(e^-x)/n)^n, since I identified that as the shape produced by a whole bunch of generating functions I tried, heres a visualisation of that from my original post. It is precisely the s curve I intended. All other S curves can be transformed into that one using an increasing bijection u(x), so you can easily perform all the calculations for any S curve without any major code changes. I originally chose log(x) because its simple, and because it maps R+ into R, and the strains are in R+. I didn't find out until later how nice the interpretation was and how well it reflects the current PP system. Any theoretical justification for a choice of any u(x) would have to reference properties of the strains themselves, and I don't see a way forward there. |
That’s great and all but your probability function should be based on observations and conclusions from the map data itself, This would be an example of a start to something like what I’m talking about. I would model this to an exponential distribution with a moving failure rate (that’s just me though). If you build these things up, I think this can really work (much better than the selections above you with that log BS). You should be able to construct hazard, survival and failure rate functions and have them make sense as well. With this, you’ll just get functions with no reasoning behind them, which defeats the whole point of this exercise. |
Sorry I've been on holiday with no PC, didn't get a chance to address this stuff. @abraker95
I don't think this is true, I think some kind of S curve is more natural. The function I've given satisfies the other properties you've listed, and I really think it's the simplest and most natural function describing FC probability.
This doesn't make sense to me. You haven't said what is exponentially distributed, or what "moving failure rate" means, and it isn't immediately obvious to me. The probability functions I've been working with are just functions from skill->FC probability, not probability distributions. Just taking a wild guess, if you're saying to take the CDF of an exponential distribution, f(x) = 1-e^kx, and use that as a generating function, then that would give similar results to the "first attempt" as addressed in my last post. It's something you could try, but I don't think it'd work any better, and it's definitely not any more elegant or well reasoned. If you want to do something else then that probably won't be closed under multiplication. That doesn't necessarily make it wrong, but it does make everything more complicated and potentially hard to compute efficiently.
There is reasoning behind everything, including the initial assumptions, but they can't be logically airtight like you want, in the same way your little list of axioms isn't, or any list of axioms for that matter. You need to start somewhere, and I think I started somewhere reasonable, as I tried to explain in the previous post, but only proper experimentation can verify or disprove that. You're always welcome to improve things further if you can, but maybe think things through more and test it before calling other stuff BS. |
I meant much better than Xexxar’s length bonus above, yours was alright with some tweaks Best of all, this converts into a continuous function even more easily than the decay weight system (status quo) does. Going forward with your second plan would be much harder (although admittedly more wholistic). You can’t just put that delta time on top, divide by that probability and be done with it. The probability depends on time too (discrete or not). You cannot just put a constant all through there and expect it to be fine. Otherwise why bother? What I meant by moving failure rate is that it changes with strain because higher strains have a higher chance of failing and hence a higher rate of failing. The constant in that (I use the λ one) is your failure rate, which is probability of FC-inf the next note given that you FC’ed the rest. Because of independence. Find that probability (you already can based on the probability per note). Then use it to make your running probability based on strain and time (it’s a little more difficult then I’ve stated here, but plenty doable). Luckily, there are a lot of probability rules that can help. Tl;dr Lol this is hella long. |
I only divide by n in the limit to stop the positive part moving off to infinity,otherwise the limit will be zero. You could also think of it as 1/(1+e^-(x+ln(n))^n which makes it more clear that it's only shifting the curve along the x axis, so it's not changing shape. I really don't see why you'd think that the amount of time you've been playing could possibly affect the probability you hit a note like you seem to be suggesting. I assumed at the start somewhere that the probability of hitting a note is fixed, I'm not trying to model player fatigue or anything. Probability of hitting a note does affect time to FC, but that's expected, and doesn't falsify any calculations. You can just put delta time on top, divide by probability and be done with it, and I explained why quite thoroughly IMO. I feel like you're looking at the end result and saying "thats wrong" without looking at the reasoning. You can apply the same reasoning to simpler problems like expected number of coinflips to get n heads and you get the right answer. This guy's derivation almost matches my reasoning, and using his reasoning for this problem gives the same answer as I've got. If you don't like the derivation, tell me what about it you don't like, or try to derive it yourself, don't just say "that's wrong". The rest of your post seems to just be suggesting loads of alternatives not saying what's wrong with what I've got. |
Even if it is before, the problem still stands. If you have a long pause before a huge diff spike section, it can still be heavily overweighted. |
While I like this way better, if you just take the expected value of p(s) over each time interval (which is p in the paper above), and use that as the p in T_FC that would be fine too. |
What's the expected value of a function? You mean the mean value. i.e. integral between t0 and t1 divided by t1-t0? I think that doesn't make sense by itself, if there's one section with p(success)=1 and another with p(success)=0, then the mean will be 0.5, but actuall success probability will be 0. If we say p(s) represents the probability of success for a given strain over 1 second, then the probability of success for a given strain over T seconds is p(s)^T, i.e. the product of p(s) for each second played. For continuously varying strain, it seems that the natural extension is this: That corresponds to the integral I've given already in the "converting strain to difficulty" section (which people didn't like the results from). |
You’re half right. What integration method did you use? For me, the values turn out just fine. This seems like a humongous number, but if you take out the t from the integral, then you are just restarting from the last object instead of from the beginning of the map, which is not your intention. Edits: Trying to load this goddamn image without the browser restarting and changing some wording. Missed a parenthesis |
I noticed the problem with the squares two and fixed that in the last sentence. Expected time per play would be the weighted average of the times until you fail (obviously). The problem here is that your probability function technically isn’t a valid probability function, so the survival function method of finding the expected value (what you are doing here) won’t work. You might get a number you can work with, but it won’t be the expected time per play. You have to fix the domain over which these probabilities and such are defined. Just shift your time a bit, sube that can work. |
Ignore my fix in the last comment. The reason why my integral is off is because attempts is 1 + 1/odds not 1/P, so my integral is |
You've got mismatched parens, but seems like that actually simplifies to just |
You've been making a new suggestion every few days for a while now, and none have made sense to me so far, including this one. If I were you and I wanted to make a contribution, I'd put more effort into actually understanding what's there already. Don't trust some intuition-based argument I made? try to make it more rigorous or use a different method to get that result, I don't see how making some other hand-wavey hypothesis and rolling with that until you find an error is a good strategy. I honestly think, given the assumptions I've made at the start, that most of what's there is just objectively right, and I don't see loads of room for improvements. I've derived things with different methods, still get the same answer, I've checked the reasoning against more common problems, such as number of coinflips to get n heads in a row, I get the right answer, I quite rigorously take the limit as delta t goes towards zero, i get the same result as just converting the sums and delta ts into integrals and dts. Everything just makes sense as is. TBH I'm quite surprised by how well everything has just worked itself out. I went in completely blind with pure theory and all of the math just clicked into place, and as I added more things to the model people say pp values are improving. From my point of view, random PP changes popped out. I was just pleased that they kind of matched what was already there, but top players say the results are better than before. And I'm by no means having a go at you here. This stuff was floating around in my head for months before I even wrote anything down, and I've had a lot of time to think about things since then. And it's always easier to understand your own ideas than it is to understand someone elses. I'm just getting tired of all these comments that to me just completely miss the point. |
Hello,
In collaboration with MBmasher, I present a new solution to the length problem.
The Problem
Currently the algorithm addresses length as the total object count of a map. While this idea certainly has merit, it has several problems. For example, a 2 minute stream map that has lots of 1/4th rhythm can be treated as longer than a 5 minute jump map simply due to the object count being more dense. In addition, the difficulty of these objects are not assessed. A map that has a relatively consistent difficulty would get the same length bonus as a map that has difficulty spikes assuming the object count and other factors were equal.
Thus, something new is needed.
The Approach / The Math
In order to properly balance length, a system that assesses the aggregate difficulty of a map respective to the star rating is needed. To make this a bit more visual for those who aren't seeing where I'm going with this, here are some graphs that display the difficulty curves of both speed.cs and aim.cs
http://puu.sh/BNTOR/cf7fa8a2b2.png
http://puu.sh/BNTPh/cd6a05b751.png
With these images, it should be clear what I intend to do. By summing up the area under the curve, we can give reward for length not only in terms of real time, but also balance for the difficulty of the section. Without any more delay, the math for this suggestion is shown below.
Formal: http://puu.sh/BNTWD/e5090554ca.png
Code:
Where
strain in strains
refers to the sum / calculation for bothaim_total
andspeed_total
and starsIn essence, this function takes the log of the aggregate difficulty with the log base being the star rating of the map (as I'm using the formula
log_x(y) = log (y) / log(x)
). This makes it so length is rewarded regardless of the star rating. Some people may have some questions as to why I have2+log
etc. but for the most part these constants are to prevent huge adverse scaling for low SR maps. With my testing that I have done, the following values have been tested and I believe produce satisfactory results.I will say that the only major issue I have with this method currently is the slightly too large buff for normal and hard maps (because these type of maps don't spike at all, so they are buffed in comparison). That being said, the issues are minor compared to the benefits.
Some Results
Ah yes the juicy part that everyone cares about. I'm a bit concerned with the calculator that we developed to do the testing as there are a few anomalies, so some values might be slightly inaccurate. (I apologize for the small sample size). Please note that these values are still subject to change.
Within Temptation - The Unforgiven[Marathon] https://puu.sh/BNUBj/255ac14540.png
Fairy FORE - Vivid[Insane] https://puu.sh/BNUBM/583c3a33b5.png
Will Stetson - Harumachi Clover [Oh no!] +DT http://puu.sh/BNVvK/64b1959c7e.png
Will Stetson - Harumachi Clover [Oh no!] https://puu.sh/BNUFw/d1aac4c924.png
VINXIS - Sidetracked Day[THREE DIMENSIONS] +HR http://puu.sh/BNVB6/c2378d6cb4.png
MuryokuP - Sweet Sweet Cendrillon Drug[Normal] https://puu.sh/BNVbx/39f1d87226.png
MuryokuP - Sweet Sweet Cendrillon Drug[Cendrillon] https://puu.sh/BNVbe/62217919a5.png
Imperial Circus Dead Decadence - Uta[Himei] +HR https://puu.sh/BNUWU/ecdeab1163.png
Yamajet feat. Hiura Masako - Sunglow[Melody] http://puu.sh/BNVFx/3894582b87.png
Yamajet feat. Hiura Masako - Sunglow[Melody] +DT http://puu.sh/BNVGz/238f09cb50.png
GYZE - HONESTY[DISHONEST] +HR https://puu.sh/BNUVB/2ccf2a7da0.png
GYZE - HONESTY[DISHONEST] http://puu.sh/BNVIu/93fc534305.png
Fujijo Seitokai Shikkou-bu - Best FriendS (GoldenWolf)[Extreme] http://puu.sh/BNVRr/df180fdef9.png
Fujijo Seitokai Shikkou-bu - Best FriendS (GoldenWolf)[Extreme] +DT http://puu.sh/BNVSk/042ead7be8.png
The astute among you all might have realize that DT universally causes the lengthBonus to be reduced. This is indeed intended and one of the best features of this algorithm: lengthBonus varies across mods. In fact, DT causes the bonus to lower and HT causes the bonus to raise universally.
Conclusion
While these values aren't all perfect, I believe them to be leaps and bounds better than the current system. In the coming days MBmasher will likely release an updated calculator for those who are interested in testing some values on their own.
With that all said, I hope you all are interested in pursuing this change. Hopefully an official PR can be produced and testing can begin on the official code soon. Thank you for reading and remember to drop a 👍 if you like the change!
The text was updated successfully, but these errors were encountered: