-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try harder to plot all areas of overlap #84
Comments
Yes, this is basically related to #67, and I'm definitely open to alternatives to the least-squares function that exists. It is, howver, often impossible to plot all intersections, so it's not possible to achieve a loss function that always ensures all overlaps exist in the plot. Do you have any particular loss function in mind? Maybe just some combination of proportional and absolute loss I guess. |
I just came here to look for a similar workaround, wondering if I could circumvent the loss function to "push harder" to display overlaps. It looks like the magic happens inside |
The part in the code where this is computed is here: Lines 125 to 131 in 625d7ff
So adding new loss functions should be quite easy. |
With this: // compute loss between the actual and desired areas
// original is minimizing sum of square of area difference
// modification:
// Let x_ be the estimate of x -- the region area
// given 0 < x <= 1, x_ >= 0, with sum(all x) == 1
// A perfect fit would have
// x = some constant * x_ for every x_,x pair
// Rewrite to
// x / x_ = some constant r, r > 0
// The estimated r_ is sum(x)/sum(x_)
// If we minimize sum of square of (x/x_ - r_) is not good as each fit
// has different r_. So better use the expression (x/x_ - r_)/r_,
// So loss is: sum of square ((x/x_ - r_)/r_)
//
// Note that x_ can be zero. In that case, a small value will be used instead
// This tweat is by design to discourage/remove empty region in the
// final solution. This is why x/x_ is used instead of x_/x
//
// [[Rcpp::export]]
double optim_final_loss(const std::vector<double>& par,
const std::vector<double>& areas,
const bool circle)
{
const auto small_value = 1e-10/areas.size();
auto fit = intersect_ellipses(par, circle, false);
auto sum_areas = std::accumulate(areas.begin(),areas.end(),0.0);
auto sum_fit = std::accumulate(fit.begin(),fit.end(),0.0);
auto x = areas; std::transform(x.begin(),x.end(),x.begin(),[sum_areas](double x){ return x/sum_areas; });
auto x_ = fit; std::transform(x_.begin(),x_.end(),x_.begin(),[sum_fit](double x){ return x/sum_fit; });
auto r_ = std::accumulate(x.begin(),x.end(),0.0)/std::accumulate(x_.begin(),x_.end(),0.0);
// now adjust the tiny values in x to small_value * r_ so that if the we get r_ when x_ is close to 0
// Also keep this loss function continuous
std::transform(x.begin(),x.end(),x.begin(),[small_value,r_](double a)->double{return std::max(a,small_value * r_);});
// now adjust x_
std::transform(x_.begin(),x_.end(),x_.begin(),[small_value](double a)->double{return std::max(a,small_value);});
auto ratios = x; std::transform(ratios.begin(),ratios.end(),x_.begin(),ratios.begin(),std::divides<double>());
std::transform(ratios.begin(),ratios.end(),ratios.begin(),[r_](double r)->double{auto diff=(r-r_)/r_; return diff*diff;});
return std::accumulate(ratios.begin(),ratios.end(),0.0);
} it seems to improve on plots with 4 sets with respect to plotting all areas, but no good results with 5 or more sets. My test cases are randomly generated. I am not sure if there is no much better solution or the solver cannot find it.
I am working on this as a low priority task, so I am not ready to go deep into the solver yet. |
It seems that eulerr simply tries to minimize the residuals when laying out the plot. For plot with many areas of overlap, some of which can be small, eulerr thus doesn't plot some of these small areas of overlap, which can be problematic.
It might make sense to add a mode where plotting all areas of overlap is given priority over making all area sizes as close as possible to the data?
Here's an example:
The
B&C&D
overlap is completely missing from the plot:(A is Path1, B=2, C=3, D=Volunteers)
The text was updated successfully, but these errors were encountered: