-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to merge multiple pipelines automatically in query when pipeline parameters are involved #145
Comments
@gaow Can you create a small DSC example that illustrates this behaviour so I can inspect the output? |
@pcarbo sure! that'd also be the first thing I should do. I'll try to do this later today and if it is straightforward enough to fix I'll just fix it. Otherwise I'll upload the example here, or add to our test, for you to inspect it and discuss desired behavior. |
@gaow Thanks. It would be great if I could first understand this example before you attempt to fix it. |
@pcarbo It has not been straightforward so cannot fix it earlier this week. I'm still working on it. I'll skip irrelevant details and discuss what I've got so far and see how to finalize it. Currently, after running separate queries I end up with this table A:
which should further be processed to this table B:
to present it more concisely. I currently struggle with a reliable implementation to go from table A to B. Previously I relied on SQL |
Another question is when the two parameters are different in length (as described here), or even in nature (see below), should we be merging it at all? For example:
does it make sense to present it as:
? or it makes more sense to leave it without merging? Seems keeping as is makes sense. I think it is a difficult question to answer without knowing the context of the DSC -- for example for me to compare two fine mapping tools I'd like to set
From the engineering point of view I wonder if it should be a post processing step of |
I think i don't understand this issue well enough.
But let me say my understanding/suggestion is that the output of dsc_query
should be one row per
pipeline instance. Maybe discuss in person?
Matthew
…On Sun, Jun 17, 2018 at 4:37 PM gaow ***@***.***> wrote:
Another question is when the two parameters are different in length (as
described here)
<#145 (comment)>, or even
in nature (see below), should we be merging it at all? For example:
DSC aa.n bb.p
0 1 1 NA
1 1 2 NA
2 1 3 NA
3 1 NA 0.1
does it make sense to present it as:
DSC aa.n bb.p
0 1 1 0.1
1 1 2 NA
2 1 3 NA
? or it makes more sense to leave it without merging? Seems keeping as is
makes sense.
I think it is a difficult question to answer without knowing the context
of the DSC -- for example for me to compare two fine mapping tools I'd like
to set --max-causal parameter for both of them to say 1,2,3 and have them
merged to compare 1 vs 1, 2 vs 2 and 3 vs 3 -- makes sense to merge. But if
I set the other one to 4,5,6 then it does not make sense to merge, eg:
DSC method1.max_n method2.max_n
0 1 1 4
1 1 2 5
2 1 3 6
From the engineering point of view I wonder if it should be a post
processing step of dsc-query result done at R level, considering whether
or not to merge is context dependent and we cannot easily tell it by
guessing from parameter names or values.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#145 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xc9eOmzaM-nJvNRRpPS5ds87czkuks5t9swwgaJpZM4Ukzj5>
.
|
Yes this is current behavior. But sometimes it is convenient to further merge the outcome for a more compact and easier to use table -- only if we know how to do this appropriately! Sure let's talk about it on Wednesday. |
once you have one row per pipeline you can merge using the many available
tools in R.
No need to do that in dscquery
…On Mon, Jun 18, 2018 at 8:59 AM gaow ***@***.***> wrote:
But let me say my understanding/suggestion is that the output of dsc_query
should be one row per
pipeline instance.
Yes this is current behavior. But sometimes it is convenient to further
merge the outcome for a more compact and easier to use table -- only if we
know how to do this appropriately! Sure let's talk about it on Wednesday.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#145 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xQEAPenpVfqKiJQsd4VaWfWHXKT4ks5t97I-gaJpZM4Ukzj5>
.
|
@gaow Like Matthew I'm confused by your example---I don't see how dsc could automatically know when to merge, and when not to merge. It seems that this would require an understanding how the parameters are used in the code, which would require a human-level understanding of the benchmark. |
Sorry I am just looking at this discussion. @gaow, @pcarbo, @stephens999 I think the problem is actually that we don't get one row per pipeline run. That would be a fine solution. I made an example illustrating the behavior. I am attaching the dsc file Basically the problem is this. I run a pipeline simulate*(A, B)*summary where method B has three possible parameters it can run with. I run three replicates. Extracting method A only results I expect a data frame with three rows -- this is observed. Extracting method B only results I expect a data frame with 9 rows (one row per parameter value per replicate) -- this is also observed. Extracting both, the ideal solution would be to also have nine rows. This might be complicated however and it doesn't easily extend to when both methods have multiple parameters. One solution described by @gaow would be to have 12 rows with NAs for method A on nine of them and NAs for method B on three of them. However, what is observed is a data frame with three rows and only the first parameter value for method B is extracted. Another also possible solution would be to go to wide format where the results for method B give three columns each corresponding to a different parameter value. |
Thank you @jean997 for clarifying it! Would you mind updating to the current
Would you verify if it is the case? And ideally, yes we want to be able to reliably achieve this:
But i'm having troubles with this, because it is logically difficult to properly determine and merge those Since for simple case such as yours, the merging would make sense, I'm now more inclined to add a step in the R utility to post process it depending on the user's need. RE the |
@jean997 I fixed the
You'll need to install the current @pcarbo Your previous assessment is pretty much what happens, but still to get a more concrete idea you can update DSC and run Jean's example and see for yourself. This is a pretty good example that Jean puts together. The DSC part is just
|
Thanks @pcarbo for confirming it! To motivate my claim that we should provide some post-processing utility, I updated my post above with an example from Jean's data that we actually may want to merge. Posting it again here:
is nicer to be presented (or post-processed to):
This was initially also requested by @jean997 via slack a while ago and I responded by doing this automatic merging assuming there is always equal number of |
i'll repeat my resistance to helping users with what is basically a
programming task. Up
to them how to deal with results once they are in a dataframe.
However, this process can be made easier by using dsc in particular ways.
here the "problem" is created by not defining a group mean_est: (mean,
huber_mean)
With that group suitably defined I believe you would have a columns that
look somethign like:
DSC, mean_est, k, est_mean
1, huber_mean, 0.5, xx
1, huber_mean, 1.5, xx
1, huber_mean, 5, xx
1, mean, NA, xx
etc
Put another way I agree that:
```
dscout
DSC mean.est_mean huber_mean.est_mean
1 1 0.008447484 NA
2 2 0.075343752 NA
3 3 0.024550986 NA
4 1 NA 0.009774250
5 2 NA 0.061133496
6 3 NA 0.025887638
```
is not how we want things. We want a single column "est_mean" and another
column
that contains the method used (mean/huber_mean).
I believe the use of a module group would have acheived this?
Matthew
...
…On Wed, Jun 27, 2018 at 4:44 AM gaow ***@***.***> wrote:
Thanks @pcarbo <https://github.com/pcarbo> for confirming it! To motivate
my claim that we should provide some post-processing utility, I updated my
post above with an example from Jean's data that we actually may want to
merge. Posting it again here:
> dscout
DSC mean.est_mean huber_mean.est_mean
1 1 0.008447484 NA
2 2 0.075343752 NA
3 3 0.024550986 NA
4 1 NA 0.009774250
5 2 NA 0.061133496
6 3 NA 0.025887638
is nicer to be presented (or post-processed to):
> dscout
DSC mean.est_mean huber_mean.est_mean
1 1 0.008447484 0.009774250
2 2 0.075343752 0.061133496
3 3 0.024550986 0.025887638
This was initially also requested by @jean997 <https://github.com/jean997>
via slack a while ago and I responded by doing this automatic merging
assuming there is always equal number of NA in each missing block -- an
assumption that has led to issues reported in this ticket. Now I think we
all agree not to auto-merge for reasons discussed previously. But should we
provide utility functions for post processing it like above, if users deem
it proper? @jean997 <https://github.com/jean997> do you have some utility
functions in R that you use for this sort of merger before I responded to
your slack request?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#145 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xYaaoE1in0POPPBsE6FN7r1s4CzUks5uAv-sgaJpZM4Ukzj5>
.
|
Thank you @stephens999 ! The reasons I did not advertise too much about @jean997 indeed for this particular example, what Matthew has proposed can be achieved already dynamically at query level, that is:
and you get:
Not exactly what you asked for, but is viable alternative. Of course we might need to document it better, which I just did:
along with minor format adjustment. Please update |
@gaow @stephens999 @jean997 I agree that groups is the best way to resolve this. Here's my implementation of Matthew's suggestion: simulate: R(dat <- rnorm(n = 1000,mean = 0,sd = 1) +
rbinom(n = 1000,size = 1,prob = 0.05) *
rexp(n = 1000,rate = 3))
$dat: dat
unbiased_mean: R(m <- mean(x))
x: $dat
$est_mean: m
huber_mean: R(library(MASS); m <- huber(x,k = k)$mu)
x: $dat
k: 0.5, 1.5, 5
$est_mean: m
sq_error: R(sq_error <- est^2)
$sq_err: sq_error
est: $est_mean
DSC:
define:
mean: unbiased_mean, huber_mean
run: simulate * mean * sq_error
replicate: 3 For example, dscrutils::dscquery(dsc.outdir = "example2",
targets = c("mean.est_mean","huber_mean.k","sq_error.sq_err")) returns the following data frame:
|
I think a group is clearly the right answer for this example. This came up for me initially in a context where groups would not have been the right choice because methods A and B produce totally different summaries. I think @stephens999 is right that that is probably a situation where the user needs to solve their own problem but dsc also shouldn't produce non-sensical output if it is provided with a poorly thought out query. I think a lot of users will do what I did first before they figure out that they need to query the results in two steps and then merge. One option would be to just have @pcarbo In your example above, should the first three rows have |
I dunno how you would solve this---if you use a programming language to develop poorly thought-out code, you will get poor results! This is a human problem, not a programming language problem. :)
@jean997 You are absolutely right. @gaow Any idea why the "mean" column only shows |
I just had a chance to re-run my example using the new version of
I think the last nine rows of this data frame should not appear? |
@jean997 Yes, this seems like another bug or undesirable outcome. |
I might be misunderstanding but this looks correct to me. For each pipeline
it has extracted the variables you asked for where it can, and filled
others with na.
The way to exclude those rows is to filter on pipelines I think. That is
run the query only on a subset of pipelines. I'm not sure what filter
facilities we currently have in place though.
Matthew
…On Wed, Jun 27, 2018, 18:18 Peter Carbonetto ***@***.***> wrote:
@jean997 <https://github.com/jean997> Yes, this seems like another bug or
undesirable outcome.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#145 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xQUnpe314PVDGJFnoMgBAqvXcv31ks5uA75egaJpZM4Ukzj5>
.
|
Yes exactly @stephens999 is correct. And this is when a row filtering is needed, via
(need to verify) But I was mostly concerned that if we find this confusing in house we perhaps should do something about it. I think for this situation it might make sense to add automatically an additional column
I did not find a chance to implement this during the week but will try to fixed this ticket the next couple of days. |
Null and na are different in r...
Not sure if that makes the syntax confusing. Presumably that is SQL syntax?
Matthew
…On Sat, Jun 30, 2018, 12:43 gaow ***@***.***> wrote:
Yes exactly @stephens999 <https://github.com/stephens999> is correct. And
this is when a row filtering is needed, via conditions:
dscout <- dscquery(dsc.outdir="results",
+ targets=c("mean.est_mean",
+ "sq_error.sq_err"), conditions="mean.est_mean not null")
(need to verify)
But I was mostly concerned that if we find this confusing in house we
perhaps should do something about it. I think for this situation it might
make sense to add *automatically* an additional column
mean
unbiased_mean
huber_mean
...
I did not find a chance to implement this during the week but will try to
fixed this ticket the next couple of days.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#145 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xUjn7ZcYgIQLcVTRi2mJbc1w6OyTks5uB2REgaJpZM4Ukzj5>
.
|
I'm not convinced that if we find it confusing in house then adding a
column automatically under certain circumstances is a good response. Such
fixes by adding new rules make things harder to reason about, not easier.
My suggestion would be to try to make the logic clearer...
In this example that is acheicved by using groups I think.
I understand that there exists another example where the groups do not fix
it, but I have not seen that example so don't really understand the issue.
…On Sat, Jun 30, 2018, 12:45 Matthew Stephens ***@***.***> wrote:
Null and na are different in r...
Not sure if that makes the syntax confusing. Presumably that is SQL syntax?
Matthew
On Sat, Jun 30, 2018, 12:43 gaow ***@***.***> wrote:
> Yes exactly @stephens999 <https://github.com/stephens999> is correct.
> And this is when a row filtering is needed, via conditions:
>
> dscout <- dscquery(dsc.outdir="results",
> + targets=c("mean.est_mean",
> + "sq_error.sq_err"), conditions="mean.est_mean not null")
>
> (need to verify)
>
> But I was mostly concerned that if we find this confusing in house we
> perhaps should do something about it. I think for this situation it might
> make sense to add *automatically* an additional column
>
> mean
> unbiased_mean
> huber_mean
> ...
>
> I did not find a chance to implement this during the week but will try to
> fixed this ticket the next couple of days.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#145 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABt4xUjn7ZcYgIQLcVTRi2mJbc1w6OyTks5uB2REgaJpZM4Ukzj5>
> .
>
|
Yes. I understand it is confusing. In particular R has the most diverse types of missing data. Anyways I think we should not dwell too much on that exact syntax to remove
I think adding a column automatically whenever groups appear in the query should be made a default behavior whether there is missing data or not -- I think that's a proper fix and is generally solution, not to deal with this specific circumstance. |
Okay, current behavior adjusted to: dscrutils::dscquery(dsc.outdir = "test", targets = c("mean.est_mean","huber_mean.k","sq_error.sq_err"))
Notice here the change of I think making rules is not bad (in fact I personally feel that I personally make less rules than many software I use). I think what's important is consistency in behavior. In principle I would not propose, or tolerant solutions that lacks generality. dscrutils::dscquery(dsc.outdir="test", targets=c("unbiased_mean.est_mean", "huber_mean.est_mean", "sq_error.sq_err"))
Compare above to below: again introducing dscrutils::dscquery(dsc.outdir="test", targets=c("unbiased_mean.est_mean", "huber_mean.est_mean", "mean.k", "sq_error.sq_err"))
dscrutils::dscquery(dsc.outdir="test", targets=c("huber_mean.k", "sq_error.sq_err"))
compare above to below: dscrutils::dscquery(dsc.outdir="test", targets=c("mean.k", "sq_error.sq_err"))
|
@gaow I don't have any fundamental issues with your proposal, but I do worry that it might be difficult to implement your rule in general, or explain it. Here is a simpler rule that I think is more intuitive, easier to implement, and easier to explain: When (1) An error when (2) (3) The output will not alwasy be as attractive or concise as it could be, but from the user point-of-view it is nice when all the targets you requested appear in the output table (unless it generates an error). |
In principle yes, but in practice we do not get to filter the table until we have the complete table. We will thus have to invent customized convention in the R code to distinguish these two types of
It is a compromise but I'm always hesitated adding more options like this simply because we did not agree on the expected behavior ourselves. I am leaning towards filtering it even though it involves more work that it appears. I just want to make sure there is no bad side effects. |
If it's unclear what the best way forward is, then maybe we should just wait until we have more user feedback. I am satisfied to use |
@jdblischak @gaow This is one of those situations where we didn't properly anticipate all the use cases when designing the internal data structures. |
I don't understand the example.
Is there still a conditions parameter that can be used to filter out
pipelines?
…On Thu, Jan 31, 2019, 15:53 Peter Carbonetto ***@***.*** wrote:
@jdblischak <https://github.com/jdblischak> @gaow
<https://github.com/gaow> This is one of those situations where we didn't
properly anticipate all the use cases when designing the internal data
structures.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#145 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xVVsaEUIoex8OuiXElU4k-y__9IIks5vI2XBgaJpZM4Ukzj5>
.
|
Yes and I think carefully crafted
The pipelines are:
then the query targets are |
I'm not sure I understand the query behavior.
Why are there results returned for the pipe_zeros pipelines but not for the
pipelines of the form
de, analyze,auc
?
Can you point me to the docs that describe the query behavior?
…On Thu, Jan 31, 2019, 22:15 gaow ***@***.*** wrote:
Is there still a conditions parameter that can be used to filter out
pipelines?
Yes and I think carefully crafted condition parameter can give clean
results. But more often than not we tend to use dscquery without
condition constraints, then use R to post-process it, so that we only query
it once. The condition query is perhaps only useful for very large
benchmarks when it is too much computation to load every file generated for
all conditions.
I don't understand the example.
The pipelines are:
run:
pipe_de: de * analyze * (auc, true_fdr)
pipe_zeros: zeros * analyze * type_one_error
then the query targets are de analyze true_fdr. You see de and true_fdr
are only relevant to the first pipeline, but analyze is relevant to both
pipelines. As explained earlier, there is no concept of pipeline in
queries. As a result, lines 36 to 45 are in fact analyze from the 2nd
pipeline where de and true_fdr are missing.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#145 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xfGP55-5kOS6Xg6AyhwYYPeQD0yBks5vI79pgaJpZM4Ukzj5>
.
|
Sorry in the ticket I truncated the output to highlight the problem. The complete output is here: > df_auc <- dscquery("soneson2013/",
targets = c("de",
"analyze",
"auc.auc"))
Here is the DSC file. Two pipelines are executed. Lines before 35 are output for the
The "official" approach is to use dscquery in R, see docs here. It is a wrapper of this program. However I dont think these documentation is very helpful for this specific issue. |
thanks @gaow you say two pipelines are executed, although of course it is many more than that... In any case I have a question. In the results I see there are 35 pipelines created by (de* analyze *auc) and 20 by Where are the results for the other 35 pipelines created by |
They are not available because they are not part of the query. When I query this: library(dscrutils)
df_auc <- dscquery("soneson2013/",
targets = c("de",
"analyze",
"auc.auc",
"true_fdr.true_fdr")) I get:
We can discuss it when we meet but here is a minimal working example: To run this example, first update DSC:
then run it via:
It should work without having to install any packages. The results can be queried using the R codes at the beginning of this post. |
@stephens999 to answer again your question "Where are the results for the other 35 pipelines created by
where the last column is I agree one would expect it. In fact you will see it if I modify this line: Line 273 in 7d26602
by just removing the function
will completely overlap with pipeline
but in pipeline
Since the first pipeline instance However, the story with overlapping
In sum the behavior of |
@jdblischak With this pull request you can change your query to: library(dscrutils)
df_auc <- dscquery("soneson2013/",
targets = "auc.auc",
others = c("de", "analyze"),
omit.file.columns = TRUE) and get what you actually wanted. Notice you can now use a new option @stephens999 I used convention
|
I close this ticket now since the current behavior is what we've discussed. I'm still open to interface changes -- we can use separate tickets for it. |
@gaow I'm finally catching up on DSC. From
First, this description should be given in the "target" argument. Second, this behaviour doesn't seem like quite what we discussed; from my understanding, if any of the targets have a missing value, then the row should be removed or, equivalently, the row is only kept if all targets have non-missing values. Third, can we change the names of "targets" and "others" arguments? In particular, "others" is not a great name for an argument. What about "target.cols" and "other.cols", for example? |
Sure, please feel free to make changes.
We have not discussed what to do with multiple targets. @stephens999 initially suggested limiting it to only one column. I used
Again, there is backwards compatibility issue, which might impact me, Jean, Kaiqian, Yuxin and possibly John -- perhaps mostly me. I'm not against changing it particularly at this point; i'm only saying that we should be careful if we want to make interface changes. |
Sounds like we weren't precise enough in our conversation. I'm open to the "or" behaviour, but my understanding in our discussion is that we want the "and" behaviour. Let's see what @stephens999 says. I think if we name the arguments in a clever way, we can remain backward compatible. @stephens999 Can we reopen this so that we remember to check in with Matthew about this? |
I don't really have a use case for multiple targets in mind. So I suggested
to
Only allow one to avoid having to make a decision until we have a use
case...
Backwards compatibility is of course nice
Matthew
…On Mon, Feb 11, 2019, 13:16 Peter Carbonetto ***@***.***> wrote:
I used OR here because it will make it compatible with existing codes.
Sounds like we weren't precise enough in our conversation. I'm open to the
"or" behaviour, but my understanding in our discussion is that we want the
"and" behaviour. Let's see what @stephens999
<https://github.com/stephens999> says.
I think if we name the arguments in a clever way, we can remain backward
compatible.
@stephens999 <https://github.com/stephens999> Can we reopen this so that
we remember to check in with Matthew about this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#145 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xQeJoCbWKi-Qeq1uQcC8B7_UrGJ_ks5vMcF_gaJpZM4Ukzj5>
.
On Mon, Feb 11, 2019, 13:16 Peter Carbonetto ***@***.***> wrote:
I used OR here because it will make it compatible with existing codes.
Sounds like we weren't precise enough in our conversation. I'm open to the
"or" behaviour, but my understanding in our discussion is that we want the
"and" behaviour. Let's see what @stephens999
<https://github.com/stephens999> says.
I think if we name the arguments in a clever way, we can remain backward
compatible.
@stephens999 <https://github.com/stephens999> Can we reopen this so that
we remember to check in with Matthew about this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#145 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xQeJoCbWKi-Qeq1uQcC8B7_UrGJ_ks5vMcF_gaJpZM4Ukzj5>
.
|
I'm okay with that. But to ensure backward compatibility, what we would have to do is add a new argument to Here is an idea that could be more user-friendly: Many functions in R that operate on vectors or matrices often have to deal with NAs as special cases, and there is often an argument to specify behaviour for missing values; e.g., |
I like this proposal! Are there any obvious limitations? |
It kind of goes back to the difference between na and not run.
I thought the intended behavior was to remove pipelines that did not run
auc (or whatever is in Target) rather than removing pipeline that run it
but produce na?
…On Tue, Feb 12, 2019, 16:46 gaow ***@***.***> wrote:
Maybe we could have an argument na.rm that accepts the name of one (or
more) names specified in targets
I like this proposal! Are there any obvious limitations?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#145 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xQwp-JhEnyo9ZtDYNGoYLeX-ydD2ks5vM0Q1gaJpZM4Ukzj5>
.
|
This is important to distinguish. Currently it is the intended behavior as you suggested above. It can still be the case when we implement the
@stephens999 this is currently already the case when |
@gaow None that I see.
I thought we recognized that it is currently not possible to distinguish between the two based on how the DSC results are stored. My understanding is that sometimes DSC assigns |
I don't fully understand the process but I believe they are distinguishable
before result extraction. They are just not distinguishable after
extraction into a data frame as they are both represented as na. But during
extraction it should be possible to not extract results from things that
were not run.
Gao can correct me if I'm wrong.
Matthew
…On Tue, Feb 12, 2019, 18:10 Peter Carbonetto ***@***.***> wrote:
I like this proposal! Are there any obvious limitations?
@gaow <https://github.com/gaow> None that I see.
I thought the intended behavior was to remove pipelines that did not run
auc (or whatever is in Target) rather than removing pipeline that run it
but produce na?
I thought we recognized that it is currently not possible to distinguish
between the two based on how the DSC results are stored. My understanding
is that sometimes DSC assigns NA to auc when tthe module generating a
value for auc is not run.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#145 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xUZSLRN_t5np6YU_tKAD-kbT9gk8ks5vM1f_gaJpZM4Ukzj5>
.
|
Yes @stephens999 is correct. This is exactly what happened -- thanks to #167 all filtering are now in R and we can filter as many rounds as we want. We now filter twice: before extracting results, and after extracting values. The 'not run" filter only applies to the first round. |
@gaow And only the |
Columns involved in |
The issue was reported by @jean997 over slack:
Lets say I have two tracks that get run AB and AC. Lets say that C has three possible parameter values and B only has 1. Right now if I do something like
I will only get results for the first parameter value for C but if I do
I get all three. This makes sense but it is awkward because I have to merge data frames
My response:
This is a result of joining in tables by common key when constructing SQL query. But I suppose I can break it up smarter into multiple SQL queries more properly, and join separate tables, then merge the outcome internally, so that you'll not have to merge tables outside the query function.
The text was updated successfully, but these errors were encountered: