You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My current environment is fairly irellevant to the question at hand so I thought to simply omit this
Problem description
I am currently trying to generate "novel" (out-of-distribution) data to evaluate a novel-detecting classification framework. Currently, I am comparing many different generation methods from the SDV library, but of course these generators aim to generate data that is very similar to the original. I'm just aiming to open a discussion as to whether it would be possible to generate this kind of data using SDV.
What I already tried
While yet to try, I am taking a light assumption that potentially specifying probability distributions differing to the original set could produce results which are outside of the original distribution, am I assuming correctly - is this how the library should work?
Secondly, I am looking at conditional sampling to ensure a certain amount of some samples are included, hopefully to change the statistical properties of the new set, so that it differs from the original. I am going to assume this method will be beneficial as this is partially what I need to do anyway.
In case I am missing something (perhaps a generator which could be tuned to do this), I would appreciate any further input anyone is able to provide - if any. I have had a suggestion to use a VAE which samples from rarely visited areas in the latent space, but as far as I am aware, this is likely not possible with SDV unless modifying source code.
Any input is greatly appreaciated, thanks for reading!
The text was updated successfully, but these errors were encountered:
Hi @weirdfishs this sounds like an interesting project. You're definitely correct that SDV synthesizers are designed to learn and mimic the patterns in the real data, not generate data that's dissimilar.
Your best bet might be to use the SDV for the subset of samples you need that are in fact statistically similar to the real data and use a different approach for the "novel" samples you want, as you suggested.
I unfortunately can't provide much guidance here for generating out-of-distribution data to help generate outliers. I know that this is probably an unsatisfying answer though!
My current environment is fairly irellevant to the question at hand so I thought to simply omit this
Problem description
I am currently trying to generate "novel" (out-of-distribution) data to evaluate a novel-detecting classification framework. Currently, I am comparing many different generation methods from the SDV library, but of course these generators aim to generate data that is very similar to the original. I'm just aiming to open a discussion as to whether it would be possible to generate this kind of data using SDV.
What I already tried
While yet to try, I am taking a light assumption that potentially specifying probability distributions differing to the original set could produce results which are outside of the original distribution, am I assuming correctly - is this how the library should work?
Secondly, I am looking at conditional sampling to ensure a certain amount of some samples are included, hopefully to change the statistical properties of the new set, so that it differs from the original. I am going to assume this method will be beneficial as this is partially what I need to do anyway.
In case I am missing something (perhaps a generator which could be tuned to do this), I would appreciate any further input anyone is able to provide - if any. I have had a suggestion to use a VAE which samples from rarely visited areas in the latent space, but as far as I am aware, this is likely not possible with SDV unless modifying source code.
Any input is greatly appreaciated, thanks for reading!
The text was updated successfully, but these errors were encountered: