-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: framework agnostic loaders #682
Conversation
333bd90
to
0560bb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge amount of work, well done!! Things feel much cleaner now!
I think there are a few issues with the webapp right now:
- Clicking next on a task without connecting data doesn't display anthing anymore at the Training step (while it used to display the training board)
- Connecting data, clicking next to the Training board now displays the training things correctly but if I go back I can't clear the connected files anymore.
- Connecting the LUS COVID dataset and training fails with error
Error: Based on the provided shape, [224,224,3], the tensor should have 150528 values but has 200704
, maybe linked to the new image loading function? A similar errors occurs on simple face:Error: Based on the provided shape, [200,200,3], the tensor should have 120000 values but has 160000
- Training on titanic works well, however the webapp fails to display the testing page correctly afterward and the Test button doesn't appear. (Sidenote titanic training currently fails on develop with error:
Error: A Dataset iterator for fitDataset() is expected to generate objects of the form {xs: xVal, ys: yVal}, where the two values may be tf.Tensor, an array of Tensors, or a map of string to Tensor. The provided Dataset instead generates Tensor
so you seemed to have fixed this error)
- Training on the Skin condition and CIFAR10 tasks fails with error
TypeError: valueScalar is undefined
, you can find the sample skin condition dataset here. - There is no TextDatasetInput.vue so there is no way to connect data to the wikitext task in the webapp
I've been terribly confused with the naming of Dataset
, Data
, DataSplit
and tf.data.Dataset
. A list of specific things I don't like about the current abstractions:
- Dataset is not a data structure composed of Data elements
- Data is actually a dataset (and it is
Data
that wrapstf.data.Dataset
rather thanDataset
) DataSplit
represents a train-test split ofData
rather than a dataset- dataset ends up being an attribute of data like in
data.train.preprocess().batch().dataset
Can we either add more class documentation to clarify the relationships or rework the oriented-object architecture in a way that makes things self-explanatory? This is a bit beyond the scope of this PR but it feels like the right time to address it? Feel free to ignore otherwise I'm also happy to create an additional PR myself.
0560bb0
to
a58143c
Compare
as no dataset was entered (it's undef), one can't train anything. I added a small message to tell users to enter some. one more functional way would be to rework the steps to disable the next step until data are entered.
good catch! I indeed forgot to propagate the "clear file" event
indeed, I mixed RGBA & RGB images, added a real
right, thanks for catching that, I missplaced a
ho oupsi, I'm generating an empty dataset when uploading a folder, should have added a throwing TODO, thanks for catching that.
correct, there isn't any. we can add one in a next iteration but that out of scope of the PR. from what I remember, one student should have worked on that but it wasn't clear if it happened in the end.
I totally agree with you, it's not understandable. that's also why this PR is introducing a new one that should replace all of theses in its next iterations (when I reworked the convertors).
that would be the central (and only) way to have a serie of data (image/tabular/text/...) with transformations applied on it.
it will be dropped, replaced by
it will be replaced by the return type of
it will be real values, not fields, by shaping
as I'm hoping to remove most of theses soon, it seems a bit premature to update the doc now. I'll update the doc accordingly in the next one, but feel free to setup a PR if you feel that's it's taking too long. |
Sounds good! Is there a quick fix to support text dataset? Merging this PR means that wikitext will not be available on discolab.ai until a new PR implements text datasets. |
I quite liked simply displaying an error message when training without data ("input a dataset first"). That allowed new users to walk around and see things when they don't want to train an actual model. What do you think of it? |
88b3514
to
d1437de
Compare
hum, I think that's doable, not for the webapp Tester (it's not really supported now anyway) but for the others parts, yes.
hum, users are weird, but yeah, make sense, it's very demo-y; adding it back. |
d1437de
to
fbbd48b
Compare
976b163
to
649c7a0
Compare
649c7a0
to
15a2a27
Compare
3c1b6ef
to
667cf26
Compare
667cf26
to
6c4ce8a
Compare
e1f1ff4
to
aae81eb
Compare
6c4ce8a
to
54478b6
Compare
aae81eb
to
bb30a7e
Compare
54478b6
to
1a96493
Compare
so, it's alive again! way smaller to review thanks to extracted scraps. description updated. |
There seems to be something wrong with the image data loaders: connecting images (by group for lus_covid and by csv for mnist) results in an error ( |
webapp/training: back to stop-by-throwing This reverts commit 7b259af.
0c8cc80
to
0fd9781
Compare
Yay we've done it! 🎉 |
congrats! |
on the road of supporting ONNX, we first need to extract TFJS from the code base, making discojs framework agnostic.
based on #735.
disco.fit
,validator.{assess,predict}
, requires more type informations (viaTypedDataset
) as theses are supporting any dataset for any task. it should be enforced at some point viaTask
splitting so that we can drop it.Datasplit
,Data
,tf.*
internal to disco (and removing theses when preprocessing and task typing are reworked)Image
,Tabular
&Text
convertors
as a potential new way to do preprocessingtf.TensorContainer
(pretty much equivalent toany
)Data
component emitting changes