-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add 7 workflow examples for MLJFlux #256
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #256 +/- ##
=======================================
Coverage 92.08% 92.08%
=======================================
Files 12 12
Lines 316 316
=======================================
Hits 291 291
Misses 25 25 ☔ View full report in Codecov by Sentry. |
Added commit for another tutorial. |
docs/src/workflow examples/Basic Neural Architecture Search/Project.toml
Outdated
Show resolved
Hide resolved
Since you propose a tutorial based on an RNN, you should probably study #225. See in particular, the point about the observation dimension, near the end. |
Looks like the Early Stopping markdown did not generate (I'm seeing "empty file"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Herculean effort, thanks!
(I haven't checked the notebooks or markdown).
After some more thought, I wonder if the spam example isn't little too long and challenging for documentation. This is my understanding: You're representing messages as sequences of integers - one integer per word in some fixed vocabulary (and a special integer for words outside). To train an RNN-based classifier you need to one-hot encode but to do that lazily you want to use Flux's embedding layer, rather than an external one-hot encoding transformer which will just produce a very large table. However, an obstacle in your approach is that is the MLJFlux's NeuralNetworkClassifier does not support categorical data, so you have to fool it by representing integers as floats, and then reverse this in a pre-processing layer of the NN preceding the Embed layer. Do I have this right? I can see how all this should work, and it's great to see this solution, however it's definitely not MLJ idiomatic, which adds a cognitive burden for users who want to first understand the "proper" way of doing things. Of course, after the summer project, you will have extended MLJFlux to overcome this kind of issue, but with something probably more useful than one-hot encoding for high cardinality features (we'll be learning an embedding directly into a lower dimensional space) and something that can be transferred to other classifiers. So maybe a new version of this example would be appropriate then. |
Co-authored-by: Anthony Blaom, PhD <[email protected]>
Co-authored-by: Anthony Blaom, PhD <[email protected]>
Co-authored-by: Anthony Blaom, PhD <[email protected]>
Co-authored-by: Anthony Blaom, PhD <[email protected]>
Co-authored-by: Anthony Blaom, PhD <[email protected]>
@ablaom for some reason, I can see the full files in the Github review log. |
It's specifically in the tutorials section of the documentation which I think doesn't have to be limited to less challenging tutorials (esp. with the rest of the docs and the workflow examples being present). As for it being potentially long, I think it's shorter than Boston and MNIST tutorials and I tried to be systematic in explaining it so one does not get lost. As for it being potentially challenging, I would say unlikely unless for someone that did not come accross using RNNs before (in which case it helps them learn more about RNNs but isn't so easy because if I were to explain more on this it would feel like a lecture and be quite long for those that have came across using them and just want to see more applications; also tokenization, embedding and other concepts are not missed in NLP courses).
Yes, the embedding layer simulates the functionality that will be performed on one-hot vectors with an equivalent functionality over integers which makes it more efficient.
Yes, correct. A sentence in this case is viewed as a sequence of categorical columns where each takes values from the same set (the vocabulary).
Yes, I agree, but if we consider the degree of deviation from MLJ-idiomatic code we find it's only the Minor comment on "this should work", yes it has expectedly worked as in the executed notebook/markdown.
After some reflection, I would like to emphasize that what follows the three dots may not be exactly precise. In particular, the difference from MLJFlux's embedding layer and the custom layer that will be added to deal with categorical data is that the former will share weights among all words (and that's useful); meanwhile, the latter assumes each column in the vector is another categorical variables with different categories and by default uses an embedding layer for each column. Thereby, I have thought about two resolutions for this issue: Resolution A:
Why A makes sense:
Resolution B:
Why B makes sense:
|
Thank you so much for the high quality review. Much appreciated. No need to check them as they are automatically generated from the Julia file. I also would quickly glance over them in the documentation to check everything is alright (e.g., the reason why I had misformed indentation for some comments is that I was trying to reverse the already misformed representation in documenter; however, that now is fixed thanks to you tab is four spaces advice). |
I see you agree with my statement above, and for this reason will push back against inclusion in the manual on this crucial point. Including non-idiomatic examples in core documentation is not a good idea, in my opinion. I do agree we could immediately expand the allowed scitype for input to include Sorry, I'm not sure I understand the distinction in A and B, but I will say weight sharing for categorical feature encoding sounds like a substantial complexity addition from the point of view of user interface because now they have to specify exactly how that looks like. But we can discuss on our next call. |
Alrighty, I see how this makes sense from a perfection perspective or by emphasizing the experience for beginners. I have hidden that tutorial accordingly. We may merge if you have seen the EarlyStopping tutorial. Sure. We can talk about the distinction between A and B in the next call. |
Thanks. Merge when ready. Note that I have just merged #251 onto |
Added the following workflow examples:
Sorry if I didn't make the PR in the expected time. I encountered interruptions and was stuck in live training for longer than I wanted. In particular, I was trying hard to get it to work in Jupyter notebooks.
Will make another PR for a single tutorial soon.