-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possible future directions #43
Comments
Thanks a lot for your interest @schlichtanders ! I look forward to seeing where your ideas could lead to. Now about your questions :
|
already some time is gone, I fear it is already time for me to say see you some day. As it is a rather big project for me to read into the code base and start contributing I cannot promise when my context will allow for this. I keep it in the back of my mind and wish you success! It is so awesome that julia is able to include automatic differentiation at its meta level! |
This is an awesome project and I hope it will mature even more together with julia's growth! Thanks for having realized the state it is now.
Some Introductory Words
I like to contribute in the long run!
probably I won't have much time soon, as I will start my first job as a Data Scientist. But I truly love this general direction. In fact, I was motivated to implement something almost exactly like it myself. And where motivation is, there is time.
My motivation
The last half year I extensively worked with Python's Theano and loved its idea from the very beginning. Soon I got to know many of its drawbacks, where its awesome basic idea of graph-optimization and automatic differentiation are not put straightforward but wrapped in an enormously big code base.
The major drawback was that I had to rely on Theano and its C-code, completely. Combining theano with native Python always means that your code is not optimal in terms of performance... I don't like that as it takes away much of the control I would like to have as a programmer. Also the implementation of theano is rather horrible as they have custom Tree-Type with hidden references to another, which makes it really hard to cut, merge, ... (sub)trees. (With tree I just mean Graph).
I wanted to have clear interaction focus on the computation graphs themselves, without giving up control.
Julia
Julia is one of the ideal languages to implement such ideas, as there is no need to compile the graphs into C-code or something (like is done in Python's theano) and thereby taking away much control of the programmer. Instead in Julia everything can be compiled to Julia because of its meta system.
Furthermore, Julia's meta system already gives a default graph structure, which even allows for annotations as far as I understood it (for instance
quote println("hello world") end
ordump(quote println("hello world") end)
already show annotations). As far as I read, julia's llvm even performs certain graph optimizations. Unfortunately I still haven't understood the details here.As I see from your code https://github.com/JuliaDiff/ReverseDiffSource.jl/blob/master/src/frdiff.jl there is already a way to extract the source graph from function objects programmatically, which enables one to indeed work on the julia expression graphs. As you do it.
Some Suggestions
Suggesting use of common Graph structures
From my experience, the interface of theano and tensor-flow is useful until a certain, especially for fast scripting some high level stuff. But for any serious gradient-based machine learning stuff the programmer wants to have control over the computation graph directly (at least this is my experience).
I thought a lot about which graph type would be preferable for representing all this. You made your own type, because of the above reason, I would recommend using julias expression types directly I think. Some additional reasons:
However, there seem to be certain graph optimizations which need a double-linked graph. At least in Python theano there is also both, a single linked graph (like Julia's expression graphs) and a double-linked one which is used for graph optimizations. I also thought a lot about which double-linke graph would be preferable. My conclusion was that I would follow Julia's Parametric Types suggestion for containers. I.e. that a generic double-linked tree is used with its value being parametrized. (Next to generalizability and style, a direct advantage of this direction is that the trees can be easily copied and combined, something which seems to be an internal mess within Theano.
This tree structure would of course made its own
DoubleLinkedTree.js
git repository. Such a collection type doesn't exist yet as far as I can see. (Or maybe better to take the time here and build a Tree.js or Graph.js with DoubleLinkedTree and SingleLinkedTree. Everything for Julia.)As I see in your code, this is not yet build this way, however you seemed to have extracted the tree structure from the values and the derivation/optimization functionalities (please correct me here as I was not able to read into your code into depth yet). Furthermore, In your code you seemd to use even TripleLinkedTree by referring to
precedences
https://github.com/JuliaDiff/ReverseDiffSource.jl/blob/master/src/node.jlWould be great if you can explain why you need this, as I haven't seen such in Theano's tree structures.
Suggesting re-implementing theano's graph-optimizations in a separate respository
Theano (and probably also tensor-flow) already collected a lot of optimization methods. So there is no need to be creative, but the work is almost done already, but needs to be rewritten into julia.
See https://github.com/Theano/Theano/blob/master/theano/tensor/opt.py for the main script of theano's graph-optimizer (if I got it right).
There are other suggested optimizations which I found while reading Julia's documentation. See http://docs.julialang.org/en/release-0.5/manual/performance-tips/#tweaks and http://docs.julialang.org/en/release-0.5/stdlib/math/#Base.muladd for instance.
It would be great, to expose the graph-optimization to the programmer, in that there are
In this vision graph-optimization is something complementary to automatic differentiation and hence the suggestion to keep those two functionalities separate. Probably it is even apt to create a new repository for this (This would be inline with the intention of JuliaDiff I think). Graph-Optimization would be useful to many Julia-users, even if they don't need AD.
Suggestion for replicating theano's / tensor-flows functionality, once the graph interface is there
There is an easy way to replicate machine learning functionality implemented in theano/tensor-flow by simply bookkeeping certain references to the complete computation graph, i.e. sub-expressions.
I implemented such an approach on top of theano, called theano_models (note that the documentation still needs improvement).
The central element is just dictionary like object. For example let
y = :($a * $x + $b)
i.e.x, a, b
refer to sub-expressions. Thenwould nicely represent the structure needed for machine learning. What is done in tensor-flow by labeling expressions as variables is much more generally implemented this way, by simply merging references of different sub-models. One advantage I already used is that you simply can add new keys, e.g.
:parameters_positive
, which might collect all references to parameters which need to stay positive.Having all the references to expressions and the ability to automatically derive the respective expressions for the gradients is all what is to implement an elegant and maximal generic machine learning interface.
Indeed, I would like to implement this on top of your package. I already thought a lot about what would be a nice implementation and think I found one in my mentioned python version theano_models. Neverthless, It will still take time to re-implement it in Julia.
Suggestion for current improvement
From my perspective, the implementation of
rdiff
as returning a single expression or function which in turn returns tuples seems impressively unintuitive.I would suggest to return a tuple of expressions / tuple of functions instead. For instance in machine learning it is often not needed and not wanted to evaluate the function value directly during learning, but the gradient is enough.
Questions
What does
precedences
?Why have you called it
*Reverse*DiffSource.js
?I am just curious
Looking forward to your comments
and to help you in the Future
The text was updated successfully, but these errors were encountered: