-
Notifications
You must be signed in to change notification settings - Fork 91
Linear2d layer #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linear2d layer #197
Conversation
Thank you! Will review this week.. |
Thank you for this effort and sorry that I missed the issue you opened a few days ago. Great work figuring out the plumbing of neural-fortran and getting stuff running. I have two questions for now:
I will think a little more about the crutch and the need for 3-d shapes. It's not yet obvious to me that it is but it's late over here.. |
Thank you for your response!
Regarding the plumbing, I think a bit more of documentation could have helped. I might make a description of OOP structure of the project with a diagram and send a pr into BTW, looking at my PR in the morning reveals that I forgot a couple of things: put the layer's logic into its submodule and update cmake stuff. I'll do it as well |
Thanks for the updates, Michael. When you get a chance, please review the Input2d layer in #198. If it's sound, we'll merge that into main and then merge it with this PR and remove the crutch. From your research & experience, is an input layer the only that precedes Linear2d in applications? Or some other kinds of layers do too? |
Wow, thank you for your work! Answering you question may be not that simple. I mostly work in Natural Language Processing, and in this area appears the need to perform linear transformations on matrices in several places. The general idea is that vectors for inputs are stored in a transformed lookup table with dimensions
So, linear2d layer will appear in every step of this: to rearrange input, to perform architecture-specific calculations and then to format output I am currently working on But all of this is not really about now, but rather about the future. At this point |
2e9518e
to
9a4422f
Compare
Everything seems to be resolved. @milancurcic can you take a look? any suggestions? |
Great, thanks! Will review and test locally tonight. |
Here is a sample: https://gist.github.com/OneAdder/1090c066d8a9e3c0557c2968000ba463 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Michael. You'll notice that I added your simple linear2d program as an example. I also left a comment about it not converging. We should get it to converge.
I think this PR is almost complete. One key thing stands out that I would like to discuss changing.
Currently, the user is required to provide:
linear2d(sequence_length, in_features, out_features)
An analogy to the dense
(1d) layer would be:
dense(in_features, out_features)
However, notice that in NF, you only need to write
dense(out_features)
and the internal shape of the dense
layer, which is (in_features, out_features)
is determined from the layer that precedes it. For example:
input(5), &
dense(10)
would initialize a dense
layer with internal shape of (5, 10)
.
That said, I would like to be able to do simply:
net = network([ &
input(sequence_length, in_features), &
linear2d(out_features), &
flatten(), &
])
to get the same behavior that we currently have. Basically, at layer construction time we only know out_features
, but sequence_length
and in_features
are obtained during layer % init()
from the preceding layer.
This way, the user doesn't need to pass redundant information when constructing the network, which makes for a nicer API but also reduces the chance of user error.
What do you think?
@@ -148,4 +150,13 @@ module function reshape(output_shape) result(res) | |||
|
|||
end function reshape | |||
|
|||
module function linear2d(sequence_length, in_features, out_features) result(res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should be requesting only out_features
at layer constructor invocation, and sequence_length
and in_features
are obtained later from the layer that feeds into this one.
module function linear2d(sequence_length, in_features, out_features) result(res) | |
module function linear2d(out_features) result(res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that we cannot avoid passing sequence_length
. We need it to determine the output shape. I think it's not a big deal and can be left like this
module function linear2d(sequence_length, out_features) result(res)
integer, intent(in) :: sequence_length, out_features
type(layer) :: res
res % name = 'linear2d'
res % layer_shape = [sequence_length, out_features]
allocate(res % p, source=linear2d_layer(out_features))
end function linear2d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just gave it a try; unless I'm mistaken, I think it can work. See this commit: 678b2c0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to explain; the process goes like this:
linear2d(out_features)
constructs the genericlayer
instance which at this time does not yet know itslayer_shape
.network
constructor from an array of genericlayers
loops over each layer in order and callslayer % init(prev_layer)
.- Inside
layer % init
the output shape of the previous layer is passed to the concretelinear2d_layer % init(input_shape)
and inside the concreteinit
, all parameters are now known (sequence_length
,in_features
,out_features
. - After the concrete layer
init
call, back inside the genericlayer % init
, we set the genericlayer % layer_shape
to be the same as the shape of the concretelinear2d_layer % output
.
I hope this make sense and I'm sorry that it had to be so complicated. However, we are essentially hacking around Fortran's very limited generic features.
|
@milancurcic done |
One last issue remains. I don't think the example actually converges. In the previous iteration, the loss (MSE) tolerance was 0.01, which for these outputs, on the order of 0.1, becomes very easily satisfied. If you lower the loss tolerance to, say, 1e-4, and run for a large number of iterations and print the output of each step, you'll see that the outputs change (and they change gradually, with sufficiently low learning rate), but they don't actually converge toward the values of Are we still not doing something correctly in this example? |
Off TopicI think I found an unrelated bug: gradients become zero between ExampleI don't think that there's an issue in fact. What is happening here is actually linear regression. It can be visualized as trying to draw a line as close to each point as possible. The |
OK, thanks for that explanation. In that case, maybe the example should simply run for a number of iterations and print the outputs to the screen, rather than stopping on some first accidental local minimum. And related to this, as I understand it, this |
Great! Then we can just remove it. I'm working on a text classification with IMDB dataset. I'll add it later when there are more 2D layers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you!
Two-Dimensional Linear Layer
Reason
Modern day Machine Learning techniques often require being able to work with 2D-shaped data, especially when solving Natural Language Processing tasks. For example, self-attention matrix is of shape
(sequence_length, sequence_length)
while scaled dot product attention matrix is of shape(sequence_length, head_size)
, transformer embeddings are usually stored in a 2D lookup table and so on.In order to do so, a linear layer with trainable parameters that transforms such data is necessary. I plan on implementing MultiHead Attention and later Transformer Encoder, so I decided to add 2D Layer first as it will be required in every step of transformers architecture.
Desciption
linear2d_layer
implements both forward and backward passes and, unintuitively, accepts inputs of 3D-shape (reasons in Crutches section) with the first dimension being reserved for batch size.linear2d
constructor accepts four arguments:batch_size, sequence_length, in_features, out_features
, requiring the input shape to be(batch_size, sequence_length, in_features)
. The output shape (layer_shape
) is(batch_size, sequence_length, out_features)
.layer
andnetwork
classes are modified to supportlinear2d_layer
. At this stage, the layer is restricted to be preceeded byinput3d_layer
.flatten
layer is now allowed to be the last layer of the network as a placeholder.Crutches
Input2D Layer
It appears that it is impossible to implement an
input2d_layer
in the current paradigm. The problem is that Fortran cannot resolve generics by array's shape and asinput3d_layer
that acceptsreal
array is already present, it is impossible to add functions that accept different shapes. So, we are stuck with an extra dimension. I decided to use it for storing batch size, similarly to how PyTorch does it.Dense Layer
To an extent, this layer can be used as simply another interface for
dense
if we make it generic. But it can create another problem: it will require reconsidering restrictions on layer ordering. It can be done, but I would like a maintener's opinion on that.Tests
I made tests in
test_linear2d_layer.f90
Sample Code
Fortran NN code:
print_info
output:NN output:
PyTorch Reference
I used PyTorch to make sure that everything works. Here is the snippet of my Python code:
Output: