Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derivative of Residual Sum-of-Square: Where's the sum ? #1294

Closed
CMoebus opened this issue Oct 4, 2024 · 13 comments
Closed

Derivative of Residual Sum-of-Square: Where's the sum ? #1294

CMoebus opened this issue Oct 4, 2024 · 13 comments

Comments

@CMoebus
Copy link

CMoebus commented Oct 4, 2024

Hi I am new to Symbolics,
ich want to derive the gradient of the resiudal sum of squares. My code is here:
let
#-----------------------------------------------------------------------------
# residuals sum of squares
function mySumOfSquares(ys, xs, ϕ0, ϕ1)
ysHat = ϕ0 .+ ϕ1 .* xs
residuals = ys - ysHat
mySoSq = sum(residual -> residual^2, residuals, init=0.0)
end # function mySumOfSquares
#-----------------------------------------------------------------------------
@variables xss, yss, ϕ0s, ϕ1s
Dϕ0 = Differential(ϕ0s)
Dϕ1 = Differential(ϕ1s)
println(simplify(expand_derivatives(Dϕ0(mySumOfSquares(yss, xss, ϕ0s, ϕ1s)))))
println(simplify(expand_derivatives(Dϕ1(mySumOfSquares(yss, xss, ϕ0s, ϕ1s)))))
#-----------------------------------------------------------------------------
end # let
The Latex code for the hand-calculated gradient is here:
Now let's compute the gradient vector for a given set of parameters of the loss function $L$:

$L = \sum_{i=1}^N \left(y_i - \hat y_i\right)^2 = \sum_{i=1}^N \left(y_i - (\phi_0 + \phi_1 x_i)\right)^2 = \sum_{i=1}^N \left(y_i - \phi_0 - \phi_1 x_i\right)^2$

$;$
$;$
$;$

$\nabla L =
\frac{\partial L}{\partial \mathbf{\phi}} =

\left[
\begin{array}{c}
\frac{\partial L}{\partial \phi_0} \
\frac{\partial L}{\partial \phi_1}
\end{array}
\right] =

\left[
\begin{array}{c}
\sum_{i=1}^N(-2y_i + 2\phi_0 + 2\phi_1x_i) \
\sum_{i=1}^N(-2x_i y_i + 2\phi_0x_i + 2\phi_1x_i^2)
\end{array}
\right] =

\left[
\begin{array}{c}
- 2\sum_{i=1}^N(y_i - (\phi_0 + \phi_1x_i)) \
- 2\sum_{i=1}^N x_i(y_i - (\phi_0 + \phi_1x_i))
\end{array}
\right]$
The partial derivates computed by Symbolics.jl are correct but the sum is missing. What was my fault ?
All the best, Claus Möbus

@ChrisRackauckas
Copy link
Member

The sum of a scalar variable is just the scalar.

@CMoebus
Copy link
Author

CMoebus commented Oct 5, 2024

Sorry Chris, the gradient vector of the residual sum of squares of a simple univariate regression contains two elements which are sums (see here: https://uol.de/f/2/dept/informatik/ag/lks/download/Probabilistic_Programming/JULIA/Pluto.jl/Machine_Learning/UnderstandingDeepLearning/UDL_20240920_6_2_2_GradientDescent_II.html?v=1728143690 and in Prince's book (6.7, 6.8, p.80; https://github.com/udlbook/udlbook/releases/download/v4.0.4/Understanding_Deep_Learning.pdf).
All the best, Claus

@ChrisRackauckas
Copy link
Member

@variables xss, yss, ϕ0s, ϕ1s these are scalar variables, did you mean to make any of them arrays?

@CMoebus
Copy link
Author

CMoebus commented Oct 5, 2024 via email

@ChrisRackauckas
Copy link
Member

Define n and then @variables xss[1:n], yss[1:n], ϕ0s, ϕ1s ?

@CMoebus
Copy link
Author

CMoebus commented Oct 6, 2024

Sorry, the error message is now: "Differentiation with array expressions is not yet supported". I uploaded the Julia/Pluto-code here: https://uol.de/f/2/dept/informatik/ag/lks/download/Probabilistic_Programming/JULIA/Pluto.jl/Machine_Learning/UnderstandingDeepLearning/UDL_20240920_6_2_2_GradientDescent_II.html?v=1728143690

@CMoebus
Copy link
Author

CMoebus commented Oct 9, 2024

I tried various code variants (https://uol.de/f/2/dept/informatik/ag/lks/download/Probabilistic_Programming/JULIA/Pluto.jl/Machine_Learning/UnderstandingDeepLearning/UDL_20240920_6_2_2_GradientDescent_II.html?v=1728406870) but my impression is that Symbolics.jl is presently unable to calculate derivatives when array expressions are in the code. But this is always the case when you have statistical models at hand. Sorry fo that. All the best. Claus

@ChrisRackauckas
Copy link
Member

You'd have to scalarize it. Symbolics.scalarize(expr)

@CMoebus
Copy link
Author

CMoebus commented Oct 9, 2024 via email

@CMoebus
Copy link
Author

CMoebus commented Oct 12, 2024

I tried various code variants (https://uol.de/f/2/dept/informatik/ag/lks/download/Probabilistic_Programming/JULIA/Pluto.jl/Machine_Learning/UnderstandingDeepLearning/UDL_20240920_6_2_2_GradientDescent_II.html?v=1728406870) but my impression is that Symbolics.jl is presently unable to calculate derivatives when array expressions are in the code. But this is always the case when you have statistical models at hand. Sorry fo that. All the best. Claus

After some trial and error I came out with this code snippet which genersted a correct but very low level answer which has to be abstracted by hand.
let 

      @variables x[1:N], y[1:N], ϕ0, ϕ1

      rSS = sum((y[i]- (ϕ0 + ϕ1*x[i]))^2 for i in 1:N)

      grad_rSS = Symbolics.gradient(rSS, [ϕ0, ϕ1])

      simplify(grad_rSS)

end # let

But this abstraction should be provided by Symbolics.jl ! But how to activate this process ? Do you know an answer ?

@ChrisRackauckas
Copy link
Member

julia> using Symbolics

julia> N = 10
10

julia> @variables x[1:N], y[1:N], ϕ0, ϕ1
4-element Vector{Any}:
   x[1:10]
   y[1:10]
 ϕ0
 ϕ1

julia> rSS = sum((y[i]- (ϕ0 + ϕ1*x[i]))^2 for i in 1:N)
(y[1] - ϕ0 - x[1]*ϕ1)^2 + (y[10] - ϕ0 - x[10]*ϕ1)^2 + (y[2] - ϕ0 - x[2]*ϕ1)^2 + (y[3] - ϕ0 - x[3]*ϕ1)^2 + (y[4] - ϕ0 - x[4]*ϕ1)^2 + (y[5] - ϕ0 - x[5]*ϕ1)^2 + (y[6] - ϕ0 - x[6]*ϕ1)^2 + (y[7] - ϕ0 - x[7]*ϕ1)^2 + (y[8] - ϕ0 - x[8]*ϕ1)^2 + (y[9] - ϕ0 - x[9]*ϕ1)^2

julia> grad_rSS = Symbolics.gradient(rSS, [ϕ0, ϕ1])
2-element Vector{Num}:
                                                    -2(y[1] - ϕ0 - x[1]*ϕ1) - 2(y[10] - ϕ0 - x[10]*ϕ1) - 2(y[2] - ϕ0 - x[2]*ϕ1) - 2(y[3] - ϕ0 - x[3]*ϕ1) - 2(y[4] - ϕ0 - x[4]*ϕ1) - 2(y[5] - ϕ0 - x[5]*ϕ1) - 2(y[6] - ϕ0 - x[6]*ϕ1) - 2(y[7] - ϕ0 - x[7]*ϕ1) - 2(y[8] - ϕ0 - x[8]*ϕ1) - 2(y[9] - ϕ0 - x[9]*ϕ1)
 -2x[1]*(y[1] - ϕ0 - x[1]*ϕ1) - 2x[10]*(y[10] - ϕ0 - x[10]*ϕ1) - 2x[2]*(y[2] - ϕ0 - x[2]*ϕ1) - 2x[3]*(y[3] - ϕ0 - x[3]*ϕ1) - 2x[4]*(y[4] - ϕ0 - x[4]*ϕ1) - 2x[5]*(y[5] - ϕ0 - x[5]*ϕ1) - 2x[6]*(y[6] - ϕ0 - x[6]*ϕ1) - 2x[7]*(y[7] - ϕ0 - x[7]*ϕ1) - 2x[8]*(y[8] - ϕ0 - x[8]*ϕ1) - 2x[9]*(y[9] - ϕ0 - x[9]*ϕ1)

julia> simplify(grad_rSS)
2-element Vector{Num}:
                                                    -2(y[1] - ϕ0 - x[1]*ϕ1) - 2(y[10] - ϕ0 - x[10]*ϕ1) - 2(y[2] - ϕ0 - x[2]*ϕ1) - 2(y[3] - ϕ0 - x[3]*ϕ1) - 2(y[4] - ϕ0 - x[4]*ϕ1) - 2(y[5] - ϕ0 - x[5]*ϕ1) - 2(y[6] - ϕ0 - x[6]*ϕ1) - 2(y[7] - ϕ0 - x[7]*ϕ1) - 2(y[8] - ϕ0 - x[8]*ϕ1) - 2(y[9] - ϕ0 - x[9]*ϕ1)
 -2x[1]*(y[1] - ϕ0 - x[1]*ϕ1) - 2x[10]*(y[10] - ϕ0 - x[10]*ϕ1) - 2x[2]*(y[2] - ϕ0 - x[2]*ϕ1) - 2x[3]*(y[3] - ϕ0 - x[3]*ϕ1) - 2x[4]*(y[4] - ϕ0 - x[4]*ϕ1) - 2x[5]*(y[5] - ϕ0 - x[5]*ϕ1) - 2x[6]*(y[6] - ϕ0 - x[6]*ϕ1) - 2x[7]*(y[7] - ϕ0 - x[7]*ϕ1) - 2x[8]*(y[8] - ϕ0 - x[8]*ϕ1) - 2x[9]*(y[9] - ϕ0 - x[9]*ϕ1)

@CMoebus
Copy link
Author

CMoebus commented Oct 13, 2024 via email

@ChrisRackauckas
Copy link
Member

I have no idea what you're saying.

@CMoebus CMoebus closed this as completed Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants