- Newton's Method In Optimization
- Gauss-Newton Algorithm
- Curve Fitting
- Non Linear Least Squares
- Non Linear Regression
- Levenberg Marquardt
Newton's method is an iterative method for finding the roots of a differentiable function . Applying Newton's method to the derivative
of a twice-differentiable function
to find the roots of the derivative, (solutions to
) will give us critical points (minima, maxima, or saddle points)
The second-order Taylor expansion of around
is:
setting the derivative to zero:
This will give us:
We start the next point at where the second order approximation is zero, so
By putting everything together:
So we iterate this until the changes are small enough.
The Gauss-Newton algorithm is a modification of Newton's method for finding a minimum of a function. Unlike Newton's method, second derivatives, (which can be challenging to compute), are not required.
Starting with an initial guess for the minimum, the method proceeds by the iterations
The is called Pseudoinverse. To compute this matrix we use the following property:
and
Then, we can write the pseudoinverse as:
Proof:
Refs: 1
In the following we solve the inverse kinematic of 3 link planner robot.
vector_t q=q_start;
vector_t delta_q(3);
double epsilon=1e-3;
int i=0;
double gamma;
double stepSize=10;
while( (distanceError(goal,forward_kinematics(q)).squaredNorm()>epsilon) && (i<200) )
{
Eigen::MatrixXd jacobian=numericalDifferentiationFK(q);
Eigen::MatrixXd j_pinv=pseudoInverse(jacobian);
Eigen::VectorXd delta_p=transformationMatrixToPose(goal)-transformationMatrixToPose(forward_kinematics(q) );
delta_q=j_pinv*delta_p;
q=q+delta_q;
i++;
}
Full source code here
You have a function with
unknown parameters,
and a set of sample data (possibly contaminated with noise) from
that function and you are interested to find the unknown parameters such that the residual (difference between output of your function and sample data)
became minimum. We construct a new function, where it is sum square of all residuals:
residuals are:
We start with an initial guess for
Then we proceeds by the iterations:
Lets say we have the following dataset:
i | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
[S] | 0.038 | 0.194 | 0.425 | 0.626 | 1.253 | 2.500 | 3.740 |
Rate | 0.050 | 0.127 | 0.094 | 0.2122 | 0.2729 | 0.2665 | 0.3317 |
And we have the folliwng function to fit the data:
Now let's implement it with Eigen, First a functor that calculate the
given
:
struct SubstrateConcentrationFunctor : Functor<double>
{
SubstrateConcentrationFunctor(Eigen::MatrixXd points): Functor<double>(points.cols(),points.rows())
{
this->Points = points;
}
int operator()(const Eigen::VectorXd &z, Eigen::VectorXd &r) const
{
double x_i,y_i,beta1,beta2;
for(unsigned int i = 0; i < this->Points.rows(); ++i)
{
y_i=this->Points.row(i)(1);
x_i=this->Points.row(i)(0);
beta1=z(0);
beta2=z(1);
r(i) =y_i-(beta1*x_i) /(beta2+x_i);
}
return 0;
}
Eigen::MatrixXd Points;
int inputs() const { return 2; } // There are two parameters of the model, beta1, beta2
int values() const { return this->Points.rows(); } // The number of observations
};
Now we have to set teh data:
//the last column in the matrix should be "y"
Eigen::MatrixXd points(7,2);
points.row(0)<< 0.038,0.050;
points.row(1)<<0.194,0.127;
points.row(2)<<0.425,0.094;
points.row(3)<<0.626,0.2122;
points.row(4)<<1.253,0.2729;
points.row(5)<<2.500,0.2665;
points.row(6)<<3.740,0.3317;
Let's create an instance of our functor and compute the derivatves numerically:
SubstrateConcentrationFunctor functor(points);
Eigen::NumericalDiff<SubstrateConcentrationFunctor,Eigen::NumericalDiffMode::Central> numDiff(functor);
Set the initial values for beta:
Eigen::VectorXd beta(2);
beta<<0.9,0.2;
And the main loop:
//βᵏ⁺¹=βᵏ- (JᵀJ)⁻¹Jᵀr(βᵏ)
for(int i=0;i<10;i++)
{
numDiff.df(beta,J);
std::cout<<"J: \n" << J<<std::endl;
functor(beta,r);
std::cout<<"r: \n" << r<<std::endl;
beta=beta-(J.transpose()*J).inverse()*J.transpose()*r ;
}
std::cout<<"beta: \n" << beta<<std::endl;
The Levenberg-Marquardt algorithm aka the damped least-squares (DLS) method, is used to solve non-linear least squares problems. The LMA is used in many mainly for solving curve-fitting problems. The LMA finds only a local minimum (which may not be the global minimum). The LMA interpolates between the Gauss-Newton algorithm and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.