Skip to content

Commit

Permalink
IRLS
Browse files Browse the repository at this point in the history
  • Loading branch information
endymecy committed Jan 25, 2017
1 parent 0108159 commit 7db4132
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 2 deletions.
2 changes: 1 addition & 1 deletion 分类和回归/线性模型/广义线性回归/glr.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ println(s"Intercept: ${model.intercept}")
irlsModel.diagInvAtWA.toArray, irlsModel.numIterations, getSolver)
model.setSummary(Some(trainingSummary))
```
  迭代再加权最小二乘的分析见最优化章节:[迭代再加权最小二乘](../分类和回归/线性模型/广义线性回归/IRLS.md)
  迭代再加权最小二乘的分析见最优化章节:[迭代再加权最小二乘](../../../最优化算法/IRLS.md)

### 3.3 链接函数

Expand Down
43 changes: 42 additions & 1 deletion 最优化算法/IRLS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 迭代再加权最小二乘

## 原理
## 1 原理

  迭代再加权最小二乘(`IRLS`)用于解决特定的最优化问题,这个最优化问题的目标函数如下所示:

Expand All @@ -13,3 +13,44 @@ $$\beta ^{t+1} = argmin_{\beta} \sum_{i=1}^{n} w_{i}(\beta^{(t)}))|y_{i} - f_{i}
  在这个公式中,$W^{(t)}$是权重对角矩阵,它的所有元素都初始化为1。每次迭代中,通过下面的公式更新。

$$W_{i}^{(t)} = |y_{i} - X_{i}\beta^{(t)}|^{p-2}$$

## 2 源码分析

  `spark ml`中,迭代再加权最小二乘主要解决广义线性回归问题。下面看看实现代码。

### 2.1 更新权重

```scala
// Update offsets and weights using reweightFunc
val newInstances = instances.map { instance =>
val (newOffset, newWeight) = reweightFunc(instance, oldModel)
Instance(newOffset, newWeight, instance.features)
}
```
  这里使用`reweightFunc`方法更新权重。具体的实现在广义线性回归的实现中。

```scala
/**
* The reweight function used to update offsets and weights
* at each iteration of [[IterativelyReweightedLeastSquares]].
*/
val reweightFunc: (Instance, WeightedLeastSquaresModel) => (Double, Double) = {
(instance: Instance, model: WeightedLeastSquaresModel) => {
val eta = model.predict(instance.features)
val mu = fitted(eta)
val offset = eta + (instance.label - mu) * link.deriv(mu)
val weight = instance.weight / (math.pow(this.link.deriv(mu), 2.0) * family.variance(mu))
(offset, weight)
}
}

def fitted(eta: Double): Double = family.project(link.unlink(eta))
```
  这里的`model.predict`利用带权最小二乘模型预测样本的取值,然后调用`fitted`方法计算均值函数$\mu$。`offset`表示
更新后的标签值,`weight`表示更新后的权重。关于链接函数的相关计算可以参考[广义线性回归](../分类和回归/线性模型/广义线性回归/glr.md)的分析。

  有一点需要说明的是,这段代码中标签和权重的更新并没有参照上面的原理或者说我理解有误。

## 3 参考文献

【1】[Iteratively reweighted least squares](https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares)

0 comments on commit 7db4132

Please sign in to comment.