优化作业3

1

f(x+s) = f(x - \tau g_x)\\ m(\tau) = f(x) - \tau g_k^Tg_k + \frac{1}{2} \tau^2 g_k^T B g_k \\ \frac{\partial m}{\partial \tau } = -g_k^Tg_k + \tau g_k^T B g_k = 0\\ \tau = \frac{g_k^Tg_k}{g_k^TBg_k}\\ s= -\frac{g_k^Tg_k}{g_k^TBg_k}g_k

2

Case 1: $g^TBg<0$
We have:

s = -\frac{\Delta g}{\|g\|}

\begin{aligned} m(0) - m(s) &=\frac{\Delta g^Tg}{\|g\|} - \frac{1}{2} \Delta^2 \frac{g^TBg}{\|g\|^2}\\ &=\Delta\|g\| - \frac{1}{2} \Delta^2 \frac{g^TBg}{\|g\|^2}\\ &\geq \Delta\|g\|\\ &\geq \frac 1 2 \|g\|\min\left\{\Delta,\frac{\|g\|}{\|B\|}\right\} \end{aligned}

Case 2: $g^TBg\geq 0$ . For this case, we have two small cases.
First:

\frac{\|g\|^3}{\Delta g^TBg}\leq 1

We have:

\tau = \frac{\|g\|^3}{\Delta g^TBg}

s = -\frac{\|g\|^2 g}{g^TBg}

\begin{aligned} m(0) - m(s) &=-g^Ts - \frac 1 2 s^TBs\\ &=\frac 1 2\frac{\|g\|^4}{g^TBg}\\ &\geq \frac 1 2\frac{\|g\|^4}{\|B\| \|g\|^2}\\ &\geq \frac 1 2 \|g\|\min\left\{\Delta,\frac{\|g\|}{\|B\|}\right\} \end{aligned}

Second:

\frac{\|g\|^3}{\Delta g^TBg}> 1

We have $\tau =1$ , silmilar to Case 1:

\begin{aligned} m(0) - m(s) &=\frac{\Delta g^Tg}{\|g\|} - \frac{1}{2} \Delta^2 \frac{g^TBg}{\|g\|^2}\\ &=\Delta\|g\| - \frac{1}{2} \Delta^2 \frac{g^TBg}{\|g\|^2}\\ &\geq \Delta\|g\| -\frac1 2 \Delta^2 \frac{\|g\|^3}{\Delta\|g\|^2}\\ &= \frac 1 2 \Delta \|g\|\\ &\geq \frac 1 2 \|g\|\min\left\{\Delta,\frac{\|g\|}{\|B\|}\right\} \end{aligned}

3

(1)

$p^TAq = 0$ , therefore they are conjugated.

(2)

$\bold x = (x_1,x_2,x_3)^T, \bold y = (y_1,y_2,y_3)^T,x^TAy = 0$
$x^TAy = (x_1+2x_2+x_3)y_1+(2x_1+6x_2)y_2+(x_1+4x_3)y_3 = 0$
let $\bold x = (0,0,1)^T$ , we can have: $\bold y = (-4,0,1)^T$

4

$x^TAy = 0$ , if $x$ and $y$ are not linear independent, there exists $\lambda\in \R,\lambda\neq 0$ , subject to: $x = \lambda y$ . Hence $x^TAy = \lambda x^TAx = 0$ . However $A$ is positive definite. Thus $x$ and $y$ are linear independent.

5

This is proved by python.

import numpy as np

def FR_acc():
    # 定义 G, b, x0 和误差精度
    G = np.mat(np.array([[2, 0], [0, 8]]))
    b = np.mat(np.array([[-4], [-8]]))
    x0 = np.mat(np.array([[0], [0]]))
    eps = 1e-9
    
    # 定义梯度函数 g(x) = G*x + b
    def gradient(x):
        return G @ x + b

    # 初始化
    x = x0
    old_g = gradient(x)
    old_d = -old_g
    old_x = np.mat(np.array([[100], [0]]))
    count = 0
    # 迭代过程
    d_list = []
    g_list = []
    
    while np.linalg.norm(old_x - x) > eps:
        # 计算新的梯度
        new_g = gradient(x)
        # 计算 beta
        beta = (new_g.T @ new_g) / (old_g.T @ old_g)
        # 更新方向 d
        d = -new_g + float(beta) * old_d
        # 计算 alpha
        alpha = -(new_g.T @ d) / (d.T @ G @ d)
        # 保存当前值用于下一次迭代
        d_list.append(d)
        g_list.append(new_g)
        old_d = d
        old_g = new_g
        old_x = x

        # 更新 x
        x = x + float(alpha) * d
        count += 1

    return x,count,d_list,g_list

# 调用函数
result,count,d_list,g_list = FR_acc()
print("最优解 x =", result)
print("d:",np.array(d_list))
print("g:",np.array(g_list))

# analyze
G = np.mat(np.array([[2, 0], [0, 8]]))
for item in d_list:
    print(item.T @ G @ d_list[-1])

for item in g_list:
    print(item.T @ g_list[-1])

check = []
eps = 1e-13
for i in range(1,len(d_list)):
    check.append(float(np.abs(d_list[i].T @ g_list[i] 
                    + g_list[i].T @ g_list[i]))<eps)
print(check)

6

(1)

for

(\alpha_k,\beta_k) =\arg \min_{\alpha,\beta}f(x_k +\alpha d_k +\beta s_{k-1})

take the partial derivative:

\frac{\partial f}{\partial \alpha} = g_{k+1}^Td_k = 0

\frac{\partial f}{\partial \beta} = g_{k+1}^Ts_{k-1} = 0

g_{k+1}^Ts_k = g_{k+1}^T(\alpha d_k +\beta s_{k-1})=\alpha g_{k+1}^Td_k+\beta g_{k+1}^Ts_{k-1} = 0

(2)

when $f = \frac{1}{2}x^TGx+b^Tx$ , $g(x) = Gx + b$ , plug $g(x)$ into $\frac{\partial f}{\partial \alpha} = 0$ and $\frac{\partial f}{\partial \beta} = 0$ , we get:

d_k^Tg_k + \alpha d_k^TGd_k + \beta d_k^T G s_{k-1} = 0\\ s_{k-1}^Tg_k + \alpha s_{k-1}^TGd_k + \beta s_{k-1}^TGs_{k-1} = 0

change the form to $Ax=b$ :

\begin{pmatrix} d_k^Tg_k &d_k^T G s_{k-1}\\ s_{k-1}^TGd_k &s_{k-1}^TGs_{k-1} \end{pmatrix} \begin{pmatrix} \alpha\\ \beta \end{pmatrix}= \begin{pmatrix} -d_k^Tg_k\\ -s_{k-1}^Tg_k \end{pmatrix}

Using Cramer’s Rule, $\Delta_k = \det (A)$ , we get $\alpha_k, \beta_k$ .
Appying results from (1), we get:

\begin{aligned} f_{k+1}-f_k &=\frac{1}{2}(x_k +\alpha_k d_k +\beta s_{k-1})^TG(x_k +\alpha_k d_k +\beta s_{k-1}) - \frac{1}{2}x_k^TGx_k + b^T(\alpha_k d_k + \beta s_{k-1}) \\ &=\frac 1 2(x_k^TG+b^T)(\alpha_k d_k+\beta_k s_{k-1})+\frac 1 2(\alpha_k d_k+\beta_k s_{k-1} )^T(G(x_k+\alpha_k d_k+\beta_k s_{k-1})+b)\\ &=\frac 1 2 \alpha_k g_k^Td_k + \frac 1 2(\alpha_k d_k+\beta_k s_{k-1})^T(g_k + G(\alpha_k d_k+\beta_k s_{k-1}))\\ &=\frac 1 2 \alpha_k g_k^Td_k + \frac 1 2 s_k^Tg_{k+1}\\ &=\frac 1 2 \alpha_k g_k^Td_k \end{aligned}

(3)

Consider Positive Definite function:

f(x) = \frac 1 2 x^TGx +b^Tx

For FR method, we note the symbols with hat: $\hat{}$ . For the method given in the problem, we use the original symbols.
FR method: $\hat x_{k+1} = \hat x_k + \hat \alpha_k \hat d_k$ , where $\alpha_k$ minimizes $f(x)$ on direction $d_k$

\hat\beta_{k-1} = \frac{g_k^Tg_k}{g_{k-1}^Tg_{k-1}}\qquad \hat d_k = - g_k + \hat\beta_{k-1} \hat d_{k-1}

\begin{aligned} \hat x_{k+1} &= \hat x_k + \hat \alpha_k \hat d_k \\ &=x_k + \hat \alpha_k (- g_k + \hat\beta_{k-1} \hat d_{k-1})\\ &=x_k +\hat \alpha_k \left(- g_k + \frac{g_k^Tg_k}{g_{k-1}^Tg_{k-1}}\hat d_{k-1}\right) \end{aligned}

Method in this problem: $x_{k+1} = x_k + \alpha_k d_k + \beta_k s_{k-1}$ . Where $\alpha_k$ and $\beta_k$ minimizes $f(x)$ on the manifold $x_k +\mathrm{span}\{d_k,s_{k-1}\}$ which is same as : $x_k +\mathrm{span}\{g_k,\hat d_{k-1}\}$ .

\begin{aligned} x_{k+1} &=x_k + \alpha_k d_k + \beta_k s_{k-1}\\ &=x_k - \alpha_k g_k + \beta_k \hat \alpha_{k-1}\hat d_{k-1}\\ &=x_k + \frac{(g_k^Td_k)(s_{k-1}^TGs_{k-1})}{\Delta_k}g_k + \frac{(g_k^Td_k)(d_k^TGs_{k-1})}{\Delta_k}\hat \alpha_{k-1}\hat d_{k-1}\\ &=x_k + \tilde \alpha_k \left(-g_k + \frac{d_k^TGs_{k-1}}{s_{k-1}^TGs_{k-1}}\hat \alpha_{k-1}\hat d_{k-1}\right) \\ &=x_k + \tilde \alpha_k \left(-g_k + \frac{d_k^TG \hat d_{k-1}}{ \hat d_{k-1}^TG\hat d_{k-1}}\hat d_{k-1}\right) \\ \end{aligned}

As theorm 4.5 states, searching for the minimum on directions $d_j,j = 1,\ldots k$ step by step actually finds the minumum for $x_0 + \mathrm{span}\{d_1,\ldots,d_k\}$ for a $G$ congurent set of $\{d_j\}$ . We limit the search to a single step and the theorm still remains true.

p.s. 我没有证明出来(3)，我感觉应该要证明FR方法搜索点的方向刚好由（2）中的结论决定的方向上。即两个 $\hat d_k$ 的系数相同。

\frac{g_k^Tg_k}{g_{k-1}^Tg_{k-1}}=\frac{d_k^TG \hat d_{k-1}}{ \hat d_{k-1}^TG\hat d_{k-1}}

Learn

优化作业3

https://blog.jacklit.com/2024/11/15/优化作业3/

作者

Jack H

发布于

2024年11月15日

许可协议

Computer Vision 上一篇

优化作业2 下一篇