优化作业3

1

f(x+s)=f(xτgx)m(τ)=f(x)τgkTgk+12τ2gkTBgkmτ=gkTgk+τgkTBgk=0τ=gkTgkgkTBgks=gkTgkgkTBgkgkf(x+s) = f(x - \tau g_x)\\ m(\tau) = f(x) - \tau g_k^Tg_k + \frac{1}{2} \tau^2 g_k^T B g_k \\ \frac{\partial m}{\partial \tau } = -g_k^Tg_k + \tau g_k^T B g_k = 0\\ \tau = \frac{g_k^Tg_k}{g_k^TBg_k}\\ s= -\frac{g_k^Tg_k}{g_k^TBg_k}g_k

2

Case 1: gTBg<0g^TBg<0
We have:

s=Δggs = -\frac{\Delta g}{\|g\|}

m(0)m(s)=ΔgTgg12Δ2gTBgg2=Δg12Δ2gTBgg2Δg12gmin{Δ,gB}\begin{aligned} m(0) - m(s) &=\frac{\Delta g^Tg}{\|g\|} - \frac{1}{2} \Delta^2 \frac{g^TBg}{\|g\|^2}\\ &=\Delta\|g\| - \frac{1}{2} \Delta^2 \frac{g^TBg}{\|g\|^2}\\ &\geq \Delta\|g\|\\ &\geq \frac 1 2 \|g\|\min\left\{\Delta,\frac{\|g\|}{\|B\|}\right\} \end{aligned}

Case 2: gTBg0g^TBg\geq 0 . For this case, we have two small cases.
First:

g3ΔgTBg1\frac{\|g\|^3}{\Delta g^TBg}\leq 1

We have:

τ=g3ΔgTBg\tau = \frac{\|g\|^3}{\Delta g^TBg}

s=g2ggTBgs = -\frac{\|g\|^2 g}{g^TBg}

m(0)m(s)=gTs12sTBs=12g4gTBg12g4Bg212gmin{Δ,gB}\begin{aligned} m(0) - m(s) &=-g^Ts - \frac 1 2 s^TBs\\ &=\frac 1 2\frac{\|g\|^4}{g^TBg}\\ &\geq \frac 1 2\frac{\|g\|^4}{\|B\| \|g\|^2}\\ &\geq \frac 1 2 \|g\|\min\left\{\Delta,\frac{\|g\|}{\|B\|}\right\} \end{aligned}

Second:

g3ΔgTBg>1\frac{\|g\|^3}{\Delta g^TBg}> 1

We have τ=1\tau =1 , silmilar to Case 1:

m(0)m(s)=ΔgTgg12Δ2gTBgg2=Δg12Δ2gTBgg2Δg12Δ2g3Δg2=12Δg12gmin{Δ,gB}\begin{aligned} m(0) - m(s) &=\frac{\Delta g^Tg}{\|g\|} - \frac{1}{2} \Delta^2 \frac{g^TBg}{\|g\|^2}\\ &=\Delta\|g\| - \frac{1}{2} \Delta^2 \frac{g^TBg}{\|g\|^2}\\ &\geq \Delta\|g\| -\frac1 2 \Delta^2 \frac{\|g\|^3}{\Delta\|g\|^2}\\ &= \frac 1 2 \Delta \|g\|\\ &\geq \frac 1 2 \|g\|\min\left\{\Delta,\frac{\|g\|}{\|B\|}\right\} \end{aligned}

3

(1)

pTAq=0p^TAq = 0 , therefore they are conjugated.

(2)

x=(x1,x2,x3)T,y=(y1,y2,y3)T,xTAy=0\bold x = (x_1,x_2,x_3)^T, \bold y = (y_1,y_2,y_3)^T,x^TAy = 0
xTAy=(x1+2x2+x3)y1+(2x1+6x2)y2+(x1+4x3)y3=0x^TAy = (x_1+2x_2+x_3)y_1+(2x_1+6x_2)y_2+(x_1+4x_3)y_3 = 0
let x=(0,0,1)T\bold x = (0,0,1)^T, we can have: y=(4,0,1)T\bold y = (-4,0,1)^T

4

xTAy=0x^TAy = 0, if xx and yy are not linear independent, there exists λR,λ0\lambda\in \R,\lambda\neq 0, subject to: x=λyx = \lambda y . Hence xTAy=λxTAx=0x^TAy = \lambda x^TAx = 0 . However AA is positive definite. Thus xx and yy are linear independent.

5

This is proved by python.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
import numpy as np

def FR_acc():
# 定义 G, b, x0 和误差精度
G = np.mat(np.array([[2, 0], [0, 8]]))
b = np.mat(np.array([[-4], [-8]]))
x0 = np.mat(np.array([[0], [0]]))
eps = 1e-9

# 定义梯度函数 g(x) = G*x + b
def gradient(x):
return G @ x + b

# 初始化
x = x0
old_g = gradient(x)
old_d = -old_g
old_x = np.mat(np.array([[100], [0]]))
count = 0
# 迭代过程
d_list = []
g_list = []

while np.linalg.norm(old_x - x) > eps:
# 计算新的梯度
new_g = gradient(x)
# 计算 beta
beta = (new_g.T @ new_g) / (old_g.T @ old_g)
# 更新方向 d
d = -new_g + float(beta) * old_d
# 计算 alpha
alpha = -(new_g.T @ d) / (d.T @ G @ d)
# 保存当前值用于下一次迭代
d_list.append(d)
g_list.append(new_g)
old_d = d
old_g = new_g
old_x = x

# 更新 x
x = x + float(alpha) * d
count += 1

return x,count,d_list,g_list

# 调用函数
result,count,d_list,g_list = FR_acc()
print("最优解 x =", result)
print("d:",np.array(d_list))
print("g:",np.array(g_list))

# analyze
G = np.mat(np.array([[2, 0], [0, 8]]))
for item in d_list:
print(item.T @ G @ d_list[-1])

for item in g_list:
print(item.T @ g_list[-1])

check = []
eps = 1e-13
for i in range(1,len(d_list)):
check.append(float(np.abs(d_list[i].T @ g_list[i]
+ g_list[i].T @ g_list[i]))<eps)
print(check)

6

(1)

for

(αk,βk)=argminα,βf(xk+αdk+βsk1)(\alpha_k,\beta_k) =\arg \min_{\alpha,\beta}f(x_k +\alpha d_k +\beta s_{k-1})

take the partial derivative:

fα=gk+1Tdk=0\frac{\partial f}{\partial \alpha} = g_{k+1}^Td_k = 0

fβ=gk+1Tsk1=0\frac{\partial f}{\partial \beta} = g_{k+1}^Ts_{k-1} = 0

gk+1Tsk=gk+1T(αdk+βsk1)=αgk+1Tdk+βgk+1Tsk1=0g_{k+1}^Ts_k = g_{k+1}^T(\alpha d_k +\beta s_{k-1})=\alpha g_{k+1}^Td_k+\beta g_{k+1}^Ts_{k-1} = 0

(2)

when f=12xTGx+bTxf = \frac{1}{2}x^TGx+b^Tx, g(x)=Gx+bg(x) = Gx + b, plug g(x)g(x) into fα=0\frac{\partial f}{\partial \alpha} = 0 and fβ=0\frac{\partial f}{\partial \beta} = 0 , we get:

dkTgk+αdkTGdk+βdkTGsk1=0sk1Tgk+αsk1TGdk+βsk1TGsk1=0d_k^Tg_k + \alpha d_k^TGd_k + \beta d_k^T G s_{k-1} = 0\\ s_{k-1}^Tg_k + \alpha s_{k-1}^TGd_k + \beta s_{k-1}^TGs_{k-1} = 0

change the form to Ax=bAx=b :

(dkTgkdkTGsk1sk1TGdksk1TGsk1)(αβ)=(dkTgksk1Tgk)\begin{pmatrix} d_k^Tg_k &d_k^T G s_{k-1}\\ s_{k-1}^TGd_k &s_{k-1}^TGs_{k-1} \end{pmatrix} \begin{pmatrix} \alpha\\ \beta \end{pmatrix}= \begin{pmatrix} -d_k^Tg_k\\ -s_{k-1}^Tg_k \end{pmatrix}

Using Cramer’s Rule, Δk=det(A)\Delta_k = \det (A) , we get αk,βk\alpha_k, \beta_k .
Appying results from (1), we get:

fk+1fk=12(xk+αkdk+βsk1)TG(xk+αkdk+βsk1)12xkTGxk+bT(αkdk+βsk1)=12(xkTG+bT)(αkdk+βksk1)+12(αkdk+βksk1)T(G(xk+αkdk+βksk1)+b)=12αkgkTdk+12(αkdk+βksk1)T(gk+G(αkdk+βksk1))=12αkgkTdk+12skTgk+1=12αkgkTdk\begin{aligned} f_{k+1}-f_k &=\frac{1}{2}(x_k +\alpha_k d_k +\beta s_{k-1})^TG(x_k +\alpha_k d_k +\beta s_{k-1}) - \frac{1}{2}x_k^TGx_k + b^T(\alpha_k d_k + \beta s_{k-1}) \\ &=\frac 1 2(x_k^TG+b^T)(\alpha_k d_k+\beta_k s_{k-1})+\frac 1 2(\alpha_k d_k+\beta_k s_{k-1} )^T(G(x_k+\alpha_k d_k+\beta_k s_{k-1})+b)\\ &=\frac 1 2 \alpha_k g_k^Td_k + \frac 1 2(\alpha_k d_k+\beta_k s_{k-1})^T(g_k + G(\alpha_k d_k+\beta_k s_{k-1}))\\ &=\frac 1 2 \alpha_k g_k^Td_k + \frac 1 2 s_k^Tg_{k+1}\\ &=\frac 1 2 \alpha_k g_k^Td_k \end{aligned}

(3)

Consider Positive Definite function:

f(x)=12xTGx+bTxf(x) = \frac 1 2 x^TGx +b^Tx

For FR method, we note the symbols with hat: ^\hat{} . For the method given in the problem, we use the original symbols.
FR method: x^k+1=x^k+α^kd^k\hat x_{k+1} = \hat x_k + \hat \alpha_k \hat d_k , where αk\alpha_k minimizes f(x)f(x) on direction dkd_k

β^k1=gkTgkgk1Tgk1d^k=gk+β^k1d^k1\hat\beta_{k-1} = \frac{g_k^Tg_k}{g_{k-1}^Tg_{k-1}}\qquad \hat d_k = - g_k + \hat\beta_{k-1} \hat d_{k-1}

x^k+1=x^k+α^kd^k=xk+α^k(gk+β^k1d^k1)=xk+α^k(gk+gkTgkgk1Tgk1d^k1)\begin{aligned} \hat x_{k+1} &= \hat x_k + \hat \alpha_k \hat d_k \\ &=x_k + \hat \alpha_k (- g_k + \hat\beta_{k-1} \hat d_{k-1})\\ &=x_k +\hat \alpha_k \left(- g_k + \frac{g_k^Tg_k}{g_{k-1}^Tg_{k-1}}\hat d_{k-1}\right) \end{aligned}

Method in this problem: xk+1=xk+αkdk+βksk1x_{k+1} = x_k + \alpha_k d_k + \beta_k s_{k-1} . Where αk\alpha_k and βk\beta_k minimizes f(x)f(x) on the manifold xk+span{dk,sk1}x_k +\mathrm{span}\{d_k,s_{k-1}\} which is same as : xk+span{gk,d^k1}x_k +\mathrm{span}\{g_k,\hat d_{k-1}\} .

xk+1=xk+αkdk+βksk1=xkαkgk+βkα^k1d^k1=xk+(gkTdk)(sk1TGsk1)Δkgk+(gkTdk)(dkTGsk1)Δkα^k1d^k1=xk+α~k(gk+dkTGsk1sk1TGsk1α^k1d^k1)=xk+α~k(gk+dkTGd^k1d^k1TGd^k1d^k1)\begin{aligned} x_{k+1} &=x_k + \alpha_k d_k + \beta_k s_{k-1}\\ &=x_k - \alpha_k g_k + \beta_k \hat \alpha_{k-1}\hat d_{k-1}\\ &=x_k + \frac{(g_k^Td_k)(s_{k-1}^TGs_{k-1})}{\Delta_k}g_k + \frac{(g_k^Td_k)(d_k^TGs_{k-1})}{\Delta_k}\hat \alpha_{k-1}\hat d_{k-1}\\ &=x_k + \tilde \alpha_k \left(-g_k + \frac{d_k^TGs_{k-1}}{s_{k-1}^TGs_{k-1}}\hat \alpha_{k-1}\hat d_{k-1}\right) \\ &=x_k + \tilde \alpha_k \left(-g_k + \frac{d_k^TG \hat d_{k-1}}{ \hat d_{k-1}^TG\hat d_{k-1}}\hat d_{k-1}\right) \\ \end{aligned}

As theorm 4.5 states, searching for the minimum on directions dj,j=1,kd_j,j = 1,\ldots k step by step actually finds the minumum for x0+span{d1,,dk}x_0 + \mathrm{span}\{d_1,\ldots,d_k\} for a GG congurent set of {dj}\{d_j\} . We limit the search to a single step and the theorm still remains true.

p.s. 我没有证明出来(3),我感觉应该要证明FR方法搜索点的方向刚好由(2)中的结论决定的方向上。即两个d^k\hat d_k的系数相同。

gkTgkgk1Tgk1=dkTGd^k1d^k1TGd^k1\frac{g_k^Tg_k}{g_{k-1}^Tg_{k-1}}=\frac{d_k^TG \hat d_{k-1}}{ \hat d_{k-1}^TG\hat d_{k-1}}


优化作业3
https://blog.jacklit.com/2024/11/15/优化作业3/
作者
Jack H
发布于
2024年11月15日
许可协议