1
f ( x + s ) = f ( x − τ g x ) m ( τ ) = f ( x ) − τ g k T g k + 1 2 τ 2 g k T B g k ∂ m ∂ τ = − g k T g k + τ g k T B g k = 0 τ = g k T g k g k T B g k s = − g k T g k g k T B g k g k f(x+s) = f(x - \tau g_x)\\
m(\tau) = f(x) - \tau g_k^Tg_k + \frac{1}{2} \tau^2 g_k^T B g_k \\
\frac{\partial m}{\partial \tau } = -g_k^Tg_k + \tau g_k^T B g_k = 0\\
\tau = \frac{g_k^Tg_k}{g_k^TBg_k}\\
s= -\frac{g_k^Tg_k}{g_k^TBg_k}g_k
f ( x + s ) = f ( x − τ g x ) m ( τ ) = f ( x ) − τ g k T g k + 2 1 τ 2 g k T B g k ∂ τ ∂ m = − g k T g k + τ g k T B g k = 0 τ = g k T B g k g k T g k s = − g k T B g k g k T g k g k
2
Case 1: g T B g < 0 g^TBg<0 g T B g < 0
We have:
s = − Δ g ∥ g ∥ s = -\frac{\Delta g}{\|g\|}
s = − ∥ g ∥ Δ g
m ( 0 ) − m ( s ) = Δ g T g ∥ g ∥ − 1 2 Δ 2 g T B g ∥ g ∥ 2 = Δ ∥ g ∥ − 1 2 Δ 2 g T B g ∥ g ∥ 2 ≥ Δ ∥ g ∥ ≥ 1 2 ∥ g ∥ min { Δ , ∥ g ∥ ∥ B ∥ } \begin{aligned}
m(0) - m(s) &=\frac{\Delta g^Tg}{\|g\|} - \frac{1}{2} \Delta^2 \frac{g^TBg}{\|g\|^2}\\
&=\Delta\|g\| - \frac{1}{2} \Delta^2 \frac{g^TBg}{\|g\|^2}\\
&\geq \Delta\|g\|\\
&\geq \frac 1 2 \|g\|\min\left\{\Delta,\frac{\|g\|}{\|B\|}\right\}
\end{aligned}
m ( 0 ) − m ( s ) = ∥ g ∥ Δ g T g − 2 1 Δ 2 ∥ g ∥ 2 g T B g = Δ∥ g ∥ − 2 1 Δ 2 ∥ g ∥ 2 g T B g ≥ Δ∥ g ∥ ≥ 2 1 ∥ g ∥ min { Δ , ∥ B ∥ ∥ g ∥ }
Case 2: g T B g ≥ 0 g^TBg\geq 0 g T B g ≥ 0 . For this case, we have two small cases.
First:
∥ g ∥ 3 Δ g T B g ≤ 1 \frac{\|g\|^3}{\Delta g^TBg}\leq 1
Δ g T B g ∥ g ∥ 3 ≤ 1
We have:
τ = ∥ g ∥ 3 Δ g T B g \tau = \frac{\|g\|^3}{\Delta g^TBg}
τ = Δ g T B g ∥ g ∥ 3
s = − ∥ g ∥ 2 g g T B g s = -\frac{\|g\|^2 g}{g^TBg}
s = − g T B g ∥ g ∥ 2 g
m ( 0 ) − m ( s ) = − g T s − 1 2 s T B s = 1 2 ∥ g ∥ 4 g T B g ≥ 1 2 ∥ g ∥ 4 ∥ B ∥ ∥ g ∥ 2 ≥ 1 2 ∥ g ∥ min { Δ , ∥ g ∥ ∥ B ∥ } \begin{aligned}
m(0) - m(s) &=-g^Ts - \frac 1 2 s^TBs\\
&=\frac 1 2\frac{\|g\|^4}{g^TBg}\\
&\geq \frac 1 2\frac{\|g\|^4}{\|B\| \|g\|^2}\\
&\geq \frac 1 2 \|g\|\min\left\{\Delta,\frac{\|g\|}{\|B\|}\right\}
\end{aligned}
m ( 0 ) − m ( s ) = − g T s − 2 1 s T B s = 2 1 g T B g ∥ g ∥ 4 ≥ 2 1 ∥ B ∥∥ g ∥ 2 ∥ g ∥ 4 ≥ 2 1 ∥ g ∥ min { Δ , ∥ B ∥ ∥ g ∥ }
Second:
∥ g ∥ 3 Δ g T B g > 1 \frac{\|g\|^3}{\Delta g^TBg}> 1
Δ g T B g ∥ g ∥ 3 > 1
We have τ = 1 \tau =1 τ = 1 , silmilar to Case 1:
m ( 0 ) − m ( s ) = Δ g T g ∥ g ∥ − 1 2 Δ 2 g T B g ∥ g ∥ 2 = Δ ∥ g ∥ − 1 2 Δ 2 g T B g ∥ g ∥ 2 ≥ Δ ∥ g ∥ − 1 2 Δ 2 ∥ g ∥ 3 Δ ∥ g ∥ 2 = 1 2 Δ ∥ g ∥ ≥ 1 2 ∥ g ∥ min { Δ , ∥ g ∥ ∥ B ∥ } \begin{aligned}
m(0) - m(s) &=\frac{\Delta g^Tg}{\|g\|} - \frac{1}{2} \Delta^2 \frac{g^TBg}{\|g\|^2}\\
&=\Delta\|g\| - \frac{1}{2} \Delta^2 \frac{g^TBg}{\|g\|^2}\\
&\geq \Delta\|g\| -\frac1 2 \Delta^2 \frac{\|g\|^3}{\Delta\|g\|^2}\\
&= \frac 1 2 \Delta \|g\|\\
&\geq \frac 1 2 \|g\|\min\left\{\Delta,\frac{\|g\|}{\|B\|}\right\}
\end{aligned}
m ( 0 ) − m ( s ) = ∥ g ∥ Δ g T g − 2 1 Δ 2 ∥ g ∥ 2 g T B g = Δ∥ g ∥ − 2 1 Δ 2 ∥ g ∥ 2 g T B g ≥ Δ∥ g ∥ − 2 1 Δ 2 Δ∥ g ∥ 2 ∥ g ∥ 3 = 2 1 Δ∥ g ∥ ≥ 2 1 ∥ g ∥ min { Δ , ∥ B ∥ ∥ g ∥ }
3
(1)
p T A q = 0 p^TAq = 0 p T A q = 0 , therefore they are conjugated.
(2)
x = ( x 1 , x 2 , x 3 ) T , y = ( y 1 , y 2 , y 3 ) T , x T A y = 0 \bold x = (x_1,x_2,x_3)^T, \bold y = (y_1,y_2,y_3)^T,x^TAy = 0 x = ( x 1 , x 2 , x 3 ) T , y = ( y 1 , y 2 , y 3 ) T , x T A y = 0
x T A y = ( x 1 + 2 x 2 + x 3 ) y 1 + ( 2 x 1 + 6 x 2 ) y 2 + ( x 1 + 4 x 3 ) y 3 = 0 x^TAy = (x_1+2x_2+x_3)y_1+(2x_1+6x_2)y_2+(x_1+4x_3)y_3 = 0 x T A y = ( x 1 + 2 x 2 + x 3 ) y 1 + ( 2 x 1 + 6 x 2 ) y 2 + ( x 1 + 4 x 3 ) y 3 = 0
let x = ( 0 , 0 , 1 ) T \bold x = (0,0,1)^T x = ( 0 , 0 , 1 ) T , we can have: y = ( − 4 , 0 , 1 ) T \bold y = (-4,0,1)^T y = ( − 4 , 0 , 1 ) T
4
x T A y = 0 x^TAy = 0 x T A y = 0 , if x x x and y y y are not linear independent, there exists λ ∈ R , λ ≠ 0 \lambda\in \R,\lambda\neq 0 λ ∈ R , λ = 0 , subject to: x = λ y x = \lambda y x = λ y . Hence x T A y = λ x T A x = 0 x^TAy = \lambda x^TAx = 0 x T A y = λ x T A x = 0 . However A A A is positive definite. Thus x x x and y y y are linear independent.
5
This is proved by python.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 import numpy as npdef FR_acc (): G = np.mat(np.array([[2 , 0 ], [0 , 8 ]])) b = np.mat(np.array([[-4 ], [-8 ]])) x0 = np.mat(np.array([[0 ], [0 ]])) eps = 1e-9 def gradient (x ): return G @ x + b x = x0 old_g = gradient(x) old_d = -old_g old_x = np.mat(np.array([[100 ], [0 ]])) count = 0 d_list = [] g_list = [] while np.linalg.norm(old_x - x) > eps: new_g = gradient(x) beta = (new_g.T @ new_g) / (old_g.T @ old_g) d = -new_g + float (beta) * old_d alpha = -(new_g.T @ d) / (d.T @ G @ d) d_list.append(d) g_list.append(new_g) old_d = d old_g = new_g old_x = x x = x + float (alpha) * d count += 1 return x,count,d_list,g_list result,count,d_list,g_list = FR_acc()print ("最优解 x =" , result)print ("d:" ,np.array(d_list))print ("g:" ,np.array(g_list)) G = np.mat(np.array([[2 , 0 ], [0 , 8 ]]))for item in d_list: print (item.T @ G @ d_list[-1 ])for item in g_list: print (item.T @ g_list[-1 ]) check = [] eps = 1e-13 for i in range (1 ,len (d_list)): check.append(float (np.abs (d_list[i].T @ g_list[i] + g_list[i].T @ g_list[i]))<eps)print (check)
6
(1)
for
( α k , β k ) = arg min α , β f ( x k + α d k + β s k − 1 ) (\alpha_k,\beta_k) =\arg \min_{\alpha,\beta}f(x_k +\alpha d_k +\beta s_{k-1})
( α k , β k ) = arg α , β min f ( x k + α d k + β s k − 1 )
take the partial derivative:
∂ f ∂ α = g k + 1 T d k = 0 \frac{\partial f}{\partial \alpha} = g_{k+1}^Td_k = 0
∂ α ∂ f = g k + 1 T d k = 0
∂ f ∂ β = g k + 1 T s k − 1 = 0 \frac{\partial f}{\partial \beta} = g_{k+1}^Ts_{k-1} = 0
∂ β ∂ f = g k + 1 T s k − 1 = 0
g k + 1 T s k = g k + 1 T ( α d k + β s k − 1 ) = α g k + 1 T d k + β g k + 1 T s k − 1 = 0 g_{k+1}^Ts_k = g_{k+1}^T(\alpha d_k +\beta s_{k-1})=\alpha g_{k+1}^Td_k+\beta g_{k+1}^Ts_{k-1} = 0
g k + 1 T s k = g k + 1 T ( α d k + β s k − 1 ) = α g k + 1 T d k + β g k + 1 T s k − 1 = 0
(2)
when f = 1 2 x T G x + b T x f = \frac{1}{2}x^TGx+b^Tx f = 2 1 x T G x + b T x , g ( x ) = G x + b g(x) = Gx + b g ( x ) = G x + b , plug g ( x ) g(x) g ( x ) into ∂ f ∂ α = 0 \frac{\partial f}{\partial \alpha} = 0 ∂ α ∂ f = 0 and ∂ f ∂ β = 0 \frac{\partial f}{\partial \beta} = 0 ∂ β ∂ f = 0 , we get:
d k T g k + α d k T G d k + β d k T G s k − 1 = 0 s k − 1 T g k + α s k − 1 T G d k + β s k − 1 T G s k − 1 = 0 d_k^Tg_k + \alpha d_k^TGd_k + \beta d_k^T G s_{k-1} = 0\\
s_{k-1}^Tg_k + \alpha s_{k-1}^TGd_k + \beta s_{k-1}^TGs_{k-1} = 0
d k T g k + α d k T G d k + β d k T G s k − 1 = 0 s k − 1 T g k + α s k − 1 T G d k + β s k − 1 T G s k − 1 = 0
change the form to A x = b Ax=b A x = b :
( d k T g k d k T G s k − 1 s k − 1 T G d k s k − 1 T G s k − 1 ) ( α β ) = ( − d k T g k − s k − 1 T g k ) \begin{pmatrix}
d_k^Tg_k &d_k^T G s_{k-1}\\
s_{k-1}^TGd_k &s_{k-1}^TGs_{k-1}
\end{pmatrix}
\begin{pmatrix}
\alpha\\
\beta
\end{pmatrix}=
\begin{pmatrix}
-d_k^Tg_k\\
-s_{k-1}^Tg_k
\end{pmatrix}
( d k T g k s k − 1 T G d k d k T G s k − 1 s k − 1 T G s k − 1 ) ( α β ) = ( − d k T g k − s k − 1 T g k )
Using Cramer’s Rule, Δ k = det ( A ) \Delta_k = \det (A) Δ k = det ( A ) , we get α k , β k \alpha_k, \beta_k α k , β k .
Appying results from (1), we get:
f k + 1 − f k = 1 2 ( x k + α k d k + β s k − 1 ) T G ( x k + α k d k + β s k − 1 ) − 1 2 x k T G x k + b T ( α k d k + β s k − 1 ) = 1 2 ( x k T G + b T ) ( α k d k + β k s k − 1 ) + 1 2 ( α k d k + β k s k − 1 ) T ( G ( x k + α k d k + β k s k − 1 ) + b ) = 1 2 α k g k T d k + 1 2 ( α k d k + β k s k − 1 ) T ( g k + G ( α k d k + β k s k − 1 ) ) = 1 2 α k g k T d k + 1 2 s k T g k + 1 = 1 2 α k g k T d k \begin{aligned}
f_{k+1}-f_k &=\frac{1}{2}(x_k +\alpha_k d_k +\beta s_{k-1})^TG(x_k +\alpha_k d_k +\beta s_{k-1}) - \frac{1}{2}x_k^TGx_k + b^T(\alpha_k d_k + \beta s_{k-1}) \\
&=\frac 1 2(x_k^TG+b^T)(\alpha_k d_k+\beta_k s_{k-1})+\frac 1 2(\alpha_k d_k+\beta_k s_{k-1} )^T(G(x_k+\alpha_k d_k+\beta_k s_{k-1})+b)\\
&=\frac 1 2 \alpha_k g_k^Td_k + \frac 1 2(\alpha_k d_k+\beta_k s_{k-1})^T(g_k + G(\alpha_k d_k+\beta_k s_{k-1}))\\
&=\frac 1 2 \alpha_k g_k^Td_k + \frac 1 2 s_k^Tg_{k+1}\\
&=\frac 1 2 \alpha_k g_k^Td_k
\end{aligned}
f k + 1 − f k = 2 1 ( x k + α k d k + β s k − 1 ) T G ( x k + α k d k + β s k − 1 ) − 2 1 x k T G x k + b T ( α k d k + β s k − 1 ) = 2 1 ( x k T G + b T ) ( α k d k + β k s k − 1 ) + 2 1 ( α k d k + β k s k − 1 ) T ( G ( x k + α k d k + β k s k − 1 ) + b ) = 2 1 α k g k T d k + 2 1 ( α k d k + β k s k − 1 ) T ( g k + G ( α k d k + β k s k − 1 )) = 2 1 α k g k T d k + 2 1 s k T g k + 1 = 2 1 α k g k T d k
(3)
Consider Positive Definite function:
f ( x ) = 1 2 x T G x + b T x f(x) = \frac 1 2 x^TGx +b^Tx
f ( x ) = 2 1 x T G x + b T x
For FR method, we note the symbols with hat: ^ \hat{} ^ . For the method given in the problem, we use the original symbols.
FR method: x ^ k + 1 = x ^ k + α ^ k d ^ k \hat x_{k+1} = \hat x_k + \hat \alpha_k \hat d_k x ^ k + 1 = x ^ k + α ^ k d ^ k , where α k \alpha_k α k minimizes f ( x ) f(x) f ( x ) on direction d k d_k d k
β ^ k − 1 = g k T g k g k − 1 T g k − 1 d ^ k = − g k + β ^ k − 1 d ^ k − 1 \hat\beta_{k-1} = \frac{g_k^Tg_k}{g_{k-1}^Tg_{k-1}}\qquad \hat d_k = - g_k + \hat\beta_{k-1} \hat d_{k-1}
β ^ k − 1 = g k − 1 T g k − 1 g k T g k d ^ k = − g k + β ^ k − 1 d ^ k − 1
x ^ k + 1 = x ^ k + α ^ k d ^ k = x k + α ^ k ( − g k + β ^ k − 1 d ^ k − 1 ) = x k + α ^ k ( − g k + g k T g k g k − 1 T g k − 1 d ^ k − 1 ) \begin{aligned}
\hat x_{k+1} &= \hat x_k + \hat \alpha_k \hat d_k \\
&=x_k + \hat \alpha_k (- g_k + \hat\beta_{k-1} \hat d_{k-1})\\
&=x_k +\hat \alpha_k \left(- g_k + \frac{g_k^Tg_k}{g_{k-1}^Tg_{k-1}}\hat d_{k-1}\right)
\end{aligned} x ^ k + 1 = x ^ k + α ^ k d ^ k = x k + α ^ k ( − g k + β ^ k − 1 d ^ k − 1 ) = x k + α ^ k ( − g k + g k − 1 T g k − 1 g k T g k d ^ k − 1 )
Method in this problem: x k + 1 = x k + α k d k + β k s k − 1 x_{k+1} = x_k + \alpha_k d_k + \beta_k s_{k-1} x k + 1 = x k + α k d k + β k s k − 1 . Where α k \alpha_k α k and β k \beta_k β k minimizes f ( x ) f(x) f ( x ) on the manifold x k + s p a n { d k , s k − 1 } x_k +\mathrm{span}\{d_k,s_{k-1}\} x k + span { d k , s k − 1 } which is same as : x k + s p a n { g k , d ^ k − 1 } x_k +\mathrm{span}\{g_k,\hat d_{k-1}\} x k + span { g k , d ^ k − 1 } .
x k + 1 = x k + α k d k + β k s k − 1 = x k − α k g k + β k α ^ k − 1 d ^ k − 1 = x k + ( g k T d k ) ( s k − 1 T G s k − 1 ) Δ k g k + ( g k T d k ) ( d k T G s k − 1 ) Δ k α ^ k − 1 d ^ k − 1 = x k + α ~ k ( − g k + d k T G s k − 1 s k − 1 T G s k − 1 α ^ k − 1 d ^ k − 1 ) = x k + α ~ k ( − g k + d k T G d ^ k − 1 d ^ k − 1 T G d ^ k − 1 d ^ k − 1 ) \begin{aligned}
x_{k+1} &=x_k + \alpha_k d_k + \beta_k s_{k-1}\\
&=x_k - \alpha_k g_k + \beta_k \hat \alpha_{k-1}\hat d_{k-1}\\
&=x_k + \frac{(g_k^Td_k)(s_{k-1}^TGs_{k-1})}{\Delta_k}g_k + \frac{(g_k^Td_k)(d_k^TGs_{k-1})}{\Delta_k}\hat \alpha_{k-1}\hat d_{k-1}\\
&=x_k + \tilde \alpha_k \left(-g_k + \frac{d_k^TGs_{k-1}}{s_{k-1}^TGs_{k-1}}\hat \alpha_{k-1}\hat d_{k-1}\right) \\
&=x_k + \tilde \alpha_k \left(-g_k + \frac{d_k^TG \hat d_{k-1}}{ \hat d_{k-1}^TG\hat d_{k-1}}\hat d_{k-1}\right) \\
\end{aligned} x k + 1 = x k + α k d k + β k s k − 1 = x k − α k g k + β k α ^ k − 1 d ^ k − 1 = x k + Δ k ( g k T d k ) ( s k − 1 T G s k − 1 ) g k + Δ k ( g k T d k ) ( d k T G s k − 1 ) α ^ k − 1 d ^ k − 1 = x k + α ~ k ( − g k + s k − 1 T G s k − 1 d k T G s k − 1 α ^ k − 1 d ^ k − 1 ) = x k + α ~ k ( − g k + d ^ k − 1 T G d ^ k − 1 d k T G d ^ k − 1 d ^ k − 1 )
As theorm 4.5 states, searching for the minimum on directions d j , j = 1 , … k d_j,j = 1,\ldots k d j , j = 1 , … k step by step actually finds the minumum for x 0 + s p a n { d 1 , … , d k } x_0 + \mathrm{span}\{d_1,\ldots,d_k\} x 0 + span { d 1 , … , d k } for a G G G congurent set of { d j } \{d_j\} { d j } . We limit the search to a single step and the theorm still remains true.
p.s. 我没有证明出来(3),我感觉应该要证明FR方法搜索点的方向刚好由(2)中的结论决定的方向上。即两个d ^ k \hat d_k d ^ k 的系数相同。
g k T g k g k − 1 T g k − 1 = d k T G d ^ k − 1 d ^ k − 1 T G d ^ k − 1 \frac{g_k^Tg_k}{g_{k-1}^Tg_{k-1}}=\frac{d_k^TG \hat d_{k-1}}{ \hat d_{k-1}^TG\hat d_{k-1}}
g k − 1 T g k − 1 g k T g k = d ^ k − 1 T G d ^ k − 1 d k T G d ^ k − 1