Matlab实现FR共轭梯度法

前一段时间学习了无约束最优化方法,今天用Matlab实现了求解无约束最优化问题的FR共轭梯度法。关于共轭梯度法的理论介绍,请参考我的另一篇文章 无约束最优化方法学习笔记

文件testConjungateGradient.m用于测试共轭梯度法函数。测试文件需要定义函数$f$和自变量$x$,给定迭代初值$x_0$和允许误差$\epsilon$。函数设置了show_detail变量用于控制是否显示每一步的迭代信息。

% test conjungate gradient method
% by TomHeaven, hanlin_tan@nudt.edu.cn, 2015.08.25

%% define function and variable
syms x1 x2;
%f = xs^2+2*ys^2-2*xs*ys + 2*ys + 2;
f = (x1-1)^4 + (x1 - x2)^2;
%f = (1-x1)^2 + 2*(x2 - x1^2)^2;
x = {x1, x2};

% initial value
x0 = [0 0];
% tolerance
epsilon = 1e-1;

%% call conjungate gradient method
show_detail = true;
[bestf, bestx, count] = conjungate_gradient(f, x, x0, epsilon, show_detail);
% print result
fprintf('bestx = %s, bestf = %f, count = %d\n', num2str(bestx), bestf, count);


文件conjungate_gradient.m是共轭梯度法的实现函数。变量nf表示函数$f$的梯度$\nabla f$(梯度的希腊字母是nabla,故用nf)。

function [fv, bestx, iter_num] = conjungate_gradient(f, x, x0, epsilon, show_detail)
%% conjungate gradient method
% by TomHeaven, hanlin_tan@nudt.edu.cn, 2015.08.25
% Input: 
%   f - syms function
%   x - row cell arrow for input syms variables
%   $x_0$ - init point
%   epsilon - tolerance
%   show_detail - a boolean value for wether to print details
% Output:
%   fv - minimum f value
%   bestx - mimimum point
%   iter_num - iteration count

%% init
syms lambdas  % suffix s indicates this is a symbol variable

% n is the dimension
n = length(x);

% compute differential of function f stored in cell nf
nf = cell(1, n); % using row cells, column cells will result in error
for i = 1 : n
    nf{i} = diff(f, x{i});
end

% $\nabla f(x_0)$
nfv = subs(nf, x, x0);
% init $\nabla f(x_k)$
nfv_pre = nfv;
% init count, k and xv for x value.
count = 0;
k = 0;
xv = x0;
% initial search direction
d = - nfv; 
% show initial info
if show_detail
    fprintf('Initial:\n');
    fprintf('f = %s, x0 = %s, epsilon = %f\n\n', char(f), num2str(x0), epsilon);
end

%% loop
while (norm(nfv) > epsilon)
    %% one-dimensional search
    % define $x_{k+1} = x_{k} + \lambda d$
    xv = xv+lambdas*d;
    % define $\phi$ and do 1-dim search
    phi = subs(f, x, xv);
    nphi = diff(phi); % $\nabla \phi$
    lambda = solve(nphi);
    % get rid of complex and minus solution
    lambda = double(lambda);  
    if length(lambda) > 1
        lambda = lambda(abs(imag(lambda)) < 1e-5);
        lambda = lambda(lambda > 0);
        lambda = lambda(1);
    end
    % if $\lambda$ is too small, stop iteration
    if lambda < 1e-5
        break;
    end
    
    %% update
    % update $x_{k+1} = x_{k} + \lambda d$
    xv = subs(xv, lambdas, lambda); 
    % convert sym to double
    xv = double(xv);
    % compute the differential
    nfv = subs(nf, x, xv);   
    % update counters
    count = count + 1;
    k = k + 1; 
    % compute alpha based on FR formula
    alpha = sumsqr(nfv) / sumsqr(nfv_pre);
    
    % show iteration info
    if show_detail
        fprintf('Iteration: %d\n', count);
        fprintf('x(%d) = %s, lambda = %f\n', count, num2str(xv), lambda);
        fprintf('nf(x) = %s, norm(nf) = %f\n', num2str(double(nfv)), norm(double(nfv)));
        fprintf('d = %s, alpha = %f\n', num2str(double(d)), double(alpha));
        fprintf('\n');
    end
    
    % update conjungate direction
    d = -nfv + alpha * d;
    % save the previous $$\nabla f(x_k)$$
    nfv_pre = nfv;
    % reset the conjungate direction and k if k >= n
    if k >= n
        k = 0;
        d = - nfv;
    end
end % while

%% output
fv = double(subs(f, x, xv));
bestx = double(xv);
iter_num = count;

end

运行testConjungateGradient后输出结果如下:

>> testConjungateGradient
Initial:
f = (x1 - x2)^2 + (x1 - 1)^4, x0 = 0  0, epsilon = 0.100000

Iteration: 1
x(1) = 0.41025           0, lambda = 0.102561
nf(x) = 1.08e-16    -0.82049, norm(nf) = 0.820491
d = 4  0, alpha = 0.042075

Iteration: 2
x(2) = 0.52994     0.58355, lambda = 0.711218
nf(x) = -0.52265     0.10721, norm(nf) = 0.533528
d = 0.1683     0.82049, alpha = 0.422831

Iteration: 3
x(3) = 0.63914     0.56115, lambda = 0.208923
nf(x) = -0.031994    -0.15597, norm(nf) = 0.159223
d = 0.52265    -0.10721, alpha = 0.089062

Iteration: 4
x(4) = 0.76439     0.79465, lambda = 1.594673
nf(x) = -0.11285    0.060533, norm(nf) = 0.128062
d = 0.078542     0.14643, alpha = 0.646892

Iteration: 5
x(5) = 0.79174     0.77998, lambda = 0.242379
nf(x) = -0.012614   -0.023517, norm(nf) = 0.026686
d = 0.11285   -0.060533, alpha = 0.043425

bestx = 0.79174     0.77998, bestf = 0.002019, count = 5

修改允许误差为

epsilon = 1e-8;

则可以得到更加精确的结果:

Iteration: 6
x(6) = 0.9026      0.9122, lambda = 6.329707
nf(x) = -0.022884    0.019188, norm(nf) = 0.029864
d = 0.017515    0.020888, alpha = 1.252319

Iteration: 7
x(7) = 0.90828     0.90744, lambda = 0.247992
nf(x) = -0.0014077  -0.0016788, norm(nf) = 0.002191
d = 0.022884   -0.019188, alpha = 0.005382

Iteration: 8
x(8) = 0.97476     0.97586, lambda = 43.429293
nf(x) = -0.0022668   0.0022025, norm(nf) = 0.003161
d = 0.0015309   0.0015756, alpha = 2.080989

Iteration: 9
x(9) = 0.97533     0.97531, lambda = 0.249812
nf(x) = -2.9597e-05 -3.0461e-05, norm(nf) = 0.000042
d = 0.0022668  -0.0022025, alpha = 0.000181

Iteration: 10
x(10) = 0.99709     0.99712, lambda = 725.188481
nf(x) = -5.2106e-05  5.2008e-05, norm(nf) = 0.000074
d = 3.0006e-05  3.0063e-05, alpha = 3.004594

Iteration: 11
x(11) = 0.9971      0.9971, lambda = 0.249997
nf(x) = -4.8571e-08 -4.8663e-08, norm(nf) = 0.000000
d = 5.2106e-05 -5.2008e-05, alpha = 0.000001

Iteration: 12
x(12) = 0.99992     0.99992, lambda = 57856.826721
nf(x) = -9.3751e-08  9.3748e-08, norm(nf) = 0.000000
d = 4.8616e-08  4.8617e-08, alpha = 3.718503

Iteration: 13
x(13) = 0.99992     0.99992, lambda = 0.250000
nf(x) = -1.1858e-12 -1.1855e-12, norm(nf) = 0.000000
d = 9.3751e-08 -9.3748e-08, alpha = 0.000000

bestx = 0.99992     0.99992, bestf = 0.000000, count = 13

这与问题的最优解$(1,1)^T$已经非常接近了。

算法实现没有经过大量测试,实际使用可能会有BUG。这里只是用于说明基本实现原理,有兴趣的读者可以在此基础上改进。