前一段时间学习了无约束最优化方法,今天用Matlab实现了求解无约束最优化问题的FR共轭梯度法。关于共轭梯度法的理论介绍,请参考我的另一篇文章 无约束最优化方法学习笔记。
文件testConjungateGradient.m用于测试共轭梯度法函数。测试文件需要定义函数$f$和自变量$x$,给定迭代初值$x_0$和允许误差$\epsilon$。函数设置了show_detail变量用于控制是否显示每一步的迭代信息。
% test conjungate gradient method
% by TomHeaven, hanlin_tan@nudt.edu.cn, 2015.08.25
%% define function and variable
syms x1 x2;
%f = xs^2+2*ys^2-2*xs*ys + 2*ys + 2;
f = (x1-1)^4 + (x1 - x2)^2;
%f = (1-x1)^2 + 2*(x2 - x1^2)^2;
x = {x1, x2};
% initial value
x0 = [0 0];
% tolerance
epsilon = 1e-1;
%% call conjungate gradient method
show_detail = true;
[bestf, bestx, count] = conjungate_gradient(f, x, x0, epsilon, show_detail);
% print result
fprintf('bestx = %s, bestf = %f, count = %d\n', num2str(bestx), bestf, count);
文件conjungate_gradient.m是共轭梯度法的实现函数。变量nf表示函数$f$的梯度$\nabla f$(梯度的希腊字母是nabla,故用nf)。
function [fv, bestx, iter_num] = conjungate_gradient(f, x, x0, epsilon, show_detail)
%% conjungate gradient method
% by TomHeaven, hanlin_tan@nudt.edu.cn, 2015.08.25
% Input:
% f - syms function
% x - row cell arrow for input syms variables
% $x_0$ - init point
% epsilon - tolerance
% show_detail - a boolean value for wether to print details
% Output:
% fv - minimum f value
% bestx - mimimum point
% iter_num - iteration count
%% init
syms lambdas % suffix s indicates this is a symbol variable
% n is the dimension
n = length(x);
% compute differential of function f stored in cell nf
nf = cell(1, n); % using row cells, column cells will result in error
for i = 1 : n
nf{i} = diff(f, x{i});
end
% $\nabla f(x_0)$
nfv = subs(nf, x, x0);
% init $\nabla f(x_k)$
nfv_pre = nfv;
% init count, k and xv for x value.
count = 0;
k = 0;
xv = x0;
% initial search direction
d = - nfv;
% show initial info
if show_detail
fprintf('Initial:\n');
fprintf('f = %s, x0 = %s, epsilon = %f\n\n', char(f), num2str(x0), epsilon);
end
%% loop
while (norm(nfv) > epsilon)
%% one-dimensional search
% define $x_{k+1} = x_{k} + \lambda d$
xv = xv+lambdas*d;
% define $\phi$ and do 1-dim search
phi = subs(f, x, xv);
nphi = diff(phi); % $\nabla \phi$
lambda = solve(nphi);
% get rid of complex and minus solution
lambda = double(lambda);
if length(lambda) > 1
lambda = lambda(abs(imag(lambda)) < 1e-5);
lambda = lambda(lambda > 0);
lambda = lambda(1);
end
% if $\lambda$ is too small, stop iteration
if lambda < 1e-5
break;
end
%% update
% update $x_{k+1} = x_{k} + \lambda d$
xv = subs(xv, lambdas, lambda);
% convert sym to double
xv = double(xv);
% compute the differential
nfv = subs(nf, x, xv);
% update counters
count = count + 1;
k = k + 1;
% compute alpha based on FR formula
alpha = sumsqr(nfv) / sumsqr(nfv_pre);
% show iteration info
if show_detail
fprintf('Iteration: %d\n', count);
fprintf('x(%d) = %s, lambda = %f\n', count, num2str(xv), lambda);
fprintf('nf(x) = %s, norm(nf) = %f\n', num2str(double(nfv)), norm(double(nfv)));
fprintf('d = %s, alpha = %f\n', num2str(double(d)), double(alpha));
fprintf('\n');
end
% update conjungate direction
d = -nfv + alpha * d;
% save the previous $$\nabla f(x_k)$$
nfv_pre = nfv;
% reset the conjungate direction and k if k >= n
if k >= n
k = 0;
d = - nfv;
end
end % while
%% output
fv = double(subs(f, x, xv));
bestx = double(xv);
iter_num = count;
end
运行testConjungateGradient后输出结果如下:
>> testConjungateGradient
Initial:
f = (x1 - x2)^2 + (x1 - 1)^4, x0 = 0 0, epsilon = 0.100000
Iteration: 1
x(1) = 0.41025 0, lambda = 0.102561
nf(x) = 1.08e-16 -0.82049, norm(nf) = 0.820491
d = 4 0, alpha = 0.042075
Iteration: 2
x(2) = 0.52994 0.58355, lambda = 0.711218
nf(x) = -0.52265 0.10721, norm(nf) = 0.533528
d = 0.1683 0.82049, alpha = 0.422831
Iteration: 3
x(3) = 0.63914 0.56115, lambda = 0.208923
nf(x) = -0.031994 -0.15597, norm(nf) = 0.159223
d = 0.52265 -0.10721, alpha = 0.089062
Iteration: 4
x(4) = 0.76439 0.79465, lambda = 1.594673
nf(x) = -0.11285 0.060533, norm(nf) = 0.128062
d = 0.078542 0.14643, alpha = 0.646892
Iteration: 5
x(5) = 0.79174 0.77998, lambda = 0.242379
nf(x) = -0.012614 -0.023517, norm(nf) = 0.026686
d = 0.11285 -0.060533, alpha = 0.043425
bestx = 0.79174 0.77998, bestf = 0.002019, count = 5
修改允许误差为
epsilon = 1e-8;
则可以得到更加精确的结果:
Iteration: 6
x(6) = 0.9026 0.9122, lambda = 6.329707
nf(x) = -0.022884 0.019188, norm(nf) = 0.029864
d = 0.017515 0.020888, alpha = 1.252319
Iteration: 7
x(7) = 0.90828 0.90744, lambda = 0.247992
nf(x) = -0.0014077 -0.0016788, norm(nf) = 0.002191
d = 0.022884 -0.019188, alpha = 0.005382
Iteration: 8
x(8) = 0.97476 0.97586, lambda = 43.429293
nf(x) = -0.0022668 0.0022025, norm(nf) = 0.003161
d = 0.0015309 0.0015756, alpha = 2.080989
Iteration: 9
x(9) = 0.97533 0.97531, lambda = 0.249812
nf(x) = -2.9597e-05 -3.0461e-05, norm(nf) = 0.000042
d = 0.0022668 -0.0022025, alpha = 0.000181
Iteration: 10
x(10) = 0.99709 0.99712, lambda = 725.188481
nf(x) = -5.2106e-05 5.2008e-05, norm(nf) = 0.000074
d = 3.0006e-05 3.0063e-05, alpha = 3.004594
Iteration: 11
x(11) = 0.9971 0.9971, lambda = 0.249997
nf(x) = -4.8571e-08 -4.8663e-08, norm(nf) = 0.000000
d = 5.2106e-05 -5.2008e-05, alpha = 0.000001
Iteration: 12
x(12) = 0.99992 0.99992, lambda = 57856.826721
nf(x) = -9.3751e-08 9.3748e-08, norm(nf) = 0.000000
d = 4.8616e-08 4.8617e-08, alpha = 3.718503
Iteration: 13
x(13) = 0.99992 0.99992, lambda = 0.250000
nf(x) = -1.1858e-12 -1.1855e-12, norm(nf) = 0.000000
d = 9.3751e-08 -9.3748e-08, alpha = 0.000000
bestx = 0.99992 0.99992, bestf = 0.000000, count = 13
这与问题的最优解$(1,1)^T$已经非常接近了。
算法实现没有经过大量测试,实际使用可能会有BUG。这里只是用于说明基本实现原理,有兴趣的读者可以在此基础上改进。