optimization/preconditioned_conjugate_gradient_method.m

   1 function [x, k] = preconditioned_conjugate_gradient_method(Q, ...
   2                                                            M, ...
   3                                                            b, ...
   4                                                            x0, ...
   5                                                            tolerance, ...
   6                                                            max_iterations)
   7   %
   8   % Solve,
   9   %
  10   %   Qx = b
  11   %
  12   % or equivalently,
  13   %
  14   %   min [phi(x) = (1/2)*<Qx,x> + <b,x>]
  15   %
  16   % using the preconditioned conjugate gradient method (14.56 in
  17   % Guler). If ``M`` is the identity matrix, we use the slightly
  18   % faster implementation in conjugate_gradient_method.m.
  19   %
  20   % INPUT:
  21   %
  22   %   - ``Q`` -- The coefficient matrix of the system to solve. Must
  23   %     be positive definite.
  24   %
  25   %   - ``M`` -- The preconditioning matrix. If the actual matrix used
  26   %     to precondition ``Q`` is called ``C``, i.e. ``C^(-1) * Q *
  27   %     C^(-T) == \bar{Q}``, then M=CC^T. However the matrix ``C`` is
  28   %     never itself needed. This is explained in Guler, section 14.9.
  29   %
  30   %   - ``b`` -- The right-hand-side of the system to solve.
  31   %
  32   %   - ``x0`` -- The starting point for the search.
  33   %
  34   %   - ``tolerance`` -- How close ``Qx`` has to be to ``b`` (in
  35   %     magnitude) before we stop.
  36   %
  37   %   - ``max_iterations`` -- The maximum number of iterations to
  38   %     perform.
  39   %
  40   % OUTPUT:
  41   %
  42   %   - ``x`` - The computed solution to Qx=b.
  43   %
  44   %   - ``k`` - The ending value of k; that is, the number of
  45   %   iterations that were performed.
  46   %
  47   % NOTES:
  48   %
  49   % All vectors are assumed to be *column* vectors.
  50   %
  51   % The cited algorithm contains a typo; in "The Preconditioned
  52   % Conjugate-Gradient Method", we are supposed to define
  53   % d_{0} = -z_{0}, not -r_{0} as written.
  54   %
  55   % The rather verbose name of this function was chosen to avoid
  56   % conflicts with other implementations.
  57   %
  58   % REFERENCES:
  59   %
  60   %   1. Guler, Osman. Foundations of Optimization. New York, Springer,
  61   %      2010.
  62   %
  63   %   2. Shewchuk, Jonathan Richard. An Introduction to the Conjugate
  64   %      Gradient Method Without the Agonizing Pain, Edition 1.25.
  65   %      August 4, 1994.
  66   %
  67
  68   % We use this in the inner loop.
  69   sqrt_n = floor(sqrt(length(x0)));
  70
  71   % Set k=0 first, that way the references to xk,rk,zk,dk which
  72   % immediately follow correspond (semantically) to x0,r0,z0,d0.
  73   k = 0;
  74
  75   xk = x0;
  76   rk = Q*xk - b;
  77   zk = M \ rk;
  78   dk = -zk;
  79
  80   while (k <= max_iterations)
  81
  82     if (norm(rk) < tolerance)
  83       % Check our stopping condition. This should catch the k=0 case.
  84       x = xk;
  85       return;
  86     end
  87
  88     % Used twice, avoid recomputation.
  89     rkzk = rk' * zk;
  90
  91     % The term alpha_k*dk appears twice, but so does Q*dk. We can't
  92     % do them both, so we precompute the more expensive operation.
  93     Qdk = Q * dk;
  94
  95     alpha_k = rkzk/(dk' * Qdk);
  96     x_next = xk + (alpha_k * dk);
  97
  98     % The recursive definition of r_next is prone to accumulate
  99     % roundoff error. When sqrt(n) divides k, we recompute the
 100     % residual to minimize this error. This modification is due to the
 101     % second reference.
 102     if (mod(k, sqrt_n) == 0)
 103       r_next = Q*x_next - b;
 104     else
 105       r_next = rk + (alpha_k * Qdk);
 106     end
 107
 108     z_next = M \ r_next;
 109     beta_next = (r_next' * z_next)/rkzk;
 110     d_next = -z_next + beta_next*dk;
 111
 112     % We potentially just performed one more iteration than necessary
 113     % in order to simplify the loop. Note that due to the structure of
 114     % our loop, we will have k > max_iterations when we fail to
 115     % converge.
 116     k = k + 1;
 117     xk = x_next;
 118     rk = r_next;
 119     zk = z_next;
 120     dk = d_next;
 121   end
 122
 123   % The algorithm didn't converge, but we still want to return the
 124   % terminal value of xk.
 125   x = xk;
 126 end