Sherman–Morrison formula

Formula computing the inverse of the sum of a matrix and the outer product of two vectors

In linear algebra, the Sherman–Morrison formula, named after Jack Sherman and Winifred J. Morrison, computes the inverse of a "rank-1 update" to a matrix whose inverse has previously been computed.[1][2][3] That is, given an invertible matrix A {\displaystyle A} and the outer product u v T {\displaystyle uv^{\textsf {T}}} of vectors u {\displaystyle u} and v , {\displaystyle v,} the formula cheaply computes an updated matrix inverse ( A + u v T ) ) 1 . {\textstyle \left(A+uv^{\textsf {T}}\right){\vphantom {)}}^{\!-1}.}

The Sherman–Morrison formula is a special case of the Woodbury formula. Though named after Sherman and Morrison, it appeared already in earlier publications.[4]

Statement

Suppose A R n × n {\displaystyle A\in \mathbb {R} ^{n\times n}} is an invertible square matrix and u , v R n {\displaystyle u,v\in \mathbb {R} ^{n}} are column vectors. Then A + u v T {\displaystyle A+uv^{\textsf {T}}} is invertible iff 1 + v T A 1 u 0 {\displaystyle 1+v^{\textsf {T}}A^{-1}u\neq 0} . In this case,

( A + u v T ) 1 = A 1 A 1 u v T A 1 1 + v T A 1 u . {\displaystyle \left(A+uv^{\textsf {T}}\right)^{-1}=A^{-1}-{A^{-1}uv^{\textsf {T}}A^{-1} \over 1+v^{\textsf {T}}A^{-1}u}.}

Here, u v T {\displaystyle uv^{\textsf {T}}} is the outer product of two vectors u {\displaystyle u} and v {\displaystyle v} . The general form shown here is the one published by Bartlett.[5]

Proof

( {\displaystyle \Leftarrow } ) To prove that the backward direction 1 + v T A 1 u 0 A + u v T {\displaystyle 1+v^{\textsf {T}}A^{-1}u\neq 0\Rightarrow A+uv^{\textsf {T}}} is invertible with inverse given as above) is true, we verify the properties of the inverse. A matrix Y {\displaystyle Y} (in this case the right-hand side of the Sherman–Morrison formula) is the inverse of a matrix X {\displaystyle X} (in this case A + u v T {\displaystyle A+uv^{\textsf {T}}} ) if and only if X Y = Y X = I {\displaystyle XY=YX=I} .

We first verify that the right hand side ( Y {\displaystyle Y} ) satisfies X Y = I {\displaystyle XY=I} .

X Y = ( A + u v T ) ( A 1 A 1 u v T A 1 1 + v T A 1 u ) = A A 1 + u v T A 1 A A 1 u v T A 1 + u v T A 1 u v T A 1 1 + v T A 1 u = I + u v T A 1 u v T A 1 + u v T A 1 u v T A 1 1 + v T A 1 u = I + u v T A 1 u ( 1 + v T A 1 u ) v T A 1 1 + v T A 1 u = I + u v T A 1 u v T A 1 = I {\displaystyle {\begin{aligned}XY&=\left(A+uv^{\textsf {T}}\right)\left(A^{-1}-{A^{-1}uv^{\textsf {T}}A^{-1} \over 1+v^{\textsf {T}}A^{-1}u}\right)\\[6pt]&=AA^{-1}+uv^{\textsf {T}}A^{-1}-{AA^{-1}uv^{\textsf {T}}A^{-1}+uv^{\textsf {T}}A^{-1}uv^{\textsf {T}}A^{-1} \over 1+v^{\textsf {T}}A^{-1}u}\\[6pt]&=I+uv^{\textsf {T}}A^{-1}-{uv^{\textsf {T}}A^{-1}+uv^{\textsf {T}}A^{-1}uv^{\textsf {T}}A^{-1} \over 1+v^{\textsf {T}}A^{-1}u}\\[6pt]&=I+uv^{\textsf {T}}A^{-1}-{u\left(1+v^{\textsf {T}}A^{-1}u\right)v^{\textsf {T}}A^{-1} \over 1+v^{\textsf {T}}A^{-1}u}\\[6pt]&=I+uv^{\textsf {T}}A^{-1}-uv^{\textsf {T}}A^{-1}\\[6pt]&=I\end{aligned}}}

To end the proof of this direction, we need to show that Y X = I {\displaystyle YX=I} in a similar way as above:

Y X = ( A 1 A 1 u v T A 1 1 + v T A 1 u ) ( A + u v T ) = I . {\displaystyle YX=\left(A^{-1}-{A^{-1}uv^{\textsf {T}}A^{-1} \over 1+v^{\textsf {T}}A^{-1}u}\right)(A+uv^{\textsf {T}})=I.}

(In fact, the last step can be avoided since for square matrices X {\displaystyle X} and Y {\displaystyle Y} , X Y = I {\displaystyle XY=I} is equivalent to Y X = I {\displaystyle YX=I} .)

( {\displaystyle \Rightarrow } ) Reciprocally, if 1 + v T A 1 u = 0 {\displaystyle 1+v^{\textsf {T}}A^{-1}u=0} , then via the matrix determinant lemma, det ( A + u v T ) = ( 1 + v T A 1 u ) det ( A ) = 0 {\displaystyle \det \!\left(A+uv^{\textsf {T}}\right)=(1+v^{\textsf {T}}A^{-1}u)\det(A)=0} , so ( A + u v T ) {\displaystyle \left(A+uv^{\textsf {T}}\right)} is not invertible.

Application

If the inverse of A {\displaystyle A} is already known, the formula provides a numerically cheap way to compute the inverse of A {\displaystyle A} corrected by the matrix u v T {\displaystyle uv^{\textsf {T}}} (depending on the point of view, the correction may be seen as a perturbation or as a rank-1 update). The computation is relatively cheap because the inverse of A + u v T {\displaystyle A+uv^{\textsf {T}}} does not have to be computed from scratch (which in general is expensive), but can be computed by correcting (or perturbing) A 1 {\displaystyle A^{-1}} .

Using unit columns (columns from the identity matrix) for u {\displaystyle u} or v {\displaystyle v} , individual columns or rows of A {\displaystyle A} may be manipulated and a correspondingly updated inverse computed relatively cheaply in this way.[6] In the general case, where A 1 {\displaystyle A^{-1}} is an n {\displaystyle n} -by- n {\displaystyle n} matrix and u {\displaystyle u} and v {\displaystyle v} are arbitrary vectors of dimension n {\displaystyle n} , the whole matrix is updated[5] and the computation takes 3 n 2 {\displaystyle 3n^{2}} scalar multiplications.[7] If u {\displaystyle u} is a unit column, the computation takes only 2 n 2 {\displaystyle 2n^{2}} scalar multiplications. The same goes if v {\displaystyle v} is a unit column. If both u {\displaystyle u} and v {\displaystyle v} are unit columns, the computation takes only n 2 {\displaystyle n^{2}} scalar multiplications.

This formula also has application in theoretical physics. Namely, in quantum field theory, one uses this formula to calculate the propagator of a spin-1 field.[8][circular reference] The inverse propagator (as it appears in the Lagrangian) has the form A + u v T {\displaystyle A+uv^{\textsf {T}}} . One uses the Sherman–Morrison formula to calculate the inverse (satisfying certain time-ordering boundary conditions) of the inverse propagator—or simply the (Feynman) propagator—which is needed to perform any perturbative calculation[9] involving the spin-1 field.

One of the issues with the formula is that little is known about its numerical stability. There are no published results concerning its error bounds. Anecdotal evidence [10] suggests that the Woodbury matrix identity (a general case of the Sherman–Morrison formula) may diverge even for seemingly benign examples (when both the original and modified matrices are well-conditioned).

Alternative verification

Following is an alternate verification of the Sherman–Morrison formula using the easily verifiable identity

( I + w v T ) 1 = I w v T 1 + v T w {\displaystyle \left(I+wv^{\textsf {T}}\right)^{-1}=I-{\frac {wv^{\textsf {T}}}{1+v^{\textsf {T}}w}}} .

Let

u = A w , and A + u v T = A ( I + w v T ) , {\displaystyle u=Aw,\quad {\text{and}}\quad A+uv^{\textsf {T}}=A\left(I+wv^{\textsf {T}}\right),}

then

( A + u v T ) 1 = ( I + w v T ) 1 A 1 = ( I w v T 1 + v T w ) A 1 {\displaystyle \left(A+uv^{\textsf {T}}\right)^{-1}=\left(I+wv^{\textsf {T}}\right)^{-1}A^{-1}=\left(I-{\frac {wv^{\textsf {T}}}{1+v^{\textsf {T}}w}}\right)A^{-1}} .

Substituting w = A 1 u {\displaystyle w=A^{-1}u} gives

( A + u v T ) 1 = ( I A 1 u v T 1 + v T A 1 u ) A 1 = A 1 A 1 u v T A 1 1 + v T A 1 u {\displaystyle \left(A+uv^{\textsf {T}}\right)^{-1}=\left(I-{\frac {A^{-1}uv^{\textsf {T}}}{1+v^{\textsf {T}}A^{-1}u}}\right)A^{-1}=A^{-1}-{\frac {A^{-1}uv^{\textsf {T}}A^{-1}}{1+v^{\textsf {T}}A^{-1}u}}}

Generalization (Woodbury matrix identity)

Given a square invertible n × n {\displaystyle n\times n} matrix A {\displaystyle A} , an n × k {\displaystyle n\times k} matrix U {\displaystyle U} , and a k × n {\displaystyle k\times n} matrix V {\displaystyle V} , let B {\displaystyle B} be an n × n {\displaystyle n\times n} matrix such that B = A + U V {\displaystyle B=A+UV} . Then, assuming ( I k + V A 1 U ) {\displaystyle \left(I_{k}+VA^{-1}U\right)} is invertible, we have

B 1 = A 1 A 1 U ( I k + V A 1 U ) 1 V A 1 . {\displaystyle B^{-1}=A^{-1}-A^{-1}U\left(I_{k}+VA^{-1}U\right)^{-1}VA^{-1}.}

See also

References

  1. ^ Sherman, Jack; Morrison, Winifred J. (1949). "Adjustment of an Inverse Matrix Corresponding to Changes in the Elements of a Given Column or a Given Row of the Original Matrix (abstract)". Annals of Mathematical Statistics. 20: 621. doi:10.1214/aoms/1177729959.
  2. ^ Sherman, Jack; Morrison, Winifred J. (1950). "Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix". Annals of Mathematical Statistics. 21 (1): 124–127. doi:10.1214/aoms/1177729893. MR 0035118. Zbl 0037.00901.
  3. ^ Press, William H.; Teukolsky, Saul A.; Vetterling, William T.; Flannery, Brian P. (2007), "Section 2.7.1 Sherman–Morrison Formula", Numerical Recipes: The Art of Scientific Computing (3rd ed.), New York: Cambridge University Press, ISBN 978-0-521-88068-8, archived from the original on 2012-03-19, retrieved 2011-08-08
  4. ^ Hager, William W. (1989). "Updating the inverse of a matrix". SIAM Review. 31 (2): 221–239. doi:10.1137/1031049. JSTOR 2030425. MR 0997457. S2CID 7967459.
  5. ^ a b Bartlett, Maurice S. (1951). "An Inverse Matrix Adjustment Arising in Discriminant Analysis". Annals of Mathematical Statistics. 22 (1): 107–111. doi:10.1214/aoms/1177729698. MR 0040068. Zbl 0042.38203.
  6. ^ Langville, Amy N.; and Meyer, Carl D.; "Google's PageRank and Beyond: The Science of Search Engine Rankings", Princeton University Press, 2006, p. 156
  7. ^ Update of the inverse matrix by the Sherman–Morrison formula
  8. ^ Propagator#Spin 1
  9. ^ "Perturbative quantum field theory".
  10. ^ "MathOverflow discussion". MathOverflow.