Matrix Calculus - Intro

Matrix Calculus

Applied math very often involves matrix calculus. It is therefore a good idea to review some matrix calculus basics before diving into project specific solutions. It easier to learn the basics first and they are surprisingly simple to learn. A lot can be accomplished with a few intuitions and some notation.

Tensors, matrices, vectors and scalars

You can view all of these variables as tensors of differing rank. A rank $0$ tensor is scalar while a rank $1$ tensor is a vector and a rank $2$ tensor is a matrix. $$ \begin{array}{c c c} & tensor \ rank & example \\ scalar & 0 & x\\ vector & 1 & \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} \\ matrix & 2 & \begin{pmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{pmatrix} \\ tensor & 3+ &\\ \end{array} $$

Vector and scalar valued functions

We are likely all familiar with single valued functions of one variable : $$ f(x) $$ The value of the function $f$ depends on the input variable $x$ AKA argument or independent variable . This is called a scalar valued function . A scalar valued multivariable function has more than one input variable. For example input may be a position vector $f(x,y)$. Now a vector valued function has more than one function value. In this case the function value is a vector like a force with an $x$ and $y$ component : $\mathbf{f} = \begin{pmatrix} f_x \\ f_y \end{pmatrix}$ $$ \begin{array}{c c} type & notation & example\\ \hline scalar \ valued \ single \ variable & f(x) & f(x)\\ scalar \ valued \ multivariable & f(\mathbf{x}) & f(x,y) \\ vector \ valued \ single \ variable & \mathbf{f}(x) & \begin{pmatrix} f_x(x) \\ f_y(x) \end{pmatrix}\\ vector \ valued \ multivariable & \mathbf{f}(\mathbf{x}) & \begin{pmatrix} f_x(x,y) \\ f_y(x,y) \end{pmatrix}\\ matrix \ valued \ multivariable & F(\mathbf{x}) & \begin{pmatrix} f_{11}(x,y) & f_{12}(x,y) \\ f_{21}(x,y) & f_{22}(x,y) \end{pmatrix} \end{array} $$ In notation, use of $(x,y)=(x_1,x_2)=\mathbf{x}$ can sometimes cause confusion. Just keep in mind $\mathbf{x} \not = x$. The bold font indicates a vector or equivalently $\mathbf{x} = \vec{x} \not = x$. Often $x$ is a component of $\mathbf{x}$. Also, there is no standard notation for tensor/matrix valued functions or even vector valued functions. Capital letters often indicate a matrix but may be used for vectors as well. In any case, a vector is a single column or row matrix and is a rank 1 tensor.

Derivatives

In case you are not yet familiar with derivatives and you are wondering why bother reading this, the importance of derivatives will soon become apparent. For example, in robotics we can use the transpose Jacobian to calculate inverse kinematics trajectories. And in case you have not taken any course in calculus before, a derivative is just a slope, it is not a difficult concept. It is a measure of how rapidly something changes in some direction. If you are driving in the mountains and see a gradient sign you are seeing the derivative of the elevation, it is a measure of how quickly the elevation changes along that road. If the change $\Delta$ is small, the derivative of $f(x)$ is approximately equal to the change in function value $\Delta f$ divided by the change in argument $\Delta x$.
$$ scalar \ function, \ scalar \ argument \ \ f(x) $$ $$derivative \ \frac{d f}{d x} \simeq \frac{\Delta f}{\Delta x} $$ Now what if our function is a function of more than one variable? This means the function argument is a vector, maybe a position $\mathbf{x}$ where we denote a vector in bold font. If $\mathbf{x}$ is a position, $\mathbf{x}=x_1,x_2,x_2 = x,y,z$
$$ scalar \ function, \ vector \ argument \ \ f(\mathbf{x}) = f(x_1,x_2,\dots) $$ $$ derivative \ \frac{\partial f}{\partial \mathbf{x}} = \nabla f (\mathbf{x}) = \begin{pmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \dots \end{pmatrix} \simeq \begin{pmatrix} \frac{\Delta f}{\Delta x_1} \\ \frac{\Delta f}{\Delta x_2} \\ \dots \end{pmatrix} $$ This is still a scalar valued function. The value of the function $f$ has a single component, along the real line. It is the argument that is a vector. The resulting derivative is a vector of partial derivatives. Partial derivatives are the slopes in each direction of the basis corresponding to the input. The gradient along a road example above is actually a directional derivative in the arbitrary direction of the road. Partial derivatives generalize one dimensional derivatives. Directional derivatives are linear combinations of partial derivatives, they generalize the partial derivative but these are not discussed here. The first derivative of a multivariable scalar valued function is usually called the $\textbf{gradient}$. We can also have a function with multiple value components : a vector valued function. The vector function may have a scalar argument $\mathbf{f}(x)$ but may also have a higher rank argument like $\mathbf{f}(\mathbf{x})$. Consider vector valued with scalar argument: $$ vector \ function, \ scalar \ argument \ \ \mathbf{f}(x) = \begin{pmatrix} f_1(x) \\ f_2(x) \\ \dots \end{pmatrix} $$ $$ derivative \ \frac{ d \mathbf{f}}{ dx} = \begin{pmatrix} \frac{d f_1}{d x} & \frac{d f_2}{d x} & \dots \end{pmatrix} $$ The first derivative of a multivariable vector valued function is the Jacobian. $$ vector \ function, \ vector \ argument \ \ \mathbf{f}(\mathbf{x}) = \begin{pmatrix} f_1(\mathbf{x}) \\ f_2(\mathbf{x}) \\ \dots \end{pmatrix} = \begin{pmatrix} f_1(x_1,x_1,\dots) \\ f_2(x_1,x_1,\dots) \\ \dots \end{pmatrix} $$ $$ derivative \ \frac{\partial \mathbf{f}}{\partial \mathbf{x}} = J = \begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \dots & \frac{\partial f_1}{\partial x_n} \\ \dots & \dots &\dots \\ \frac{\partial f_m}{\partial x_1} & \dots & \frac{\partial f_m}{\partial x_n} \end{pmatrix} = \begin{pmatrix} \frac{\partial \mathbf{f} }{\partial x_1} & \frac{\partial \mathbf{f} }{\partial x_2} & \dots \end{pmatrix} = \begin{pmatrix} \nabla^T f_1\\ \nabla^T f_2\\ \dots\\ \nabla^T f_n \end{pmatrix} $$ The above matrix $J$ of partial first order derivatives is called the $\textbf{Jacobian}$. The Jacobian generalizes the first derivative for vector valued functions. The the rows of the Jacobian of a multivariable vector valued function are the transpose of the gradients of the function elements with respect to the vector input. Technically, the Jacobian is a tensor and may be a tensor field. The Jacobian is not limited to rank $2$ and most first derivatives are generalized by the Jacobian tensor. So if the derivative is a scalar, it is a Jacobian of rank zero while the gradient of a scalar valued multivariable function is the transpose of the Jacobian of rank one. You may use $J$ in each case.

Tensors generalize all derivative results. A derivative is always a tensor of some rank. The term scalar, vector, matrix and tensor derivative is generally used to indicate the derivative with respect to that variable type e.g, a matrix derivative is a derivative with respect to a matrix. But that means every derivative is a tensor derivative of some rank and every derivative evaluates to a tensor of some rank.

Summary $$ \begin{array}{c c c} & notation & result \\ \hline \frac{scalar}{scalar} & \frac{d f }{ dx } & scalar\\ \frac{scalar}{vector} & \frac{\partial f}{\partial \mathbf{x}} & \nabla \ (vector)\\ \frac{vector}{vector} & \frac{\partial \mathbf{f} }{\partial \mathbf{x}} & J \ (matrix)\\ \frac{matrix}{scalar} & \frac{\partial F(x)}{\partial x} & matrix \\ \frac{matrix}{vector} & \frac{\partial F(\mathbf{x})}{\partial \mathbf{x} } & tensor \\ \frac{scalar}{matrix} & \frac{\partial x}{\partial F } & matrix \end{array} $$

Comments

Popular posts from this blog

Data science & ML Video Tutorials Part II - Group & Set Theory (Groups, Rings & Fields)

Music Scoring With STFT-Gabor Transform

Guide To Walter Rudin's Principles, 1.14 (Proof Details)