Computing local moments

As a pre-processing for shape analysis and registration, one often needs to compute local features such as point normals and curvatures. In this context, local point moments are a useful building block. We now explain in depth the implementation of our point_moments() method.

Definition

Let us consider a point cloud \(x_1, \dots, x_N\) in dimension \(D = 2\) or \(3\). We associate to every point \(x_i\) a local neighborhood \(\nu_i(x)\), understood as a distribution of mass on the \(N\) points. The full point_neighborhoods() structure is encoded in the \(N \times N\) matrix:

\[\begin{split}\mathcal{V} ~=~ \left( \begin{array}{ccc} ~ & \nu_1 & ~ \\ \hline ~ & \vdots & \\ \hline ~ & \nu_N & ~ \end{array} \right) ~=~ \begin{pmatrix} \nu_1(x_1) & \dots & \nu_1(x_N) \\ \vdots & \ddots & \vdots \\ \nu_N(x_1) & \dots & \nu_N(x_N) \end{pmatrix}~.\end{split}\]

Order 0. The local mass \(m_i\) around point \(x_i\) is defined by:

\[m_i ~=~ \sum_{j=1}^N \nu_i(x_j)~.\]

Order 1. The local mean \(\overline{x}_i\) around point \(x_i\) is defined by:

\[\overline{x}_i ~=~ \frac{1}{m_i} \sum_{j=1}^N \nu_i(x_j) x_j~.\]

Order 2. The local covariance matrix \(\Sigma_i\) around point \(x_i\) is defined by:

\[\Sigma_i ~=~ \frac{1}{m_i} \sum_{j=1}^N \nu_i(x_j) \, (x_j - \overline{x}_i) (x_j - \overline{x}_i)^\intercal~.\]

Just like other feature transforms or shape contexts, point moments encode some information about the local geometry of the shape. They are popular in shape analysis as precursors to rotation-invariant features: see the vast literature on the Riesz transform, steerable filters and Hu’s geometric moment invariants.

Implementation

Local masses and means. Moments of order 0 and 1 are easy to compute using matrix-matrix products with the neighborhood operator \(\mathcal{V}\). The vector of local masses \(m = (m_1, \dots, m_N)\) is given by:

\[\begin{split}m ~=~ \begin{pmatrix} m_1 \\ \vdots \\ m_N \end{pmatrix} ~=~ \mathcal{V} \begin{pmatrix} 1 \\ \vdots \\ 1 \end{pmatrix}~,\end{split}\]

and the \(N\times D\) matrix of local means \((\overline{x}_1, \dots, \overline{x}_N)\) is given by:

\[\begin{split}\left( \begin{array}{ccc} ~ & \overline{x}_1 & ~ \\ \hline ~ & \vdots & \\ \hline ~ & \overline{x}_N & ~ \end{array} \right) ~=~ \frac{1}{m} \mathcal{V} \left( \begin{array}{ccc} ~ & x_1 & ~ \\ \hline ~ & \vdots & \\ \hline ~ & x_N & ~ \end{array} \right)~.\end{split}\]

On the other hand, computing local covariances at scale is surprisingly tricky.

Local covariances. To compute the local covariance matrix \(\Sigma_i\) around point \(x_i\), performing the summation:

\[\Sigma_i ~=~ \frac{1}{m_i} \sum_{j=1}^N \nu_i(x_j) \,(x_j - \overline{x}_i) (x_j - \overline{x}_i)^\intercal\]

in parallel over the indices \(i\) is do-able if the neighborhood weight \(\nu_i(x_j)\) is known in closed form. For instance, if we use a Gaussian window of radius \(\sigma\):

\[\nu_i(x_j) ~=~ \exp\left(-\tfrac{1}{2\sigma^2} \|x_i - x_j\|^2\right)~,\]

the KeOps library performs the summation in fractions of a second on a GPU, even for large clouds of \(N > 100k\) points.

Limitations of the brute force approach. Unfortunately, this approach is not tractable in all settings:

The Nystroem approximation, the Fast Multipole Method or DiffusionNet layers rely on low-rank approximations of the neighborhood matrix \(\mathcal{V}\).
The heat method for geodesic distance computations relies on an implicit definition. The neighborhood operator \(\mathcal{V}\) is defined as:

\[\mathcal{V} ~=~ (\text{Id}_{N} - \tfrac{\sigma^2}{2}\Delta)^{-1}~ \simeq~ \exp(\tfrac{\sigma^2}{2}\Delta)~,\]

where \(\Delta\) is a discrete \(N\times N\) Laplacian matrix and matrix-matrix products with \(\mathcal{V}\) are implemented via a sparse Cholesky factorization.

In both settings, while matrix multiplication with \(\mathcal{V}\) is cheap, accessing all of the \(N^2\) entries \(\nu_i(x_j)\) of \(\mathcal{V}\) is not tractable when \(N > 10k\).

Naive quadratic expansion

Expanding the squared difference. A simple work-around would be to remark that:

\[\begin{split}\Sigma_i ~ &=~ \frac{1}{m_i} \sum_{j=1}^N \nu_i(x_j) \,(x_j - \overline{x}_i) (x_j - \overline{x}_i)^\intercal \\ &=~ \frac{1}{m_i} \sum_{j=1}^N \nu_i(x_j) \, x_j x_j^\intercal - \overline{x}_i \overline{x}_i^\intercal~.\end{split}\]

This is the standard statistical formula:

\[\text{Var}(X)~=~ \mathbb{E}(X^2) - \mathbb{E}(X)^2~\geqslant ~0~,\]

which quantifies the spread of a random variable \(X \sim \nu_i\) by computing the gap in Jensen’s inequality when applied to the convex function \(x \mapsto x^2\).

Numerical accuracy. Theoretically, we could use this identity to compute the \(D\times D\) covariance matrices \(\Sigma_i\) efficiently with:

\[\begin{split}\left( \begin{array}{ccc} ~ & \Sigma_1 & ~ \\ \hline ~ & \vdots & \\ \hline ~ & \Sigma_N & ~ \end{array} \right) ~=~ \frac{1}{m} \mathcal{V} \left( \begin{array}{ccc} ~ & x_1 x_1^\intercal & ~ \\ \hline ~ & \vdots & \\ \hline ~ & x_N x_N^\intercal & ~ \end{array} \right) ~-~ \left( \begin{array}{ccc} ~ & \overline{x}_1 \overline{x}_1^\intercal & ~ \\ \hline ~ & \vdots & \\ \hline ~ & \overline{x}_N \overline{x}_N^\intercal & ~ \end{array} \right)~.\end{split}\]

Unfortunately, this formula runs into catastrophic cancellation when it is implemented using floating-point arithmetic. With neighborhoods \(\nu_i\) defined at scale \(\sigma > 0\), the coefficients of the covariance matrix \(\Sigma_i\) are typically of order \(\sigma^2\), while \(x_jx_j^\intercal\) and \(\overline{x}_i\overline{x}_i^\intercal\) are of order \(\max(\|x_i\|)^2\).

In common shape processing scenarios, even if the point cloud \((x_1, \dots, x_N)\) is centered and normalized, the neighborhood scale \(\sigma\) is 100 to 1,000 times smaller than the diameter of the shape. We are interested in fine details and don’t want to over-smooth our shapes!

As a consequence, the coefficients of the covariance matrix \(\Sigma_i\) are usually 4 to 6 orders of magnitude smaller than both terms in the difference, which scale like \(\|x_i\|^2\). They cannot be computed reliably from the identity above if the \(x_jx_j^\intercal\) and \(\overline{x}_i\overline{x}_i^\intercal\) are stored in single-precision float32 format with 7 significant digits in base 10.

Portability. Performing the computation in double-precision float64, including the matrix multiply with the neighborhood operator \(\mathcal{V}\), could provide a sufficient accuracy. However, while float64 arithmetic is supported by most CPUs and high-end GPUs, it is not implemented at the hardware level by consumer-grade GPUs. A typical “gaming” GPU such as a Nvidia RTX4090 experiences a x64 slow-down when switching from float32 to float64 arithmetic.

Stable trigonometric expansion

Approximating x with sin(x). To provide a fast, general and portable implementation of point moments, we rely on the fact that if \(x_j - \overline{x}_i\) is of order \(\sigma\) and \(\ell = 10\, \sigma\), then:

\[\begin{split}x_j - \overline{x}_i~ &\simeq~ \ell \, \sin \left( \frac{x_j - \overline{x}_i}{\ell} \right) \\ &=~\ell\, \sin(X_j - \overline{X}_i)~,\end{split}\]

where \(X_j = x_j / \ell\) and \(\overline{X}_i = \overline{x}_i / \ell\).

This implies that if the support of the neighborhood \(\nu_i:x_j\mapsto \nu_i(x_j)\) of point \(x_i\) is concentrated on points \(x_j\) which are at distance at most \(5\,\sigma\) of \(x_i\), then:

\[\begin{split}\Sigma_i~ &=~\frac{1}{m_i} \sum_{j=1}^N \nu_i(x_j) \,(x_j - \overline{x}_i) (x_j - \overline{x}_i)^\intercal \\ &\simeq~ \frac{\ell^2}{m_i} \sum_{j=1}^N \nu_i(x_j) \,\sin(X_j - \overline{X}_i) \sin(X_j - \overline{X}_i)^\intercal~.\end{split}\]

Trigonometric expansion. Denoting by \(\imath\) the imaginary unit and \(\overline{z}\) the complex conjugate of \(z\), standard trigonometric identities let us rewrite this equation as:

\[\Sigma_i~ \simeq~ \frac{\ell^2}{m_i} \frac{1}{(2\imath)^2} \sum_{j=1}^N \nu_i(x_j) \,(z_{ij} - \overline{z}_{ij}) (z_{ij}^\intercal - \overline{z}_{ij}^\intercal)~,\]

where \(z_{ij} = \exp[\imath (X_j - \overline{X}_i)]\) is a \(D\)-dimensional vector of unitary complex numbers.

We know that for any complex-valued vector \(z\):

\[(z - \overline{z}) (z^\intercal - \overline{z}^\intercal) ~=~ 2 \,[ \text{Re}(zz^\intercal) - \text{Re}(z\overline{z}^\intercal) ]\]

We can thus write the relevant terms as products of \(i\) and \(j\) factors:

\[\begin{split}z_{ij} z^\intercal_{ij} ~&=~ \exp[\imath (X_j - \overline{X}_i)]\,\cdot\, \exp[\imath (X_j^\intercal - \overline{X}_i^\intercal)]\\ ~&=~ \exp[-\imath (\overline{X}_i + \overline{X}_i^\intercal)]\,\cdot\, \exp[\imath (X_j + X_j^\intercal)] \\ z_{ij} \overline{z}^\intercal_{ij} ~&=~ \exp[-\imath (X_j - \overline{X}_i)]\,\cdot\, \exp[\imath (X_j^\intercal - \overline{X}_i^\intercal)]\\ ~&=~ \exp[-\imath (\overline{X}_i - \overline{X}_i^\intercal)]\,\cdot\, \exp[\imath (X_j - X_j^\intercal)]~.\end{split}\]

If we denote by \(\langle w, z^\intercal \rangle = \text{Re}(\overline{w}z^\intercal)\) the dot product applied coordinate-wise on our \(2\times 2\) or \(3 \times 3\) matrices, we can now rewrite the covariance matrix \(\Sigma_i\) as:

\[\begin{split}\Sigma_i~ \simeq~ \frac{\ell^2}{2} \bigg[\, &\Big\langle \exp\big[ \imath (\overline{X}_i - \overline{X}_i^\intercal) \big], \frac{1}{m_i} \sum_{j=1}^N \nu_i(x_j)\, \exp\big[ \imath (X_j - X_j^\intercal) \big] \Big\rangle \\ ~-~ &\Big\langle \exp \big[\imath (\overline{X}_i + \overline{X}_i^\intercal)\big], \frac{1}{m_i} \sum_{j=1}^N \nu_i(x_j)\, \exp \big[\imath (X_j + X_j^\intercal) \big] \Big\rangle \,\bigg]\end{split}\]

Intuition. Looking at the diagonal coefficients of the covariance matrix \(\Sigma_i\), we know that \(\overline{X}_i = \overline{X}_i^\intercal\) and \(X_j = X_j^\intercal\) with a slight abuse of notation so the formula simply reads:

\[\text{diag}(\Sigma_i) ~\simeq~ \frac{\ell^2}{2} \bigg[\, 1 ~-~\Big\langle \exp \big[2 \imath \overline{X}_i\big], \frac{1}{m_i} \sum_{j=1}^N \nu_i(x_j)\, \exp \big[2\imath X_j \big] \Big\rangle \,\bigg]~.\]

This formula estimates the spread of \(X_j = x_j / \ell\) in the neighborhood of \(\overline{X}_i = \overline{x}_i / \ell\) using a generalized Jensen’s inequality on the boundary of the (convex) unit disk of the complex plane.

Numerical analysis. To perform this computation, one simply needs to use a matrix-matrix product with the neighborhood operator \(\mathcal{V}\) to smooth the coefficients of the \(D\times D\) tensors:

\[\cos(X_j + X_j^\intercal), ~~\sin(X_j + X_j^\intercal), ~~\cos(X_j - X_j^\intercal)~~\text{and} ~~\sin(X_j - X_j^\intercal)~.\]

The first three tensors are symmetric, while the latter is skew-symmetric. This leaves us with \(3 \cdot 3 + 1 = 10\) channels in dimension \(D=2\), and \(3\cdot 6 + 3 = 21\) channels in dimension \(D=3\).

Crucially, the complex exponentials are all of order 1. Using this trigonometric expansion, we are estimating a quantity of order \(\sigma^2\) as the difference of two terms of order \(\ell^2 = 100\, \sigma^2\). Even if the computation is performed in float32 arithmetic, this still leaves approximately \(7-2 = 5\) digits of decimal precision for \(\Sigma_i\), which is suitable for our purposes. We thus favor this method by default in point_moments() and all downstream methods.