Trivialization¶
arxiv-link Trivializations for Gradient-Based Optimization on Manifolds
arxiv-link Geometric Optimisation on Manifolds with Applications to Deep Learning
Trivialization: given a manifold $\mathcal{M}$, trivialization is defined as a surjective map from the Euclidean space onto the manifold:
$$ \phi:\mathbb{R}^n\to \mathcal{M} $$
A constrained optimization problem over some manifold $\mathcal{M}$
$$ \min_{x\in\mathcal{M}} f(x) $$
can be converted to an unconstrained optimization problem via a trivialization:
$$ \min_{\theta\in\mathbb{R}^n} f(\phi(\theta)) $$
In this notebook, we will list some common manifolds and their trivializations.
Math notation¶
- $\mathbb{R}$: real number
- $\mathbb{R}_+$: real positive number
- $\mathbb{R}^d$: $d$-dimensional real vector
- $x\in\mathbb{R}^d,x> 0$: $x$ is a $d$-dimensional real vector and all elements are positive
- $x\in\mathbb{R}^{d},x\succeq 0$: $x$ is a $d$-dimensional real vector and all elements are non-negative
- $\mathbb{R}^{m\times n}$: $m\times n$ real matrix
- $\mathbb{R}^{m\times m},x\succ 0$: $m\times m$ real definite positive matrix (all eigenvalues are positive)
- $\mathbb{R}^{m\times m},x\succeq 0$: $m\times m$ real semi-definite positive matrix (all eigenvalues are non-negative)
import numpy as np
import torch
try:
import numqi
except ImportError:
%pip install numqi
import numqi
Positive real number¶
$$ \mathbb{R}_+ = \{x\in\mathbb{R}:x>0\} $$
One trivialization is the SoftPlus function:
$$ \phi(\theta) = \log(1+\exp(\theta)):\mathbb{R}\to\mathbb{R}_+ $$
Another trivialization is the Exponential function:
$$ \phi(\theta) = \exp(\theta):\mathbb{R}\to\mathbb{R}_+ $$
manifold = numqi.manifold.PositiveReal(batch_size=5, method='softplus')
point = manifold().detach().numpy() #random point
print('softplus:', point)
manifold = numqi.manifold.PositiveReal(batch_size=5, method='exp')
point = manifold().detach().numpy() #random point
print('exp:', point)
softplus: [0.5348025 0.88817727 0.61018362 0.51604116 0.61954426] exp: [1.01374331 0.85613584 0.77393395 0.98120348 0.63255657]
Discrete Probability Simplex¶
$$ \Delta^{n-1}_+ = \{x\in\mathbb{R}^n:x_i>0,x_1+x_2+\cdots +x_n = 1\} $$
Trivialization can be composed: Let $g$ be any trivialization map of $\mathcal{R}_+$, then
$$ \phi(\theta)=(x_1,x_2,\cdots,x_n):\mathrm{dom}(g)^n\to\Delta_+^{n-1} $$
wth
$$ x_i = \frac{g(\theta_i)}{\sum_j g(\theta_j)} $$
gives a trivialization of $\Delta_+^{n-1}$. Specifically, the SoftMax function corresponds to $g(\theta) = \exp(\theta)$.
manifold = numqi.manifold.DiscreteProbability(5)
point = manifold().detach().numpy()
print('point:', point)
print('sum(point):', np.sum(point))
point: [0.16649143 0.172548 0.21787707 0.27329013 0.16979337] sum(point): 1.0
Sphere¶
The sphere $S^n$ is defined as
$$ S^n = \{x\in\mathbb{R}^{n+1}:\lVert x\rVert_2=1\} $$
The following quotient map is a trivialization of the sphere:
$$\phi(\theta)=\frac{\theta}{\lVert \theta\rVert}: \mathbb{R}^{n+1}\to S^n$$
PS: the origin point is divergent in the trivialization, so we should be careful in initialization. We just hope the optimization will not jump to the origin, and if that happens, we should reinitialize the optimization. (such cases seem rare if random initialization is used).
dim = 5
manifold = numqi.manifold.Sphere(dim, method='quotient')
point = manifold().detach().numpy()
print('point:', point)
print('norm:', np.linalg.norm(point))
point: [-0.17008636 -0.47148146 -0.23835158 0.5384511 0.63406214] norm: 1.0
Stiefel Manifold¶
The Stiefel manifold $\mathrm{St}(n,r)$ is defined as
$$ \mathrm{St}(n,r) = \{X\in\mathbb{R}^{n\times r}:X^TX=I_r\} $$
QR decomposition is a trivialization of the Stiefel manifold:
$$ \phi(\theta)=Q: \mathbb{R}^{n\times r}\to \mathrm{St}(n,r) $$
where $\theta=QR$ is the QR decomposition of $X$, and $Q$ is an orthogonal matrix.
PS: if the rank of matrix $X$ is smaller than $r$, then the QR decomposition will fail. Still, nothing we can do except hoping the optimization will not jump this singular point or reinitializing the optimization (such cases seem rare if random initialization is used).
TODO doi-link A Global Cayley Parametrization of Stiefel Manifold for Direct Utilization of Optimization Mechanisms Over Vector Spaces
manifold = numqi.manifold.Stiefel(dim=5, rank=3, method='qr')
point = manifold().detach().numpy()
print('point:', point, sep='\n')
print('\nX^T X:', point.T @ point, sep='\n')
point: [[-0.69681231 0.61859369 -0.2822582 ] [-0.19427928 -0.2126762 0.45589068] [-0.540945 -0.37250132 -0.03307628] [-0.33462373 -0.64933924 -0.26582702] [ 0.26853983 -0.108222 -0.80045984]] X^T X: [[ 1.00000000e+00 6.78308274e-17 -3.16196540e-17] [ 6.78308274e-17 1.00000000e+00 8.72700945e-17] [-3.16196540e-17 8.72700945e-17 1.00000000e+00]]
Symmetric and Hermitian Matrices¶
The set of symmetric matrices $\mathrm{Sym}^n$ is defined as
$$ \mathrm{Sym}^n = \{X\in\mathbb{R}^{n\times n}:X=X^T\} $$
The set of Hermitian matrices $\mathrm{Herm}^n$ is defined as
$$ \mathrm{Herm}^n = \{X\in\mathbb{C}^{n\times n}:X=X^\dagger\} $$
Both of them are vector spaces and we can find their basis $\{E_i\}$. E.g., For Hermitian matrix, the basis is Gell-Mann matrices (see tutorial/gellmann) and the identity matrix. The trivialization is just the vectorization of the matrix:
$$\phi(\theta)=\sum_i\theta_iE_i $$
manifold_sym = numqi.manifold.SymmetricMatrix(3)
point = manifold_sym().detach().numpy()
print('point:', point, sep='\n')
point: [[-0.60857755 0.17283328 -0.1575151 ] [ 0.17283328 0.91546264 -0.03090367] [-0.1575151 -0.03090367 0.42752979]]
manifold_herm = numqi.manifold.SymmetricMatrix(3, dtype=torch.complex128)
point = manifold_herm().detach().numpy()
print('point:', point, sep='\n')
point: [[-0.81167828+0.j -0.25937764+0.27170628j -0.03662005+0.28889551j] [-0.25937764-0.27170628j -0.5829548 +0.j -0.22975185-0.32249453j] [-0.03662005-0.28889551j -0.22975185+0.32249453j -0.81029508+0.j ]]
Rank-$r$ Positive Semi-Definite Matrices¶
The set of real rank-$r$ positive semi-definite matrices $\mathrm{PSD}^n_r$ is defined as
$$ \mathrm{Sym}^{(n,r)}_+ = \{X\in\mathbb{R}^{n\times n}:X\succeq 0,\mathrm{rank}(X)=r\} $$
To make it bounded, numqi
adds a constraint that the trace of the matrix is 1. The trivialization is the reverse of Cholesky decomposition:
$$ \phi(\theta)=g(\theta)g(\theta)^T: \mathrm{dom}(g)\to \mathrm{Sym}^{(n,r)}_+ $$
where $g$ is a trivialization map of the lower triangular matrix $\mathrm{image}(g)=L^{(n,r)}_+$
$$ L^{(n,r)}_+=\{X\in\mathbb{R}^{n\times r}: X_{ii}>0,X_{ij,j>i}=0\}$$
manifold = numqi.manifold.Trace1PSD(dim=4, rank=2, method='cholesky')
point = manifold().detach().numpy()
print(point)
print('eigenvalue:', np.linalg.eigvalsh(point)) #eigenvalues are positive (up to machine precision)
print('trace(point):', np.trace(point))
[[ 0.42083295 0.00695834 -0.17905983 0.24755122] [ 0.00695834 0.31096244 -0.08273368 -0.08567651] [-0.17905983 -0.08273368 0.0966602 -0.08229269] [ 0.24755122 -0.08567651 -0.08229269 0.17154441]] eigenvalue: [-1.23139719e-16 -4.77305432e-17 3.56353880e-01 6.43646120e-01] trace(point): 0.9999999999999998
Similarly, we can define the set of trace-1 complex rank-$r$ positive semi-definite matrices $\mathrm{PSD}^n_r$
$$ \mathrm{Herm}^{(n,r)}_+ = \{X\in\mathbb{C}^{n\times n}:X\succeq 0,\mathrm{rank}(X)=r,\mathrm{trace}(X)=1\} $$
manifold = numqi.manifold.Trace1PSD(dim=3, rank=2, method='cholesky', dtype=torch.complex128)
point = manifold().detach().numpy()
print(point)
print('eigenvalue:', np.linalg.eigvalsh(point))
print('trace(point):', np.trace(point))
[[ 0.33199846+0.j 0.25246373+0.00519178j -0.11691709+0.26132899j] [ 0.25246373-0.00519178j 0.40687444+0.j -0.10650486+0.25145544j] [-0.11691709-0.26132899j -0.10650486-0.25145544j 0.2611271 +0.j ]] eigenvalue: [-4.02306919e-17 1.22801064e-01 8.77198936e-01] trace(point): (1.0000000000000002+0j)
Special Orthogonal Group¶
The special orthogonal group $\mathrm{SO}(n)$ is defined as
$$ \mathrm{SO}(n) = \{X\in\mathbb{R}^{n\times n}:X^TX=I_n,\det(X)=1\} $$
The trivialization can be Cayley transform wiki-link or matrix exponential.
Similarly, for special unitary group $\mathrm{SU}(n)$
$$ \mathrm{SU}(n) = \{X\in\mathbb{C}^{n\times n}:X^\dagger X=I_n,\det(X)=1\} $$
manifold_so = numqi.manifold.SpecialOrthogonal(dim=4)
point = manifold_so().detach().numpy()
print('point:', point, sep='\n')
print('point.T @ point:', point.T @ point, sep='\n')
point: [[ 0.80338859 -0.46371606 -0.03703177 0.37170262] [ 0.30677981 0.85514863 -0.15406215 0.38842223] [-0.03434166 0.06930311 0.96340062 0.25666512] [-0.50918913 -0.22110025 -0.21622385 0.80317393]] point.T @ point: [[ 1.00000000e+00 -4.74862492e-17 5.35387492e-17 -2.37106426e-16] [-4.74862492e-17 1.00000000e+00 3.02221161e-18 6.07411330e-17] [ 5.35387492e-17 3.02221161e-18 1.00000000e+00 -7.73839678e-17] [-2.37106426e-16 6.07411330e-17 -7.73839678e-17 1.00000000e+00]]
manifold_su = numqi.manifold.SpecialOrthogonal(dim=3, dtype=torch.complex128)
point = manifold_su().detach().numpy()
print('point:', point, sep='\n')
print('point.H @ point:', point.T.conj() @ point, sep='\n')
point: [[ 0.76377358+0.07724118j 0.09369224+0.44669632j 0.29201585-0.34219093j] [-0.19586652+0.41313087j 0.81761993+0.28939664j -0.11466684+0.15986853j] [-0.41994296-0.15903091j 0.18410145-0.07441815j 0.80501837-0.33297312j]] point.H @ point: [[1.00000000e+00+0.00000000e+00j 5.55111512e-17-2.77555756e-16j 0.00000000e+00+5.55111512e-17j] [5.55111512e-17+2.77555756e-16j 1.00000000e+00+0.00000000e+00j 2.77555756e-17-4.85722573e-17j] [0.00000000e+00-5.55111512e-17j 2.77555756e-17+4.85722573e-17j 1.00000000e+00+0.00000000e+00j]]
Connection with Quantum Information¶
TODO
- pure quantum states
- Hamiltonian
- density matrix
- quantum gate
- quantum channel
fig,ax = numqi.manifold.plot_qobject_trivialization_map()
fig,ax = numqi.manifold.plot.plot_cha_trivialization_map()
fig,ax = numqi.manifold.plot.plot_tensor_rank_sigmar_trivialization_map()
fig,ax = numqi.manifold.plot.plot_pureb_trivialization_map()
fig,ax = numqi.manifold.plot.plot_uda_trivialization_map()
fig,ax = numqi.manifold.plot.plot_udp_trivialization_map()