Imaginary time evolution
$ \require{physics} \def\bm{\boldsymbol} $
from copy import deepcopy
import numpy as np
import pandas as pd
from scipy.linalg import expm
from scipy.sparse.linalg import eigsh
import matplotlib.pyplot as plt
%matplotlib inline
from ph121c_lxvm import tensor, basis, models
What we are doing
In this assignment there will be simulations of the dynamics of quantum systems with local Hamiltonians in the matrix product state (MPS) representation. This will be via the Time Evolving Block Decimation (TEBD) algorithm.
If you’re like me and you’re learning this for the first time, I looked for help on the topic and have created this brief collection of learning materials:
- Roger Penrose tensor diagram notation (1971)
- Vidal’s exposition of TEBD (2004)
- Tensor Network TEBD
- TeNPy (see
literature
for more) - Help with tensor networks
- White’s exposition of DMRG (1992)
Overview
In TEBD, we are interested in applying a propagator $U(t)$ to evolve a quantum
state. In this section, we will use imaginary time evolution of the Hamiltonian:
$$ U(\tau) = e^{H \tau}, $$
but next section we will consider real time evolution:
$$ U(t) = e^{-i H t}. $$
We will use the following TFIM Hamiltonian with open boundary conditions
parametrized by $h^x, h^z$:
$$
H =
-\sum_{i=1}^{L-1} \sigma_i^z \sigma_{i+1}^z
-h^x \sum_{i=1}^L \sigma_i^x
-h^z \sum_{i=1}^L \sigma_i^z
.
$$
Because the Hamiltonian is 2-local, we can group its terms:
\begin{align}
H &=
-\left( \sum_{i=1}^L h^x \sigma_i^x + h^z \sigma_i^z \right)
- \left( \sum_{i=1}^{L // 2} \sigma_{2i - 1}^z \sigma_{2i}^z \right)
- \left( \sum_{i=1}^{L // 2-1+L\%2} \sigma_{2i}^z \sigma_{2i+1}^z \right)
\\
&= H_1 + H_2^\text{even} + H_2^\text{odd}
.
\end{align}
Now all of the terms in each group commute, but the groups themselves may not.
The Zassenhaus formula (proven here)
allows a separation of $U(t)$ into a product of matrix exponentials of these
local terms, grouped in powers of $t$.
The formula tells us that the lowest order of $t$ in the exponent is given by:
$$
U(\tau) =
e^{H_1 \tau} e^{H_2^\text{even} \tau} e^{H_2^\text{odd} \tau}
e^{\mathcal O (\tau^2)}
.
$$
Note: the Zassenhaus formula looks like the Baker-Campbell-Hausdorff formula,
but they group terms differently. The former groups terms in powers of $t$, and
thus lends itself to perturbative approximations.
A corollary of the Zassenhaus formula is the Lie product formula: $$ e^{A + B} = \lim_{N \to \infty} \left( e^{A/N} e^{B/N} \right)^N . $$ In fact, this result was known as early as 1893 by Sophus Lie, the namesake of Lie algebras! Their work is published in ISBN 0828402329 and on the web. Thus for some finite $N$, we will are in a good position to make the Suzuki-Trotter decomposition which provides an approximation of the time-evolution operator over a discrete number of time steps: $$ U(\tau) \approx \left( e^{H_1 \tau/N} e^{H_2^\text{even} \tau/N} e^{H_2^\text{odd} \tau/N} \right)^N . $$ This provides yet another example of how mathematicians have explored areas relevant to physics more than a century before their emergence.
In practice, this is how we will simulate time evolution of quantum systems on both classical computers (using the MPS representation) and on quantum computers by applying the gates on qudits in the same fashion as on the tensor network. For higher-order Suzuki-Trotter decompositions, read this (cited by Vitali).
Matrix calculus
I haven’t given you enough details yet to apply this to a tensor network! I haven’t exponentiated matrices that can be written as Kronecker products.
In general, matrix exponentials can be calculated in terms of diagonalizing a matrix and exponentiating its eigenvalues, or by a power-series representation. For a review of matrix exponentials, see these course notes written by my summer research mentor, or this paper on many ways of calculating the matrix exponential .
My questions about Kronecker products and matrix exponentials center around whether the operations commute: for 1-site operators, 2-site operators? Let’s begin
sx = np.array([[0, 1], [1, 0]])
sy = np.array([[0, -1j], [1j, 0]])
sz = np.diag([1, -1])
Two sites
For the diagonal matrix $\sigma^z$ we shall see that the matrix exponential and Kronecker product do not commute for a 2-site operator:
expm(np.kron(sz, sz))
array([[2.71828183, 0. , 0. , 0. ],
[0. , 0.36787944, 0. , 0. ],
[0. , 0. , 0.36787944, 0. ],
[0. , 0. , 0. , 2.71828183]])
np.kron(expm(sz), expm(sz))
array([[7.3890561 , 0. , 0. , 0. ],
[0. , 1. , 0. , 0. ],
[0. , 0. , 1. , 0. ],
[0. , 0. , 0. , 0.13533528]])
Not the same! This means we actually have to exponentiate the matrix, which for a diagonal matrix is equivalent to exponentiating the diagonal:
np.exp(np.diag(np.kron(sz, sz)))
array([2.71828183, 0.36787944, 0.36787944, 2.71828183])
The consequence for MPS is that this is not possible to represent as a one-site operator, so we will have to contract a virtual index before applying this transformation directly. (After, we will again do an SVD to return to MPS form).
We could represent the matrix product operator as acting on individual sites if we take an SVD of it and disaggregating the physical indicies by introducing a virtual index. This option might not be viable for diagonal matrices with large coefficients (maybe better for random unitaries):
np.linalg.svd(expm(np.kron(sz, sz)))
(array([[1., 0., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.],
[0., 1., 0., 0.]]),
array([2.71828183, 2.71828183, 0.36787944, 0.36787944]),
array([[1., 0., 0., 0.],
[0., 0., 0., 1.],
[0., 1., 0., 0.],
[0., 0., 1., 0.]]))
That is, if we can write the elements of a 2-site operator $U(t)$ as $U_{\sigma'_i \sigma'_{i+1}}^{\sigma_i \sigma_{i+1}}$, then we should reshape the matrix so that $U_{\sigma'_i \sigma_i}^{\sigma'_{i+1} \sigma_{i+1}}$ groups the physical indices by the site. Then we should do an SVD on this matrix which will introduce a virtual index $\alpha$, leading to: $$ U_{\sigma'_i \sigma_i}^{\sigma'_{i+1} \sigma_{i+1}} = \sum_\alpha U_{\sigma'_i \sigma_i}^\alpha S_\alpha^\alpha (V^\dagger)_{\alpha}^{\sigma'_{i+1} \sigma_{i+1}} . $$ We should then reshape $U_{\sigma'_i \sigma_i}^\alpha \to U_{\sigma'_i}^{\sigma_i \alpha}$ and $(V^\dagger)_{\alpha}^{\sigma'_{i+1} \sigma_{i+1}} \to (V^\dagger)_{\alpha \sigma'_{i+1}}^{\sigma_{i+1}}$ so that we can apply these operators to the physical indices as a matrix multiplication. This opens the door to matrix product operators, which have very similar properties as MPS, except for the additional physical index.
One site
Diagonal matrix
We shall see that for a 1-site operator, the matrix exponential commutes with a Kronecker product by the identity:
expm(np.kron(np.eye(2), sz))
array([[2.71828183, 0. , 0. , 0. ],
[0. , 0.36787944, 0. , 0. ],
[0. , 0. , 2.71828183, 0. ],
[0. , 0. , 0. , 0.36787944]])
np.kron(np.eye(2), expm(sz))
array([[2.71828183, 0. , 0. , 0. ],
[0. , 0.36787944, 0. , 0. ],
[0. , 0. , 2.71828183, 0. ],
[0. , 0. , 0. , 0.36787944]])
Non-diagonal matrix
For the off-diagonal matrix $\sigma^x$, the operations still commute across Kronecker products with the identity:
expm(np.kron(np.eye(2), sx))
array([[1.54308063, 1.17520119, 0. , 0. ],
[1.17520119, 1.54308063, 0. , 0. ],
[0. , 0. , 1.54308063, 1.17520119],
[0. , 0. , 1.17520119, 1.54308063]])
np.kron(np.eye(2), expm(sx))
array([[1.54308063, 1.17520119, 0. , 0. ],
[1.17520119, 1.54308063, 0. , 0. ],
[0. , 0. , 1.54308063, 1.17520119],
[0. , 0. , 1.17520119, 1.54308063]])
All in all, this means we can time-evolve local operators efficiently by calculating their matrix exponentials locally.
Since in fact $\sigma^x$ is related to $\sigma^z$ by a Hadamard rotation. $T$, we can compute: $$ \exp(\phi \sigma^x) = \exp(\phi T \sigma^z T) = T \exp(\phi \sigma^z) T . $$ Let’s demonstrate:
hd = np.array([[1, 1], [1, -1]]) * np.sqrt(0.5)
expm(sx)
array([[1.54308063, 1.17520119],
[1.17520119, 1.54308063]])
expm(hd @ sz @ hd)
array([[1.54308063, 1.17520119],
[1.17520119, 1.54308063]])
hd @ expm(sz) @ hd
array([[1.54308063, 1.17520119],
[1.17520119, 1.54308063]])
We might prefer to use the result in the assignment that for a 1-site operator: $$ \exp(i t \bm n \cdot \bm \sigma) = \cos(t) + i \sin(t) \bm n \cdot \bm \sigma . $$ For imaginary time evolution: $$ \exp(\tau \bm n \cdot \bm \sigma) = \cos(-i \tau) + i \sin(-i \tau) \bm n \cdot \bm \sigma = \cosh(\tau) + \sinh(\tau) \bm n \cdot \bm \sigma . $$
# Verify formula expontentiation formulas
hx, hz = 1.05, 0.5
hn = np.sqrt(hx ** 2 + hz ** 2)
np.allclose(
expm((hx * sx + hz * sz) / hn),
np.cosh(1) * np.eye(2) + np.sinh(1) * ((hx * sx + hz * sz) / hn)
)
True
Action
At this stage, I have layed out the Suzuki-Trotter decomposition as well as how
to represent the gates within each term, so we can go ahead and do TEBD.
We are told to evolve a ferromagnetic state:
\begin{align}
\ket{\psi (t=0)}
&= \ket{\downarrow} \otimes \cdots \otimes \ket{\downarrow}
,
\\
\ket{\phi (t=0)}
&= \ket{\uparrow} \otimes \cdots \otimes \ket{\uparrow}
.
\end{align}
L = 12
d = 2
hx = 1.05
hz = 0.5
down = np.array([1., 0.]).reshape(2, 1)
up = down[::-1].reshape((2, 1))
# build wavefunction in MPS representation
ψ = tensor.mps(L=L, d=d)
ψ.from_arr([ down for _ in range(L) ], center=-1)
ϕ = tensor.mps(L=L, d=d)
ϕ.from_arr([ up for _ in range(L) ], center=-1)
We are asked to calculate the energy of this state which requires a Hamiltonian.
sx = np.diag([1., 1.])[::-1]
sz = np.diag([1.,-1.])
# Build pieces of Hamiltonian in gate representation
def build_pieces_of_H (L, d, hx, hz):
"""Build the field, odd, and even term Hamiltonians and also their union."""
H_field = np.empty(L, dtype='O')
for i in range(H_field.size):
H_field[i] = tensor.mpo(L=L, d=d)
H_field[i].set_local_oper(-(hx * sx + hz * sz), i + 1)
H_odd = np.empty(L//2, dtype='O')
for i in range(H_odd.size):
H_odd[i] = tensor.mpo(L=L, d=d)
H_odd[i].set_local_oper(-sz, 2*i + 1)
H_odd[i].set_local_oper(sz, 2*i + 1 + 1)
H_even = np.empty(L//2 - 1 + L%2, dtype='O')
for i in range(H_even.size):
H_even[i] = tensor.mpo(L=L, d=d)
H_even[i].set_local_oper(-sz, 2*(i + 1))
H_even[i].set_local_oper(sz, 2*(i + 1) + 1)
H_full = np.array([*H_field, *H_odd, *H_even], dtype='O')
return (H_field, H_odd, H_even, H_full)
H_field, H_odd, H_even, H_full = build_pieces_of_H(L=L, d=d, hx=hx, hz=hz)
for name, wave in [('psi', ψ), ('phi', ϕ)]:
print('MPS Energy of', name, sum(e.expval(wave) for e in H_full))
MPS Energy of psi -17.0
MPS Energy of phi -5.0
As a verification, let’s compare this to the direct calculation from the Hamiltonian.
ψ.merge_bonds()
ψ.reset_pos()
psic = ψ[0].mat.reshape(ψ.size())
np.inner(psic, models.tfim_z.H_vec(psic, L, hx, 'o', hz))
-17.0
ϕ.merge_bonds()
ϕ.reset_pos()
phic = ϕ[0].mat.reshape(ϕ.size())
np.inner(phic, models.tfim_z.H_vec(phic, L, hx, 'o', hz))
-5.0
# Restore MPS compression to these product states
ψ.split_quanta(-1, trim = 1)
ϕ.split_quanta(-1, trim = 1)
Implementation
Somehow, my code was influenced by the interface to the Julia ITensor package. I haven’t used the package, but seeing that in their code they define everything in terms of indices and operations on them convinced me I should write classes which do the same. Keeping track of indices is a pain, so reducing everything to index operations on my MPS representation (which is basically a list of matrices, where each matrix has a multiindex for both rows and columns) hopefully makes algorithms less difficult to program when the essential methods work.
In order to do this, I made many backwards-incompatible changes to my previous MPS code, which was rather simple to begin with. Now, there is a TON of stuff going on under the hood to provide a composable set of elementary tensor operations that are abstracted away from the storage and representation details of the tensors themselves. The basic problems are: how do you keep track of a set of indices as a data structure, and how to you selectively operate on these indices with operations such as contractions, reshapes, and reorderings. A lot of this comes down to giving bonds pointers to sites. When these tasks can, to some extent, be automated, then they open the door to splitting multi-indices with SVD’s, to MPS canonical forms, and to tensor network contractions. The icing on the cake, so to speak, is taking these operations and composing them into algorithms for quantum simulation, such as TEBD, which we are about to undertake, as well as DMRG (maybe next time?). The difficulty of these algorithms is in applying gates repeatedly to different groups of indices, creating a lot of book-keeping. There are a lot of books to keep.
To be clear, there is no attempt to optimize code, mainly just a framework for matrix operations where the row and column indices are replaced by multiindices. It maybe succeeds in reducing code repetition when actually expressing algorithms. For the programmer, the price of abstraction is payed by usability. I want to be able to apply an operator on a specific set of indices without having to manually manipulate the matrices each time I do an experiment.
I also started to 1-index the sites in the tensor trains, which is bound to create confusion!
Greatest achievement of this course
The following function is the greatest function I’ve written:
def multi_index_perm (dims, perm):
"""Return a slice to permute a multi-index in the computational basis.
`dims` and `perm` enumerate 0 to N-1 as the fastest-changing dit to slowest.
E.g. perm = np.arange(N) is the identity permutation.
Arguments:
dims :: np.ndarray(N) :: dimensions of each index
perm :: np.ndarray(N) :: perm[i] gives the new position of i
"""
assert dims.size == perm.size
iperm = np.argsort(perm)
new_dims = dims[iperm]
index = np.zeros(np.prod(dims), dtype='int64')
for i in range(len(dims)):
index += np.tile(
np.repeat(
np.prod(dims[:iperm[i]], dtype='int64')
* np.arange(dims[iperm[i]]),
repeats=np.prod(new_dims[:i], dtype='int64')
),
reps=np.prod(new_dims[i+1:], dtype='int64')
)
return index
It performs any arbitrary permutation of a multi-index (with any dimensions).
It’s great to have this function because it gives unlimited flexibility in the
manipulation of indices within a multi-index in order to sort indices after
sophisticated tensor manipulations.
Whoever uses this function has to be highly aware of the issue of endianness,
which I believe asserts itself in both the order of the array dims
as well as
the order of np.tile
and np.repeat
.
This function generalizes ph121c_lxvm.basis.schmidt.permute
, which can only
be applied to multi-indices where each index has dimension 2, which was useful
earlier for qubit chains, but cannot be generalized because its algorithm is
based on bit swaps, and bit indices always have dimension 2.
The function is great because it works: it is the endgame of indices.
I have no idea if np.ravel_multi_index
does the same thing: I suspect it
doesn’t. In principle, this function should overcome the limitations imposed
by the maximal ndim
set by numpy
, though of course too many dimensions will
lead to dangerously large arrays.
Here is an example of how to create and invert a permutation with both methods:
tensor.multi_index_perm(
np.array([2,2,2,2,2,2]),
np.array([5,2,0,3,1,4])
)[tensor.multi_index_perm(
np.array([2,2,2,2,2,2]),
np.array([2,4,1,3,5,0])
)]
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63])
basis.schmidt.permute(
basis.schmidt.permute(
np.arange(64), [], 6, perm=np.array([5,2,0,3,1,4])
), [], 6, perm=np.array([2,4,1,3,5,0])
)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63])
Everything else
If you read the source code in ph121c_lxvm.tensor
, you might think that it is
disgusting, and I would agree with you. The code mainly enforces the topology of
a tensor network and automates operations on the network. Should it be 10 lines
of code? Probably. It’s over 1500.
Also, Python does not have a compiler that enforces types, so it is up to the programmer to enforce type stability. For me, that meant writing code in a LBYL (look before you leap) style, whereas some Python code is better in the EAFP (it’s easier to ask for forgiveness than permission) style.
The structure of the tensor
module is the following: an index
class is defined
by subclassing collections.UserList
and adding two attributes and some methods.
Then multi_index
, bond
, and quanta
are subclasses of index
and they
enforce some restrictions of the types of elements and attributes in the index.
Separately, the site
type is a collection of a np.ndarray
with ndim=2
and
a multi_index
of two multi_index
es representing the row and column indices.
The site
implements SVD, contraction, and sorting operations on the matrix indices.
Then the train
type is a UserList
of site
s with methods to manipulate these.
Lastly, mps
and mpo
are subclasses of train
with particular methods.
I’ll also mention that preserving the topology of the network (i.e. pointers between sites) during operations which involve not in-place operations such as contractions requires a bit of wizardry to keep the references altogether. The process behaves a fair bit like DNA replication.
I only implemented some things this way because the deepcopy
function in the copy
module apparently knows how to do this correctly.
By the way, the module I wrote is neither complete nor robust. You cannot yet compose things as I had hoped for, but hopefully time evolution should work reliably
Direct approach
We will apply the gates in the Hamiltonian directly, which necessitates regrouping the physical indices into sites which match the gates.
We will first exponentiate each group in the Hamiltonian
# Construct propagators
def build_propagators (L, d, δτ, H_field, H_odd, H_even):
"""Exponentiate each non-commuting piece of the Hamiltonian"""
U_field = tensor.mpo(L=L, d=d)
for i, e in enumerate(H_field):
U_field.set_local_oper(expm(-δτ * e[0].mat), i+1)
U_odd = tensor.mpo(L=L, d=d)
for i, e in enumerate(H_odd):
U_odd.set_local_oper(
expm(-δτ * np.kron(e[1].mat, e[0].mat)),
2 * i + 1
)
U_even = tensor.mpo(L=L, d=d)
for i, e in enumerate(H_even):
U_even.set_local_oper(
expm(-δτ * np.kron(e[1].mat, e[0].mat)),
2 * (i + 1)
)
return (U_field, U_odd, U_even)
Let’s evolve imaginary time
L = 12
d = 2
hx = 1.05
hz = 0.5
δτ = 0.1
%%time
# States
wave = deepcopy(ψ)
cave = deepcopy(ϕ)
# Max rank
chi = 16
# Operators
H_field, H_odd, H_even, H_full = build_pieces_of_H(L=L, d=d, hx=hx, hz=hz)
U_field, U_odd, U_even = build_propagators(
L=L, d=d, δτ=δτ, H_field=H_field, H_odd=H_odd, H_even=H_even
)
# Results
ψ_energies = [sum(e.expval(wave) for e in H_full)]
ϕ_energies = [sum(e.expval(cave) for e in H_full)]
# TEBD pattern
Nstp = 10
for _ in range(Nstp):
for e in [U_field, U_even, U_odd]:
e.oper(wave, inplace=True)
wave.trim_bonds(chi)
wave.normalize()
ψ_energies.append(sum(e.expval(wave) for e in H_full))
for _ in range(Nstp):
for e in [U_field, U_even, U_odd]:
e.oper(cave, inplace=True)
cave.trim_bonds(chi)
cave.normalize()
ϕ_energies.append(sum(e.expval(cave) for e in H_full))
CPU times: user 2min 23s, sys: 1.75 s, total: 2min 25s
Wall time: 27 s
# Hamiltonian system: energy should be conserved
plt.plot(ψ_energies, label='$\psi$')
plt.plot(ϕ_energies, label='$\phi$')
plt.xlabel(f'Number of imaginary time steps at size {δτ}')
plt.ylabel('Expectation value of Hamiltonian')
plt.legend()
plt.show()
Simulation worked! The states are loosing energy and converging to the ground state. We’ll come back to this after the next section to do some physics.
MPO approach
Here I outline the approach I described that was relayed to me by Gil. There is also an excellent explanation of this on the Tensor Network website.
Start by taking the set of operators, splitting them, stacking them. Then do TEBD my applying the stacked operator. Potentially the bond dimension is larger than in the other case, but analytically it should be the same.
U_field.split_quanta(-1)
U_odd.split_quanta(-1)
U_even.split_quanta(-1)
U_full = U_field.oper(U_odd.oper(U_even))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-31-a23f6b796b43> in <module>
2 U_odd.split_quanta(-1)
3 U_even.split_quanta(-1)
----> 4 U_full = U_field.oper(U_odd.oper(U_even))
~/Documents/repos/ph121c/ph121c_lxvm/ph121c_lxvm/tensor/mpo.py in oper(self, tren, inplace)
166 for i in range(len(self)):
167 tags = [ e.tag for e in self[i].get_type(quantum) if (e.tag > 0) ]
--> 168 output.groupby_quanta_tag(tags)
169 center_pos = output.index(output.center)
170 # Here is where we apply the operator to the new center
~/Documents/repos/ph121c/ph121c_lxvm/ph121c_lxvm/tensor/mpo.py in groupby_quanta_tag(self, tag_group)
140 if chunk:
141 self.set_local_oper(np.eye(self.d ** len(chunk)), min(chunk))
--> 142 return super().groupby_quanta_tag(tag_group)
143
144 ## These methods must not modify the mps or mpo instance (except reshape)!
~/Documents/repos/ph121c/ph121c_lxvm/ph121c_lxvm/tensor/train.py in groupby_quanta_tag(self, tag_group)
263 new_center = list(self.get_sites(tag_group))
264 assert (len(new_center) == 1), 'unable to distinguish site.'
--> 265 self.canonize(new_center[0])
266
267 def contract_quanta (self, raised, lowered):
~/Documents/repos/ph121c/ph121c_lxvm/ph121c_lxvm/tensor/train.py in canonize(self, center)
160 bnd = next(self[i].ind[1].get_type(bond))
161 self.split_site(self[i], center_tag)
--> 162 self.contract_bond(bnd)
163 except StopIteration:
164 pass
~/Documents/repos/ph121c/ph121c_lxvm/ph121c_lxvm/tensor/train.py in contract_bond(self, bnd)
83 assert isinstance(bnd, bond), \
84 f'received type {type(bnd)} but need index.bond'
---> 85 left_pos, right_pos = sorted( self.index(e) for e in bnd.tag )
86 assert ((right_pos - left_pos) == 1), \
87 'bond should connect adjacent sites.'
~/Documents/repos/ph121c/ph121c_lxvm/ph121c_lxvm/tensor/train.py in <genexpr>(.0)
83 assert isinstance(bnd, bond), \
84 f'received type {type(bnd)} but need index.bond'
---> 85 left_pos, right_pos = sorted( self.index(e) for e in bnd.tag )
86 assert ((right_pos - left_pos) == 1), \
87 'bond should connect adjacent sites.'
~/.conda/envs/ph121c/lib/python3.7/collections/__init__.py in index(self, item, *args)
1138 def copy(self): return self.__class__(self)
1139 def count(self, item): return self.data.count(item)
-> 1140 def index(self, item, *args): return self.data.index(item, *args)
1141 def reverse(self): self.data.reverse()
1142 def sort(self, *args, **kwds): self.data.sort(*args, **kwds)
ValueError: site( # at 0x7f5a3b00a3d0
array([[1.10517092, 0. , 0. , 0. ],
[0. , 0.90483742, 0. , 0. ],
[0. , 0. , 0.90483742, 0. ],
[0. , 0. , 0. , 1.10517092]]),
multi_index(dim=16, tag=2, shape=(4, 4)) # at 0x7f5a380cbe90
) is not in list
Well, debugging is hard, and that looks like a bond reference failed to update.
This means my mpo.oper
method that I use on mps
types doesn’t generalize
to mpo
types. I thought I should write a generic function on train
types, but
it doesn’t seem necessary.
Convergence to ground state
We will study the convergence to the ground state of the MPS wavefunction of $\ket{\psi}$ for system sizes $L=12, 32, 64, 128$. For larger systems, we will make some measurements on these wavefunctions. Recall \begin{align} \ket{\psi (t=0)} &= \ket{\uparrow} \otimes \cdots \otimes \ket{\uparrow} . \end{align}
Comparing to ED
Let’s setup a quick calculation for the ground state energy at the small system using a sparse eigensolver and our exact methods from earlier in the course.
L = 12
d = 2
hx = 1.05
hz = 0.5
%time Eₒ = eigsh(models.tfim_z.H_oper(L, hx, 'o', hz), k=1, return_eigenvectors=False)[0]
CPU times: user 160 ms, sys: 0 ns, total: 160 ms
Wall time: 28.4 ms
Eₒ
-19.94577803903926
Well, that was quick (thank you Fortran) and seems really close to the minimum reached by our earlier time evolution. We’ll do the same run this time except by truncating the simulation when the relative energy difference $|(E_n - E_{n-1}) / E_n|$ converges to to under $0.001$. We will also compare time step sizes of $\delta\tau = 0.1, 0.01, 0.001$
%%time
# State
ψ = tensor.mps(L=L, d=d)
ψ.from_arr([ up for _ in range(L) ], center=-1)
# Operators
H_field, H_odd, H_even, H_full = build_pieces_of_H(L=L, d=d, hx=hx, hz=hz)
# Results
ψ_converged = []
ψ_data = dict(dtau=[], tau=[], E=[], relerr=[])
E_initial = sum(e.expval(ψ) for e in H_full)
# TEBD pattern
rtol = 0.001
step_sizes = [0.1, 0.01, 0.001]
for δτ in step_sizes:
U_field, U_odd, U_even = build_propagators(
L=L, d=d, δτ=δτ, H_field=H_field, H_odd=H_odd, H_even=H_even
)
i = 0
relerr = 1
wave = deepcopy(ψ)
# Max rank
chi = 16
ψ_data['E'].append(E_initial)
ψ_data['relerr'].append(relerr)
ψ_data['tau'].append(0)
ψ_data['dtau'].append(δτ)
while (relerr > rtol):
for e in [U_field, U_even, U_odd]:
e.oper(wave, inplace=True)
wave.trim_bonds(chi)
wave.normalize()
i += 1
ψ_data['E'].append(sum(e.expval(wave) for e in H_full))
relerr = abs((ψ_data['E'][-1] - ψ_data['E'][-2]) / ψ_data['E'][-1])
ψ_data['relerr'].append(relerr)
ψ_data['dtau'].append(δτ)
ψ_data['tau'].append(δτ * i)
if ((i % 10) == 0):
print(f'Completed step {i} :: dtau={δτ} :: relative error {relerr}')
ψ_converged.append(wave)
print(f'dtau={δτ} converged within rtol={rtol} in {i} steps')
Completed step 10 :: dtau=0.1 :: relative error 0.11052898331065479
dtau=0.1 converged within rtol=0.001 in 18 steps
Completed step 10 :: dtau=0.01 :: relative error 0.022332604616300347
Completed step 20 :: dtau=0.01 :: relative error 0.011880294254134427
Completed step 30 :: dtau=0.01 :: relative error 0.007629698816700093
Completed step 40 :: dtau=0.01 :: relative error 0.005785861024167228
Completed step 50 :: dtau=0.01 :: relative error 0.005081225438163214
Completed step 60 :: dtau=0.01 :: relative error 0.00508097227745834
Completed step 70 :: dtau=0.01 :: relative error 0.00569205153874006
Completed step 80 :: dtau=0.01 :: relative error 0.0069837469402630695
Completed step 90 :: dtau=0.01 :: relative error 0.008954310702568432
Completed step 100 :: dtau=0.01 :: relative error 0.010836955888001671
Completed step 110 :: dtau=0.01 :: relative error 0.01064547975484463
Completed step 120 :: dtau=0.01 :: relative error 0.00777394581004741
Completed step 130 :: dtau=0.01 :: relative error 0.004476012311521265
Completed step 140 :: dtau=0.01 :: relative error 0.0022887756487684255
Completed step 150 :: dtau=0.01 :: relative error 0.0011280931348894825
dtau=0.01 converged within rtol=0.001 in 152 steps
Completed step 10 :: dtau=0.001 :: relative error 0.004774673781280013
Completed step 20 :: dtau=0.001 :: relative error 0.004323138641314433
Completed step 30 :: dtau=0.001 :: relative error 0.00392955707201135
Completed step 40 :: dtau=0.001 :: relative error 0.0035844021354098947
Completed step 50 :: dtau=0.001 :: relative error 0.0032801300378205926
Completed step 60 :: dtau=0.001 :: relative error 0.003010679087888506
Completed step 70 :: dtau=0.001 :: relative error 0.0027711150845904963
Completed step 80 :: dtau=0.001 :: relative error 0.0025573753974013425
Completed step 90 :: dtau=0.001 :: relative error 0.0023660809762896203
Completed step 100 :: dtau=0.001 :: relative error 0.0021943960112396467
Completed step 110 :: dtau=0.001 :: relative error 0.0020399215959311257
Completed step 120 :: dtau=0.001 :: relative error 0.001900614044379694
Completed step 130 :: dtau=0.001 :: relative error 0.0017747213431069935
Completed step 140 :: dtau=0.001 :: relative error 0.001660733126377698
Completed step 150 :: dtau=0.001 :: relative error 0.0015573408639144618
Completed step 160 :: dtau=0.001 :: relative error 0.001463405852678402
Completed step 170 :: dtau=0.001 :: relative error 0.0013779332397855937
Completed step 180 :: dtau=0.001 :: relative error 0.0013000507557396686
Completed step 190 :: dtau=0.001 :: relative error 0.0012289911630306628
Completed step 200 :: dtau=0.001 :: relative error 0.0011640776634128044
Completed step 210 :: dtau=0.001 :: relative error 0.0011047116817713394
Completed step 220 :: dtau=0.001 :: relative error 0.0010503625757645177
Completed step 230 :: dtau=0.001 :: relative error 0.001000558917304315
dtau=0.001 converged within rtol=0.001 in 231 steps
CPU times: user 53min 4s, sys: 36.8 s, total: 53min 41s
Wall time: 9min
df = pd.DataFrame(ψ_data)
fig, ax = plt.subplots()
for i, δτ in enumerate(step_sizes):
ax.plot(
df.tau[df.dtau == δτ].values,
df.E[df.dtau == δτ].values,
label = f'$\\delta \\tau={δτ}$'
)
ax.axhline(Eₒ, label='$E_o$')
ax.set_title('Cooling during imaginary time evolution')
ax.set_xlabel(f'Imaginary time $\\tau$$')
ax.set_ylabel('Expectation value of Hamiltonian')
ax.legend()
plt.show()
Clearly the smallest step size reached the desired relative tolerance much earlier than the others because the relative tolerance was kept the same across the different time step sizes. I really don’t want to wait an hour to evolve all that time anyway, so I’m glad that it stopped itself. Anyway, we can take a look at the final converged energies to get a more precise view of this
print('ED ground state energy ::', Eₒ)
for δτ in step_sizes:
print(
f'converged ground state energy, δτ={δτ} ::',
df.E[df.dtau == δτ].values[-1]
)
ED ground state energy :: -19.94577803903926
converged ground state energy, δτ=0.1 :: -19.877286728955642
converged ground state energy, δτ=0.01 :: -19.679317693465716
converged ground state energy, δτ=0.001 :: -8.535746649468903
So actually the larger step size gave us a better estimate of the energy, just because it did not converge too early! Another question is how close the MPS ground state is to the ED ground state:
_, gs = eigsh(models.tfim_z.H_oper(L, hx, 'o', hz), k=1)
ψ = deepcopy(ψ_converged[0])
ψ.merge_bonds()
ψ.reset_pos()
np.inner(ψ[0].mat.reshape(gs.size), gs.reshape(gs.size))
-0.9942849322190765
The overlap of the two states is surprisingly good! The negative sign due to phase is not meaningful.
Larger system sizes
Let’s boost to a larger systems and compute their approximate ground states. We can measure correlation functions in these ground states and compare their energies. We will not be measuring their energies every step but instead I will just simulate 20 time steps of each with $\delta\tau = 0.1$ because it is expensive to calculate a Hamiltonian of 200 terms. I’ll just be testing the systems of size $L = 10n$ for $n= 1, \dots, 10$.
%%time
L_list = list(range(10, 110, 10))
ψ_converged = []
ψ_energies = []
δτ = 0.1
for L in L_list:
ψ = tensor.mps(L=L, d=d)
ψ.from_arr([ up for _ in range(L) ], center=-1)
H_field, H_odd, H_even, H_full = build_pieces_of_H(
L=L, d=d, hx=hx, hz=hz
)
U_field, U_odd, U_even = build_propagators(
L=L, d=d, δτ=δτ, H_field=H_field, H_odd=H_odd, H_even=H_even
)
# Max rank
chi = 16
Nstp = 20
# TEBD
for i in range(Nstp):
for e in [U_field, U_even, U_odd]:
e.oper(ψ, inplace=True)
ψ.trim_bonds(chi)
ψ.normalize()
ψ_converged.append(ψ)
ψ_energies.append(sum( e.expval(ψ) for e in H_full ))
print(f'Completed {Nstp} steps at system size {L}')
Completed 20 steps at system size 10
Completed 20 steps at system size 20
Completed 20 steps at system size 30
Completed 20 steps at system size 40
Completed 20 steps at system size 50
Completed 20 steps at system size 60
Completed 20 steps at system size 70
Completed 20 steps at system size 80
Completed 20 steps at system size 90
Completed 20 steps at system size 100
CPU times: user 5h 56min 54s, sys: 4min 14s, total: 6h 1min 9s
Wall time: 1h 4min 33s
Wow, that took a really long time to simulate :(. The really dumb thing that someone should be scolding me for is: “why am I using the all spin-up wavefunction as an ansatz”. At the beginning of this notebook, I used the all spin-down wavefunction, which starts out closer to the ground state so that I wouldn’t have to evolve the state for so long.
Anyway, we have the wavefunctions so let’s be sure to measure interesting things on them. First, let’s display the difference $(E(L+x) - E(L))/x$
fig, ax = plt.subplots()
ax.plot(
L_list[:-1],
[ ((ψ_energies[i] - ψ_energies[i + 1]) / (L_list[i+1] - L))
for i, L in enumerate(L_list[:-1]) ],
)
ax.set_title('Convergence of estimated $E_{gs}$')
ax.set_xlabel('$L$')
ax.set_ylabel('(E(L+10) - E(L))/10')
plt.show()
i=7; ((ψ_energies[i] - ψ_energies[i + 1]) / (L_list[i+1] - L))
-1.7171464427371574
Great, it looks like this quantity reaches a stable value -1.71714 in the $L \to \infty$ limit. Now let’s measure some of our favorite correlation functions on all of these wavefunctions. We can’t really measure our 2-point correlation functions from the first assignment without incurring some large tensor contractions, so we’ll stick to measuring $\ev{\sigma_{L/2}^x}, \ev{\sigma_1^x}, \ev{\sigma_{L/2}^z}, \ev{\sigma_1^z}$.
def make_observables (L, d):
"""Create the observable operators of interest."""
s1x = tensor.mpo(L=L, d=d)
s1x.set_local_oper(sx, 1)
sL2x = tensor.mpo(L=L, d=d)
sL2x.set_local_oper(sx, L//2)
s1z = tensor.mpo(L=L, d=d)
s1z.set_local_oper(sz, 1)
sL2z = tensor.mpo(L=L, d=d)
sL2z.set_local_oper(sz, L//2)
return (s1x, sL2x, s1z, sL2z)
%%time
observables = dict(L=[], s1x=[], sL2x=[], s1z=[], sL2z=[])
for i, L in enumerate(L_list):
s1x, sL2x, s1z, sL2z = make_observables(L, d)
observables['s1x'].append(s1x.expval(ψ_converged[i]))
observables['sL2x'].append(sL2x.expval(ψ_converged[i]))
observables['s1z'].append(s1z.expval(ψ_converged[i]))
observables['sL2z'].append(sL2z.expval(ψ_converged[i]))
observables['L'].append(L)
CPU times: user 16 s, sys: 128 ms, total: 16.1 s
Wall time: 2.69 s
df = pd.DataFrame(observables)
nrow, ncol = 2, 2
fig, axes = plt.subplots(nrow, ncol, sharex=True)
names = [('s1x', '$\\sigma_1^x$'), ('sL2x', '$\\sigma_{L/2}^x$'),
('s1z', '$\\sigma_1^z$'), ('sL2z', '$\\sigma_{L/2}^z$')]
for i, row in enumerate(axes):
for j, ax in enumerate(row):
ax.plot(L_list, df[names[ncol*i + j][0]].values)
ax.set_title(names[ncol*i + j][1])
ax.set_xlabel('L')
ax.set_ylabel('$\\langle$' + names[ncol*i + j][1] + '$\\rangle$')
fig.tight_layout()
Interesting to see that $\sigma_i^x$ is of the same order of magnitude at the center and boundary of the spin chain, but that for $\sigma_i^z$ the correlator nearly vanishes at the boundary but is nearly saturated at the center.
I find it surprising that for such a large system that these results aren’t ridden with numerical noise.
a Néel-ing
Let’s see the convergence to the ground state of an alternating spin chain.
\begin{align} \ket{\psi (t=0)} &= \ket{\uparrow} \otimes \ket{\downarrow} \otimes \cdots . \end{align} We’ll choose a system size which is twice as large as what I did in assignment 1 with my sparse Lanczos solver so that it is larger but won’t take forever. Voilá $L=48$.
%%time
L = 48
ψ = tensor.mps(L=L, d=d)
wave = []
for i in range(L // 2):
wave.append(up)
wave.append(down)
assert (len(wave) == L)
ψ.from_arr(wave, center=-1)
H_field, H_odd, H_even, H_full = build_pieces_of_H(L=L, d=d, hx=hx, hz=hz)
# Results
ψ_energies = [sum(e.expval(ψ) for e in H_full)]
# TEBD pattern
rtol = 0.001
δτ = 0.1
U_field, U_odd, U_even = build_propagators(
L=L, d=d, δτ=δτ, H_field=H_field, H_odd=H_odd, H_even=H_even
)
chi = 16
i = 0
relerr = 1
while (relerr > rtol):
for e in [U_field, U_even, U_odd]:
e.oper(ψ, inplace=True)
ψ.trim_bonds(chi)
ψ.normalize()
ψ_energies.append(sum(e.expval(ψ) for e in H_full))
relerr = abs((ψ_energies[-1] - ψ_energies[-2]) / ψ_energies[-1])
i += 1
if ((i % 10) == 0):
print(f'Completed step {i} :: dtau={δτ} :: relative error {relerr}')
print(f'dtau={δτ} converged within rtol={rtol} in {i} steps')
Completed step 10 :: dtau=0.1 :: relative error 0.0021793080500920876
dtau=0.1 converged within rtol=0.001 in 11 steps
CPU times: user 22min 38s, sys: 15.8 s, total: 22min 54s
Wall time: 4min 12s
plt.plot(δτ * np.arange(len(ψ_energies)), ψ_energies)
plt.xlabel('Imaginary time $\\tau$')
plt.ylabel('Expectation value of Hamiltonian')
plt.show()
It looks like this state cools in less than half the time the first simulation, which had a system size of 1/4th of this one, did. I didn’t realize that larger systems could potentially converge faster, especially noticing that this state starts with positive energy. But it might also be the In this system, the converged ground state energy is:
ψ_energies[-1]
-81.72930885147767
Let’s measure some correlators again
observables = dict(L=[], s1x=[], sL2x=[], s1z=[], sL2z=[])
s1x, sL2x, s1z, sL2z = make_observables(L, d)
observables['s1x'].append(s1x.expval(ψ))
observables['sL2x'].append(sL2x.expval(ψ))
observables['s1z'].append(s1z.expval(ψ))
observables['sL2z'].append(sL2z.expval(ψ))
observables['L'].append(L)
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-170-aeb7d931d025> in <module>
2 s1x, sL2x, s1z, sL2z = make_observables(L, d)
3 observables['s1x'].append(s1x.expval(ψ))
----> 4 observables['sL2x'].append(sL2x.expval(ψ))
5 observables['s1z'].append(s1z.expval(ψ))
6 observables['sL2z'].append(sL2z.expval(ψ))
~/Documents/repos/ph121c/ph121c_lxvm/ph121c_lxvm/tensor/mpo.py in expval(self, tren)
201 else:
202 raise IndexError('hard coded sizes do not implement this')
--> 203 tren.groupby_quanta_tag(tags)
204 output = train([tren.center.copy(), oper])
205 output[0].transpose()
~/Documents/repos/ph121c/ph121c_lxvm/ph121c_lxvm/tensor/train.py in groupby_quanta_tag(self, tag_group)
260 if sites:
261 self.split_quanta(self.center, sites, N_list)
--> 262 self.merge_bonds(list(self.get_sites(tag_group)))
263 new_center = list(self.get_sites(tag_group))
264 assert (len(new_center) == 1), 'unable to distinguish site.'
~/Documents/repos/ph121c/ph121c_lxvm/ph121c_lxvm/tensor/train.py in merge_bonds(self, sites)
101 for i in chunk[:-1]:
102 for bnd in self[i - count_contractions].ind[1].get_type(bond):
--> 103 self.contract_bond(bnd)
104 count_contractions += 1
105 break
~/Documents/repos/ph121c/ph121c_lxvm/ph121c_lxvm/tensor/train.py in contract_bond(self, bnd)
86 assert ((right_pos - left_pos) == 1), \
87 'bond should connect adjacent sites.'
---> 88 self[left_pos].contract(self[right_pos])
89 # If contracting the center, move it
90 if (self.center == self[right_pos]):
~/Documents/repos/ph121c/ph121c_lxvm/ph121c_lxvm/tensor/site.py in contract(self, other)
406 other.permute(lowered + bonds, 0)
407 self.mat = np.kron(np.eye(lowered.dim), self.mat) \
--> 408 @ np.kron(other.mat, np.eye(raised.dim))
409 self.ind[0] = lowered + self.ind[0]
410 self.ind[1] = other.ind[1] + raised
<__array_function__ internals> in kron(*args, **kwargs)
~/.conda/envs/ph121c/lib/python3.7/site-packages/numpy/lib/shape_base.py in kron(a, b)
1152 bs = (1,)*(nda-ndb) + bs
1153 nd = nda
-> 1154 result = outer(a, b).reshape(as_+bs)
1155 axis = nd-1
1156 for _ in range(nd):
<__array_function__ internals> in outer(*args, **kwargs)
~/.conda/envs/ph121c/lib/python3.7/site-packages/numpy/core/numeric.py in outer(a, b, out)
940 a = asarray(a)
941 b = asarray(b)
--> 942 return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis, :], out)
943
944
MemoryError: Unable to allocate 32.0 GiB for an array with shape (4096, 1048576) and data type float64
Darn, there’s an issue in my code. What probably happened is that a bond got lost somewhere and contractions led to Kronecker products instead of contractions. I’d rather move onto the rest of the assignment than rerun the 4 minute simultation and troubleshoot what is going on.
Extra reading
You might also be interested in reading