Stochastic Convergence of Persistence Landscapes and Silhouettes

Persistent homology is a widely used tool in Topological Data Analysis that encodes multiscale topological information as a multi-set of points in the plane called a persistence diagram. It is difficult to apply statistical theory directly to a random sample of diagrams. Instead, we can summarize the persistent homology with the persistence landscape, introduced by Bubenik, which converts a diagram into a well-behaved real-valued function. We investigate the statistical properties of landscapes, such as weak convergence of the average landscapes and convergence of the bootstrap. In addition, we introduce an alternate functional summary of persistent homology, which we call the silhouette, and derive an analogous statistical theory.


Introduction
Often, data can be represented as point clouds that carry specific topological and geometric structures. Identifying, extracting, and exploiting these underlying geometric structures has become a problem of fundamental importance for data analysis and statistical learning. With the emergence of new geometric inference and algebraic topology tools, computational topology has recently seen an important development toward data analysis, giving birth to the field of Topological Data Analysis, whose aim is to infer relevant, multiscale, qualitative, and quantitative topological structures directly from the data.
Persistent homology (Edelsbrunner et al. (2002); Zomorodian and Carlsson (2005)) is a fundamental tool for providing multi-scale homology descriptors of data. More precisely, it provides a framework and efficient algorithms to quantify the evolution of the topology of a family of nested topological spaces, {X(t)} t∈R , built on top of the data and indexed by a set of real numbers -that can be seen as scale parameters -such that X(t) ⊆ X(s) for all t ≤ s. At the homology level 1 , such a filtration induces a family {H(X(t))} t∈R of homology groups and the inclusions X(t) → X(s) induce a family of homomorphisms H(X(t)) → H(X(s)), t ≤ s, which is known as the persistence module associated to the filtration. When the rank of all the homomorphisms H(X(t)) → H(X(s)), t < s, are finite the module is said to be q-tame (Chazal et al. (2012)) and it can be summarized as a set of real intervals (b i , d i ) representing homological features that appear in the filtration at t = b i and disappear at t = d i . Such a set of intervals can be represented as a multi-set of points in the real plane and is then called a persistence diagrams. Thanks to their stability properties (Cohen-Steiner et al. (2007); Chazal et al. (2012)), persistence diagrams provide relevant multi-scale topological information about the data.
In a more statistical framework, when several data sets are randomly generated or are coming from repeated experiments, one often has to deal with not only one persistence diagrams but with a whole distribution of persistence diagrams. Unfortunately, since the space of persistence diagrams is a general metric space, analyzing and quantifying the statistical properties of such a distribution turns out to be particularly difficult.
A few attempts have been made towards a statistical analysis of distributions of persistence diagrams. For example, the concentration and convergence properties of persistence diagrams obtained from point cloud randomly sampled on manifolds and from more general compact metric spaces are studied in Balakrishnan et al. (2013); Chazal et al. (2013b). Considering general distributions of persistence diagrams, Turner et al. (2012) have suggested using the Fréchet average of the diagrams D 1 , . . . , D n . Unfortunately, the Fréchet average is unstable and not even unique. A solution that uses a probabilistic approach to define a unique Fréchet average can be found in Munch et al. (2013), but its computation remains practically prohibitive.
In this paper, we also consider general distributions of persistence diagrams but we build on a completely different approach, proposed in Bubenik (2012), consisting of encoding persistence diagrams as a collection of real-valued one-Lipschitz functions that are called persistence landscapes; see Section 2. The advantage of landscapes -and, more generally, of any function-valued summaries of persistent homology -is that we can analyze them using existing techniques and theories from nonparametric statistics.
We have in mind two scenarios where multiple persistence diagrams arise: 1 We consider here homology with coefficient in a given field, so the homology groups are vector spaces.

Scenario 1:
We have a random sample of compact sets K 1 , . . . , K n drawn from a probability distribution on the space of compact sets. Each set K i gives rise to a persistence diagram which in turn yields a persistence landscape function λ i . An analogous sampling scenario is the one where we observe a sample of n random Morse functions f 1 , . . . , f n from a common probability distribution. Each such function f i induces a persistent diagram built from its sub-level set filtration, which can again be encoded by a landscape λ i . The goal is to use the observed landscapes λ 1 , . . . , λ n to infer the mean landscape µ = E(λ i ).

Scenario 2:
We have a very large dataset with N points. There is a diagram D and landscape λ corresponding to some filtration built on the data. When N is large, computing D is prohibitive. Instead, we draw n subsamples, each of size m. We compute a diagram and landscape for each subsample yielding landscapes λ 1 , . . . , λ n . (Assuming m is much smaller than N, these subsamples are essentially independent and identically distributed.) Then we are interested in estimating µ = E(λ i ), which can be regarded as an approximation of λ. Two questions arise: how far are the λ i 's from their mean µ and how far is µ from λ. We focus on the first question in this paper.
In both sampling scenarios, we study the statistical behavior as the number of persistence diagrams n grows. We will then analyze the stochastic limiting behavior of the average landscape, as well as the speed of convergence to such limit. Specifically, the contributions of this papers are as follows: 1. We show that the average persistence landscape converges weakly to a Gaussian process and we find the rate of convergence of that process. 2. We show that a statistical procedure known as the bootstrap leads to valid confidence bands for the average landscape. We provide an algorithm to compute confidence bands and illustrate it on a few real and simulated examples. 3. We define a new functional summary of persistent homology, which we call the silhouette.
As the proofs are rather technical, we defer the interested reader to the appendices.

Persistence Diagrams and Landscapes
Formally, a (finite) persistence diagram is a set of real intervals {(b i , d i )} i∈I where I is a finite set. We represent a persistence diagram as the finite multiset of points We denote by D T the space of all positive, finite, T-bounded persistence diagrams.
A persistence landscape, introduced in Bubenik (2012), is a set of continuous, piecewise linear functions λ : Z + × R → R which provides an encoding of a persistence diagram. To define the landscape, consider the set of functions created by tenting each persistence point p = (x, y) = b+d 2 , d−b 2 ∈ D to the base line x = 0 as with the following function: otherwise. (1) Notice that p is itself on the graph of Λ p (t). We obtain an arrangement of curves by overlaying the graphs of the functions {Λ p } p∈D ; see Figure 1.  (1), and the landscape λ(k, ·) is the k-th largest of the arrangement of the graphs of {Λ p }. In particular, the cyan curve is the landscape λ(1, ·).
The persistence landscape of D is just a summary of this arrangement. Formally, the persistence landscape of D is the collection of functions where kmax is the kth largest value in the set; in particular, 1max is the usual maximum function. We set λ D (k, t) = 0 if the set {Λ p (t), p ∈ D} contains less than k points. From the definition of persistence landscape, we immediately observe that λ D (k, ·) is one-Lipschitz, since Λ p is one-Lipschitz. We denote by L T the space of persistence landscapes corresponding to D T . For ease of exposition, in this paper we only focus on the case k = 1, and set λ(t) = λ D (1, t). However, the results we present hold for k > 1.

Weak Convergence of Landscapes
Let P be a probability distribution on L T , and let λ 1 , . . . , λ n ∼ P. We define the mean landscape as The mean landscape is an unknown function that we would like to estimate. We estimate µ with the sample average Note that since E(λ n (t)) = µ(t), we have that λ n is a point-wise unbiased estimator of the unknown function µ. Our goal is then quantify how close the resulting estimate is to the function µ. To do so, we first need to explore the statistical properties of λ n . Bubenik (2012) showed that λ n converges pointwise to µ and that the pointwise Central Limit Theorem holds. In this section we extend these results, proving the uniform convergence of the average landscape. In particular, we show that the process converges weakly (see below) to a Gaussian process on [0, T] and we establish the rate of convergence.
Let F = { f t } 0≤t≤T (4) Figure 2: We illustrate the empirical process G n ( f t ). Given a set of landscapes {λ i } 1≤i≤n , each real-value a corresponds to a function f a : . P n f a is then the average over all sampled landscapes. If µ is the true mean landscape, then P f a = µ(a) and G n ( f a ) is the normalized difference n(P n f a − P f a ).
Writing P( f ) = f dP and letting P n be the empirical measure that puts mass 1/n at each λ i , we can and will regard (3) as an empirical process indexed by f t ∈ F . Thus, for t ∈ [0, T], we will write We note that the function F(λ) = T/2 is a measurable envelope for F .
A Brownian bridge is a mean zero Gaussian process on the set of bounded functions from F to R such that the covariance between any pair f 1 , A sequence of random objects X n converges weakly to X , written X n X , if E * ( f (X n )) → E( f (X )) for every bounded continuous function f . (The symbol E * is an outer expectation, which is used for technical reasons; the reader can think of this as an expectation.) Thus, we arrive at the following theorem: Theorem 1 (Weak Convergence of Landscapes, Theorem 2.4 in Chazal et al. (2013a)). Let G be a Brownian bridge with covariance function κ(t, s) Next, we describe the rate of convergence of the maximum of the normalized empirical process G n to the maximum of the limiting distribution G. The maximum is relevant for statistical inference as we shall see in the next section.

Remarks:
The assumption in Theorem 2 that the standard deviation function σ is positive over a subinterval of [0, T] can be replaced with the weaker assumption of positivity of σ over a finite collection of sub-intervals without changing the result. We have stated the theorem in this simplified form for ease of readability. Furthermore, it may be possible to improve the term n −1/8 in the rate using what is known as a "Hungarian embedding" (see Chapter 19 of van der Vaart (2000)). We do not pursue this point further, however.

The Bootstrap for Landscapes
Recall that our goal is to use the observed landscapes (λ 1 , . . . , λ n ) to make inferences about Specifically, in this paper we will seek to construct an asymptotic confidence band for µ. A pair of functions n , u n : where r n = o(1). Confidence bands are valuable tools for statistical inference, as they allow to quantify and visualize the uncertainty about the mean persistence landscape function µ and to screen out topological noise.
Below we will describe an algorithm for constructing the funcions n and u n from the sample of landscapes λ n 1 := (λ 1 , . . . , λ n ), will prove that it yields an asymptotic (1 − α)-confidence band for the unknown mean landscape function µ and determine its rate r n . Our algorithm relies on the use of the bootstrap, a simulation-based statistical method for constructing confidence set under minimal assumptions on the data generating distribution P; see Efron (1979); Efron and Tibshirani (1993); van der Vaart (2000). There are several different versions of the bootstrap. This paper uses the multiplier bootstrap.
We may take B as large as we like so the Monte Carlo error arbitrarily small. Thus, when using bootstrap methods, one ignores the error in approximating Z(α) as defined in (9) with its simulation approximation as defined in (10). The multiplier bootstrap confidence band is {( n (t), u n (t)) : The steps of the algorithm are given in Algorithm 1.
INPUT: Landscapes λ 1 , . . . , λ n ; confidence level 1 − α; number of bootstrap samples B OUTPUT: confidence functions n , u n : The accuracy of the coverage of the confidence band and the width of the band are described in the next result, which follows from Theorem 2 and the analogous result for the multiplier bootstrap process, stated in Proposition 13 in Appendix B.
The confidence band above has a constant width; that is, the width is the same for all t. However, the empirical estimate λ(t) might be a more accurate estimator of µ(t) for some t than others. This suggests that we may construct a more refined confidence band whose width varies with t. Hence, we construct an adaptive confidence band that has variable width. Consider the standard deviation function σ, defined in (6), and its estimate and, for ξ 1 , . . . , ξ n ∼ N(0, 1), define its multiplier bootstrap version Just like in the construction of uniform bands, let Q(α) be such that Again, Q(α) can be determined by simulation to arbitrary precision. The adaptive confidence band is Theorem 4 (Adaptive Band). Suppose that σ(t) > c > 0 in an interval [t * , t * ] ⊂ [0, T], for some constant c. Then

The Weighted Silhouette
The kth persistence landscape λ(k, t) can be interpreted as a summary function of the persistence diagrams. A summary function is a functor that takes a persistence diagram and outputs a realvalued continuous function. If the diagram corresponds to the distance function to a random set, then we have a probability distribution on the space of summary functions induced by a probability distribution on the original sample space.
The persistence landscape is just one of many functions that could be used to summarize a persistence diagram. In this section, we introduce a new family of summary functions called weighted silhouettes.
Consider a persistence diagram with m off diagonal points. In this formulation, we take the weighted average of the triangle functions defined in (1): Consider two points of the persistence diagram, representing the pairs (b i , d i ) and (b j , d j ). In general, we would like to have w j ≥ w i whenever |d j − b j | ≥ |d i − b i |. In particular, let φ(t) have weights Definition 5 (Power-Weighted Silhouette). For every 0 < p ≤ ∞ we define the power-weighted silhouette The value p can be though of as a trade-off parameter between uniformly treating all pairs in the persistence diagram and considering only the most persistent pairs. Specifically, when p is small, φ (p) (t) is dominated by the effect of low persistence pairs. Conversely, when p is large, φ (p) (t) is dominated by the most persistent pair; see Figure 3. The power-weighted silhouette preserves the property of being one-Lipschitz. In fact, this is true for any choice of non-negative weights. Therefore all the result of Sections 3 and 4 hold for the weighted silhouette, by simply replacing λ with φ. In particular, consider φ 1 , . . . , φ n ∼ P φ . Applying theorems 1, 2, 3 and 4, we obtain: and some constant c.

Examples
In Topological Data Analysis, persistent homology is classically used to encode the evolution of the homology of filtered simplicial complexes built on top of data sampled from a metric space -see Chazal et al. (2014). For example, given a metric space (X, d X ) and a probability distribution P X supported on X, one can sample m points, K = {X 1 , . . . , X m }, i.i.d. from P X and consider the Vietoris-Rips filtration built on top of these points: σ = [X i 0 , . . . , X i k ] ∈ R(K, a) if and only if d X (X i j , X i l ) ≤ a for any j, l ∈ {0, . . . k}. The persistent homology of this filtration induces a persistent diagram D and a landscape λ. Sampling n such K, one obtains n persistence landscapes λ 1 , . . . , λ n . In this section, we adopt this setting to illustrate our results on two examples, one real and one simulated.  2 We randomly sample m = 400 epicenters, construct the Vietoris-Rips filtration (using the Euclidean distance), compute the persistence diagram (Betti 1) using Dionysus 3 and the corresponding landscape function. We repeat this procedure n = 30 times and compute the mean landscape λ n . Using the algorithm given in Algorithm 1, we obtain the uniform 95% confidence band of Theorem 3 and the adaptive 95% confidence band of Theorem 4. See Figure 4 (middle). Both the confidence bands have coverage around 95% for the mean landscape µ(t) that is attached to the distribution induced by the sampling scheme. Similarly, using the same n = 30 persistence diagrams we construct the corresponding weighted silhouettes using p = 0.01 and construct uniform and adaptive 95% confidence bands for the mean weighted silhouette E[φ (0.01) (t)]. See Figure 4 (right). Notice that, for most t ∈ [0, T], the adaptive confidence band is tighter than the fixed-width confidence band.

Toy Example: Rings
In this example, we embed the torus S 1 × S 1 in R 3 and we use the rejection sampling algorithm of Diaconis et al. (2012) (R = 5, r = 1.8) to sample 10,000 points uniformly from the torus. Then we link it with a circle of radius 5, from which we sample 1,800 points; see Figure 5 (top left). These N = 11, 800 points constitute the sample space. We randomly sample m = 600 of these points, construct the Vietoris-Rips filtration, compute the persistence diagram (Betti 1) and the corresponding first and third landscapes and the silhouettes for p = 0.1 and p = 4. We repeat this procedure n = 30 times to construct 95% adaptive confidence bands for the mean landscapes µ 1 (t), µ 3 (t) and the mean In the persistence diagram, notice that three persistence pairs are more persistent than the rest. These correspond to the two nontrivial cycles of the torus and the cycle corresponding to the circle. We notice that many of the points in the persistence diagram are hidden by the first landscape. However, as shown in the figure, the third landscape function and the silhouette with parameter p = 0.1 are able to detect the presence of these features.

Discussion
We have shown how the bootstrap can be used to give confidence bands for Bubeknik's persistence landscape and for the persistence silhouette defined in this paper. We are currently working on several extensions to our work including the following: allowing persistence diagrams with countably many points, allowing T to be unbounded, and extending our results to new functional summaries of persistence diagrams. In the case of subsampling (scenario 2 defined in the introduction), we have provided accurate inferences for the mean function µ. We are investigating methods to estimate the difference between µ (the mean landscape from subsampling) and λ (the landscape from the original large dataset). Coupled with our confidence bands for µ, this could provide an efficient approach to approximating the persistent homology in cases where exact computations are prohibitive.

A Results from Chernozhukov et al. (2013)
In this appendix, we summarize the results from Chernozhukov et al. (2013) that are used in this paper. Given a set of functions G and a probability measure Q, define the covering number N(G , L 2 (Q), ε) as the smallest number of balls of size ε needed to cover G , where the balls are defined with respect to the norm ||g|| 2 = g 2 (u)dQ(u). Let X 1 , . . . , X n be i.i.d. random variables taking values in a measurable space (S, S ). Let G be a class of functions defined on S and uniformly bounded by a constant b, such that the covering numbers of G satisfy for some a ≥ e and v ≥ 1 and where the supremum is taken over all probability measures Q on (S, S ). The set G is said to be of VC type, with constants a and v and envelope b. Let σ 2 be a constant such that sup g∈G E[g(X i ) 2 ] ≤ σ 2 ≤ b 2 and for some sufficiently large constant C 1 , denote K n := C 1 v(log n ∨ log(ab/σ)). Finally, let W n := G n G := sup g∈G |G n (g)| denote the supremum of the empirical process G n .
Theorem 11 (Gaussian anti-concentration, Lemma 6.1 in Chernozhukov et al. (2012)). Let (S, S , P) be a probability space, and let F ⊂ L 2 (P) be a P-pre-Gaussian class of functions. Denote by G a tight Gaussian random element in ∞ (F ) with mean zero and covariance function E[G( f )G(g)] = Cov P ( f , g) for all f , g ∈ F . Suppose that there exist constants σ, σ > 0 such that σ 2 ≤ Var P ( f ) ≤ σ 2 for all f ∈ F . Then for every ε > 0, where C σ is a constant depending only on σ and σ.
Theorem 12 (Talagrand's inequality, Theorem A.4 in Chernozhukov et al. (2013)). Let ξ 1 , . . . , ξ n be i.i.d. random variables taking values in a measurable space (S, S ). Suppose that G is a measurable class of functions on S uniformly bounded by a constant b such that there exist constants a ≥ e and v > 1 with sup Q N(G , L 2 (Q), bε) ≤ (a/ε) v for all 0 < ε < 1. Let σ 2 be a constant such that where A is an absolute constant.

B Technical Tools
In this section, we prove some results that will be used in the proofs of Appendix C. Some of our techniques are an adaptation of the strategy used in Chernozhukov et al. (2013) to construct adaptive confidence bands.
Consider the class of functions F = { f t } 0≤t≤T , defined in (4) and let λ n 1 = (λ 1 , . . . , λ n ) be an i.i.d. sample from a probability P on the measurable space (L T , S ) of persistence landscapes. We summarize the processes used in the analysis of persistence landscapes, given in Sections 3 and 4: • G( f t ) is a Brownian Bridge with covariance function For σ(t) > c > 0, we also defined and for completeness we introduce • H( f t ), the standardized Brownian Bridge with covariance function • The process which differs from H n ( f t ) in the use of the standard deviation σ(t) that replace its estimate σ n (t).
Using the strategy of Theorem 2 and applying the anti-concentration inequality of Theorem 11, it follows that for large n andλ n 1 := (λ 1 , . . . ,λ n ) ∈ S n , In the following lemma we consider the class G c = {g t : (4) and we bound the corresponding covering number, as in (20).

Lemma 14.
Consider the assumptions of Theorem 4 and consider the class of functions G c = {g t : g t = f t /σ(t), t * ≤ t ≤ t * }, where f t ∈ F . Note that T/(2c) is a measurable envelope for G c . Then sup Q N(G c , L 2 (Q), ε T/(2c) Q,2 ) ≤ (a/ε) v , 0 < ε < 1 for a = (T 2 + 2c 2 )/c 2 and v = 1, where the supremum is taken over all measures Q on L T . G c is of VC type, with constants a and v and envelope T/(2c).
for some absolute constants C 7 and C 8 .