Computing Multidimensional Persistence

The theory of multidimensional persistence captures the topology of a multifiltration -- a multiparameter family of increasing spaces. Multifiltrations arise naturally in the topological analysis of scientific data. In this paper, we give a polynomial time algorithm for computing multidimensional persistence. We recast this computation as a problem within computational algebraic geometry and utilize algorithms from this area to solve it. While the resulting problem is Expspace-complete and the standard algorithms take doubly-exponential time, we exploit the structure inherent withing multifiltrations to yield practical algorithms. We implement all algorithms in the paper and provide statistical experiments to demonstrate their feasibility.


Introduction
In this paper, we give a polynomial time algorithm for computing the persistent homology of a multifiltration. The computed solution is compact and complete, but not an invariant. Theoretically, this is the best one may hope for because complete compact invariants do not exist for multidimensional persistence. We also discuss computing an incomplete invariant, the rank invariant, giving an algorithm for reading off this invariant from our complete solution. We implement all algorithms in the paper and provide statistical experiments to demonstrate their feasibility.

Motivation
Intuitively, a multifiltration models a growing space that is parameterized along multiple dimensions. For example, the complex with coordinate (3,2) in Figure 1 is filtered along the horizontal and vertical dimensions, giving rise to a bifiltration. Multifiltrations arise naturally in topological analysis of scientific data. Often, scientific data is in the form of a finite set of noisy samples from some underlying topological space. Our goal is to robustly recover the lost connectivity of the underlying space. If the sampling is dense enough, we approximate the space as a union of balls by placing ǫ-balls around each point. As we increase ǫ, we obtain a growing family of spaces, a one-parameter multifiltration also called a filtration. This approximation is the central idea behind many methods for computing the topology of a point set, such asČech, Rips-Vietoris [12], or witness [7] complexes. Often, however, the input point set is filtered via multiple functions. For example, in analyzing the structure of natural images, we filter the data according to density [1]. We now have multiple dimensions along which our space is filtered; that is, we have a multifiltration.

Prior Work
For one-dimensional filtrations, the theory of persistent homology provides a complete invariant called a barcode, a multiset of intervals [22]. Each interval in the barcode corresponds to the lifetime of a single topological feature within the filtration. Since intrinsic features have long lives, while noise is short-lived, a quick examination of the intervals gives a robust estimation of the topology. The existence of a complete compact invariant, as well as efficient algorithms and fast implementations have led to successful application of persistence to a variety of disciplines, such  Figure 1: A bifiltration. The complex with labeled vertices is at coordinate (3,2). Simplices are highlighted and named at the critical coordinates that they appear. as shape description [5], denoising volumetric density data [13], detecting holes in sensor networks [8], analyzing neural activity in the visual cortex [18], and analyzing the structure of natural images [1], to name a few. For multifiltrations of dimension higher than one, the situation is much more complicated. The theory of multidimensional persistence shows that no complete discrete invariant exists, where discrete means that the structure of the target for the invariant does not depend on the structure of the underlying field of coefficients [2]. Instead, the authors propose an incomplete invariant, the rank invariant, which captures important persistent information. Unfortunately, this invariant is not compact, requiring large storage, so its direct computation using the one-dimensional algorithm is not feasible. A variant of the problem of multidimensional persistence appears in computer vision [10]. There is also a partial solution called vineyards [4]. A full solution, however, has not been attempted by any prior work.

Contributions
In this paper, we provide a complete solution to the problem of computing multidimensional persistence.
• We recast persistence as a problem within computational commutative algebra, allowing us to utilize powerful algorithms from this area.
• We exploit the structure provided by a multifiltration to greatly simplify the algorithms.
• We show that the resulting algorithms are polynomial time, unlike their original counterparts, which are EX-PSPACE-complete, requiring exponential space and time.
• We implement all algorithms and show that the multidimensional setting requires different data structures than the one-dimensional case for efficient computation. In particular, the change in approach allows for parallelization.
• We provide a suite of statistical tests with random multifiltrations and analyze the running time in practice.
As such, our work provides a full integrated solution, from a new theoretical understanding in computational commutative algebra, to the refinement of algorithms, full implementation, and experimentation. We begin with background on multidimensional persistence in Section 2. In Section 3, we formalize the input to our algorithms and justify it. In Section 4, we reinterpret the problem of computing multidimensional persistence within computational commutative algebra. Having recast the problem, we use algorithms from this area to solve the problem. This gives us a computationally intractable solution. In Section 5, we simplify the algorithms by using the structure within multifiltrations. This simplification allows us to derive a polynomial time algorithm from the original doubly-exponential algorithms. In Section 7, we describe our implementations of these algorithms and provide experiments to validate our work in practice.

Background
In this section, we provide the background on multidimensional persistence. We begin by formalizing multifiltrations. We then describe simplicial homology, persistent homology, and multidimensional persistence, as well as the definition of the rank invariant.
The key insight for the second definition is that the persistent homology of a multifiltration is, in fact, the homology of a single algebraic entity. This object encodes all the complexes via polynomials that track cells through the multifiltration. To define our algebraic structure, we need to first review graded modules over polynomials. A monomial in x 1 , . . . , x n is a product of the form . , x n and coefficients in field k is a finite linear combination of monomials, The set of all such polynomials is denoted ]. An n-graded ring is a ring R equipped with a decomposition of Abelian groups R ∼ = ⊕ v R v , v ∈ N n so that multiplication has the property R u · R v ⊆ R u+v . Elements in a single group R u are called homogeneous. The set of polynomials form the n-graded polynomial ring, denoted A n . This ring is graded by A n v = kx v , v ∈ N n . An n-graded module over an n-graded ring R is an Abelian group M equipped with a decomposition M ∼ = ⊕ v M v , v ∈ N n together with a R-module structure so that R u · M v ⊆ M u+v . An n-graded module is finitely generated if it admits a finite generating set. Also, recall the notion of a free module on an n-graded set and a basis for such a module [2].
Given a multifiltration {X u } u , the ith dimensional homology is the following n-graded module over A n where the k-module structure is the direct sum structure and we described in the previous section. This view of homology yields two important results. In one dimension, the persistent homology of a filtration is easily classified and parameterized by the barcode, and there is an efficient algorithm for its computation [22]. In higher dimensions, no similar classification exists [2]. Instead, we may utilize an incomplete invariant. For instance, the rank invariant defined above is provably equivalent to the barcode in one dimension, allowing us to compute incomplete information in higher dimensions.

One-Critical Multifiltrations
Our interest in persistent homology stems from the need for analyzing scientific data. In this section, we begin by formalizing such data. We then show that topological analysis of scientific data naturally generates multifiltrations. In particular, it generates multifiltrations with the following property.
Definition 1 (one-critical) A multifiltered complex K where each cell σ has a unique critical coordinate u σ is onecritical.
In the rest of this paper, we assume that our input multifiltrations are one-critical. General multifiltrations, however, may not have this property. Therefore, we end this section by describing a classic construction that eliminates multiple critical coordinates in such input.

Model for Scientific Data
We are often given scientific data in the form of a set of samples, possibly noisy, from some underlying geometric space. At each sample point, we may also have measurements from the ambient space. For example, a fundamental goal in graphics is to render objects under different lighting from different camera positions. One approach is to construct a digitized model using data from a range scanner, which employs multiple cameras to sense 3D positions on an object's surface, as well as estimated normals and texture information [19]. An alternate approach samples the four-dimensional light field of a surface directly and interpolates to render the object without explicit surface reconstruction [15]. Either approach gives us a set of noisy samples with measurements. Similarly, a node in a wireless sensor network has sensors on board that measure physical attributes of the local environment, such as pressure and temperature [21]. The GPS coordinates of the nodes give us a set of samples at which several functions are sampled. Formally, we assume we have a manifold X with d − 1 Morse functions defined on it [16]. In practice, X is often embedded within a high-dimensional Euclidean space R d , although this is not required. As such, we model the data using the following definition.

Definition 2 (multifiltered data)
We assume our input is multifiltered data, a finite set S of d-dimensional points along with n − 1 real-valued functions f j : S → R, 1 ≤ j ≤ n − 1, defined at the samples, for n > 1.

Construction
We begin by approximating the underlying space X with a combinatorial representation, a complex. There are a variety of techniques for building such complexes, all of which have a scale parameter ǫ, such as theČech and Rips-Vietoris [12] complex which employ a global parameter, or the witness complex [7] which simulates a local parameter. As we increase ǫ, a complex grows larger, and fixing a maximum scale ǫ max gives us a filtered complex K with n cells. Each cell σ ∈ K enters K at scale ǫ(σ). We formalize this type of complex next.

Definition 3 (scale-filtered complex)
A scale-filtered complex is the tuple (K, σ), where K is a finite complex and ǫ : K → R, such that for any σ ∈ K, σ ∈ K u only for u ≥ ǫ(σ).
We now assume we have a scale-filtered complex K, ǫ defined on our multifiltered point set. To incorporate the measurements at the points into our analysis, we first extend the sampled functions f j to the cells in the complex. For σ ∈ K and f j , let f j (σ) be the maximum value f j takes on σ's vertices; that is, f j (σ) = max v∈σ f j (v), where v is a vertex. This extension defines n − 1 functions on the complex, f j : K → R. We now combine all filtration functions into a single multivariate function F : K → R n , where We may now filter K via the excursion sets {K u } u of F : Note that simplex σ enters K u at u = F (σ) and will remain in the complex for all u ≥ F (σ). Equivalently, F (σ) is the unique minimal critical coordinate at which σ enters the filtered complex. That is, these multifiltrations are one-critical. Figure 1 is one-critical, as each simplex enters at a single critical coordinate. For instance, F (a) = (1, 1), F (cde) = (3, 1), and F (af ) = (1, 2).

Example 1 (bifiltration criticals) The bifiltration in
Since K is finite, we have a finite set of critical coordinates that we may project on each dimension j, getting a set finite set of critical values C j . We now restrict ourselves to the Cartesian product C 1 × . . . × C n of the critical values, parameterizing the resulting discrete grid using N in each dimension. It is clear that the required diagrams (1) commute, giving us a d-dimensional multifiltration We end by noting that one-critical multifiltrations may be represented compactly by the set of tuples This representation is the main input to our algorithms in Section 4.3.

Mapping Telescope
In general, multifiltrations are not one-critical since a cell may have enter at multiple incomparable critical coordinates, viewing ≤ as a partial order on N n . For example, in Figure 1, the vertex that enters at (1, 0) may also enter at (0, 1) as the two coordinates are incomparable. For such multifiltrations, we may utilize the mapping telescope, a standard algebraic construction to ensure that each cell has a unique critical coordinate [14]. Intuitively, this construction introduces additional shadow cells into the multifiltration without changing its topology. We will not detail this construction here as none of the multifiltrations we encounter in practice require the conversion. We should note, however, that the mapping telescope increases the size of the multifiltration, depending on the number of cells with multiple critical points. In the worst case, the increase is exponential, but this requires strange constructions that are not topologically interesting and do not arise in practice.

Using Computational Commutative Algebra
Having described our input in the last section, in this section we recast the problem of computing multidimensional persistence as a problem within computational commutative algebra. We then describe standard algorithms from this area that solve our problem. While this gives us a solution, it comes at a computational cost, as these algorithms are intractable. In the next section, we refine them to derive polynomial-time algorithms.

Multigraded Homology
We begin by extending simplicial homology to a multifiltered simplicial complex. We then convert the computation of the latter to standard questions in computational commutative algebra.

Definition 4 (chain module) Given a multifiltered simplicial complex
where the k-module structure is the direct sum structure and Note that we overload notation to reduce complexity by having C i = C i ({K u }u) when the multifiltration is clear from context. The module C i is n-graded as for any u ∈ N n , (C i ) u = C i (K u ). That is, the chain complex in grade u of the module is the chain complex of K u , the simplicial complex with coordinate u.

Example 2 (bifiltration module)
Consider the vertex d in the bifiltered complex in Figure 1. This vertex has critical coordinate (1, 0), so copies of this vertex exist in 9 complexes K u for u ≥ (1, 0). The inclusion maps relate these copies within the complexes. In turn, polynomials relate the chain groups in the different grades of the module. Let d be the copy of the vertex in the critical coordinate (1, 0). Then, within C i , we have d in grade (1, 0), (3,2) and so on, as required by the definition of an n-graded module. In other words, a simplex has different names in different grades.
The graded chain modules C i are finitely generated, so we may choose bases for them. Preliminary Manuscript -10/2/09 -Page 6

Definition 5 (standard basis)
The standard basis for the i th chain module C i is the set of i-simplices in critical grades. Figure 1, the standard basis corresponds to the highlighted and named simplices. For example, the standard basis for C 0 is

Example 3 (bifiltration bases) For our bifiltration in
Note that in doing so, we have made a choice of basis. Unlike for chain groups, this choice has an important consequence: Our resulting calculations will not be invariant but depend on the initial basis.
Recall that our multifiltrations are one-critical. The graded chain groups of one-critical multifiltrations are free: Since each cell enters only once, the resulting chain groups do not require any relations. Since our graded chain groups are free, the boundary operator is simply a homomorphism between free graded modules. Given standard bases, we may write the boundary operator ∂ i : C i → C i−1 explicitly as a matrix with polynomial entries. Figure 1, ∂ 1 has the matrix

Example 4 (boundary matrix) For the bifiltration in
where we assume we are computing over Z 2 .
As in standard homology, the boundary operator connects the graded chain modules into a chain complex C * (Equation 3) and the ith homology module is defined exactly as before (Equation 4):

Recasting the Problem
Our goal is to compute the homology modules. Following the definition, we have three tasks: 1. Compute the boundary module im ∂ i+1 .
2. Compute the cycle module ker ∂ i .
3. Compute the quotient H i .
We next translate these three tasks into problems in computational commutative algebra. Both the boundary and cycle modules turn out to be submodules of free and finitely generated modules that consist of vectors of polynomials. For the rest of this paper, we assume that we are computing homology over the field k. Recall from Section 2.4 that our module is defined over the n-graded polynomial ring For notational simplicity, we will use R = A n to denote this ring for the remainder of this section.
Let R m be the Cartesian product of m copies of R. In other words, R m consists of all column m-vectors of polynomials: To distinguish elements of R m from polynomials, we adopt the standard practice of placing them in bold format, so that f ∈ R m is a vector of polynomials, but f ∈ R is a polynomial. We use this practice exclusively for elements of R m and not for other vectors, such as elements of N n . We now recast the three problems above into the language of computational commutative algebra.
1. The boundary module is a submodule of the polynomial module. The matrix for ∂ i+1 has m i rows and m i+1 columns, where m j denotes the number of j-simplices in the complex.
The Submodule Membership Problem asks whether a polynomial vector f is in a submodule M ; That is, whether we may write f in terms of some basis F as above. A solution to this problem would complete our first task.
2. The cycle submodule is also a submodule of the polynomial module. The matrix for ∂ i has m i−1 rows and m i columns.
A set of generators for this submodule would complete our second task.
3. Our final task is simple, once we have completed the first two tasks. All we need to do is test whether the generators of the syzygy submodule, our cycles, are in the boundary submodule. As we shall see, the tools which allow us to complete the first two tasks also resolve this question.

Algorithms
In this section, we begin by reviewing concepts from commutative algebra that involve the polynomial module R m We then look at algorithms for solving the submodule membership problem and computing generators for the syzygy submodule. In our treatment, we follow Chapter 5 of Cox, Little, and O'Shea [6]. The standard basis for R m is {e 1 , . . . , e m }, where e i is the standard basis vector with constant polynomials 0 in all positions except 1 in position i. We use the "top down" order on the standard basis vectors, so that e i > e j whenever i < j. A monomial m in R m is an element of the form x u e i for some i and we say m contains e i .
For algorithms, we need to order monomials in both R and R m . For u, v ∈ N n , we say u > lex v if the vector difference u − v ∈ Z n , the leftmost nonzero entry is positive. The lexicographic order > lex is a total order on N n . For example, (1, 3, 0) > lex (1, 2, 1) since (1, 3, 0) − (1, 2, 1) = (0, 1, −1) and the leftmost nonzero entry is 1. Now, suppose x u and x v are monomials in R. We say x u > lex x v if u > lex v. This gives us a monomial order on R. We next extend > lex to a monomial order on R m using the "position-over-term" (POT) rule: x u e i > x v e j if i < j, or if i = j and x u > lex x v . Now, every element f ∈ R m may be written, in a unique way, as a k-linear combination of monomials where c i ∈ k, c i = 0 and the monomials m i are ordered according to the monomial order. We now define: Then, we may write f in terms of the standard basis (Equation 17): From the second line, the monomials corresponding to sum (17)  Finally, we extend division and least common multiple to monomials in R and R m . Given monomials Moreover, we define w ∈ N n by w i = max(u i , v i ) and define the monomial x w to be the least common multiple of x u and x v , denoted LCM(x u , x v ) = x w . Next, given monomials m = x u e i and n = x v e j in R m , we say n divides m iff i = j and x v divides x u , and define the quotient to be m/n = x u /x v = x u−v . In addition, we define Clearly, the LCM of two monomials is a monomial in R and R m , respectively.
Then, the LCM of their leading monomials is: We now turn our attention to the submodule membership problem. Formally, given a polynomial vector f and a set of t polynomial vectors F , we would like to know whether f ∈ F . We may divide f by F using the division algorithm DIVIDE in Figure 2. After division, we have f = ( t i=1 q i f i ) + r, so if the remainder r = 0, then f ∈ F . This condition, however, is not necessary for modules over multivariate polynomials (n > 1) as we may get a non-zero remainder even when f ∈ F .
Let M be an submodule and LT(M ) be the set of leading terms of elements of M . A Gröbner basis is a basis G ⊆ M such that LT(G) = LT(M ) . For Gröbner bases, whenever f ∈ F , we always get r = 0 after division of f by F , so we have solved the membership problem. The BUCHBERGER algorithm in Figure 3 computes a Gröbner basis G starting from any basis F . The syzygy polynomial vector or S-polynomial S(f , g) ∈ R m of f and g is The BUCHBERGER algorithm utilizes S-polynomials to eliminate the leading terms of polynomial vectors and complete the given basis to a Gröbner basis. A Gröbner basis generated by the algorithm is neither minimal nor unique. A reduced Gröbner basis is a Gröbner basis G such that for all g ∈ G, LC(g) = 1 and no monomial of g lies in LT(G − {g} . A reduced Gröbner basis is both minimal and unique. We may compute a reduced Gröbner basis by reducing each polynomial in G in turn, replacing g ∈ G with the remainder of DIVIDE(g, G − {g}). Since the algorithm is rather simple, we do not present pseudo-code for it. The DIVIDE, BUCHBERGER, and the reduction algorithms together solve the submodule membership problem and, in turn, our first task of computing im ∂ i+1 .
p ← p − LT(p) 10 return ((q 1 , . . . , q t ), r)  We next compute generators for the syzygy submodule to complete our second task. We begin by computing a Gröbner basis G = {g 1 , . . . , g s } for F , where the vectors are ordered with some monomial order. We then compute DIVIDE(S(g i , g j ), G) for each pair of Gröbner basis elements. Since G is a Gröbner basis, the remainder of this division is 0, giving us Let ǫ 1 , . . . , ǫ s be the standard basis vectors in R s and let For pairs (i, j) such that h ij = 0, we define s ij ∈ R s by with s ij = 0, otherwise. Schreyer's Theorem states that the set {s ij } ij form a Gröbner basis for Syz(g 1 , . . . , g s ) [6, Chapter 5, Theorem 3.3]. Clearly, we may compute this basis using DIVIDE. We use this basis to find generators for Syz(f 1 , . . . , f t ).
Let M F and M G be the m×t and m×s matrices in which the f i 's and g i 's are columns, respectively. As both bases generate the same module, there is a t × s matrix A and an s × t matrix B such that M G = M F A and M F = M G B.
To compute A, we simply initialize A to be the identity matrix and add a column to A for each division on line 4 of BUCHBERGERthat records the pair involved in the S-polynomial. The matrix B may be computed by using the division algorithm. To see how, notice that each column of M F is divisible by M G since M G is a Gröbner Basis for M F ; now there is a column in B for each column f i ∈ M F , which is obtained by division of f i by M G . Let s 1 , . . . , s t be the columns of the t × t matrix I t − AB. Then, giving us the syzygy generators [6, Chapter 5, Proposition 3.8]. We refer to the algorithm sketched above as Schreyer's algorithm. This algorithm completes our second task. The third task was to compute the quotient H i given im ∂ i+1 = G and ker ∂ i = Syz(f 1 , . . . , f t ). We simply need to find whether the columns of ker ∂ i can be represented as a combination of the columns of the im ∂ i+1 . Now, H i may be computed using the division algorithm. We divide every column in the ker ∂ i by the matrix im ∂ i+1 using the DIVIDE algorithm. If the remainder is non-zero, we add the remainder both to im ∂ i+1 and H i (so as to count only unique cycles). While the above algorithms solve the membership problem, they have not been used in practice due to their complexity. The submodule membership problem is a generalization of the Polynomial Ideal Membership Problem (PIMP) which is EXPSPACE-complete, requiring exponential space and time [17,20]. Indeed, the Buchberger algorithm, in its original form, is doubly-exponential and is therefore not practical.

Multigraded Algorithms
In this section, we show that multifiltrations provide additional structure that may be exploited to simplify the algorithms from the previous section. These simplifications convert these intractable algorithms into polynomial time algorithms.

Exploiting Homogeneity
The key property that we exploit for simplification is homogeneity. Any vector f endowed with a coordinate u f that may be written as above is homogeneous, e.g. the columns of M .
We will show that all boundary matrices ∂ i may be written as homogeneous matrices initially, and the algorithms for computing persistence only produce homogeneous matrices and vectors. That is, we maintain homogeneity as an invariant throughout the computation. We begin with our first task.

Lemma 1 For a one-critical multifiltration, the matrix of ∂ i : C i → C i−1 written in terms of the standard bases is homogeneous.
Proof: Recall that we may write the boundary operator ∂ i : C i → C i−1 explicitly as a m i−1 × m i matrix M in terms of the standard bases for C i and C i−1 , as shown in matrix (12) for ∂ 1 . From Definition 5, the standard basis for C i is the set of i-simplices in critical grades. In a one-critical multifiltration, each simplex σ has a unique critical coordinate u σ (Definition (1)). In turn, we may represent this coordinate by the monomial x uσ . For instance, simplex a in Figure 1 has critical grade (1, 1) and monomial x (1,1) = x 1 x 2 . We order these monomials using > lex and use this ordering to rewrite the matrix for ∂ i . The matrix entry M jk relates σ k , the kth basis element for C i toσ j , the jth basis element for C i−1 . Ifσ j is not a face of σ k , then M jk = 0. Otherwise,σ j is a face of σ k . Since a face must precede a co-face in a multifiltration, That is, the matrix is homogeneous.

Corollary 1 For a one-critical multifiltration, the boundary matrix ∂ i in terms of the standard bases has monomial entries.
Proof: The result is immediate from the proof of the previous lemma. The matrix entry is either 0, a monomial, or x u(σ k )−u(σj ) , a monomial.

Example 7
We show the homogeneous matrix for ∂ 1 below, where we augment the matrix with the associated monomials. Again, we assume we are computing over Z 2 . We next focus on our second task, showing that given a homogeneous matrix as input, the algorithms produce homogeneous vectors and matrices. Let F be an m × n homogeneous matrix. Let {e 1 , . . . , e m } and {ê 1 , . . . ,ê n } be the standard bases for R m and R n , respectively. A homogeneous matrix associates a coordinate and monomial to the row and column basis elements. For example, since x 1 is the monomial for row 2 of matrix (33), we have u e2 = (1, 0) and x ue 2 = x 1 . Each column f in F is homogeneous and may be written in terms of rows: where c i ∈ k and we allow c i = 0 when a row is not used. For instance, column g representing the edge ab in the bifiltration shown in Figure 1 may be written as: x ug Consider the BUCHBERGER algorithm in Figure 3. The algorithm repeatedly computes S-polynomials of homogeneous vectors on line 4.

Lemma 2 The S-polynomial S(f , g) of homogeneous vectors f and g is homogeneous.
Proof: A zero S-polynomial is trivially homogeneous. A non-zero S-polynomial S(f , g) implies that h in Equation (24) is non-zero. By the definition of LCM (Equation 20), this implies that the leading monomials of f and g contain the same basis element e j . We have: x ug x ue j e j (41) Let where c f = 0 is the field constant in the leading term of f . Similarly, we get Putting it together, we have where Having computed the S-polynomial, BUCHBERGER next divides it by the current homogeneous basis G on line 4 using a call to the DIVIDE algorithm in Figure 2. (f 1 , . . . , f t )) returns a homogeneous remainder vector r for homogeneous vectors f , f i ∈ R m . Proof: On line 1, r and p are initialized to be 0 and f , respectively, and are both trivially homogeneous. We will show that each iteration of the while loop starting on line 4 maintains the homogeneity of these two vectors. On line 5, since both f i and p are homogeneous, we have Since LT(f i ) divides LT(p), the terms must share basis element e k and we have where x up > lex x u f i so that the division makes sense. On line 7, p is assigned to where d ′ j = d j − d k · c ij /c ik and d ′ k = 0, so the subtraction eliminates the kth term. The final sum means that p is now a new homogeneous polynomial with the same coordinate u p as before. Similarly, LT(p) is added to r on line 8 and subtracted from p on line 9, and neither action changes the homogeneity of either vector. Both remain homogeneous with coordinate u p .
The lemmas combine to give us the desired result.

Theorem 1 (homogeneous Gröbner) The BUCHBERGER algorithm computes a homogeneous Gröbner basis for a homogeneous matrix.
Proof: Initially, the algorithm sets G to be the set of columns of the input matrix F , so the vectors in G are homogeneous by Lemma 1. On line 4, it computes the S-polynomial of homogeneous vectors f , g ∈ G. By Lemma 2, the S-polynomial is homogeneous. It then divides the S-polynomial by G. Since the input is homogeneous, DIVIDE produces a homogeneous remainder r by Lemma 3. Since only homogeneous vectors are added to G on line 6, G remains homogeneous. We may extend this result easily to the reduced Gröbner basis.
Using similar arguments, we may show the following result, whose proof we omit here.
Theorem 2 (homogenous syzygy) For a homogeneous matrix, all matrices encountered in the computation of the syzygy module are homogeneous.

Data Structures and Optimizations
We have shown that the structure inherent in a multifiltration allows us to compute using homogeneous vectors and matrices whose entries are monomials only. We next explore the consequences of this restriction on both the data structures and complexity of the algorithms.
By Definition (6), an m × n homogeneous matrix naturally associates monomials to the standard bases for R m and R n . Moreover, every non-zero entry of the matrix is a quotient of these monomials as the matrix is homogeneous. Therefore, we do not need to store the matrix entries, but simply the field elements of the matrix along with the monomials for the bases. We may modify two standard data structures to represent the matrix.
• linked list: Each column stores its monomial as well as a linked-list of its non-zero entries in sorted order. The non-zero entries are represented by the row index and the field element. The matrix is simply a list of these columns in sorted order. Figure 4 displays matrix (33) in this data structure.
• matrix: Each column stores its monomial as well as the column of field coefficients. If we are computing over a finite field, we may pack bits for space efficiency.
The linked-list representation is appropriate for sparse matrices as it is space-efficient at the price of linear access time. This is essentially the representation used for computing in the one-dimensional setting [22]. In contrast, the matrix representation is appropriate for dense matrices as it provides constant access time at the cost of storing all zero entries. The multidimensional setting provides us with denser matrices, as we shall see, so the matrix representation becomes a viable structure. In addition, the matrix representation is optimally suited to computing over the field Z 2 , the field often commonly employed in topological data analysis. The matrix entries each take one bit and the column entries may be packed into machine words. Moreover, the only operation required by the algorithms is symmetric difference which may be implemented as a binary XOR operation provided by the chip. This approach gives us bit-level parallelism for free: On a 64-bit machine, we perform symmetric difference 64 times faster than on the list. The combination of these techniques allow the matrix structure to perform better than the linked-list representation in practice.
We may also exploit homogeneity to speed up the computation of new vectors and their insertion into the basis. We demonstrate this briefly using the BUCHBERGER algorithm. We order the columns of input matrix G using the POT rule for vectors as introduced in Section 4. Suppose we have f , g ∈ G with f > g. If S(f , g) = 0, LT(f ) and  Figure 1, in column sorted order. Note that the columns in Equation 12 are not ordered while they are sorted correctly here. LT(g) contain the same basis, which the S-polynomial eliminates. So, we have S(f , g) < g < f . This implies that when dividing S(f , g) by the vectors in G, we need only consider vectors that are smaller than g. Since the vectors are in sorted order, we consider each in turn until we can no longer divide. By the POT rule, we may now insert the new remainder column here into the basis G. This gives us a constant time insertion operation for maintaining the ordering, as well as faster computation of the Gröbner basis.

Complexity
In this section, we give simple polynomial bounds on our multigraded algorithms. These bounds imply that we may compute multidimensional persistence in polynomial time. In practice, the number of unique monomials in the matrix is lower than the worst case. In computing persistence, for example, we may control the number of unique monomials by ignoring close pairs of gradings. The following lemma bounds the basis size and running time in this case.

Lemma 5 Let F be an m × n homogeneous matrix with h of unique monomials. The Gröbner basis G contains O(hn) vectors and may be computed in time O(n 3 h 2 ).
The proof is identical to the previous lemma.

Theorem 3 Multidimensional persistence may be computed in polynomial time.
Proof: Multidimensional persistence is represented by the Gröbner Bases and the syzygy moduli of all the homogeneous boundary matrices ∂ i for a given multifiltration. In the previous lemmas, we have shown that both the Gröbner Basis and the syzygy module can be computed in polynomial time. Therefore, one can compute multidimensional persistence in polynomial time.
In other words, our optimizations in this section turn the exponential-algorithms from the last section into polynomialtime algorithms.

Computing the Rank Invariant
Having described our algorithms, in this section we discuss the computation of the rank invariant. Recall that our solution is complete, but not an invariant. In contrast, the rank invariant (Equation 5) is incomplete, but is an invariant and may be used, for instance, as a descriptor in order to compare and match multifiltrations. We begin with a naive algorithm that computes the invariant for each pair independently. We then discuss alternate approaches using posets and vineyards. Direct computation, however, suffers from the storage requirement, which is intractable. Therefore, we end this section by giving an algorithm for reading off the rank invariant from the solution computed using our multigraded algorithms.

Direct Computation
We assume we are given a n-dimensional multifiltration of a cell complex K with m cells. Recall the rank invariant, Equation (5), from Section 2. We observe that any pair u ≤ v ∈ N n defines a one-dimensional filtration: We let t be a new parameter, map u to t = 0, v to t = 1, and obtain a two-level filtration. We then use the persistence algorithm to obtain barcodes [22]. The invariant ρ i (u, v) may be read off from the β i -barcode: It is the number of intervals that contain both 0 and 1. The persistence algorithm is Θ(m 3 ) in the worst-case, so we have a cubic time algorithm for computing the rank invariant for a single pair of coordinates.
To fully compute the rank invariant, we need to consider all distinct pairs of complexes in a multifiltration. It may seem, at first, that we need to only consider critical coordinates, such as (1, 1) and (2, 0) in the bifiltration in Figure 1. However, note that the complex at coordinate (2, 1) is also distinct even though no simplex is introduced at that coordinate. Inspired by this example, we may devise the following worst-case construction: We place m/n cells on each of the n axis to generate (m/n) n = Θ(m n ) distinct complexes. Simple calculation shows that there are Θ(m 2n ) comparable coordinates with distinct complexes. For each pair, we may compute the rank invariant using our method above for a total of O(m 2n+3 ) running time. To store the rank invariant, we also require Θ(m 2n ) space.

Alternate Approaches
Our naive algorithm above computes the invariant for each pair of coordinates independently. In practice, we may read off multiple ranks from the same barcode for faster calculation. Any monotonically increasing path from the origin to the coordinate of the full complex is a one-dimensional filtration, such as the following path in Figure 1.
Having computed persistence, we may read off the ranks for all six comparable pairs within this path. We may formalize this approach using language from the theory of partially ordered sets. The path described above is a maximal chain in the multifiltration poset: a maximal set of pairwise comparable complexes. We require a set of maximal chains such that each pair of comparable elements (here, complexes) are in at least one chain. Each maximal chain requires a single one-dimensional persistence computation. We now require an algorithm that computes the smallest set of such chains. Another approach is to use Vineyards as introduced in [4]. The basic idea is the following: given two functions, compute a homotopy parameterized by λ; for each λ compute the persistence barcodes and finally study the stack of these barcodes. A drawback of this approach is that a homotopy of functions might suffer from range compression. Consider a topological feature which exists for a long range in both the filtrations, but it is possible that the range for which the feature exists in the homotopy is rather small. Thus studying the vineyard, one would see this feature for a very short while, even though the feature existed for a long range in both the filtrations.

Multigraded Approach
Regardless of the approach, the direct computation is hampered by the exponential storage requirement, motivating our work in computing the rank invariant indirectly. Therefore, we first compute multidimensional persistence using our multigraded algorithms in Section 5. We then simply read off the rank invariant using the RANK algorithm, as shown in Figure 5. We describe the algorithm in the proof of the following theorem. Proof: The algorithm uses two simple helper procedures. The procedure PROMOTE takes a matrix M and coordinate u as input. It then finds the columns f ∈ M whose associated coordinate u f precedes u, and promotes them to coordinate u by a simple shift. The procedure QUOTIENT finds the quotient of the input matrices by division: If the remainder r is non-zero, it adds r to the quotient Q, also adding it to B so it only find unique cycles. Now assume the input are as in the statement of the theorem. By the definition of the rank invariant, we need to count homology cycles that exist at u and persist until v. The RANK algorithm implements this. We compute homology H u at u on the first three lines. On line 4, we promote these cycles to coordinate v. We then quotient with boundaries B v at v to find homology cycles H uv that exist at u and persist until v. The size of this set is the rank invariant by definition.
3. Set the column monomial to be LCM(m j ), where {m j } j are the monomials of rows with non-zero Each column in this matrix has k non-zero elements and is homogeneous by construction. We also generate random matrices but limit the number of unique monomials in the matrix to be O(h 2 ) for different values of h. The basic idea behind these tests is that the range of the filtrations in a simplicial complex can typically be divided into smaller discrete intervals. For generation, we replace the first step of the procedure above with the following two steps: 0. Randomly generate h unique monomials {l 1 , . . . , l h }.
1. Generate n monomials {m 1 , . . . , m n } corresponding to the monomials associated with the basis elements of the rows such that m i ∈ {l 1 , . . . , l h }.
After executing step 2 and 3 above, our resulting matrix has homogeneous columns with k non-zero elements and at most h 2 unique monomials.

Size & Timings
According to Lemma 4, the number of columns in the Gröbner basis for a random matrix may grow O(n 3 ) (we have n = m here.) Figure 6(a) shows that the growth of the Gröbner basis is less in practice (about linear for k = 2 and quadratic for k = 4) and increases as the matrix becomes denser. Similarly, the theoretical running time for this matrix is O(n 7 ). Figure 6(b) demonstrates that the actual running time matches this bound quite well (about O(n 6 ) in these tests.) The matrix method, however, is considerably more efficient, about 20 times faster for our largest test here.  We next limit the number of unique monomials in the input matrices. Figures 7 and 8 give the size and running time for matrices with at most h 2 monomials for h = 20 and h = 100, respectively. We see that the growth of the basis is about linear for different values of k and h, and the running time matches the theoretical O(n 3 ) bound in Lemma 5 quite well.

Rank Invariant
We end this paper by revisiting our motivating bifiltration from Figure 1 and computing its multidimensional persistence and rank invariants using our algorithms. Using the natural ordering on the simplices, one can write the boundary The Gröbner Basis (G 1 ) and the set of syzygies (Z 1 ) for ∂ 1 are: Note that each row in the syzygy matrix corresponds to an edge in the appropriate order. Finally, the Gröbner Basis for is G 2 = ∂ 2 (since the only possible S-polynomial is identically 0).  Using G 1 , Z 1 and G 2 , one can read off the rank invariants for various u and v using the RANK algorithm in Section 6.3. A few interesting rank invariants for this example are:

Conclusion
In this paper, we give a full treatment to the computation of multidimensional persistence, from theory to implementation and experiments. We develop polynomial time algorithms by recasting the problem into computational commutative algebra. Although the recast problem is EXPSPACE-complete, we exploit the multigraded setting to develop practical algorithms. The Gröbner bases we construct allow us to reconstruct the entire multidimensional persistence vector space, providing us a convenient way to compute the rank invariant. We implement all algorithms in the paper and show that the calculations are feasible due to the sparsity of the boundary matrices. For additional speedup, we plan to parallelize the computation by batching and threading the XOR operations. We also plan to apply our algorithms toward studying scientific data. For instance, for zero-dimensional homology, multidimensional persistence corresponds to clustering multiparameterized data, This gives us a fresh perspective, as well as a new arsenal of computational tools, to attack an old and significant problem in data analysis.