SMALE’S 17TH PROBLEM: ADVANCES AND OPEN DIRECTIONS

We give an overview of Smale’s 17th problem describing the context in which Smale proposed it, the ideas that led to its solution, and the extensions and subsequent progress after this solution.


Foreword
In June 1999 I received an invitation letter to give a short course on a summer school. The letter was signed by Vaughan Jones, then the chairman of the NZMRI (New Zealand Mathematical Research Institute). New Zealand being in the south, the summer school was going to be in January. I immediately accepted. I had never been in New Zealand, but I knew that it shared some similarities with my home country (Uruguay). My stay at Kaikoura was delightful. The gentle beauty 234 FELIPE CUCKER of the town, the vicinity of Marlborough's vineyards, the whales in the neighboring waters, the kindness of the local people in whose house I stayed, . . . And the school was no less fascinating. Not only because of the interest of its students and the quality of the short courses I took advantage to attend but also because of its setting: lectures and lunches were held at the local Marae. More than twenty years later I still look at it as a unique experience.
January 2000 was also the beginning of the 21st century. So, while we were enjoying the school at Kaikoura, the presses of the American Mathematical Society were at work to print a singular volume [4]. Commissioned by the International Mathematical Union, and to celebrate the year 2000 as the Year of Mathematics, the book acknowledges its inspiration on the list of problems proposed by Hilbert in 1900 for the mathematicians of the 20th century. Thus, along with articles describing the state of mathematics at the end of the century a few articles proposed, in the spirit of Hilbert, lists of problems for the mathematicians of the 21st century. Among the latter one was written by Vaughan [14]; it describes ten open problems related to his research interests. Another was writen by Steve Smale [27] 1 . Similar in character to Vaughan's it describes eighteen problems related to Smale's research interests 2 . The nature of the 17th had a large overlap with the subject of my course at Kaikoura. Our aim in this article is to describe -with a focus on ideas rather than on technical details-what this problem is, what the work of Smale leading to it was, how it was solved a few years ago, and what are the questions left open after this solution. It is a fitting update to the course I gave at Vaughan's invitation.

Smale 17th Problem
2.1. The background. Work on the solution of polynomial equations may be traced back to the Babylonians, 4000 years ago, who solved some quadratic equations arising from problems relating areas and sides of rectangles [13]. The general solution to cubic and quartic equations is generally attributed to Ferrari and Cardano in the 16th century. Solutions could be written in terms of square and cubic roots and involved complex numbers. It took almost three centuries to prove that a similar result cannot be achieved for the general quintic equation [1].
Nonetheless, the solutions exist. The Fundamental Theorem of Algebra states that a polynomial equation of degree d has exactly d zeros in C when counted with their multiplicity. An equally fundamental result in algebraic geometry, known as the Bézout Theorem, extends the FTA to (square) systems of polynomials. It needs a caveat though. The system {X 1 + X 2 = 0, X 1 + X 2 = 1} has, obviously, no zeros: the two lines are parallel. But if one "adds points at infinity" to the complex plane C 2 these two lines do meet at one such point. Recall, the complex projective space P n is the quotient C n+1 / ∼ where x ∼ y if x = λy for some λ ∈ C, λ = 0. Also, a polynomial f ∈ C[X 0 , X 1 , . . . , X n ] is homogeneous of degree d when all its monomials have degree d. In this case, if x ∈ C n+1 is such that f (x) = 0 then, for all λ ∈ C, f (λx) = λ d f (x) = 0. Hence we can talk about the zeros of f in P n and, by extension, to the zeros in P n of a system F = (f 1 , . . . , f n ) of homogeneous polynomials (possibly of different degrees). Given a point (x 0 , . . . , x n ) ∈ C n+1 \ {0} 235 we denote by [x 0 : · · · : x n ] its class in P n . In all what follows we will choose representatives (x 0 , . . . , x n ) of a point [x] ∈ P n satisfying that x = 1. This has no loss of generality and simplifies a number of expressions. Now let d := (d 1 , . . . , d n ) ∈ N n and H d be the linear space of the systems F = (f 1 , . . . , f n ) with f i ∈ C[X 0 , . . . , X n ] homogeneous of degree d i . Generically (and by this we mean outside a subset of smaller dimension in H d ) the system F has a finite number of zeros in P n . Bézout Theorem states that in this case, and when counted with multiplicity, this number is D := d 1 d 2 · · · d n . The question underlying Smale's 17th problem is the following: Can one efficiently compute (any) one zero of a given F ∈ H d ? (S17) 2.2. The Bézout series. During the early 1990s, in a series of papers known as "the Bézout series" [22,23,24,26,25], Mike Shub and Steve Smale laid down the foundations of an approach to answer this question. We won't attempt to summarize these papers but limit ourselves to describe some of the main ingredients in this approach.
2.2.1. Approximate zeros. We already mentioned that the solutions of a general quintic equation (in one variable) cannot be expressed by radicals. Obviously, neither can the zeros of a system F ∈ H d . The most common way out of this obstacle is to compute a (sufficiently good) approximation of such a zero. Arguably the most common definition of approximation is through a bound on the distance to a true zero. If d P denotes the Riemannian distance in P n (i.e., the angle), we say that z ∈ P n is an ε-approximate zero of F when there is a zero ζ ∈ P n of F such that d P (z, ζ) ≤ ε. A shortcoming of this notion is the dependence on ε: being an approximate zero in this sense is not an absolute notion. The definition adopted in the Bézout series, which goes back to [28], is different and relies on Newton's method. The latter can be extended to the projective space. Given z ∈ P n and F ∈ H d , the Newton iterate of F at z is Here T z is the tangent space to P n at z (i.e., the orthogonal complement z ⊥ to z in C n+1 ) and D z F | Tz is the restriction to this space of the derivative of F at z. A remarkable feature of Newton's method is its quadratic convergence. Let ζ be a non-singular (i.e., not multiple) zero of F and z be a point sufficiently close to ζ. Then, where N k F is the kth iterate of (1). Shub and Smale call a point z satisfying (2) an approximate zero of F and the point ζ its associated zero. Note, this is a qualitative notion; there is no dependence on a parameter measuring how well the point approximates the zero. It is also a strong property. Once a point ζ is an approximate zero of F , inequality (2) allows one to get an ε-approximation with log 2 (1 + log 2 ε) Newton steps.
Two questions are naturally posed. Firstly, what does 'sufficiently close' mean? Can one provide estimates for this notion? The answer goes back (again) to [28].

FELIPE CUCKER
For a point ζ ∈ P n define where D k ζ F is the kth derivative of F at ζ and the norm is the operator norm. The γ-Theorem of Smale states that if F (ζ) = 0 and d P (z, ζ)γ(F, ζ) ≤ 1 6 then z is an approximate zero of F with associated zero ζ (see, e.g., [17,Thm. 12]).
Secondly, in the absence of a true zero ζ at hand, can one certify that a point z is an approximate zero of F ? To answer this question let Tz F (z) be the length of the Newton step in (1) and Smale's α-Theorem states that there exists a constant α 0 such that if α(F, z) ≤ α 0 then z is an approximate zero of F . A proof with α 0 = 0.02 is in [10,Thm. 19.9].
The goal in (S17) is therefore not to compute a true zero but an approximate zero as above.

2.2.2.
Linear homotopies. The algorithmic scheme considered in the Bézout series was a linear homotopy. Its general idea (which goes back at least to Lahaye [15]) can be simply described. Assume you want to compute the zeros of the polynomial F = X 3 − X 2 − 5X + 4 and further assume that you aren't aware of the formula to do so. You may try to use the fact that you certainly know the zeros of G = X 3 −X.
To do so, consider the segment (in the space of polynomials of degree at most 3) We observe that its extremities are Q 0 = G and Q 1 = F . Hence, when t varies from 0 to 1, we may expect the zeros of Q t to vary from the zeros of G (1, 0 and −1) to those of F . Starting with one such zero ζ 0 of G (say ζ 0 = 1) this will induce a curve C = (Q t , ζ t ) t∈[0,1] with Q t (ζ t ) = 0 for all t ∈ [0, 1] and if we can "follow" this curve via a finite sequence {Q ti , z i } i=0,...,k such that 0 = t 0 < t 1 < · · · < t k = 1 and z i is a good approximation of ζ ti then we will end up with z k , a good approximation to the zero ζ 1 of Q 1 = F . Figure 1 attempts to convey this idea. A few comments are in order. The first is that the same idea applies for systems in H d ; we now have n homogeneous polynomials in n + 1 variables and D zeros in P n instead of 3 in C but the basic idea is the same. The second is that the "lifting" of the segment  Figure 1. The curve C , the initial pair (G, ζ0), the target pair (F, ζ1) and the points (Qt i , zi) that "follow" C .
The bottomline is, linear homotopy, in general, works. But for some unlucky choices of F and G it may go awfully wrong. Consequently, to give a formal meaning to the notion of "efficiently compute" we need a framework other than the worst-case scenario.

Probabilities and complexity.
To talk about computational efficiency supposes a cost measure at hand. As we are considering numerical algorithms, that is, algorithms whose basic operations are arithmetic operations and comparisons between (idealized) real numbers the simplest, and most natural, cost measure for a computation with input F is the number cost(F ) of such operations performed along the computation. The complexity of an algorithm is a function relating the cost of the algorithm's computations to the size of the input data.
In our case, the size N of a system F ∈ H d is the number of complex numbers we need to describe the system. That is, the number of coefficients in its polynomials. This gives us We have mentioned that we need a framework for complexity other than the worstcase scenario. The one proposed by Smale, which is widely used, is that of average complexity. To describe it we need some structure on H d . We have mentioned that H d is a linear space. We can turn it into an inner product space by endowing it with the Weyl inner product. This is a dot product in a scaled monomial basis which has the property of being unitarily invariant. That is, for all F, G ∈ H d and all unitary transformation u ∈ U (n + 1), we have F • u, G • u = F, G . See [10, §16.1] for details. In all what follows, for a system F ∈ H d , F will denote its Weyl norm F, F 1/2 .
The Weyl norm induces a unit sphere in H d , in what follows denoted by S(H d ), which we can endow with the uniform distribution. The latter, in turn, allows one to define the average cost of an algorithm (taking inputs in S(H d )) to be 2.4. The problem's statement. We can now formally state Smale's 17th problem. This statement relies on the usual understanding of "efficient" as solvable with polynomially bounded average cost. The question in (S17) thus becomes Is there an algorithm that, given an F ∈ S(H d ) randomly drawn from the uniform distribution, halts with probability one returning an approximate zero of F , and whose average cost is bounded by a polynomial in N ?

The Solution
To turn the linear homotopy scheme into a bona fide algorithm we need to specify how do we choose the initial pair (G, ζ 0 ) and how do we compute the sequences t 0 , t 1 , . . . , t k and z 0 , . . . , z k . Let us start with the latter. Before doing so, however, we observe that it makes sense to work with the angle d S (Q, H) between systems Q, H ∈ H d instead of the distance Q − H . This is so because there are no changes in the zeros of a system when the system is multiplied by a constant. Consequently, we will consider the segment [G, F ] parameterized by a fraction τ of the angle d S (G, F ) instead of a fraction t of the distance F − G as in (4). That is, we will consider 3.1. Condition numbers. Assume, for the time being, that we are given F, G and ζ 0 with G(ζ 0 ) = 0. A first remark is that we won't attempt to find a universal partition of [0, 1] that we can use for all triples (F, G, ζ 0 ) but rather that we will proceed dynamically: at the ith step, we have computed a pair (Q i , z i ) for which we know that z i is an approximate zero of Q i with associated zero ζ i (we should write Q τi and ζ τi but we simplify the notation for ease of reading). Next assume that Note that this is stronger, by a factor of 2, than the condition required in Smale's γ-Theorem (cf. §2.2.1). Given a ∆τ ≤ d S (Qi,F ) d S (G,F ) we let Q i+1 be the only system in The goal is to compute a bound B(Q i , z i ) such that If we manage to find B such that (9) holds, then z i is an approximate zero of Q i+1 with associated zero ζ i+1 (by Smale's γ-Theorem) and then, taking i.e., that (8) holds, and we can iterate the process. Note that inequality (8) is trivial for i = 0 as z 0 = ζ 0 and hence d S (z 0 , ζ 0 ) = 0. This allows us to begin the process. Figure 3 shows one step of the homotopy. Figure 3. Computation of next pair (Qt i+1 ,z i+1 ).
How do we obtain B satisfying (9)?
. We begin by noting that Inequality (8) already bounds the first term in the right-hand side. In addition, Lipschitz bounds for γ −1 allow one to show, for Q i+1 sufficiently close to Q i and ζ i+1 sufficiently close to ζ i , that for some c 1 ∈ (6, 12).
Let us assume for a while that this holds true and focus on the second term in the right-hand side of (10). To bound this term define, for F ∈ H d and z ∈ P n , the condition number of F at z to be This is a condition number in the classical sense: if F (ζ) = 0, then µ norm (F, ζ) bounds the first-order variation of ζ in terms of the variation of F 3 . Hence, we may expect that for some small constant c 2 . For a given c 3 , if we want the left-hand side above to be at most At this stage we use a fundamental result, known as the Higher Derivative Estimate, which states that, for all F ∈ H d and z ∈ P n , where D := max{d 1 , . . . , d n }, to replace (13) by The use of Lipschitz bounds, now for µ −1 norm , allows one to obtain for some small c 4 . We can conclude that taking we ensure that Finally, by chosing c 3 := 6c1 c1−6 we obtain from (10), (11), and (17) that This general argument comes from [21] where the value of the constants c i is not worked out 4 . Detailed proofs are in [8,9] and [10, §17.1]. In the latter it is shown that one can actually chose them so that both (11) and (15) hold and that with that choice (16) becomes This results in an easy-to-describe algorithm LinHom that takes as input a triple (F, G, ζ 0 ) with F, G ∈ H d and G(ζ 0 ) = 0.
3 Actually, the true condition number would be µ(F, z) (see [10,Corollary 16.14]) which is defined in the same manner but without the diagonal matrix diag( √ d i ). The fact that µnorm is unitarily invariant (whereas µ isn't), however, make µnorm(F, z) more convenient to use. 4 The author of [21] writes "In previous papers we have paid careful attention to the constants.
In this paper we are more cavalier." Algorithm LinHom input F, G ∈ H d and ζ 0 ∈ P n such that G(ζ 0 ) = 0 With some (but not much) additional work the arguments above show the following result (Proposition 17.3 in [10]).
Then, the execution of LinHom(F, G, ζ 0 ) stops after at most K steps with The returned point z is an approximate zero of F .

A probabilistic solution.
The first breakthrough towards a solution of Smale's 17th problem was produced by Carlos Beltrán and Luis Miguel Pardo [7]. It consisted of a randomized algorithm, in the sequel BP, to generate the initial pair (G, ζ 0 ). In the course of its execution (with input (n, d)) this algorithm draws real numbers from the standard normal distribution and, because of this, its output (G, ζ 0 ) is random as well. If we define the solution variety to be then the distribution ρ std induced on V S by the ouputs of BP can be easily described as follows: among the D zeros of G A surprising feature of BP is that it constructs a polynomial system along with one of its zeros without ever solving a (nonlinear) polynomial system. Roughly speaking, it first draws the "linear part" of the system, then computes a zero of this linear part, and finally adds a suitable complement of higher degree terms. For a description of BP, its properties and its cost, see [8,9] or [10, §17.6].
A zero-finding algorithm becomes clear, let's call it LV (from Las Vegas, as randomized algorithms as ours are called this way). On input F ∈ S(H d ), we first draw a pair (G, ζ 0 ) ∈ V S from ρ std via a call to BP. We then run the linear homotopy on the triple (F, G, ζ 0 ). This returns an approximate zero of F . To understand its cost we next make two fundamental observations.

FELIPE CUCKER
Firstly, if F, G are independently drawn from S(H d ) and τ ∈ [0, 1] is fixed, then the system Q τ ∈ [G, F ] given by Secondly, let Then [8,Thm. 23], These facts allow us to bound the average number of steps of LV. Indeed, In addition to this, the cost of each step, which is dominated by the computation of the Newton iteration, is O(N + n 3 ), which is O(N ) if we assume that d i ≥ 2 for i = 1, . . . , d. This yields a total average cost of O(nD 3/2 N 2 ) for LV (as the cost for the execution of BP is dominated by that of the linear homotopy). This O(nD 3/2 N 2 ) average cost was independently obtained in [8] and [9].
A few last remarks are necessary. The average complexity bound above considers two different sorts of average. On the one hand, the initial pair (G, ζ 0 ) is produced by a randomized algorithm and hence, the algorithm LV itself is randomized. On the other hand, the input system F is considered random to derive average complexity bounds.
Algorithm LV is of Las Vegas type: for a given input F ∈ S(H d ), if the algorithm halts, it returns an approximate zero of F (and it halts with probability 1 for all F outside a set of measure zero). Yet, its running time (its cost) is a random variable as it depends on (G, ζ 0 ). The argument above does not provide a bound for the randomized cost randcost(F ) of LV on input F , which is defined as E(G,ζ 0)∼ρstd K(F, G, ζ 0 ). But it does so for the average of these randomized costs over F ∈ S(H d ).
Because of the probabilistic nature of LV, the bound above was not a solution to Smale's 17th problem. The latter asked for a deterministic algorithm. More work would be needed to reach this solution.

3.3.
A near solution and other advances. The arguments above, which follow the exposition in [8], relie on (among other things) the fact that when F and G are uniformly drawn from S(H d ), so is, for any τ ∈ [0, 1], the system G τ . But as soon as these distributions are not identical, technical difficulties may arise. This is at the root of the different choice of setting taken in [9], where the same O(D 3/2 nN 2 ) bound for LV is proved as well. The setting here considers Gaussian distributions on H d .
The Weyl norm induces a standard Gaussian distribution N (0, I) on H d via the density which in turn induces an average cost of an algorithm (taking inputs in H d ) This quantity is not different to that in (7). Indeed, the standard Gaussian distribution on R 2N induces, via the bijection the uniform distribution on S(R 2N ) and a χ 2 -distribution with 2N degrees of freedom on (0, ∞). Moreover these two components are independent [10,Prop. 2.19].
Using the fact that the right-hand side in Proposition 3.1 is scale-invariant on F it immediately follows that The fact that it is scale-invariant on G allows one to use a version of BP that returns a pair in for every t ∈ [0, 1]. We cannot however directly use these facts in conjunction with Proposition 3.1 because there τ parameterizes a fraction of the angle d S (G, F ) and now t parameterizes a fraction of F − G . But it is easy to go from one to the other via the change of variables The returned point z is an approximate zero of F . 2

FELIPE CUCKER
We won't describe the derivation of the bound for Suffice it to mention that, although different in the details, it is similar to the one described in §3.2 and it yields the same O(D 3/2 nN ) bound.
One of the differences is the need to estimate EQ∼N(0,σ 2 I) (21)). It is here when one notes that if any of F or G is not centered (i.e., have a mean different from 0), then Q t is still Gaussian (even though not centered). In particular, this is true when either F or G are fixed. If F , for instance, is fixed, then Q t is distributed as N (F, (1 − t) 2 I). This fact was the trigger to extend (21) not only to the Gaussian setting but also to the non-centered case. The main technical result in [9] (see Theorem 3.6 there or [10,Thm. 18.4]) shows that for allQ ∈ H d and σ > 0 we have With this smoothed analysis at hand, and some additional work, two different complexity results followed by fixing, respectively, F and (G, ζ 0 ).
Reasonably enough, the randomized cost for F depends on how well-conditioned the zeros of F are.

3.3.2.
Deterministic algorithms. If we fix the initial pair (G, ζ 0 ) instead, and we take the average of the cost of the linear homotopy for an input F drawn from N (0, I) we obtain our second result, namely which gives an average total cost of O(D 3 nN 2 µ 2 max (G)). This bound deceptively suggests that the solution to Smale's 17th problem is near; it suffices to construct, for a given pair (n, d), a system G ∈ H d with µ max (G) = N O(1) . We say 'deceptively' because this innocent-looking task proved to be remarkably difficult. A first attempt to use this bound took as G the system given by  The quest for a construction of a good initial pair (G, ζ 0 ), raised already in the Bézout series, would not be closed until very recently [6]. But by then Smale's 17th problem had already been solved.
3.4. The solution. The final solution of Smale's 17th problem by Pierre Lairez [16] relies on a beautifully ingenious idea. Beltrán and Pardo had avoided constructing a pair (G, ζ 0 ) that would work for all inputs F ∈ S(H d ) via a randomized construction: the pair (G, ζ 0 ) depended on random reals drawn from a random number generator. Lairez's solution uses this construction with a twist: the random numbers are extracted from F itself.
To better understand this idea, let's consider a situation simplified to the extreme. Take a random number F ∈ [0, 1] from the uniform distribution (by glueing the extremes of this interval we can look at it as the circle S 1 , a simplified version of S(H d )). Fix a number ∈ N. We can associate to F the numbers The first, the truncation of F , approximates F by taking the first bits of its base-2 expansion. It satisfies F − F • ≤ 2 − . The second, the fractional part of F , is uniformly distributed in [0, 1] and independent of F • . Lairez devised a similar procedure now for systems F ∈ S(H d ) drawn from the uniform distribution. The resulting systems F • and R • are both in S(H d ) and satisfy that F • approximates F , and that R • is nearly uniform and nearly independent from F • . The fractional part R • is then used as random source to produce a pair (G, ζ 0 ) ∼ ρ std by a version of the BP algorithm which replaces the set R of numbers obtained from a random number generator by the coefficients of R • . Once this done, a linear homotopy is performed with initial pair (G, ζ 0 ) and target system F • . Had we have started with F ∼ S(H d ) and the set R (the random numbers in the call to BP) independent of F , the systems F and G would be uniform in S(H d ) and independent, and the execution LinHom(F, G, ζ 0 ) would end on an approximate zero of F . As it happens, we are starting with F • and R • nearly independent -but not exactly so-and F • close to F -but not exactly there-.
The fact we want to rely on is that the larger is the better are these approximations. And that we can quantify this dependence. Lairez used this fact to devise a simple test that ensures that the curve C • associated to the triple (F • , G, ζ 0 ) is close enough to the curve C associated to (F, G, ζ 0 ), that the approximate zero of F • obtained by following C • is also an approximate zero of F , and that the average complexity analysis can still be carried on. It suffices to ensure that, at every step of the linear homotopy, where is an easy-to-compute quantity satisfying d S (F, F • ) ≤ . Hence, (and a fortiori d S (F, F • )) must be smaller than (151 D 3/2 µ 2 norm (Q, z)) −1 all along (a neighborhood of) C • . As we don't know a priori a bound for the maximum of these µ 2 norm s, the final algorithm starts with an initial value for , computes the associated F • and R • , computes (Q, ζ 0 ) from R • , and calls for LinHom(F • , G, ζ 0 ). (25) is satisfied at all steps, then the returned point in P n is an approximate zero of F . If at some step (25) is violated then the algorithm replaces by 2 and starts again. The complexity analysis in [16] shows that the average total cost of this deterministic procedure is O(nD 3/2 N 2 ). This settles Smale's 17th problem.

Subsequent Progress
The positive answer in Lairez's paper did not bring to a stop the interest on Smale's 17th problem. On the contrary, a number of questions naturally arose.

Eigenpair computations.
One such question dealt with applying the ideas for the design and analysis of linear homotopies to specific systems of equations which do not entirely fit the framework above. A clear example is the computation of eigenpairs. Recall, an eigenpair of a matrix A ∈ C n×n is a pair (λ, v) ∈ C × P n−1 satisfying Av = λv. This equality amounts to the n equations A n v = λv n where A j denotes the jth row of A. These equations are linear in the variables v 1 , . . . , v n and quadratic in the set of all the variables. There are n equations with (generically) a finite number of of solutions in C × P n−1 but, in glaring contrast with the general framework, we expect system (26) to have only n solutions, not 2 n . A tailor-made approach is necessary. Algorithms computing eigenpairs have existed for long. But their analysis, notwithstanding their practical performance, has been wanting. This moved James Demmel [12, pg. 139] to write So the problem of devising an algorithm [for the eigenvalue problem] that is numerically stable and globally (and quickly!) convergent remains open. An answer to this problem was given in [3], where the general lines in the previous sections -devising an appropriate condition number and a step-length computation in terms of it, analysing the number of steps of the linear homotopy as in Proposition 3.1, performing a smoothed analysis of the condition number as in (24), and exhibiting a good initial triple (A 0 , λ 0 , v 0 ) (which in this context turns out to be easy!)-allowed one to show an O(n 7 ) bound for the average cost of computing eigenpairs. We note that this algorithm, even though its observed average complexity in simulations is O(n 3.66 ), is not efficient when compared with the eigensolvers available in packages such as Matlab or Julia. But it answers Demmel's question in the sense that it was the first eigensolver for which correctness, numerical stability, and a polynomially bounded average cost could be established.

Rigid homotopies.
Probably the most obvious question raised by the solution to Smale's 17th problem dealt with possible complexity improvements. Because just reading the system F takes time N , a lower bound of Ω(N ) is clear. The upper bound for the average cost of LV is O(nD 3/2 N 2 ). We observe that in this bound the crucial parameter is N . To see why, think on the case n = D. In this case, N ≥ n+D n = 2n n ∼ 4 n √ πn . That is, N is exponentially larger than n (and D). We therefore want to understand how small can we make the exponent of N . We already noted that the cost of each iteration of the homotopy is O(N ). This cannot be improved. The question then becomes, can one decrease the expected number of iterations (without paying for it in the cost of each iteration)?
A first answer towards this goal was given in [2] where a more elaborated computation of ∆τ , which draw from ideas in [3], allowed for a O(D 3/2 nN 1/2 ) bound for the expected number of iterations (and hence an O(D 3/2 nN 3/2 ) bound for the average total cost).
An optimal answer was provided by Lairez [17]. The involved ideas were new and touched both the algorithm and its analysis.
Let us begin with the algorithmic ideas. The innovation here is to perform a homotopy that keeps invariant the shape of the hypersurfaces f 1 = 0, . . . , f n = 0 (hence the name rigid) continuously deforming only their position within P n . To describe how this is done, denote by U (n + 1) the group of unitary matrices of dimension n + 1. This group has an obvious action on P n ; for any u ∈ U (n + 1) and z ∈ C n+1 \ {0} we define u([z]) := [u(z)]. It also acts on the space H d of homogeneous polynomials of degree d in X 0 , . . . , X n . If f ∈ H d and u ∈ U (n + 1) then uf := f • u −1 is easily seen to be in H d . Its zero set Z(uf ) is just a rotation of Z(f ); indeed, Z(uf ) = u(Z(f )). This action extends componentwise to systems. Let U := U (n + 1) n . If F ∈ H d and u = (u 1 , . . . , u n ) ∈ U then uf := (u 1 f 1 , . . . , u n f n ) ∈ H d .
Assume next that we are given a pair (v, ζ 0 ) ∈ U ×P n such that Q 0 (ζ 0 ) = 0 where Q 0 := vF . We can then start a homotopy with the pair (Q 0 , ζ 0 ). The breakthrough lies on the fact that the segment we will lift is not the segment [Q 0 , F ] ⊂ H d but the geodesic [v, I] ⊂ U. Here v = (v 1 , . . . , v n ) and I = (I, . . . , I). This lifting lives in the solution variety (compare with (19) and (23)) In practice, we don't need to use the geodesic [u, I]; we can use instead any path {u t } t∈[0,T ] with endpoints v and I and satisfying a few conditions (Lairez imposes being 1-Lipschitz and having length at most 4n). The corresponding systems in the homotopy are then   The general arrangement in Lairez's algorithm is otherwise the same as in LV. We first generate the random pair (u 0 , ζ 0 ) ∈ V U . To do so Lairez develops in this context a version of the BP algorithm which draws O(n 3 ) real numbers from the standard Gaussian distribution and performs O(n 4 + nD 4 ) arithmetic operations. The distribution ρ U in V U induced by this version of BP is similar to ρ std ; it just replaces G ∈ S(H d ) by u ∈ U in (20(i)), and G by uF in (20(ii)).
After which, the homotopy proceeds as in the linear case, with steps now in the The computation of the length of these steps is what brings us to the innovations in the analysis of the homotopy. Lairez observed that in the derivation of B leading to (18) two inequalities are coarse. Firstly, (12), which bounds the variation of the zero ζ as a worst-case variation. Secondly, the Higher Derivative Estimate (14) that replaces γ by µ norm . In both cases the use of µ norm (F, z) is computationally practical but leads to coarse bounds. One would like to use measures serving the same purposes, yielding finer bounds, and being also inexpensive to compute. To achieve this, Lairez splits condition at a point into two components. Given F ∈ H d and z ∈ P n , we let The incidence condition number of F at z is then given by Here † denotes Moore-Penrose inverse. When F (z) = 0 this quantity depends only on the angles made by the tangent spaces at z of the n hypersurfaces Z(f i ) := {z ∈ P n | f i (z) = 0}. Moreover, κ(uF, z) satisfies a version of (12) w.r.t. variations of u in U (see [17,Lem. 16]). That is, we have For the other ingredient, we define, for f ∈ H d and z ∈ P n , . In contrast with κ(F, z),γ Frob (F, z) does not depend on how the hypersurfaces Z(f i ) intersect at z but only on the geometry of each individual hypersurface at z.
The split Frobenius γ number captures both aspects. Moreover, it bounds γ(F, z) as we have thus providing us with a replacement for (14) which, we will see, turns out to be better for our purposes.
With (27) and (29) respectively replacing (12) and (14) we can reason as in §3.1 and use now Lipschitz bounds for κ and γ Frob to derive an expression for the step length of the form ∆t = c κ(u i F, z i )γ Frob (u i F, z i ) for some constant c. This, in turn, leads to a bound for the number of steps in the homotopy of the form (compare with Proposition 3.1) for some constant C. Furthermore, Lairez [17,Prop. 17] shows that, for a given F ∈ H d with square-free components, and that [17,Lem 38], for a Gaussian f ∈ H d , We can use this inequality to bound, for a Gaussian F ∈ H d , the value of EF ∼N (0,I) E(u,ζ)∼ρ Uγ 2 Frob (uF, ζ). Indeed, if (u, ζ) ∈ V U is ρ U -distributed then u 1 ζ, . . . , u n ζ are zeros of f 1 , . . . , f n , uniformly distributed in Z(f 1 ), . . . , Z(f n ), respectively, and independent [17,Thm. 8]. Also, if F is Gaussian, so are its components f i . It follows that At this stage, the reasoning continues as in §3.2. One firstly shows that when F is Gaussian (or, equivalently, when F ∼ S(H d )) and (u, ζ) ∼ ρ U , the pair (u t , ζ t ) in (30) is also ρ U -distributed in V U . Hence For the second equality we used that κ andγ Frob are independent. For the last inequality we used that F has square-free components almost surely (along with (31)) and Jensen's inequality (along with (32)). This is the bound we wished for. It shows that the average number of steps in the rigid homotopy is polynomial in n and D. It only remains to be seen that the cost of each such step is still linear in N . Lairez [17,Cor. 34] shows that this is the case: the cost of each step is O(n 2 D 2 N ). It may be worth to describe, nonetheless, how γ Frob (f i , z) is computed for a given f i ∈ H di and a point z ∈ C n+1 as this computation is not straightforward.
The crucial observation is the following. Let g(X) ∈ H di be the shifted polynomial f i (z + X). This is no longer a homogeneous polynomial. For k = 0, . . . , d i let g [k] denote the homogeneous component of degree k of g. Then D k z f = D k 0 g and [17, Lem. 30] where, we emphasize, the norm of the last term is the Weyl norm. This equality provides a way to compute the left-hand side: compute g, then its component g [k] , and finally the Weyl norm of the latter. The cost of this preocedure is dominated by the cost of the computation of g.
A naive approach to compute g leads to an Ω(N 2 i ) which is too large. But Lairez devised a procedure to compute g whose cost is O(nd 2 i N i ) [17,Prop. 32]. Since the computation of D z f i −1 has cost O(N i ) it follows that we can compute γ Frob (f i , z) with cost O(nd 2 i N i ). Furthermore, if u i ∈ U (n + 1) we have γ Frob (u i f i , z) = γ Frob (f i , u i z) so that we can compute γ Frob (u i f i , z) with the same cost. We conclude that, given F ∈ H d , u ∈ U, and z ∈ C n+1 , we can computeγ Frob (uF, z) with cost O(nD 2 N ).
The other ingredients in the computation of the step-length being simpler, one concludes the analysis with a total cost for this computation of O(n 2 D 2 N ). In conjunction with the bound for the average number of steps in (33), and the O(n 4 + nD 4 ) cost of computing the initial pair, we end up with a O(n 6 D 4 N ) for the average total cost of computing one zero of F . The exponent 1 on N is optimal.

Structured systems. Consider the polynomial
We can describe it via the list {(a 0 , (3, 2, 3)), (a 1 , (0, 5, 3)), (a 2 , (8, 0, 0)), (a 3 , (1, 2, 5))} which, even with the integers 3, 2, 3, . . . written in binary, is substantially shorter than the list of all the 8+3 3 = 165 coefficients of a generic polynomial in H 8 (in X 0 , X 1 , X 2 ). The list in (36) is a sparse encoding of the polynomial above; the list of all coefficients (in some preestablished order) is known as dense encoding. It is the one we have been considering in all the previous sections and has a size of Θ(dim C H d ). It is easy to see that the sparse encoding of a polynomial cannot be much bigger than its dense encoding but can be way smaller. A price for this succinctness is fragility: a number of operations with polynomials destroy sparseness. Think for instance in quotient and remainder applied to the polynomials X n 0 − X n 1 and X 0 − X 1 . They are both sparse (in the sense of having only two monomials) SMALE'S 17TH PROBLEM: ADVANCES AND OPEN DIRECTIONS 251 but their quotient is X n−1 0 + X n−2 0 X 1 + · · · + X n−1 1 which has n monomials. Only writing this quotient has a high cost in the sparse size of the data, in contrast with its cost as a function on the dense size, which is small.
Another way to describe a polynomial, which can be even more succint than a sparse encoding, is as a straight-line program (SLP for short). The sequence of operations (starting with the variables X 0 and X 1 ) 16 . . .
performs n + 1 operations and ends up in a polynomial f of degree 2 n for which both the dense and sparse encodings have size Ω(2 n ). We can encode f via the sequence of operations above and this encoding has size O(n). Not unexpectedly, the catalog of operations which are expensive in terms of the SLP encoding is bigger. The only operation that is clearly efficiently performed in terms of the SLP size, besides arithmetic operations with polynomials, is evaluation: given an SLP in n variables X 1 , . . . , X n encoding a polynomial f and a point z ∈ C n , the computation of f (z) ∈ C takes time linear in the SLP size. A fundamental result of Baur and Strassen [5] shows that we can compute all of f (z), ∂ X1 f (z), . . . , ∂ Xn f (z) with essentially the same cost. The question which is naturally posed is the following: What is the cost of computing a zero (S17-Sparse) of a system in terms of its sparse (or SLP) size?
Before attempting any answer to this question it is worth to look at it with more detail. Consider the sparse size, to fix ideas. A first remark is that the set of polynomials in H d having some coefficient equal to 0 has measure zero in H d . Consequently, the average cost over H d in terms of the sparse size of systems in H d will be the same as that in terms of its dense size. One may refine the question above and consider only polynomials with a given monomial structure. For instance, one of the polynomials may be imposed to have the form in (35). The average (regarding this polynomial) would then be taken, not over the whole of H 8 , but over the tuples (a 0 , a 1 , a 2 , a 3 ) of coefficients, say from a Gaussian distribution on C 4 . Other components of the input system F ∈ H d may also have a (possibly different) fixed monomial structure.
This determines a space of inputs which is a linear subspace L of H d over which a Gaussian measure is naturally defined (or a uniform measure on its associated unit sphere). And suggests that an approach as described in Sections 2 and 3 could be possible. Unfortunately though, such an approach is plagued with difficulties.
Firstly, one observes that it may happen that all systems in L are singular. Indeed, this is the case, for instance, if a component of F has the form because all the zeros of F on the hypersurface {X 0 = 0} are double. For systems in such an L any linear homotopy will loop forever and the average complexity of the algorithm will be infinity. Secondly, L is not unitary invariant. In general, given F ∈ L and u ∈ U, we don't have uF ∈ L. This deprives us from a very powerful technical tool.
Nonetheless research in homotopy methods for sparse systems has been carried out for decades. And even though there are no conclusive results, advances have been made. The state of the art in this quest can be found in the papers [18,19] and in the references therein.

4.4.
Low-complexity systems. The framework of rigid homotopies introduced by Lairez naturally considers (possibly small) subsets of H d . Indeed, for a given input F ∈ H d the space where the homotopy path leading to F lives in is not the whole of H d (as in Sections 2 and 3) but its subset UF := {uF | u ∈ U}. The initial system G is not arbitrary in H d but of the form vF for some arbitrary v ∈ U. In addition, if L(F ) denotes the evluation complexity of F (that is, the length of the shortest SLP computing F ) then L(uF ) ≤ L(F ) + (n + 1) 4 . In particular, if F has a low evaluation complexity then all systems u t F in the homotopy path have low evaluation complexity as well.
These considerations are the motivation behind [11], where the goal is to devise an efficient rigid homotopy for low-complexity systems. On a first approach, a system F ∈ H d is fixed and random inputs are considered with the form uF where u is drawn from the uniform distribution in U. This mimics the framework in §3.2 replacing random F, G ∈ S(H d ) by random u, v ∈ U.
At a first glance, it would seem that the algorithm in [17] can be applied without modifications. Unfortunately, a careful look at the details shows that there is a difficulty. For f ∈ H d , the computation of γ Frob (f, z) described at the end of §4.2 relies on computing Weyl norms g k for the homogeneous components g k of the polynomial g := f (z + X). If f is given by a short SLP then g is given by an equally short SLP. This is clear. Lemma 3.4 in [11] shows that we can actually do more: for any w ∈ C n+1 we can compute the values g 0 (w), . . . , g d (w) with cost O(dL(f ) + n + log d). So, the evaluation complexity of the g k s is also small. We are left with the problem of computing their Weyl norms (obviously, without expanding the g k s to obtain their coefficients; this would be too expensive). The way out lies on (yet another) useful property of the Weyl norm, namely, that for g ∈ H d , This allows for a randomized algorithm to approximate g : sample s points w 1 , . . . , w s from S(C n+1 ), compute the empirical average M := 1 s |g(w i )| 2 and return n+d d M . The cost of this algorithm is low (it relies on evaluating g) but one pays for it both in terms of precision -we only obtain an approximation of g -and of certainty -we only have a probabilistic guarantee of correctness. More precisely, the following is Theorem 3.3 in [11].
Theorem 4.1. There is a randomized algorithm which, given f ∈ C[X 0 , . . . , X n ] as a black-box function, an upper bound d on its degree, a point z ∈ C n+1 , and some ε > 0, returns a number G ≥ 0 satisfying The algorithms in §3.2, §3.3, and §4.2 are all of Las Vegas type. This means that they are randomized algorithms (they draw random numbers during their execution) whose output is guaranteed to be correct: only their running time is random (for each input data). The algorithm in Theorem 4.1 is also a randomized one but of Monte Carlo type. Its running time is deterministic but its output is correct only with a given probability.
One can replace in the rigid homotopy the computation of γ Frob (f, z) described at the end of §4.2 by the randomized algorithm in Theorem 4.1. In doing so, however, the rigid homotopy may be affected in two ways. If the γ Frob s computed along the homotopy are overestimated (the upper bound in Theorem 4.1 does not hold) then the corresponding step-lengths are too short and the complexity bounds no longer hold. If, instead, some γ Frob is underestimated (the lower bound in Theorem 4.1 does not hold) then the correctness of the rigid continuation is compromised. The algorithm devised in [11], called there BoostBlackBoxSolve, deals with both problems. The next result (Theorem 1.1 in [11]) states its properties regarding termination and correctness. . . , f n ) be a homogeneous polynomial system with degrees at most D in n + 1 variables having only regular zeros. On input F , given as a black-box evaluation program, and ε > 0, Algorithm BoostBlackBoxSolve terminates almost surely and computes a point z ∈ P n , which is an approximate zero of F with probability at least 1 − ε. 2 Because of its reliance on the Monte-Carlo computation of the γ Frob s, Algorithm BoostBlackBoxSolve is itself of Monte Carlo type: the output is only correct with given probability. Nonetheless, this probability is bounded below independently of F . The algorithm succeeds with probability at least 1−ε for any input F with regular zeros (of course, it may not terminate if this regularity hypothesis is not satisfied as the homotopy path may lead to a singular zero).
We are next interested on the average cost of BoostBlackBoxSolve over random data of the form uF for a fixed F and random u ∈ U. As F itself is fixed, we should expect this cost to depend with the geometry of F . We can express this dependence in terms of γ Frob . We define, for f ∈ H d , where the distribution on Z(f ) ⊂ P(C n+1 ) is the uniform, and, for F ∈ H d , This quantity is a measure of regularity (or conditioning) of the set of hypersurfaces {Z(f 1 ), . . . , Z(f n )}. If all these hypersurfaces have only algebraically regular points then Γ(F ) < ∞. But the converse is not true. A necessary and sufficient condition for the finiteness of Γ(F ) is yet to be found.

FELIPE CUCKER
The complexity of BoostBlackBoxSolve, is bounded in the next result [11,Thm. 1.2]. Theorem 4.3. Let F = (f 1 , . . . , f n ) be a homogeneous polynomial system with degrees at most D in n + 1 variables. Assume that, for all i ≤ n, f i is squarefree. Let u ∈ U be uniformly distributed. Then, on input uF , given as a black-box evaluation program, and ε ∈ [0, 1 4 ], Algorithm BoostBlackBoxSolve terminates after (n, D) O(1) · L(F ) · Γ(F ) log Γ(F ) + log log ε −1 operations on average. "On average" refers to expectation over both the random draws made by the algorithm and the random variable u, but F is fixed. 2 The hypothesis on F (the fact that all the Z(f i ) are regular) implies that each f i is square-free. Hence, by Theorem 4.2, ensures that the algorithm terminates almost surely. It further implies that the quantity Γ(F ) is finite and, therefore, so is the bound in Theorem 4.3. We note that this bound is polynomial in n and D, linear in L(F ), almost linear in Γ(F ), and linear in log log ε −1 . The latter means that we can take ε = 2 −2 100 without afterthoughts as to the effect on the running's cost. For all practical purposes the algorithm behaves as one with certified outputs.

Algebraic branching programs. Theorem 4.3 can be extended to subsets
M ⊂ H d other than UF (for a given F ∈ H d ). The crucial requirement on M is that it is unitarily invariant in the following sense: M is endowed with a probability distribution such that for all F ∈ M and all u ∈ U, uF ∈ M , and F and uF are identically distributed. Trivially, the set UF is unitarily invariant.
Final average complexity bounds will now depend not on Γ(F ) for a particular F but on the quantity Ef∈M Γ(F ) 2 . The second main result in [11] estimates this quantity for a family of well-known subsets M given by SLPs known as Algebraic Branching Programs (ABP) introduced by Nissan [20].
To understand ABPs let us consider a simple example. Consider the matrices   whose entries are linear forms in (X 0 , X 1 ). We can take the product as well as its trace f A (X) = 7X 2 0 + 47X 0 X 1 − 42X 2 1 . The coefficients of this polynomial in H 2 are simple functions of the entries in A 1 and A 2 . In this example, the former is an element in C 3 and the latter an element in C 24 . Clearly, the map (A 1 , A 2 ) → f A is surjective. But this needs not to be so.
Let r = (r 1 , . . . , r d ) ∈ N d , r 0 = r d , and M r (n + 1) be the space of d-tuples of matrices A = (A 1 (X), . . . , A d (X)) where A j (X) is a r j−1 × r j matrix whose entries are linear forms on (X 0 , . . . , X n ). This is a complex linear space of dimension (n + 1) d j=1 r j−1 r j . For A ∈ M r (n + 1) we consider the homogeneous polynomial of degree d f A (X) := tr(A 1 (X) · . . . · A d (X)).
For large enough r 1 , . . . , r d the map A → f A is a surjection M r (n + 1) → H d , but for smaller values dim C M r (n + 1) < dim C H d . Tuples of polynomials described by ABPs give us a natural example of an M ⊂ H d for which we can actually efficiently run the rigid homotopy.
To show this, we endow M r (n + 1) with the Hermitian norm Frob where e i is the ith unit vector in C n+1 . With this norm at hand we can define the standard Gaussian distribution on M r (n + 1), via the density π dim C Mr(n+1) e − A 2 . We say that an ABP is irreducible when r 1 , . . . , r d−1 ≥ 2. The second main result in [11], Theorem 1.5, is then the bound for irreducible random ABPs Because the distribution of a Gaussian ABP is unitarily invariant one can derive from Theorems 4.2 and 4.3 the following [11, Corollary 1.6]. It is remarkable that the values r j do not occur on the right-hand side of (38). They do occur, however in the bound in Corollary 4.4 as, it is easy to see using iterated matrix multiplication, L = O(nDr 3 ) where r = max r j .