Complexity and randomness in the Heisenberg groups (and beyond)

By studying the commuting graphs of conjugacy classes of the sequence of Heisenberg groups $H_{2n+1}(p)$ and their limit $H_\infty(p)$ we find pseudo-random behavior (and the random graph in the limiting case). This makes a nice case study for transfer of information between finite and infinite objects. Some of this behavior transfers to the problem of understanding what makes understanding the character theory of the uni-upper-triangular group (mod p)"wild". Our investigations in this paper may be seen as a meditation on the question: is randomness simple or is it complicated?


In memoriam Vaughan Jones.
This paper is dedicated to Vaughan Jones. Both of the authors knew Vaughan, both from Berkeley and through being invited speakers to the seminars he organized in New Zealand, in Kaikoura and Hanmer Springs, and in Napier, respectively. He was a wonderful, lively human being. Always trying to communicate and get you involved with whatever current project he had in focus -from wind surfing to the the mathematics of tangles. He was happy to explain anything he knew about 'in English'.
He also liked to keep things lively. For example, during visits to Kaikoura, Diaconis and Vaughan's great friend Hugh Woodin fell in love with a New Zealand chocolate bar 'Peanut Slabs'. After the neighborhood supply was exhausted, they made forays to nearby towns surprised to find "somebody came in and bought out our stock yesterday". Vaughan had spotted their weakness and cornered local supplies. He walked through the conference the next day preening (and munching on a peanut slab (sigh)). He shared later, but the story went on. Diaconis routinely gives five or so talks a year at Berkeley. For the NEXT SEVERAL YEARS, walking into the conference room, there was a peanut slab on the lecturers table. Whether it was Vaughan or Hugh, it was a wonderful inducement.
The conjugacy classes of H 2n+1 (p) and H ∞ (p) are the center and the cosets of the center. Let [x, y, * ] = C xy denote these non central classes. Form a graph Γ(H 2n+1 ) (respectively Γ(H ∞ )) with vertices the elements of H 2n+1 (p) and an edge between two elements if they commute; such a Γ is often called a commuting graph. [For some purposes we may want to include a loop at each vertex, because an element commutes with itself.] Observe that if [x, y, z] ∈ C xy and [a, b, c] ∈ C ab commute then all elements of C xy commute with all elements of C ab . Thus, in Γ, each conjugacy class C xy forms a complete graph, and the induced bipartite graph between any two distinct conjugacy classes C xy and C ab is either complete or empty. So we may also form the quotient graphΓ(H 2n+1 ) (respectivelyΓ(H ∞ )) with vertices the non-central conjugacy classes and an edge from C xy to C ab if their elements commute, that is, if xb − ay = 0. A subtlety to note in this definition is that the quotient group mod the center is elementary abelian.
It will be useful to know that: In the case of H 2n+1 (p), we have seen the conjugacy classes are determined by 2n entries; subtract the all-zero entry, and add the p central elements. So Fact 1.1 gives e(H 2n+1 ) = |H 2n+1 |c(H 2n+1 ) = p 2n+1 (p 2n + p − 1). Dividing both sides by p 2(2n+1) shows that if two elements of H 2n+1 (p) are chosen at random, the chance that they commute is about 1/p + (1/p 2n ).
Section two contains background material on extraspecial p-groups, quasirandomness, 'the' random graph, axioms for H ∞ (p), symplectic spaces, and commuting graphs. In section three we give a 'bare hands' proof that the commuting graphs of the Heisenberg groups H 2n+1 (p) are quasi-random when n is large for fixed p. Section four shows that H ∞ contains the random graph as an induced subgraph in a strong sense (it is essentially generated by it) and so enjoys many parallel properties of randomness. Section five connects these random properties of the Heisenberg group to the group U T (n, p) -uni-upper-triangular matrices with entries in F p . It begins with a separate literature review on the difficulties of describing the conjugacy classes or characters of U T . This may be read now for further motivation.

Background
This section gives background material on Heisenberg and extraspecial groups (2.1), the random graph (2.2), quasirandomness (2.3), axioms (2.4), symplectic spaces (2.5), and commuting graphs (2.6). Since this is an interdisciplinary effort we have made an effort to bring the various topics to life (and to articulate one new open problem).
Note: in our main cases below, p will be an odd prime.
2.1. Heisenberg and extra-special p-groups. The construction of H 2n+1 and H ∞ given above makes sense for any ring. Over R, the Heisenberg groups form a central part of analysis (if this seems like overselling, see [Howe]). Over Z, they are a basic ingredient of theta functions in many variables. See [Mumford]. We will work over the finite field Extraspecial p-groups were introduced by Philip Hall and had early success in the Hall-Higman paper on Burnside's problem [Hall-Higman].
They are a standard topic in group theory texts [Aschbacher] [Suzuki] [Huppert]. It is known that N = 2n + 1 is forced. There are two nonisomorphic such groups (fixing n, p): our Heisenberg groups H 2n+1 (p) (distinguished by having every element of order at most p) and the groups M 2n+1 (p) which have elements of order p 2 . 1 For example M 3 (p) may be constructed as a semi-direct product, letting C p act on C(p 2 ) by j · k = (1 + jp)k mod p 2 . M 2n+1 (p) may be constructed as a central product of H 2n−1 and M 3 (p). So the present constructions for H 2n+1 may be easily mirrored for M 2n+1 (p). Indeed, M 2n+1 (p) and H 2n+1 have the same character table. N.B. while there is only one H 2n+1 and only one countable H ∞ there are two distinct countable M ∞ 's [Hall]. 2 The Heisenberg groups occur throughout mathematics and physics; there is even a literature on probability for these groups. To explain, work in H 3 (p). A minimal generating set is S = {[1, 0, 0], [−1, 0, 0], [0, 1, 0], [0, −1, 0]}. A Markov chain on H 3 (p) based on this is: The work cited below shows that this Markov chain has a uniform stationary distribution and order p 2 steps are necessary and sufficient for convergence to stationarity. The proofs use the eigenvalues and eigenfunctions of the associated Laplace operator L(x, y) = I − K(x, y). The spectrum of the Laplacian is a basic ingredient in understanding the geometry of the underlying space. See [Jorgenson and Lang], [Liu], and [Bump et al.], [Diaconis-Hough] which contain reviews.
In logic, the extraspecial p-groups have also been present for several decades; we mention a few central points. First, suppose p = 2. [Felgner] gave axioms T p for H ∞ (p) and proved there is only one countable such group, up to isomorphism, see §2.4 below. It is easy to see that the infinite extraspecial p-groups are unstable in the sense of Shelah's classification theory [Shelah]: it suffices to find a formula ϕ(x, y) and disjoint sets of distinct elements {a i : i < ω}, {b j : j < ω} such that ϕ(a i , b j ) holds if and only if i < j. (Use the formula [x, y] = 1 -conveniently, our edge relation in Γ -and section three, or read the footnote on page 12.) Thus, as already observed by Felgner, Shelah's theory applies to show that there are 2 λ pairwise nonisomorphic extraspecial p-groups of exponent p of each uncountable size λ; a more direct construction is in [Shelah-Steprans]. (Some parallel results on the case of exponent p 2 are in [Hall].) The extraspecial pgroups are a useful example of a (very) simple unstable theory -here "simple" can be read as both a model theoretic term and an English adjective -especially in their alternate guise as symplectic spaces over finite fields. See the work of [Cherlin-Hrushovski], and [Macpherson-Steinhorn]. In yet another direction, they were used by [Shelah-Steprans] to build so-called Ehrenfeucht-Faber groups, uncountable groups with every abelian subgroup of strictly smaller cardinality.
We conclude this discussion by stating what might be the first appearance of extraspecial p-groups. 2.2. The random graph. This remarkable object appears in graph theory and logic. There is a wonderful survey, with many readable, full proofs, in Peter Cameron's fine [Cameron]. As motivation, fix a large n, say 100. Pick two Erdős-Renyi 1 2 -graphs randomly. What is the chance they are isomorphic? Intuitively, "small". How small? Well, the number of possible one to one maps from [n] to [n] is n! and the number of graphs is (2) ( n 2 ) so it is at most n!/2 ( n 2 ) , small indeed. Now, let n = ∞. Pick two Erdős-Renyi 1 2 -graphs at random, independently (flip independent fair coins for each pair of vertices for each graph). What is the chance they are isomorphic? The surprising answer: the chance is 1 (!). This accounts for the label "the random graph." There are many non-random constructions. Let S be the set of all primes that are 1 mod 4. Put an edge from p to q if ( p q ) = 1. [The Legendre symbol ( p q ) = 1 if q is a quadratic residue mod p; by reciprocity, this happens if and only if ( q p ) = 1 so the graph is undirected.] The resulting graph is (isomorphic to) the random graph. 3 These graphs are robust. If R is the random graph and R is obtained from R by deleting any finite number of vertices and edges, then R ∼ = R. If the vertex set is decomposed into finitely many subsets, the induced graph on at least one subset is isomorphic to R.
The random (Rado) graph has a universality property; it contains every finite or countable graph as an induced subgraph. In particular, it contains a copy of the 3 Added when updating the manuscript: In response to the present paper, to further investigate the discontinuity in passing from Erdős-Renyi graphs G(n, 1 2 ) to the random graph R, Chatterjee and Diaconis asked in [Chatterjee and Diaconis]: Pick independent Γ 1 , Γ 2 G(n, 1 2 ) graphs. What is the size of the largest induced isomorphic subgraph? They show it has order 4 log 2 n.
In a similar vein, Alon [Alon] had shown that a random graph in G(N, 1 2 ) is universal in containing all subgraphs of order k, where k is of order log(N ) (he has more precise results).
infinite complete graph and an infinite graph with no edges. The Rado graph is highly symmetric: given any two finite isomorphic induced subgraphs there is an automorphism of R extending this isomorphism.
Logicians have long studied R -after all, a graph is just a symmetric binary relation. Say that a graph property P holds in almost all finite graphs if the proportion of N -vertex graphs where P holds tends to 1 (equivalently, if an Erdős-Renyi 1 2 -graph has property P with probability tending to 1). It is known that for any first order property in the language of graphs, P holds in R if P holds in almost all (finite) graphs. [This can be said as a zero-one law: let P be a sentence in the first-order language of graphs. Then P holds in almost all graphs or in almost no graphs.] We hasten to add that many interesting properties of graphs are not first order, connectedness and hamiltonicity, for instance; and much random graph theory is done with the chance of an edge tending to zero as n tends towards infinity. For much more on logic and graph theory, see [Spencer].
Above, we explained how the study of a random walk and the associated Laplacian gives insight into the geometry of a space. There is a curious random walk on R leading to open math problems and new understanding of the structure of R. To explain, it is easiest to use the following isomorphic description of R: Let N = {0, 1, 2, . . . }. Make a graph on N via: for i < j, set i ∼ j if and only if the i-th bit of j (in its binary expansion) is a 1. Thus, for instance, 0 is connected to all odd numbers and 1 is connected to 0 and 1 is connected to all j equivalent to 2 or 3 mod 4. For the walk, fix Q(j) > 0, j∈N Q(j) = 1 a positive probability on N. For j ∈ N, let N (j) = {k : k ∼ j} be the neighborhood of j. For definiteness, say Q(j) = 1 2 j+1 . (The same story works for any model of R and the rate of convergence would be the same; of course this depends on the measure.) Define a Markov chain on N by: Thus, from i, pick j among the neighbors with probability Q(j) (normalized to the neighbors). We were surprised to find this walk has a simple stationary distribution Π (so i∈N Π(i)K(i, j) = Π(j)). That is, for Z a normalizing constant. Indeed Π(i)K(i, j) = Π(j)K(j, i) as is easy to check, so Standard theory says that, for any i, j, is the chance of going from i to j in steps. The question is to determine the rate of this convergence. Let We want to know how large to take so K i − Π < , and how it depends on the starting state i: each i is connected to half of the points in N and the diameter of R is two. So, convergence to stationarity might be very rapid (a bounded number of steps suffice to make K i − Π < for all i). On the other hand, if i begins with many zeroes, it can take a long time for the chain to get close to zero. We don't know and hope one of our readers will report. The following computation shows that the starting state matters. It is easiest to use the Boolean model for R; the vertex set is {0, 1, 2, . . . } and for i < j there is an undirected edge from i to j if the ith bit of j is a 1. Take the driving measure to be Q(j) = 1 2 j+1 . (As explained below, the choice of Q doesn't matter much; essentially the same argument works for with the exponentiation iterated k times. So 2 (1) = 2, 2 (2) = 4, 2 (3) = 16 and we stipulate 2 (0) = 1. Below, the easily proved inequality 2 (k−1) ≤ 2 (k) − 2 (k−1) ≤ 2 (k) will be used. The idea of the argument is simple. Start the Markov chain at 2 (k) . The only smaller j connected to 2 (k) is 2 (k−1) . The larger j's have superexponentially small probability. Thus, in the first step, the walk goes 2 (k) → 2 (k−1) with probability super-exponentially close to 1. Similarly, it goes → 2 (k−2) → 2 (k−3) → · · · → 2 (k− ) in the first steps. For any fixed , the walk started at 2 (k) is most likely not at 0. But Π(0) ≥ 1 6 so the walk cannot have converged after steps.
Proposition 2.2. With notation as above, which completes the proof.
Remark. The arguments above are robust. They work if the starting state 2 (k) is replaced by any j with first non-zero bit in position 2 (k−1) . They work if the measure Q is replaced by any monotone decreasing probability, e.g. 1/(j +1)(j +2).
2.3. Quasirandomness. We now turn to finite random graphs. Although, as we have just observed, there are many nonisomorphic Erdős-Renyi 1 2 -graphs of a fixed finite size N , they generally share a series of rather special properties. For example, almost always (as N grows), the edge distribution between pairs of reasonably sized subsets of vertices is regular in the sense of Szemerédi's regularity lemma; most vertices have degree about n 2 ; all labeled graphs on a fixed finite number of vertices occur asymptotically with the same frequency -say, (labeled) 4-cycles are not less frequent than cliques of size 4 or independent sets of size 4.
One of the remarkable discoveries of [Chung-Graham-Wilson], [Thomason] was that it is possible to identify a set of such (asymptotic) graph properties, a priori quite different from each other but shared by random graphs, which are in fact equivalent to each other in the sense that any graph satisfying one of these properties must necessarily satisfy all of them. Such graphs are called quasi-random. Quasi-random graphs form a broader class than random graphs, but nonetheless this framework gives a powerful and often easily checkable means of speaking approximately or asymptotically of random behavior.
There is by now an extensive literature on this subject. Some variously formulated lists of the equivalent properties can be found in [Chung-Graham-Wilson] §3 (note that for ease of exposition they restrict there to edge probability 1 2 ) or in [Krivelevich-Sudakov] Theorem 2.6 or [Gowers] Theorem 2.1. For a textbook development see [Lovasz].
2.4. First-order axioms. When considering the infinite Heisenberg groups, of which there are many of arbitrarily large infinite size, we switch from writing H ∞ to H ω to indicate the one which is countably infinite. Here we indicate why this is justified for countable, though as already noted above, not uncountable sizes, by way of reviewing axioms and notation.
Recall that for p an odd prime, [Felgner] gave a set of first-order axioms T p in the language of groups (there is a binary function symbol × and a constant 1) which hold in H ω (p) and which express: the axioms for group theory (associativity: for all x, y, z we have x×(y×z) = (x×y)×z, identity: for all x, x×1 = 1×x = x, existence of inverse: for all x there exists y such that x × y = y × x = 1), that the center is a cyclic group of order p, that the derived group is contained in the center and is not trivial, and (infinitely many axioms expressing that) the factor group modulo the center is infinite. Note this avoids direct reference to the Frattini subgroup, the intersection of all maximal subgroups, which is not obviously first-order expressible since it quantifies over subgroups, rather than elements.
Logicians have various notions of equivalence that are subtly different. For a model M , write M |= T to mean that the set of axioms T all hold in M . (T for theory, which is just a set of axioms, here always of first-order logic.) Write M ≡ N , pronounced elementarily equivalent, to mean that exactly the same axioms hold in M and in N . Write M ∼ = N to mean they are isomorphic. M ∼ = N implies M ≡ N , but the reverse fails strongly by the upward and downward Löwenheim-Skolem theorems: if M is an infinite model, say in a countable 4 language, then there is at least one N ≡ M of every other infinite size. (See [Chang and Keisler] for standard model theory.) So to have a chance of determining the structure up to isomorphism we must specify not only the theory but the size, but often this is not enough: for example, there are many pairwise non-isomorphic algebraically closed fields of characteristic zero, and countable size: one of transcendence degree n for every n ∈ N, and one of countable transcendence degree. (However, there is just one of any fixed uncountable size, because in that case the transcendence degree and thus the isomorphism type is determined.) When it is enough -when there is, up to isomorphism, precisely one way to satisfy the axioms of T on a set of size κ -T is called κ-categorical.
Summarizing "there is exactly one countably infinite extraspecial p-group of exponent p" in our notation: (The background statement is that if p is an odd prime, H is a nonabelian finite or countable p-group of exponent p such that H = Z(H) is cyclic and H/Z(H) is elementary abelian, then whenever D 1 , D 2 are finite subgroups of H which are both extraspecial and both of the same size, every isomorphism from D 1 onto D 2 can be extended to an automorphism of H.) It follows that the theory T p is complete, meaning that for any other first-order sentence ϕ in its language, either ϕ or ¬ϕ follows from T p .
A note on first-order logic. Part of the art (in mathematics) is of course finding a balance between having enough expressive power to prove interesting theorems but not so much as to prevent abstraction. First order logic, in this sense, seems not unlike linear algebra: much interesting mathematics initially appears to escape it, but as, say representation theory shows for linear algebra (or, say, classification theory for first-order logic), already there is a remarkable explanatory power.
2.5. Symplectic spaces. There is a helpful correspondence between our extraspecial p-groups and non-degenerate symplectic spaces over F p , which we outline here with the referee's encouragement (and will refer to in section four). For more details see the excellent account in [Tomkinson], pages 49-53.
In the first direction, we are given an extraspecial p-group H of exponent p and we would like to find a nondegerate symplectic space over F p . Recall from above that the commutator is a map from H × H → Z(H) and only depends on the conjugacy classes, i.e., on the cosets of the center: the commutator of [x, y, z Recall also that the quotient group mod the center G = H/Z(H) is elementary abelian so, written additively, can be thought of as a vector space , now using round parentheses to not conflict with the notation for commutator.] Let g = [0, 0, 1] ∈ Z(H) (or any other generator of the center). Define F : commutes with all elements of its conjugacy class), and (iii) nondegenerate meaning that for every nonzero (u 1 , Compare ourΓ(H), whose edges and non-edges reflect whether F is zero.
In the second direction, we have a nondegenerate symplectic space V over F p , given with a basis {v i : i ∈ I} and an alternating bilinear map F , and we would like to find an extraspecial p-group of exponent p. Informally, we try to reverse the earlier quotient by remembering a center. Let H be the group formally given by vi,vj ) . Most properties are easily checked; the proof that these relations don't collapse goes by finding an explicit construction of a nonabelian group which satisfies them (see e.g. [Tomkinson] p. 52, or [Shelah-Steprans]).
In this sense our Heisenberg groups and symplectic spaces (mod p) are equivalent.
2.6. Commuting graphs. The commuting graph of a finite group, our Γ of the introduction, also has an extensive literature. [Pyber] includes many early references; a very recent survey of many kinds of graphs on groups, including the commuting graph, is [Cameron2]. There is a direct connection to randomness in the question: Let G be a finite group. Pick two elements in G at random, what is the chance they commute? An early theorem in the subject shows that, for G non-abelian, this is at most 5/8 (and this is sharp). For simple groups [Guralnick-Robinson] show that this chance is at most 1/2. For non-abelian solvable groups the chance is at most 1/12. This last paper surveys a surprisingly deep literature. We continue this discussion with two comments. First, important early questions in this area are suggested by the title of [Erdős-Strauss]: "How abelian is a finite group?" and the above-mentioned companion paper [Pyber] of the same name. The maximal cliques of the commuting graph Γ correspond to maximal abelian subgroups, however, the picture is a priori different from Ramsey's theorem: Pyber proves that every group of order n contains an abelian subgroup of order at least 2 √ log n for some > 0 and that this result is essentially best possible. Second, we include the short: Proof of Fact 1.1 (Erdős-Turán). The number of ordered pairs of commuting elements whose first element is a is |Z(a)|, the size of the centralizer of a. This number is the same for any a ∈ C(a), the conjugacy class of a. So the number of ordered pairs of commuting elements with first element conjugate to a is |C(a)||Z(a)| = |G|. Sum over c(G) conjugacy classes to get |G|c(G) ordered pairs.

Quasirandomness
This section gives a standalone proof that the sequence of commuting graphs of the Heisenberg group is quasi-random. Some of the properties that follow have other explanations, see for instance section 4 below.
Recall the graph G =Γ(H 2k+1 ) discussed above has vertices the non-central conjugacy classes C x,y with an edge from C x,y to C a,b if the elements of these two classes commute. Combining x and y into a vector of length 2k it is convenient to identify vertices with functions from {1, . . . , 2k} to p, including the all-zero function so n = |V (G)| = p 2k . For any v ∈ V (G) and 1 ≤ i ≤ 2k, we may write v(i) for the i-th entry of v. Let the support of v, We emphasize a notational point: in this section, k is from the subscript to H and n is the size of the graph.
Claim 3.1. Given k and p, we have that: (1) Every v ∈ V (G) other than the constant-zero element has degree d = p 2k−1 . ( (1) For any v ∈ G which is not constantly zero, we can count the elements w which commute with v by fixing some i ∈ sup(v) = ∅ and choosing w(j) freely for j = i. Thus deg(v) = p 2k−1 .
(2) Counting ordered pairs, compute the density of G by ((n−1)p 2k−1 +p 2k )/n 2 = 1 p + p−1 p 2k+1 < 1 p + 1 p 2k , remembering n = p 2k . (3) Case (a) is clear. Otherwise, one of the following must be true: [Looking for x commuting with both, we may choose x( ) for = i, j freely, but the value of x(i) is forced by commuting with v and subsequently the value of x(j) is forced by commuting with w.] (ii) the parallel to (i): we can choose 1 ≤ i, j ≤ 2k such that i ∈ sup(w), j ∈ sup(v) \ sup(w), and the proof is the same. (iii) not (i) or (ii), so sup(v) = sup(w), but one is not a multiple of the other.
In this case, to find x commuting with both, we are solving two equations in | sup(u)| ≥ 2 unknowns, so we are free to choose p 2k−2 coordinates of x as we like. (iv) v and w are nonzero multiples of each other; in this case, N (v) ∩ N (w) = N (v) = N (w) and there are p 2k−1 such x. Note that there are (n−1)(p−2) such pairs (v, w) with v = w. This completes the proof.
Our graph is thus close to being regular: all vertices but one have the same degree p 2k−1 (and the remaining one has degree p 2k ). Recall that a graph is called strongly regular if there are A, B such that if v, w have an edge between them then |N (v) ∩ N (w)| = A, and if v, w do not have an edge between them then |N (v) ∩ N (w)| = B. Our G is in in some sense also close to being strongly regular.

Theorem 3.2. The sequence of graphs
Proof. It will suffice, see e.g. the formulation of [Krivelevich-Sudakov] Theorem 2.6 item P7, to verify that for our density δ = 1 p + , By Claim 3.1 this sum has several kinds of components. For each of the pairs There are fewer than n 2 such pairs, so all together n 2 ( 1 n + 2 p ) = o(n 3 ). Otherwise, we have one (v, w) in case 3.1(3)(a), and (n − 1) + (n − 1) + (n − 1)(p − 2) ≤ p 2 n pairs in 3.1(3)(b). So the left side of equation (1) for these pairs is bounded above by Thus we have all the equivalent formulations of quasirandomness, for example: Let us verify that the analogous result transfers from the quotientΓ to Γ. Recall that in Γ, each point ofΓ blows up to a clique on p vertices (grouping together the p central classes in our picture to form the blow-up of 0). Write [v] for the set of vertices in the blow-up of v ∈Γ. Then if a ∈ [v], b ∈ [w] we have that (a, b) is an edge in Γ if and only if (v, w) is an edge inΓ. So the graph Γ has N = np = p 2k+1 vertices and each vertex x ∈ Γ, x / ∈ [0] has degree p · p 2k−1 = p 2k . Recalling 1.1, we may compute density via ordered pairs of edges in Γ by N (p 2k + p − 1)/N 2 = (p 2k + p − 1)/p 2k+1 = 1 p + , where = (p − 1)/p 2k+1 < p N goes to 0 as k → ∞. The exponents in Claim 3.1(3) go up by one in each case and so the parallel count in Theorem 3.2 goes through (with N instead of n).

Convention 3.5. For any group G and A ⊆ G a subset or subgroup, write Γ(A) to mean the induced subgraph of Γ(G) formed by restricting the vertex set to A.
Likewise, for A, B ⊆ G, write Γ(A × B) for the corresponding bipartite graph, where bipartite in this case means that we ignore the edges on A and on B, not that we require them not to exist. Discussion 3.6. We may also conclude from this that the sequence of bipartite graphs Γ(H 2m+1 × H 2m+1 ) as m → ∞ is quasirandom. Since we had allowed self-loops in Γ andΓ, the calculations are the same.

Rado-ness
This section works directly with infinite extraspecial p-groups H ω (p), for p an odd prime. The main result shows that the commuting graphΓ(H ω ) contains an induced copy of the Rado graph in quite a strong way: the vertices of R (together with the center Z(H ω )) generate H ω . We include different proofs of the various lemmas (one due to the referee) with different advantages. At the end, we discuss what these results say for finite Heisenberg groups.
Observe that H 2n+1 appears as a subgroup of H 2m+1 for m ≥ n, and also as a subgroup of H ω (consider those [x, y, z] in the larger group whose x and y have nonzero entries only in the first n places; it's sufficient to show this H 2n+1 appears, recalling the finite extraspecial p-groups of exponent p are determined up to isomorphism by their size). So it follows from quasirandomness 5 that: 6 Corollary 4.1. For every finite k there is n * = n * (k) such that every graph on k vertices appears as an induced subgraph ofΓ(H 2n+1 ) for every n ≥ n * .

Conclusion 4.2. InΓ(H ω ), every finite graph appears as an induced subgraph. (It follows by choosing representatives from each of the conjugacy classes involved that the same is true in Γ(H ω ).)
To motivate what follows: with a little more work (explained below), we can find the full Rado graph R insideΓ(H ω ) as an induced subgraph, but we might wonder how integral this subgraph is to the structure of the whole graph. SinceΓ is defined in terms of conjugacy classes, a notion of a group "built over a graph" is introduced as a surrogate for "the vertices of R generate H ω ."

Claim 4.4. Suppose H is an infinite extraspecial p-group and X ⊆ H is infinite and for every
x ∈ X there is y ∈ X which does not commute with x. Let G be the subgroup of H generated by X ∪ Z(H). Then G is an extraspecial p-group also. Note that |G| = |X|.

Remark 4.5. Why the hypothesis on X?
In the language of §2.5, if V is an infinite nondegenerate symplectic space and we are looking for subspaces corresponding to subgroups which are also extraspecial, they should be nondegenerate. Note that if X is an infinite set of elements of V which is nondegenerate in the sense that for every x i ∈ X there is x j ∈ X which does not commute with it, then the subspace W generated by X is also an infinite nondegenerate symplectic space.
Proof of Claim 4.4. Let us check that it satisfies Felgner's axioms T p [Felgner,p. 423]. By assumption, G satisfies the axioms for group theory. C p = Z(H) ⊆ G.
Since G is a subgroup of an extraspecial group and is not abelian, it must be normal. So Z(H) ⊆ Z(G) = {a ∈ G : the conjugacy class of a has size 1 in G }. By our assumption on X, any other element of G whose conjugacy class has size 1 in G must still have this property in H and so be in the center of H. So indeed (ii) the center is a cyclic group of order p. Axiom (iii), which says that the derived group is contained in the center, corresponds to a universal statement so automatically passes to substructures. 7 Axiom (iv), which says the derived group is not trivial, is satisfied since X is assumed to contain two elements which do not commute. Axiom (v) says every element has order p, which (is universal and) remains true. Axiom (vi) says the factor group modulo the center is infinite, which is true because X is infinite and the center has size p. For the last line, note that every element of G can be written as a finite string in elements of X ∪ Z(H) and the operations times and inverse, and if κ is infinite, then κ <ω = κ, i.e. the set of finite sequences of elements of κ has size κ.
We now look for infinite graphs R inΓ(H) in Lemma 4.7. Note that in the case where R and H are both countable, which is really all we need for R, we may see this simply in several ways. (The reader who believes it may look ahead to 4.8-4.9.) First proof:Γ(H ω ) contains any countable graph as an induced subgraph. By Corollary 4.1 and the compactness theorem of first-order logic, there is a countable extraspecial p-group H ≡ H ω such thatΓ(H) contains a given countable graph R as an induced subgraph. By ℵ 0 -categoricity, H ∼ = H ω .
Thanks to the referee for suggesting the following proof. For the quoted result about symplectic spaces (whose proof uses countability) see for example [Tomkinson] Theorem 3.9.
The referee's proof. It is known that a countable space with a symplectic form B has a basis {e 1 , e 2 , . . . , f 1 , f 2 , . . . } such that B(e i , f j ) = δ ij , and B(e i , e j ) = B(f i , f j ) = 0. Let v 1 , v 2 , . . . be the vertices of the Rado graph, or another given countable graph. Map v 1 to e 1 . Now assuming v 1 , . . . , v n have been mapped cor- Discussion 4.6. There is also a nice connection between this proof and footnote 4 (for n = ω), explained by the fact (see e.g. [Tomkinson] Corollary 3.10) that when p > 2, a countably infinite extraspecial p-group of exponent p is the direct product with amalgamated center of groups each isomorphic to H 3 (p). In this sense, we can see e i as [x, 0, 0] where x has 1 in the i-th place and zeroes elsewhere, and f i as [0, y, 0] where y has 1 in the i-th place and zeroes elsewhere. 7 The axiom as written says ∀x∀y∀z(zx −1 y −1 xy = x −1 y −1 xyz), which a priori hides quantifiers in the expressions "x −1 " and "y −1 " since our language has just × and 1. However, for any elements a, b, c, d in G such that ab = ba = 1 and cd = dc = 1, we have that ϕ [a, b, c, d] = ∀z(zbdac = bdacz) holds in H thus in G; or just observe that ∀x∀y∀v∀w∀z((xv = vx = 1 ∧ yw = wy = 1) =⇒ (zwvxy = wvxyz)) is truly universal, and note G contains all necessary inverses.

Lemma 4.7. For any infinite graph R, there is an infinite extraspecial p-group H [i.e., a model of T p ] whoseΓ(H) contains R as an induced subgraph.
Proof. We follow the classical proof of the compactness theorem via ultraproducts. Enumerate the set of vertices of R as {v α : α < κ} and enumerate the finite subsets of κ as u i : i ∈ I = [κ] <ℵ0 . For each α < κ let I α = {i ∈ I : α ∈ u i }. Then the set {I α : α < κ} has the finite intersection property, so may be extended to an ultrafilter D on I.
For each i ∈ I, let R i be the finite induced subgraph of R with vertex set {v α : α ∈ u i }. Choose an isomorphic copy of this graph inΓ(H ω ) and choose a representative of each conjugacy class involved as aΓ-vertex; this amounts to choosing a set A i ⊆ H ω of elements in distinct conjugacy classes and a bijection π i : {v α : α ∈ u i } → A i so that for any α, β ∈ u i , there is an edge between v α , v β in R if and only if π i (v α ) and π i (v β ) commute [i.e. if and only if there is an edge between π i (v α ) and π i (v β ) in Γ].
Work in the ultrapower H = (H ω ) I /D, which is also an extraspecial p-group by Loś' theorem (which implies the theory is preserved under ultrapowers: see [Chang and Keisler] Corollary 4.10). For each α < κ and each i ∈ I, define v α (i) to be π i (v α ) if α ∈ u i , and any element of H ω otherwise. For each α < κ, define v α = v α (i) : i ∈ I /D ∈ H. Then the induced subgraph of Γ(H) with vertex set {v α : α < κ} is isomorphic to R via the map v α → v α , since for any two α, β < κ the set {i : α, β ∈ u i } ∈ D, so there is an edge between v α , v β in Γ(H) if and only if there is an edge between v α and v β in R. Moreover, in H, {v α : α < κ} is a set of distinct elements, indeed a set of elements in distinct conjugacy classes, so there is a corresponding copy of R inΓ(H) replacing each v α by its conjugacy class.

Corollary 4.8. There is an induced copy of the Rado graph R inΓ(H ω ).
Proof. Let H be the extraspecial p-group given by Lemma 4.7 in the case where R is the countable Rado graph R (which clearly satisfies the nondegeneracy hypothesis). Let G be the subgroup of H generated by Z(H) and the (conjugacy classes of) elements forming the vertices of the copy of R. Then G is countable and by Claim 4.4 it is an extraspecial p-group, so by ℵ 0 -categoricity it is isomorphic to H ω .
The proofs just given show: Theorem 4.9. H ω can be built over the Rado graph R.
In fact, the proof shows that H ω can be built over any countable graph G which is nondegenerate in the sense that no vertex has full degree in G.
Discussion 4.10. Just for fun we briefly sketch a complementary proof of 4.1 from the uncountable. [Shelah-Steprans] explicitly construct uncountable extraspecial pgroups (all of whose maximal abelian subgroups are small). By looking carefully at their construction it is possible to see directly how to construct arbitrarily long finite sequences of elements whose patterns of commuting and non-commuting can be freely chosen along the way. Since T p is complete and the statement that there exist n elements no two of which are conjugate and which have a given pattern of commuting and non-commuting is first-order expressible (for any fixed finite n), it follows thatΓ(H ω ) contains any finite graph as an induced subgraph. It remains to derive 4.1. Fix a finite graph G. Let V be a set of elements of H ω which form the vertices of an induced copy of G. Let X be the smallest subset of H ω which is a union of conjugacy classes and contains V . Choose n minimal so that H 2n+1 contains X (since X is a finite set, it suffices to choose the minimal m such that all the nonzero entries in x, y of each [x, y, z] ∈ X are among the first m elements). Then for every ≥ n, G appears as an induced subgraph ofΓ(H 2 +1 ), and so (choosing representatives of the conjugacy classes) also of Γ(H 2 +1 ).
We may summarize sections three and four by saying that both the H 2n+1 (p)'s and H ω (p) may be reasonably (and in the finite case, quantitatively) understood as "random" objects.

Towards a picture of U n
In this section we revisit a certain notoriously complicated object, UT(n, p), the group of n × n uni-upper-triangular matrices with entries in F p (which we will soon abbreviate "U n "). This basic group arises as the Sylow p-subgroup of GL(n, p). It is a "universal p-group" in the sense that every p-group is a subgroup of some UT(n, p). The center consists of all matrices in UT(n, p) which are zero except in the (1, n)-entry. The commutator subgroup equals the Frattini subgroup. These consist of matrices in UT(n, p) which are zero on the diagonal just above the main diagonal. All of these facts are proved in [Suzuki].
There has been extensive study of the conjugacy classes and characters of UT(n, p). These have resisted explicit description and form a well known "wild" problem [Gudivok et al]. See [Pak-Soffer], [Isaacs] for reviews. This difficulty gave rise to Carlos André's 'super-character theory'. This lumps together certain conjugacy classes into superclasses and sums certain irreducible characters forming supercharacters. These are constant on superclasses and decompose the regular character. They have an elegant description involving set partitions. For surveys see [Aguiar et al.], [Diaconis-Isaacs]. We began this project hoping to understand what made this character theory so complicated. Asking whether or not it was reasonable to crudely equate 'complicated' with 'random' led us to the findings in sections 1-4 above. This section shows how the 'randomness' of the Heisenberg group manifests itself in UT(n, p).
Recall that, in our notation, the subscript in H 2n+1 counts the maximal possible number of nonzero nondiagonal entries, whereas the n + 2 in UT(n + 2, p) refers to the dimension of the matrix. In what follows we will often abbreviate UT(n, p) as "U n ". Summarizing, in our notation, elements of H 2n+1 and U n+2 are both uni-upper-triangular matrices of the same size.
First proof. Define a map from U n+2 to U n by simply erasing the top row, bottom row, first column and last column of a matrix in U n . By inspection, this is a homomorphism onto U n with kernel H 2n+1 (p).
Second proof, using additional information about superclasses. First, H 2n+1 is a subgroup of U n+2 . So it would suffice to show that H 2n+1 is a union of superclasses (because then H 2n+1 is a union of conjugacy classes). But if C is a superclass which contains some element of H 2n+1 , then all its elements must be elements of H 2n+1 , because of the form of the canonical representative.
Iterating this mapping, going from U n to U n−2 and so on gives a way of picturing U n+2 as a tower of Heisenberg groups, each normal in the next. Of course, p-groups have many such series; this one can be refined so that each subgroup has index p in the one above. These series can be hard to describe and work with. One of our discoveries is that these nested Heisenberg representations are quite understandable.
Nonetheless, the conjugacy classes of H 2n+1 in U n+2 will often be bigger than they were in H 2n+1 . To build an explicit example it is useful to use the fact that sometimes the neat superclasses described above are actually conjugacy classes in U n . Carlos Andre [André] has a necessary and sufficient condition for this to happen (thanks to Nat Thiem for spelling this out for us). The following construction uses this. Suppose 1 < k, < n + 2. Let C ⊆ H 2n+1 ⊆ U n+2 be the set of all matrices A such that (1) a 1, = 0 and a 1,i = 0 for 1 < i < , and (2) a k,n = 0 and a j,n = 0 for k < j < n. Then C is a conjugacy class of U n+2 if k < , and a union of more than one conjugacy class otherwise.
Observation 5.2. The image of U n+2 under the quotient by H 2n+1 is U n . Thus U n+2 is an extension of the normal subgroup H 2n+1 by the quotient U n . In fact, this extension is a semi-direct product. The matrices in U n+2 which are zero in the top row and last column form a subgroup isomorphic to U n , and the intersection with the subgroup which is non-zero only in the top row and last column (a copy of H 2n+1 ) is the identity, so the extension splits. This allows picturing U n inside U n+2 which will become important in Theorem 5.13 below.
In 5.3 the size of the subgroup is fairly small, and a priori it need not say much about the global structure of Γ(U n+2 ), but again, note U n is the quotient of U n+2 by precisely this subgroup.
The following introduces notation for this iterated decomposition that is used throughout the rest of this section. For simplicity in these iterated constructions, we will generally assume n is odd. Definition 5.4 is illustrated in Figures 2 and 3.
i.e., the unique A ∈ U n such that a k, = x k−1, −1 for 2 ≤ k < ≤ n + 1; Observe that defines a bijection from U n+2 to U n × H 2n+1 . Informally, u(X) is the "core" of X, the matrix resulting by removing the outermost rows and columns, and h(X) is its "shell." The first operation, u, can be iterated by defining u 0 (X) = X and u 1 (X) = u(X) and u 2 (X) = u(u(X)) ∈ U n−2 , and so on down to [1] ∈ U 1 .
In the next observation, and in what follows, we think of trees rooted at the bottom and growing upwards. We will give several observations and definitions before pausing to summarize the picture.
Observation 5.6. Under this partial order (U n+2 , ) is a tree with root [1] and with uniform branching at each level: for m odd and m ≤ n + 2, for each A ∈ U m , Observation 5.7. A necessary condition for X, Y to commute in U n+2 is that: (b) for any A, B ∈ U n , the bipartite commuting graph is a subgraph of Γ(H 2n+1 ×H 2n+1 ) (a priori, possibly empty or not induced).
The next definition asks "how many descendants of B (in the sense of the partial order ) commute with X." Note that while X ∈ U n+2 , here B belongs to U n+2 from 5.5 above, so a priori may be from any U m with 1 ≤ m ≤ n + 2, m odd. Recall also that for simplicity n, hence n + 2, is odd.
Example 5.9. If X, B ∈ U n+2 , then deg(X, B) = 1 if and only if X and B commute, otherwise it is zero. If X ∈ U n+2 and B ∈ U n− k , then unless B commutes with u (X), deg(X, B) = 0. If B = [1] ∈ U 1 , then deg(X, B) is simply the degree of X in the graph Γ(U n+2 ) (or Γ(U n+2 × U n+2 )).
Discussion 5.10. We pause to reflect on the picture this sequence of definitions gives of the structure of Γ(U n+2 ). (Since in U the conjugacy classes are no longer of uniform size, it is at least initially more convenient to work with Γ and not Γ.) The picture appears by induction. Since U 3 = H 3 , Γ(U 3 ) is described by the previous sections. Let n be odd and greater than or equal to 3. Reversing the map from the proof of Lemma 5.1 amounts to blowing up each vertex A of Γ(U n ) to a set {X ∈ U n+2 : u(X) = A} of size p 2n+1 = |H 2n+1 |. To form Γ(U n+2 ) we need to see how to put edges among these vertices. If A, B ∈ Γ(U n ) are not connected by an edge, then in Γ(U n+2 ), the bipartite commuting graph between {X ∈ U n+2 : u(X) = A} and {Y ∈ U n+2 : u(Y ) = B}, call it G(A, B) for this discussion, is empty. This is the content of saying: that u(X), u(Y ) commute is necessary for X and Y to commute. If A, B are connected by an edge, then G (A, B) is a subgraph of Γ(H 2n+1 ×H 2n+1 ) under the identification (X, Y ) → (h(X), h(Y )). "Subgraph" is the content of saying: that h(X), h(Y ) commute is necessary, but not always sufficient, for X and Y to commute. An example where G(A, B) is exactly Γ(H 2n+1 × H 2n+1 ) is when A = B = id Un ; this is 5.3 above. A different situation occurs when A = B has nonzero entries all along the diagonal just above the main diagonal and zeros otherwise. As can be calculated (e.g. using the equations below) the vertices of this G(A, B) have degree ≤ p 3 (independent of n). Meanwhile, a cumulative effect of this inductive "sparsification" is reflected in the density of Γ(U n ) decreasing rapidly as n grows, see below. The more local irregularities in the rates of decrease suggested by this discussion seem worthy of study.
We now work out a simple sufficient condition for G(A, B) to contain a full Γ(H 2m+1 × H 2m+1 ) as an induced subgraph for some m ≤ n related naturally to A, B. Of course, the condition assumes A, B commute. Similar results can be obtained considering k-partite graphs above A 1 , . . . , A k (provided they form a kclique in Γ(U n )). The reader may wish to look ahead to Discussion 5.11 or to the figures in the text.
Define σ A ⊆ {3, . . . n + 1} to be the set of columns with at least one nonzero Aentry above the diagonal, τ A ⊆ {2, . . . , n} the set of rows with at least one nonzero A-entry above the diagonal, and likewise for σ B , τ B . 9 If i ∈ {3, . . . , n+1}\σ B then the left-hand side of E 1,i is zero, and also y i,n+2 has zero coefficients [so effectively does not appear] in (b). If i ∈ {3, . . . , n+1}\σ A then the right-hand side of E 1,i is zero, and x i,n+2 has zero coefficients in (b). Likewise, if j ∈ {2, . . . , n} \ τ B then the left-hand side of E j,n+2 is zero and y 1,j has zero coefficients in (a), and if j ∈ {2, . . . , n} \ τ A then the right-hand side of E j,n+2 is zero and x 1,j has zero coefficients in (a).
It follows that given tuples (of elements of F p ) x 1,i : i ∈ τ B x j,n+2 : j ∈ σ B and y 1,i : i ∈ τ A y j,n+2 : j ∈ σ A which together solve (a) and (b), if we subsequently restrict to X = {X : u(X) = A, x 1,i = x 1,i for i ∈ τ B , x j,n+2 = x j,n+2 for j ∈ σ B } and Y = {Y : u(Y ) = B, y 1,i = y 1,i for i ∈ τ A , y j,n+2 = y j,n+2 for j ∈ σ A } then the commuting between X and Y is controlled only by the Heisenberg equation, and so corresponds to an induced subgraph of Γ(H 2n+1 × H 2n+1 ). Discussion 5.11. Here is an informal summary of where we are. In order for X, Y ∈ U n+2 to commute, it is necessary and sufficient that the following conditions are satisfied. First, u(X) must commute with u(Y ) in U n . Second, the equations asserting that the elements of XY and Y X along the top row and right column, i.e. in positions (1, 2), . . . , (1, n + 1), (2, n + 2), . . . , (n + 1, n + 2), are equal, must hold. Third, the "Heisenberg equation," asserting that the top right corners of XY and Y X are equal (i.e. that h(X) commutes with h(Y )) must hold. Restating, whenever u(X), u(Y ) commute in U n and the second equations are satisfied, control of commuting falls entirely to the Heisenberg equation. As for where we are going: as we will see below, if A, B ∈ U n (or some tuple A 1 , . . . , A k ∈ U n ) all commute and have "enough" zeros, then there exist reasonably large subsets X ⊆ {X : u(X) = A} and Y ⊆ {Y : u(Y ) = B} between which commuting is entirely controlled by the Heisenberg equation. ("Reasonably large": because the non-Heisenberg equations can then be satisfied by fixing a reasonably small number of values in the top row and right column.) If moreover these reasonably large subsets are "symmetric" (see below) then under the maps X → h(X), Y → h(Y ), our X, Y both map to the same copy of some H 2m+1 inside H 2n+1 , or the parallel for the k-partite case. A notion of symmetrization is introduced for this reason.
The next theorem summarizes an instance of this analysis, showing the recurrent appearance of Γ(H) (for some H) in Γ(U ). When t = 1, we find a copy of Γ(H) for some H; when t = 2, of Γ(H × H); we can just as easily state a more general version for arbitrary finite t, finding Γ(H × · · · × H). It is illustrated in Figure 5 below.
Theorem 5.13. Suppose A 1 , . . . , A t form a clique in Γ(U n ). Then there are subsets . , t such that the t-partite graph whose pieces are X * i (in Γ(U n+2 )) is naturally isomorphic to the t-partite commuting graph whose Proof. The above analysis asked for tuples of elements of F p with indices in a restricted set solving (a) and (b). Notice that in this case the trivial sequences (all zeros) are always a solution. Suppose, then, that we are given A 1 , . . . , A t ∈ U n forming a clique in Γ(U n ) and we are considering the t-partite commuting graph whose i-th part is {X : u(X) = A i }. Letting σ * = τ * = i σ Ai ∪ i τ Ai and restricting {X : u(X) = A i } to X * i = {X : u(X) = A i , x 1,i = 0 for i ∈ σ * , x j,n+2 = 0 for j ∈ τ * }, the commuting on the t-partite graph X * 1 , . . . , X * t is entirely controlled by the Heisenberg equation; moreover, there is a particular copy of H 2m+1 ⊆ H 2n+1 which is the common image of every X * i under the map X → h(X).
Discussion 5.14. First, regarding symmetrization: had we said in 5.13 only that the resulting graph was an induced subgraph of Γ(H 2n+1 ×· · ·×H 2n+1 ), a priori the vertex sets V i ⊆ H 2n+1 would need not be the same; for small sizes, this may affect statistics. This said, no real effort is being made here to optimize m. Second, some readers of Theorem 5.13 may see the possible power of H to recur, especially when the A i 's are similar or small, and some may see the (a priori) possible power of other A i 's to keep m low by collectively spreading out. Both forces are interesting in their own right (see 5.20 below). Further information on their relative strengths, perhaps on average, could certainly be interesting. Figure 5. Theorem 5.13 for A1 = · · · = At = A as in figure 4. First, the left image marks values implicated by τ (on the top row) and σ (right column). The middle image symmetrizes. Now set all marked values e.g. to zero, and on the right is the copy of H 2k+1 which remains. Here k = 2, n = 5.
Here is a simple example of a corollary of the previous proof, however, notice that the partition given depends on both A and B; it is not claimed that a single partition of {X : u(X) = A} works against {Y : u(Y ) = B} for any B.
Corollary 5.15. Let A, B ∈ U n and m = n − |σ A ∪ σ B ∪ τ A ∪ τ B |. There are equipartitions of X = {X : u(X) = A} into {X i : i < p 2(n−m) } and of Y = {Y : u(Y ) = B} into {Y i : i < p 2(n−m) } such that: for any two i, j, we have that Γ(X i × Y j ) is either isomorphic to Γ(H 2m+1 × H 2m+1 ) or the empty graph.
Proof. Let σ * = τ * = σ A ∪ σ B ∪ τ A ∪ τ B . Partition the elements of {X : u(X) = A} according to their values on {x 1,i : i ∈ τ * } ∪ {y j,n+2 : j ∈ σ * }. If A, B do not commute, there is an empty graph in each case. Otherwise, fixing i, j we have that the equations (a) and (b) [as above, or in 5.13] are uniformly satisfied or unsatisfied on all pairs from X i and Y j . The second case gives the empty graph, and the first reduces to the Heisenberg equation on the values which remain free.
Similar arguments show other ways quasirandomness is pervasive in Γ(U ).
Discussion 5.16. In H the randomness is only one level deep. In U the appearance of quasirandomness is indestructible in the sense that it recurs after arbitrary quotients by H, until we reach [1] (or in the other direction, reappears under unlimited reverse quotienting). Is there a quantifiable drift of edges as n → ∞ towards having both "lower rank" endpoints, visible for instance in deg(X, [1]) in terms of σ, τ of u(X)?
Discussion 5.18. The successive peeling off of Heisenberg groups can be organized differently. By simply multiplying matrices it is easy to see that in U (n + 2, p) the subgroup H(2n + 1, j, p) which is non-zero only in the top j rows and last j columns is a normal subgroup, with complement U (j) consisting of matrices in U (n + 2, p) which are zero in the top j rows and last j columns. This complement is isomorphic to U (n − 2j, p) (and the extension splits). H(2n + 1, j) is a subgroup of H(2n+1, j +1, p) for j from 1 to n/2 with the top being U (n+2, p). It is not always the case that there is a complement of a normal pattern subgroup: the center and commutator of U (n+2, p) are normal pattern subgroups without complements. For more on this see [Marberg].
Discussion 5.19. There has been recent work in developing various supercharacter theories for groups such as H(ω) and some U (ω). For an elegant exposition, see [Lochon]. It would be fascinating to see some applications of this theory along the lines of [Arias-Castro, Diaconis, Stanley]. See the work of [Bendikov, Saloff-Coste] for first steps.
We conclude with several comments about this very interesting graph.
Discussion 5.20. p. 22] gives the known bounds for c(U n (p)) as: where the lower bound is attributed to Higman and the upper bound to Soffer. By Fact 1.1 the average degree of a vertex in Γ(U n ) is subject to essentially the same bounds. It also follows that as n → ∞ the density of Γ(U n ) is "super-sparse" of order p −cn 2 . There is a lot of effort in the graph theory limit community to define structure theory for sparse graphs. These examples show that truly super-sparse graphs can still have interesting structure.