What is the project closure analysis?
Closure analysis determines statically which function de¯nitions reach which programpoints. This information is used for many di®erent purposes; e.g., type inference forobject-oriented programming languages [PS91], globalization analysis of functional pro-grams [Ses89], partial evaluation [Bon90], type recovery in Scheme [Shi90], and others.The reason why closure analysis is such a fundamental analysis in di®erent applications isthat it is (sort of) the universal data °ow analysis problem for monomorphic (data °oworiented) analyses for higher-order (functional) languages. As such it may be viewed asthe analogue of path analysis, which is universal for (continuous) ¯rst-order (classical)data °ow analysis problems [Tar81].General closure analysis is, unlike its ¯rst-order counterpart, expensive in the worstcase: the best known algorithms take time £(n3) in the worst case [Hen91a,PS92]. Manytimes it is su±cient, however, to get somewhat coarser information than the exact closureinformation. Coarser here means that the results may indicate that more function def-initions reach a point than a precise closure analysis would really yield; nonetheless theinformation should in practice be only marginally di®erent from the exact analysis.In this note we exhibit a simple, but very e±cient closure analysis based on the binding-time analysis of [Gom90] and the algorithm [Hen91b] for it. (In fact the algorithm hereis substantially simpler than the one in [Hen91b].) The computed abstract value °owinformation1 is coarser than normal closure analysis exactly in the following sense: exactclosure analysis keeps track of uni-directional °ow of (abstract) values from one programpoint to another whereas our simple closure analysis works with the assumption that °owsare reversible; that is, that any abstract value that °ows from point p to point p0 can also°ow (backwards!) from p0 to p.In the practice of partial evaluation this loss of information appears to be insubstantial.Since the simple closure analysis algorithm runs in almost-linear time [Hen91b] and is very¤DIKU Semantics Report D-1931We prefer to call the abstractions of values corresponding to expressions in a program abstract valuesrather than (abstract) closures, since these values represent values other than (run-time) closures, includingpairs, integers, etc.12 BASIC IDEA OF SIMPLE CLOSURE ANALYSIS2e±cient in practice [Hen91c] this appears to be an attractive alternative to computingcomplete closure information.2 Basic idea of simple closure analysisIn the binding-time analysis of [Gom89], on which [Hen91b] is based, we begin by asso-ciating a unique abstract value (also called a token, label or a type variable depending onthe intention of their use) with every (sub)expression in a program, and constraints areextracted that capture the °ow of actual values represented by these abstract values in theprogram. The °ow is, of course, a conservative approximation of the actual °ow of data.It not only makes the standard assumptions that all expressions inside a function de¯ni-tion are actually evaluated and that abstract values can °ow forwards and backwards, asmentioned above, but also that the °ows of di®erent arguments of a function merge insidethe function | the latter is why we refer to this analysis as monomorphic.In the second step, the critical one, these constraints are normalized by a (very small)set of rewriting rules. In the process di®erent abstract values may be identi¯ed, which istantamount to saying that one abstract value could \°ow" to the other, in any direction.At the end of this process a single abstract value represents a whole set of abstract valuesthat have been identi¯ed during normalization. Such a set may contain (abstract valueslabeling) closures as well as other program points, notably application points such as theactual parameters of function applications. This information can be interpreted by sayingthat the closures in the set may reach any of the application points in the same set. Theanalysis is conservative w.r.t. to exact closure analysis in the sense that no other closuresreach the application points, but that some of the reported closures may actually be shownnot to reach some application point in the same set by exact closure analysis. (Of course,\exact" closure analysis is itself a conservative approximation of the actual dynamic °owof run-time values, including concrete closures.)3 Simple closure analysis exempli¯edIn this section we exemplify the steps above by considering a simple example.Consider the following code (fragment), representing Turner's tautology checker andtwo calls to it:taut = fn f => fn n. if n = 0 then f else taut (f true) (n-1) and taut (f false) (n-1)g = fn x => fn y => (x and not y) or (not x or y)h = fn z => ztaut g 2 taut h 13.1 Constraint extractionIn the ¯rst step we associate a distinct abstract value with every occurrence of a subex-pression occurring in this code; for simplicity's sake all occurrences of a variable havethe same abstract value. We shall refer to the abstract value of an (occurrence of an)expression by the special variable ® indexed by the expression; e.g., the abstract valuesof the three function de¯nitions are ®taut; ®g and ®h. On top of these abstract valuesrepresenting (abstract) closures we have the abstract values ®¸f and ®¸n corresponding to3 SIMPLE CLOSURE ANALYSIS EXEMPLIFIED 3the partial applications of taut to one, resp. two arguments. Finally, we have ®¸y corre-sponding to the partial application of g to one argument. These are all the abstract valuesrepresenting (abstract) closures; of course, as mentioned before, every subexpression hasa corresponding (unique) abstract value.In the constraint extraction phase of the binding-time analysis algorithm the con-straints in Figure 1 are generated for the code above. Two kinds of constraints are gener-ated.² ® = ®0: Such an equation between abstract values expresses that one °ows to theother, and vice versa (Remember our basic assumption of bidirectionality of °ow!);e.g., ® could be the abstract value of an actual argument to a function, and ®0 theabstract value of the function's corresponding formal parameter.² ® ! ®0 · ®00 or integer · ®00, etc.: An inequation has an abstract value constructorapplied to n ¸ 0 abstract values on the left-hand side and an abstract value on theright; such an inequality expresses that the right-hand side abstract value \can be"the abstract value on the left (there may be more than one, but not with the sameconstructor; see below).3.2 Constraint normalizationIn the constraint normalization phase we combine de¯nitional information (constraints ofthe second form above) and form equivalence classes of abstract values (for the constraintsof the ¯rst form).For simple closure analysis we need only 2 of the 11 rewriting rules in [Hen91b]. Inparticular, there is no need for an occurs check rule since we don't have to interpret theresult as ¯nite type expressions; and there is no need for a special type Dynamic (or ¤)representing either possible type errors or run-time computable expressions or both, sincewe are interested neither in the former nor in the latter. In particular, the rewriting rulesmanipulating Dynamic can be omitted. We end up with the rewriting rules in Figure 2.As proved in [Hen91b] a rewriting system normalizing constraints with additionalrewriting rules can be implemented in time O(n®(n; n) where ®(n; n)