%include thesis.fmt

\hsdef{\begin{comment}
a,e,v_i,x_i,i,f
Ord,letrec,xs,bottom,bullet
\end{comment}}
\begin{comment}
\begin{code}
import Prelude hiding (repeat)
import Data.List hiding (repeat)

data Expr = EVar String
          | ECon String [Expr]
          | EFun String [Expr]
          | EApp Expr [Expr]
          | ELam String Expr
          | ELet String Expr Expr
          | ECase Expr [Alt]

data Alt = EAlt String [String] Expr

unions = foldr1 union

expensive = undefined
dummy = ()
\end{code}
\end{comment}

\chapter{Background}
\label{chp:background}

In this chapter we introduce the background material and general notations used throughout the rest of this thesis. We start by introducing a Core language in \S\ref{secB:core}, then discuss its sharing properties in \S\ref{secB:sharing} and how we generate Core in \S\ref{secB:generating_core}. We then cover the homeomorphic embedding relation in \S\ref{secB:homeomorphic}, particularly applied to the expression type of our Core language.

\section{Core Language}
\label{secB:core}

\begin{figure}
\renewcommand{\f}[1]{\text{\hspace{1cm}#1}}
\ind{Prog}\ind{Func}\ind{Expr}\ind{Alt}
\ignore\begin{code}
data Prog  =  [Func]              {-"\f{program}"-}

data Func  =  (f vs_ = x)         {-"\f{function}"-}

data Expr  =  v                   {-"\f{local variable}"-}
           |  c xs_               {-"\f{constructor application}"-}
           |  f xs_               {-"\f{function application}"-}
           |  x xs_               {-"\f{general application}"-}
           |  \v -> x             {-"\f{lambda abstraction}"-}
           |  let v = x in y      {-"\f{let binding, non-recursive}"-}
           |  case x of as_       {-"\f{case expression}"-}

data Alt   =  c vs_ -> x          {-"\f{case alternative}"-}
\end{code}

Where |v| ranges over variables, |c| ranges over constructors, |f| ranges over function names, |x| and |y| range over expressions and |a| ranges over case alternatives. \\
\caption{Syntax for the Core language.}
\label{figB:core}
\end{figure}

The syntax of our Core language is given in Figure \ref{figB:core}. To specify a list of items of unspecified length we write either \ignore|x_1,...,x_n| or |xs_|. Our Core language is higher order and lazy, but lacks much of the syntactic sugar found in Haskell. The language is based upon Yhc.Core, a semantics for which is given in \cite{me:yhc_core}.

A program is a list of functions, with a root function named |main|. A function definition gives a name, a list of arguments and a body expression. Variables and lambda abstractions are much as they would be in any Core language.  Pattern matching occurs only in case expressions; alternatives match only the top level constructor and are exhaustive, including an |error| alternative if necessary.

In later chapters it will be necessary to make a distinction between higher-order and first-order programs, so our Core language has some redundancy in its representation. Our Core language permits both lambda expressions, and allows top-level definitions to take arguments. There are three forms of application, all of which take two values: the first value may be either a constructor, a top-level named function, or any arbitrary expression; the second value is a list of arguments, which may be empty. These forms of application give rise to three equivalences:

\ignore\begin{code}
(c  xs_) ys_ == c  xs_ ys_
(f  xs_) ys_ == f  xs_ ys_
(x  xs_) ys_ == x  xs_ ys_
\end{code}

We allow a list of variables to appear in a lambda abstraction and a list of bindings to appear in a let. This syntactic sugar can be translated away using the following rules:

\ignore\begin{code}
\v vs_ -> x              => \v -> (\vs_ -> x)
let v = x ; binds_ in y  => let v = x in (let binds_ in y)
let v vs_ = x xs_ in y   => let v = x in (let vs_ = xs_ in y)
\end{code}

The arity of a top-level function is the number of arguments in its associated definition. In any application, if the function is given fewer arguments than its arity we refer to it as \textit{partially-applied}, matching the arity is \textit{fully-applied}, and more than the arity is \textit{over-applied}.

Some functions are used but lack corresponding definitions in the program. These are defined to be \textit{primitive}. They have some meaning to an underlying runtime system, but are not available for transformation. A primitive function may perform an action such as outputting a character to the screen, or may manipulate primitive numbers such as addition.

The largest difference between our Core language and GHC-Core \cite{ghc_core} is that our Core language is untyped. Core is generated from well-typed Haskell, and is guaranteed not to fail with a type error. All our algorithms could be implemented equally well in a typed core language, but we prefer to work in an untyped language for simplicity of implementation. For describing data types we use the same notation as Haskell 98. One of the most common data types is the list, which can be defined as:

\begin{code}
data List alpha = Nil | Cons alpha (List alpha)
\end{code}

A list is either an empty list, or a cons cell which contains an element of the list type and the tail of the list. For example the list of 1,2,3 would be written |(Cons 1 (Cons 2 (Cons 3 Nil)))|. We allow the syntactic sugar of representing |Cons| as a right-associative infix application of |(:)| and |Nil| as |[]| -- allowing us to write |(1:2:3:[])|. We also permit |[1,2,3]|.


\subsection{Operations on Core}

There are several operations that can be defined on our Core expressions type. We present some of those used in later chapters.

\subsubsection{General Operations}

\begin{figure}
\ind{CtorName}\ind{VarName}\ind{FuncName}\ind{body}\ind{args}\ind{rhs}\ind{arity}\ind{ctors}
\begin{code}
type CtorName  = String
type VarName   = String
type FuncName  = String

body   :: FuncName  -> Expr
args   :: FuncName  -> [VarName]
rhs    :: Alt       -> Expr
arity  :: String    -> Int
ctors  :: CtorName  -> [CtorName]
\end{code}
\caption{Operations on Core.}
\label{figB:core_operations}
\end{figure}

Figure \ref{figB:core_operations} gives the signatures for helper functions over the core data types. We use the functions |body f| and |args f| to denote the body and arguments of the function definition for |f|. We use the function |rhs| to extract the expression on the right of a case alternative. Every function and constructor has an arity, which can be obtained with the |arity| function. To determine alternative constructors the |ctors| function can be used; for example \h{stmt}|ctors "True" = ["False", "True"]| and \h{stmt}|ctors "[]" = ["[]",":"]|.


\subsubsection{Substitution}

We define \ignore|e[v / x]| to be the capture-free substitution of the variable |v| by the expression |x| within the expression |e|. We define \ignore|e[v_1,...,v_n / x_1,...,x_n]| to be the simultaneous substitution of each variable |v_i| for each expression |x_i| in |e|.

\begin{example}
\ignore\begin{code}
(v + 1)[v / 2]               => 2 + 1
(let v = 3 in v + 1)[v / 2]  => let v = 3 in v + 1
\end{code}
\end{example}

\subsubsection{Variable Classification}

\begin{figure}
\ind{freeVars}
\begin{code}
freeVars :: Expr -> [VarName]
freeVars (EVar v       ) = [v]
freeVars (ECon c xs_   ) = freeVars' xs_
freeVars (EFun f xs_   ) = freeVars' xs_
freeVars (EApp x xs_   ) = freeVars x `union` freeVars' xs_
freeVars (ELam v x     ) = freeVars x \\ [v]
freeVars (ELet v x y   ) = freeVars x `union` (freeVars y \\ [v])
freeVars (ECase x as_  ) = freeVars x `union` unions (map f as_)
    where f (EAlt c vs_ y) = freeVars y \\ vs_

freeVars' xs_ = unions (map freeVars xs_)
\end{code}
\caption{Free variables of an expression.}
\label{figB:free_variables}
\end{figure}

An occurrence of a variable |v| is \textit{bound} in |x| if it occurs on the right-hand side of a case alternative whose pattern includes |v|, as the argument of an enclosing lambda abstraction or as a binding in an enclosing let expression; all other variable occurences are \textit{free}. The set of free variables of an expression |e| is denoted by |freeVars e|, and can be computed using the function in Figure \ref{figB:free_variables}.

In order to avoid accidental variable name clashes while performing transformations, we demand that all variables within a program are unique. All transformations may assume and should preserve this invariant.


\subsection{Simplification Rules}
\label{secB:core_simplify}

\begin{figure}
\begin{simplify}
\simp{app-app}{
\ignore\begin{code}
(x xs_) ys_
    => x (xs_ ++ ys_)
\end{code}}

\simp{fun-app}{
\ignore\begin{code}
(f xs_) ys_
    => f (xs_ ++ ys_)
\end{code}}

\simp{con-app}{
\ignore\begin{code}
(c xs_) ys_
    => c (xs_ ++ ys_)
\end{code}}

\simp{case-con}{
\ignore\begin{code}
case c xs_ of {... ; c vs_ -> y ; ...}
    => let vs_ = xs_ in y
\end{code}}

\simp{lam-app}{
\ignore\begin{code}
(\v -> x) y
    => let v = y in x
\end{code}}

\simp{case-app}{
\ignore\begin{code}
(case x of {c_1 vs__1 -> y_1 ; ... ; c_n vs__n -> y_n}) z
    => case x of {c_1 vs__1 -> y_1 z ; ... ; c_n vs__n -> y_n z}
\end{code}}

\simp{let-app}{
\ignore\begin{code}
(let v = x in y) z
    => let v = x in y z
\end{code}}

\simp{let-case}{
\ignore\begin{code}
let v = x in (case y of {c_1 vs__1 -> y_1 ; ... ; c_n vs__n -> y_n})
    => case y of  {  c_1 vs__1  -> let v = x in y_1
                  ;  ...
                  ;  c_n vs__n  -> let v = x in y_n}
    where v {-" \hbox{is not used in } "-} y
\end{code}}

\simp{case-let}{
\ignore\begin{code}
case (let v = x in y) of as_
    => let v = x in (case y of as_)
\end{code}}

\simp{case-case}{
\ignore\begin{code}
case (case x of {c_1 vs__1 -> y_1 ; ... ; c_n vs__n -> y_n}) of as_
    => case x of  {  c_1 vs__1  -> case y_1 of as_
                  ;  ...
                  ;  c_n vs__n  -> case y_n of as_ }
\end{code}}

\simp{case-lam}{
\ignore\begin{code}
case x of {... ; c vs_ -> \v -> y ; ...}
    => \z -> case  x of
                   {... z ; c vs_ -> (\v -> y) z ; ... z}
\end{code}}

\simp{let}{
\ignore\begin{code}
let v = x in y
    => y[v/x]
    where v {-" \hbox{occurs once in } "-} y {-" \hbox{, see \S\ref{secB:sharing}} "-}
\end{code}}
\end{simplify}
\caption{Simplification rules.}
\label{figB:simplify}
\end{figure}

We present several simplification rules in Figure \ref{figB:simplify}, which can be applied to our Core language. These rules are standard and would be applied by any optimising compiler \cite{spj:transformation}. Some of the rules duplicate code, but none duplicate work. All the rules preserve both the semantics and the sharing behaviour of an expression. We believe the rules are confluent.

The (app-app), (fun-app) and (con-app) rules normalise applications. The (case-con) and (lam-app) rules simply follow the semantics, using let expressions to preserve the sharing. The (case-app), (let-case) and (case-case) rules move outer expressions over an inner case expression, duplicating the outer expression in each alternative. The (case-lam) rule promotes a lambda from inside a case alternative outwards. The (let-app) and (let-case) rules move an expression over an inner let expression. The (let) rule substitutes let expressions where the bound variable is used only once, and therefore no loss of sharing is possible.


\begin{comment}
\section{Semantics}
\label{secB:semantics}

The evaluation strategy of our Core language is lazy. We specify a reduction to weak-head normal form (WHNF) in Figure \ref{figB:whnf} and a reduction to normal form (NF) in \ref{figB:nf}. We have assumed that all applications are translated to general applications where the second value is a 1-element list. The \ignore|_F| function translates a function name to an associate expression incorporating the program arguments as variables in a lambda, and the body as the body of that lambda. The grammar for expressions in WHNF is:

\ignore\begin{code}
r  =  c \< xs_ \>  {-" \text{  constructor} "-}
   |  \v -> x      {-" \text{  lambda} "-}
   |  bottom       {-" \text{  bottom/undefined} "-}
\end{code}

The grammar for expression in NF is:

\ignore\begin{code}
n  =  c \< ns_ \>  {-" \text{  constructor} "-}
   |  \v -> x      {-" \text{  lambda} "-}
   |  bottom       {-" \text{  bottom/undefined} "-}
\end{code}

\newcommand{\sem}[1]
    {& \begin{array}{c}#1\smallskip\end{array}}
\newcommand{\semm}[2]
    {& \frac{\begin{array}{c}#1\end{array}}
            {\begin{array}{c}#2\smallskip\end{array}}}
\newcommand{\semmm}[3]
    {& \frac{\begin{array}{c}#1\\#2\end{array}}
            {\begin{array}{c}#3\smallskip\end{array}}}

\begin{figure}
\begin{eqnarray}
\sem
    {\ignore|c => c \< \>|}
\\ \semmm
    {\ignore|_F(f) = x|}
    {\ignore|x => r|}
    {\ignore|f => r|}
\\ \semm
    {\ignore|x => bottom|}
    {\ignore|x y => bottom|}
\\ \semm
    {\ignore|x => c \< xs_ \>|}
    {\ignore|x y => c \< xs_ y \>|}
\\ \semmm
    {\ignore|x => \v -> x'|}
    {\ignore|x'[v/y] => r|}
    {\ignore|x y => r|}
\\ \semm
    {\ignore|y[v/x] => r|}
    {\ignore|let v = x in y => r|}
\\ \semmm
    {\ignore|x => c \< xs_ \>|}
    {\ignore|y[vs_ / xs_] => r|}
    {\ignore|case x of {... ; c vs_ -> y ; ...} => r|}
\\ \semm
    {\ignore|x => bottom|}
    {\ignore|case x of alts_ => bottom|}
\end{eqnarray}
\caption{Reduction to weak-head normal form, |(=>)|.}
\label{figB:whnf}
\end{figure}

\begin{figure}
\begin{eqnarray}
\semm
    {\ignore|x => bottom|}
    {\ignore|x =>* bottom|}
\\ \semm
    {\ignore|x => \v -> y|}
    {\ignore|x =>* \v -> y|}
\\ \semmm
    {\ignore|x => c \< xs_ \>|}
    {\ignore|xs_ =>* xs_' |}
    {\ignore|x =>* c \< xs_' \>|}
\end{eqnarray}
\caption{Reduction to normal form, \ignore|(=>*)|.}
\label{figB:nf}
\end{figure}

Our semantics does not respect sharing, this topic is dealt with in \S\ref{secB:sharing}. The evaluation of a program corresponds to reducing |main| to NF.

We never descend below a bound variable without replacing that variable, therefore any remaining variable would have to be free in the definition of the program, which is not permitted. Consequently, there is no rule for a variable. We choose not evaluate inside a lambda, even on \ignore|=>*|, as this would generate a free variable. The \ignore|_F| mapping never crashes, as it is checked statically that all mentioned function names exist in the mapping.

The only way an undefined value may be initially generated is by a call to the error function in the program, i.e. \ignore|_F(error) = \v -> bottom|. If a computation forces a |bottom| value, in the scrutinee of a case or the first argument of an application, the |bottom| is propagated.

There is no rule for case of a lambda abstraction, as this situation is not permitted by the type system. Similarly, the type system plus the exhaustiveness of case branches means that if the scrutinee successfully evaluates, it will match exactly one alternative.
\end{comment}


\section{Sharing}
\label{secB:sharing}

This section informally discusses the relevant sharing properties of Haskell. In general, any optimisation must take account of sharing, but semantic analysis can sometimes ignore the effects of sharing. The sharing present in Haskell is not specified in the Haskell Report \cite{haskell}, but a possible interpretation is defined elsewhere \cite{bakewell:space_semantics}.

\subsection{Let bindings}

A let expression introduces \textit{sharing} of the computational result of expressions.

\begin{example}
\ignore\begin{code}
let x = f 1
in x + x
\end{code}

The evaluation of this expression results in:

\ignore\begin{code}
(x + x)[x / f 1]
(f 1 + f 1)
\end{code}

The expression |f 1| is reduced twice. However, a compiler would only evaluate |f 1| once. The first time the value of |x| is demanded, |f 1| evaluates to weak head normal form, and is bound to |x|. Any successive examinations of |x| return immediately, pointing at the same result.
\end{example}

\begin{figure}
\ind{occurs}\ind{linear}
\begin{code}
occurs :: VarName -> Expr -> Int
occurs v (EVar v'      ) = if v == v' then 1 else 0
occurs v (ECon c xs_   ) = occurss v xs_
occurs v (EFun f xs_   ) = occurss v xs_
occurs v (EApp x xs_   ) = occurss v (x:xs_)
occurs v (ELam v' x    ) = if v == v' then 0 else 2 * occurs v x
occurs v (ELet v' x y  ) = if v == v' then 0 else occurss v [x,y]
occurs v (ECase x as_  ) = occurs v x + maximum (map f as_)
    where f (EAlt c vs_ y) = if v `elem` vs_ then 0 else occurs v y

occurss v = sum . map (occurs v)

linear :: VarName -> Expr -> Bool
linear v x = occurs v x <= 1
\end{code}
\caption{Linear variables within an expression.}
\label{figB:linear}
\end{figure}

In general, the substitution of a bound variable by the associated expression may cause duplicate computation to be formed. However, in some circumstances, duplicate computation can be guaranteed not to occur. If a bound variable can be used at most once in an expression, it is said to be \textit{linear}, and substitution can be performed. A variable is linear if it is used at most once, i.e. occurs at most once down each possible flow of control according to the definition in Figure \ref{figB:linear}.

\subsection{Recursive let bindings}

In the Haskell language, let bindings can be \textit{recursive}. A recursive let binding is one where the local variable is in scope during the computation of its associated expression. The |repeat| function is often defined using a recursive let binding.

\begin{example}
\label{exB:repeat}
\ind{repeat}
\begin{code}
repeat x =  let xs = x : xs
            in xs
\end{code}

Here the variable |xs| is both defined and referenced in the binding. Given the application |repeat 1|, regardless of how much of the list is examined, the program will only ever create one single cons cell. This construct effectively ties a loop in the memory.
\end{example}

Our Core language does not allow recursive let bindings, for reasons of simplicity. If there is a recursive binding to a function, it will be removed by lambda lifting \cite{lambda_lift}. To remove all recursive let bindings, we can replace value bindings with lambda expressions applied to dummy arguments, then lambda lift.

\begin{examplerevisit}{\ref{exB:repeat}}
Applying this algorithm to our example from before, we first add a lambda expression and a dummy argument:

\begin{code}
repeat x =  let xs = \dummy ->  x : xs dummy
            in xs dummy
\end{code}

Then we lambda lift:

\begin{onepage}
\begin{code}
repeat x = f dummy x

f dummy x = x : f dummy x
\end{code}
\end{onepage}

Optionally, we can remove the inserted |dummy| argument:

\begin{code}
repeat x = f x

f x = x : f x
\end{code}
\end{examplerevisit}

In the |repeat| example we have lost sharing of the |(:)|-node. If a program consumes $n$ elements of the list generated by the new |repeat| function, the space complexity will be $O(n)$, compared to $O(1)$ for the recursive let definition. The time complexity remains unchanged at $O(n)$, but the constant factor will be higher. However, in other examples, the time complexity may be worse.

\begin{example}
Consider the following program, where |f| is an expensive computation:

\begin{code}
main x =  let y = f x : y
          in y
\end{code}

We insert dummy arguments around recursive lets:

\begin{code}
main x =  let y = \dummy -> f x : y dummy
          in y dummy
\end{code}

We have now changed the time complexity of the example. Originally |f| was performed once per call of |main|, in the revised code |f| will be performed once for each element of |main| demanded -- an unbounded number of times, changing the complexity.
\end{example}

In practice, only a small number of programs make use of values bound in recursive lets, and nearly all of them are instances of |repeat|. However, it is possible to construct examples where the removal of recursive lets makes the computation significantly more expensive.

\subsection{Constant Applicative Forms}

A Constant Applicative Form (CAF) is a top level definition of zero arity. In Haskell, CAFs are computed at most once per program run, and retained as long as references to them remain.

\begin{example}
\begin{code}
caf = expensive

main = caf + caf
\end{code}

A compiler will only compute |expensive| once.
\end{example}

If a function with positive arity is inlined, this will not dramatically change the runtime behaviour of a program. If a CAF is inlined, this may have adverse effects on the performance.


\section{Generating Core}
\label{secB:generating_core}

In order to generate our Core language from the full Haskell language, we use the Yhc compiler \cite{yhc}, a fork of nhc \cite{nhc}.

The internal Core language of Yhc is PosLambda -- a simple variant of lambda calculus without types, but with source position information. Yhc works by applying basic desugaring transformations, without optimisation. This simplicity ensures the generated PosLambda is close to the original Haskell in its structure. Each top-level function in a source file maps to a top-level function in the generated PosLambda, retaining the same name. However, PosLambda has constructs that have no direct representation in Haskell. For example, there is a FatBar construct \cite{spj:implementation}, used for compiling pattern matches which require fall through behaviour. We have therefore introduced a new Core language to Yhc, to which PosLambda can easily be translated \cite{me:yhc_core}.

The Yhc compiler can generate the Core for a single source file. Yhc can also link in all definitions from all necessary libraries, producing a single Core file representing a whole program. All function and constructor names are fully qualified, so the linking process simply involves merging the list of functions from each required Core file.

In the process of generating a Core file, Yhc performs several transformations. Haskell's type classes are removed using the dictionary transformation (see \S\ref{secB:dictionary_transformation}). All local functions are lambda lifted, leaving only top-level functions -- ensuring Yhc generated Core does \textit{not} contain any lambda expressions. All constructor applications and primitive applications are fully applied.


\subsection{The Dictionary Transformation}
\label{secB:dictionary_transformation}

Most transformations in Yhc operate within a single function definition. The only phases which require information about more than one function are type checking and the transformation used to implement type classes \citep{wadler:type_classes}. The dictionary transformation introduces tuples (or \textit{dictionaries}) of methods passed as additional arguments to class-polymorphic functions. Haskell also allows subclassing. For example, |Ord| requires |Eq| for the same type. In such cases the dictionary transformation generates a nested tuple: the |Eq| dictionary is a component of the |Ord| dictionary.

\begin{example}
\label{exB:dictionary}
\begin{code}
f :: Eq alpha => alpha -> alpha -> Bool
f x y = x == y || x /= y
\end{code}

\noindent is translated by Yhc into

\ignore\begin{code}
f :: (alpha -> alpha -> Bool, alpha -> alpha -> Bool) -> alpha -> alpha -> Bool
f dict x y = (||) (((==) dict) x y) (((/=) dict) x y)

(==) (a,b) = a
(/=) (a,b) = b
\end{code}

The |Eq| class is implemented as two selector functions, |(==)| and |(/=)|, acting on a method table. For different types of |alpha|, different method tables are provided.
\end{example}

The dictionary transformation is a global transformation. In Example \ref{exB:dictionary} the |Eq| context in |f| not only requires a dictionary to be accepted by |f|; it requires all the callers of |f| to pass a dictionary as first argument. There are alternative approaches to implementing type classes, such as \citet{jones:dictionary_free}, which does not create a tuple of higher order functions. We use the dictionary transformation for simplicity, as it is already implemented within Yhc.


\section{Homeomorphic Embedding}
\label{secB:homeomorphic}

The homeomorphic embedding relation \cite{leuschel:homeomorphic} has been used to guarantee termination of certain program transformations \cite{sorensen:supercompilation}. The relation $x \unlhd y$ indicates the expression $x$ is an embedding of $y$. We can define $\unlhd$ using the following rewrite rule:

\[
emb = \{f(x_1,\dots,x_n) \rightarrow x_i \, || \, 1 \leq i \leq n\}
\]

Now $x \unlhd y$ can be defined as $x \leftarrow^*_{emb} y$ \cite{term_rewriting}. The rule $emb$ takes an expression, and replaces it with one of its immediate subexpressions. If repeated non-deterministic application of this rule to any subexpression transforms $y$ to $x$, then $x \unlhd y$. The intuition is that by removing some parts of $y$ we obtain $x$, or that $x$ is somehow ``contained'' within $y$.

Some examples:

\begin{center}
\begin{tabular}{r@@{ $\unlhd$ }l@@{\hspace{15mm}}r@@{ $\ntrianglelefteq$ }l}
$a$ & $a$                       & $b(a)$ & $a$ \\
$a$ & $b(a)$                    & $a$ & $b(c)$ \\
$c(a)$ & $c(b(a))$              & $d(a,a)$ & $d(b(a),c)$ \\
$d(a,a)$ & $d(b(a),c(c(a)))$    & $b(a,a)$ & $b(a,a,a)$
\end{tabular}
\end{center}

\smallskip

Homeomorphic embedding $\unlhd$ is a well-quasi order, meaning that for every infinite sequence of expressions $e_1,e_2 \ldots$ over a finite alphabet, there exist indicies $i < j$ such that $e_i \unlhd e_j$. This result is known as Kruskal's Tree Theorem \cite{kruskal:tree}. We can use this result to ensure an algorithm over expressions performs a bounded number of iterations, by stopping at iteration $n$ once $\exists i \bullet 1 \leq i < n \wedge e_i \unlhd e_n$.

\subsection{Homeomorphic Embedding of Core Expressions}

\begin{figure}
\ind{Shell}\ind{dive}\ind{couple}
\begin{code}
data Shell alpha = Shell alpha [Shell alpha]

(<<|) :: Eq alpha => Shell alpha -> Shell alpha -> Bool
x <<| y = dive x y || couple x y

dive x (Shell _ ys) = any (x <<|) ys

couple (Shell x xs) (Shell y ys) =
    x == y && length xs == length ys && and (zipWith (<<|) xs ys)
\end{code}
\caption{Homeomorphic embedding relation.}
\label{figB:homeomorphic}
\end{figure}

Figure \ref{figB:homeomorphic} gives an implementation of homeomorphic embedding in Haskell, making use of the auxiliary functions |dive| and |couple| \cite{leuschel:homeomorphic}. The |dive| function checks if the first term is contained as a child of the second term, while the |couple| function checks if both terms have the same outer shell.

In order to perform homeomorphic embedding tests on expressions in our Core language, it is necessary to convert expressions to shells. To generate shells it is useful to have some sentinel value for expressions, we use the variable consisting of the empty string, which we represent as |bullet|. To convert an expression |x| to a |Shell|, we make the first field of |Shell| the expression |x| with all subexpressions replaced by |bullet|, and the second field a list of the shells of all the immediate subexpressions. Some examples:

\ignore\begin{code}
shell (EAny v)                 = Shell (EAny v) []
shell (EAny (map f xs))        = Shell (EAny (bullet bullet bullet)) [EAny map, EAny f, EAny xs]
shell (EAny (c xs))            = Shell (EAny (bullet bullet)) [EAny c, EAny xs]
shell (EAny (\v -> c xs))      = Shell (EAny (\v -> bullet)) [Shell (EAny (bullet bullet)) [EAny c, EAny xs]]
shell (EAny (let v = x in y))  = Shell (EAny (let v = bullet in bullet)) [EAny x, EAny y]
\end{code}

To ensure that the first field in a |Shell| is drawn from a finite alphabet, we can replace any locally bound variables with the empty string. For example, \ignore|shell (EAny v)| would become \ignore|Shell (EAny bullet) []|.

\subsection{Fast Homeomorphic Embedding}

To compute whether $x \unlhd y$, using the function in Figure \ref{figB:homeomorphic}, takes worse than polynomial time in the size of the expressions. Fortunately, there exists an algorithm \cite{stillman:computational_problems,stillman:homeomorphic} which takes $O(\text{size}(x) \cdot \text{size}(y) \cdot a)$, where $a$ is the maximum arity of any subexpression in $x$ or $y$.

The faster algorithm first constructs a $\text{size}(x) \times \text{size}(y)$ table, recording whether each pair of subexpressions within $x$ and $y$ satisfy the homeomorphic embedding. By computing the homeomorphic embedding in a bottom-up manner, making use of the table to cache pre-computed results, much duplicate computation can be eliminated. By first assigning each subexpression a uniquely identifying number, table access and modification are both $O(1)$ operations. The result is a polynomial algorithm.

We have implemented the polynomial algorithm in Haskell. Haskell is not well-suited to the use of mutable arrays, so we have instead used tree data structures to model the table. In practical experiments, the table-based algorithm seems to perform around three times faster than the function in Figure \ref{figB:homeomorphic}. Comparing the complexity classes, we may have expected a greater speed-up, but it appears that the worst-case behaviour of the simple algorithm occurs infrequently.
