%include paper.fmt
%format target = "\textsc{target}"
%format unit = "\textsc{unit}"
%format |- = "\mathbin{|\hspace{-0.65mm}\hbox{-\hspace{-0.4mm}-}}"
%format |+ = "\mathbin{|\hspace{-0.65mm}\hbox{\small{+}}}"
%format |* = "\mathbin{|\hspace{-0.65mm}*}"
%format ||* = "\mathbin{||\hspace{-0.65mm}*}"
%format ||+ = "\mathbin{||\hspace{-0.65mm}\hbox{\small{+}}}"
\hsdef{\begin{comment}
Char,Bool,Float,Data,Int
Traversable
UniplateOn,On,Bi
f,x,x',x_i,y,e,e',k,r
_D,_C,_T,_I
alpha,tau,beta
warnAssign
increase,incrOne,com_increase,syb_increase,syb_incrOne
Data.Generics.PlateDirect,Data.Generics.PlateTypeable,PlateData
\end{comment}}
\begin{comment}
\h{.*}\begin{code}
import Prelude hiding (mapM)
import GHC.Exts(build)
import Data.Maybe
import Data.Monoid
import Data.Typeable
import Data.Generics hiding (Unit)
import Control.Monad.State hiding (mapM)
import Data.Traversable(Traversable,mapM)
import Data.Foldable(Foldable)
instance Eq Expr
instance Eq (a -> b)
instance Eq (Str a)
instance Typeable Expr
instance Typeable Stmt
instance Traversable Str
instance Foldable Str
class Compos t where
compos :: (forall a. a -> m a) -> (forall a b. m (a -> b) -> m a -> m b) -> (forall a. t a -> m (t a)) -> t c -> m (t c)
composOp :: Compos t => (forall a. t a -> t a) -> t c -> t c
composOpM :: (Compos t, Monad m) => (forall a. t a -> m (t a)) -> t c -> m (t c)
composOpM_ :: (Compos t, Monad m) => (forall a. t a -> m ()) -> t c -> m ()
composOpMonoid :: (Compos t, Monoid m) => (forall a. t a -> m) -> t c -> m
composOpMPlus :: (Compos t, MonadPlus m) => (forall a. t a -> m b) -> t c -> m b
composOpFold :: Compos t => b -> (b -> b -> b) -> (forall a. t a -> b) -> t c -> b
\end{code}
\end{comment}
\chapter{Boilerplate Removal}
\label{chp:uniplate}
Generic traversals over recursive data structures are often referred to as \textit{boilerplate} code. This chapter describes the Uniplate library, which offers a way to abstract several common forms of boilerplate code. The Uniplate library only supports traversals having value-specific behaviour for one type, and does not operate on functional values contained within a data structure. \S\ref{secU:intro} gives an example problem, and our solution. \S\ref{secU:use_play} introduces the traversal combinators that we propose, along with short examples. \S\ref{secU:implement_play} discusses how these combinators are implemented in terms of a single primitive. \S\ref{secU:use_playex} extends this approach to multi-type traversals, and \S\ref{secU:implement_playex} covers the extended implementation. \S\ref{secU:performance} investigates some performance optimisations. \S\ref{secU:results} gives comparisons with other approaches, using examples such as the ``paradise'' benchmark. \S\ref{secU:related} presents related work.
\section{Introductory Example}
\label{secU:intro}
Take a simple example of a recursive data type:
\h{.!expr2}\begin{code}
data Expr = Add Expr Expr | Val Int
| Sub Expr Expr | Var String
| Mul Expr Expr | Neg Expr
| Div Expr Expr
\end{code}
The |Expr| type represents a small language for integer expressions, which permits free variables. Suppose we need to extract a list of all the variable occurrences in an expression:
\begin{onepage}
\begin{code}
variables :: Expr -> [String]
variables (Var x ) = [x]
variables (Val x ) = []
variables (Neg x ) = variables x
variables (Add x y ) = variables x ++ variables y
variables (Sub x y ) = variables x ++ variables y
variables (Mul x y ) = variables x ++ variables y
variables (Div x y ) = variables x ++ variables y
\end{code}
\end{onepage}
This definition has the following undesirable characteristics: (1) adding a new constructor would require an additional equation; (2) the code is repetitive, the last four right-hand sides are identical; (3) the code cannot be shared with other similar operations. This problem is referred to as the \textit{boilerplate} problem. Using the Uniplate library, the above example can be rewritten as:
\begin{code}
variables :: Expr -> [String]
variables x = [y | Var y <- universe x]
\end{code}
The type signature is optional, and would be inferred automatically if left absent. This example assumes a |Uniplate| instance for the |Expr| data type, given in \S\ref{secU:play_instances}. This example requires only Haskell 98. For more advanced examples we require multi-parameter type classes \cite{jones:mptc} -- but no functional dependencies, rank-2 types or generalised algebraic data types (GADTs).
The central idea is to exploit a common property of many traversals: they only require value-specific behaviour for a \textit{single uniform type}. Looking at the |variables| example, the only type of interest is |Expr|. In practical applications, this pattern is common\footnote{Most examples in boilerplate removal papers meet this restriction, even though the systems being discussed do not depend on it.}. By focusing only on uniform type traversals, we are able to exploit well-developed techniques in list processing.
\subsection{Contribution}
Ours is far from the first technique for `scrapping boilerplate'. The area has been researched extensively. But there are a number of distinctive features in our approach:
\begin{itemize}
\item We require \textit{no language extensions} for single-type traversals, and only multi-parameter type classes for multi-type traversals.
\item Our \textit{choice of operations} is new: we shun some traditionally provided operations, and provide some uncommon ones.
\item Our type classes can be defined independently \textit{or} on top of |Typeable| and |Data| \citep{lammel:syb}, making \textit{optional use of built-in compiler support}.
\item We make use of \textit{list-comprehensions} \citep{wadler:list_comprehensions} for succinct queries.
\item We \textit{compare the conciseness} of operations using our library, by counting lexemes, showing our approach leads to less boilerplate.
\item We \textit{compare the performance} of traversal mechanisms, something that has been neglected in previous work.
\end{itemize}
\section{Queries and Transformations}
\label{secU:use_play}
We define various traversals, using the |Expr| type defined in the introduction as an example throughout. We divide \textit{traversals} into two categories: queries and transformations. A \textit{query} is a function that takes a value, and extracts some information of a different type. A \textit{transformation} takes a value, and returns a modified version of the original value. All the traversals rely on the class |Uniplate|, an instance of which is assumed for |Expr|. The definition of this class and its instances are covered in \S\ref{secU:implement_play}.
For some of the definitions we will make use of the terminology |alpha|-parent. The |alpha|-parent of a value
\subsection{Children}
The first function in the Uniplate library serves as both a function, and a definition of terminology:
\begin{code}
children :: Uniplate alpha => alpha -> [alpha]
\end{code}
The function |children| takes a value |x|, and returns the substructures of |x| with type |alpha|, that are not contained by any value of type |alpha| apart from |x|. For example:
\begin{code}
children (Add (Neg (Var "x")) (Val 12)) = [Neg (Var "x"), Val 12]
\end{code}
Note that |Var "x"| is \textit{not} returned, as it is contained within |Neg (Var "x")|. The |children| function is occasionally useful, but is used more commonly as an auxiliary in the definition of other functions.
\subsection{Queries}
The Uniplate library provides the |universe| function to support queries.
\begin{code}
universe :: Uniplate alpha => alpha -> [alpha]
\end{code}
This function takes a data structure, and returns a list of \textit{all} structures of the same type found within it, including the root. For example:
\begin{code}
universe (Add (Neg (Var "x")) (Val 12)) =
[Add (Neg (Var "x")) (Val 12)
,Neg (Var "x")
,Var "x"
,Val 12]
\end{code}
One use of this mechanism for querying was given in the introduction. Using the |universe| function, queries can be expressed very concisely. Using a list-comprehension to process the results of |universe| is common.
\begin{example}
\label{exU:zerocount}
Consider the task of counting divisions by the literal 0.
\begin{code}
countDivZero :: Expr -> Int
countDivZero x = length [() | Div _ (Val 0) <- universe x]
\end{code}
Here we make essential use of a feature of list comprehensions: if a pattern does not match, then the item is skipped. In other syntactic constructs, failing to match a pattern results in a pattern-match error.
\end{example}
\subsection{Bottom-up Transformations}
Another common operation provided by many boilerplate removal systems \citep{lammel:syb,stratego,strafunski,ren:generic_recursion_toolbox} applies a given function to every subtree of the argument type. We define as standard a bottom-up transformation.
\begin{code}
transform :: Uniplate alpha => (alpha -> alpha) -> alpha -> alpha
\end{code}
The result of |transform f x| is |f x'| where |x'| is obtained by replacing each |alpha|-child |x_i| in |x| by |transform f x_i|.
\begin{example}
\label{exU:simplify}
Suppose we wish to remove the |Sub| constructor assuming the equivalence: |x - y == x + (- y)|. To apply this equivalence as a rewriting rule, at all possible places in an expression, we define:
\begin{code}
simplify x = transform f x
where f (Sub x y) = Add x (Neg y)
f x = x
\end{code}
This code can be read: apply the subtraction rule where you can, and where you cannot, do nothing. Adding more rules is easy. Take for example: \ignore|x + y = 2 * x where x == y|. Now we can add this new rule into our existing transformation:
\begin{code}
simplify x = transform f x
where f (Sub x y) = Add x (Neg y)
f (Add x y) | x == y = Mul (Val 2) x
f x = x
\end{code}
Each equation corresponds to the natural Haskell translation of the rule. The |transform| function manages all the required boilerplate.
\end{example}
\subsection{Top-Down Transformation}
The Scrap Your Boilerplate approach \cite{lammel:syb} (known as SYB) provides a top-down transformation named |everywhere'|. We describe this traversal, and our reasons for \textit{not} providing it, even though it could easily be defined. We instead provide |descend|, based on the |composOp| operator \cite{bringert:compos}.
The \ignore|everywhere' f| transformation applies |f| to a value, then recursively applies the transformation on all the children of the freshly generated value. Typically, the intention in a transformation is to apply |f| to \textit{every node exactly once}. Unfortunately, \ignore|everywhere' f| does not necessarily have this effect.
\begin{example}
Consider the following transformation:
\begin{code}
doubleNeg (Neg (Neg x)) = x
doubleNeg x = x
\end{code}
The intention is clear: remove all instances of double negation. When applied in a bottom-up manner, this is the result. But when applied top-down some nodes are missed. Consider the value |Neg (Neg (Neg (Neg (Val 1))))|; only the outermost double negation will be removed.
\end{example}
\begin{example}
Consider the following transformation:
\begin{code}
reciprocal (Div n m) = Mul n (Div (Val 1) m)
reciprocal x = x
\end{code}
This transformation removes arbitrary division, converting it to divisions where the numerator is always 1. If applied once to each subtree, this computation would terminate successfully. Unfortunately, top-down transformation treats the generated |Mul| as being transformed, but cannot tell that the generated |Div| is the result of a transformation, not a fragment of the original input. This leads to a non-termination error.
\end{example}
As these examples show, when defining top-down transformations using |everywhere'| it is easy to slip up. The problem is that the program cannot tell the difference between freshly created constructors, and values that come originally from the input.
So we do support top-down transformations, but require the programmer to make the transformation more explicit. We introduce the |descend| function, inspired by the Compos paper \cite{bringert:compos}.
\begin{code}
descend :: Uniplate alpha => (alpha -> alpha) -> alpha -> alpha
\end{code}
The result of |descend f x| is obtained by replacing each outermost |alpha|-child |x_i| in |x| by |f x_i|. Unlike |everywhere'|, there is \textit{no recursion} within |descend|.
\begin{comment}
\h{.expr2}\begin{code}
data Expr = Let String Expr Expr
| Add Expr Expr | Val Int
| Sub Expr Expr | Var String
| Mul Expr Expr | Neg Expr
| Div Expr Expr
\end{code}
\end{comment}
\begin{example}
Consider the addition of a constructor \ignore|Let String Expr Expr|. Now let us define a function |subst| to replace free variables with given expressions. In order to determine which variables are free, we need to ``remember'' variables that are bound as we descend\footnote{For simplicity, we ignore issues of hygienic substitution that may arise if substituted expressions themselves contain free variables.}. We can define |subst| using a |descend| transformation:
\h{.expr2}\begin{code}
subst :: [(String,Expr)] -> Expr -> Expr
subst rep x =
case x of
Let name bind x -> Let name (subst rep bind)
(subst (filter ((/= name) . fst) rep) x)
Var x -> fromMaybe (Var x) (lookup x rep)
_ -> descend (subst rep) x
\end{code}
The |Var| alternative may return an |Expr| from |rep|, but no additional transformation is performed on this value, since all transformation is made explicit. In the |Let| alternative we explicitly continue the |subst| transformation.
\end{example}
\subsection{Transformations to a Normal Form}
In addition to top-down and bottom-up transformations, we also provide transformations to a normal form. The idea is that a rule is applied exhaustively until a normal form is achieved. Consider a rewrite transformation:
\begin{code}
rewrite :: Uniplate alpha => (alpha -> Maybe alpha) -> alpha -> alpha
\end{code}
A rewrite-rule argument |r| takes an expression |e| of type |alpha|, and returns either |Nothing| to indicate that the rule is not applicable, or |Just e'| indicating that |e| is rewritten by |r| to |e'|. The intuition for |rewrite r| is that it applies |r| exhaustively; a postcondition for |rewrite| is that there must be no places where |r| could be applied. That is, the following property must hold:
\begin{code}
propRewrite r x = all (isNothing . r) (universe (rewrite r x))
\end{code}
One possible definition of the |rewrite| function uses |transform|:
\begin{code}
rewrite :: Uniplate alpha => (alpha -> Maybe alpha) -> alpha -> alpha
rewrite f = transform g
where g x = maybe x (rewrite f) (f x)
\end{code}
This definition tries to apply the rule everywhere in a bottom-up manner. If at any point it makes a change, then the new value has the rewrite applied to it. The function only terminates when a normal form is reached.
The |rewrite| function has two potential problems. The first issue is that different application strategies may given results. Consider the rule replacing |Neg (Neg x)| with |1|, applied to the value |Neg (Neg (Neg (Val 1)))|. Depending on the application strategy, the result will be either |Neg (Val 1)| or |Val 1|. The second issue is that the implementation of |rewrite| given may check unchanged sub-expressions repeatedly, causing a performance problem. Both these issues can be avoided by using an explicit transformation, and managing the rewriting manually.
\subsubsection{Bottom-Up Transformations to a Normal Form}
\label{secU:rewrite_bottom}
An alternative way of obtaining a transformation to a normal form is to use the |transform| function directly. What restrictions on |f| ensure that |transform f| is idempotent, and hence a normal form? It is sufficient that the constructors on the right-hand side of |f| do not overlap with the constructors on the left-hand side.
\begin{examplerevisit}{\ref{exU:simplify}}
Recall the |simplify| transformation:
\begin{code}
simplify = transform f
\end{code}
\begin{code}
f (Sub x y) = Add x (Neg y)
f (Add x y) | x == y = Mul (Val 2) x
f x = x
\end{code}
Here |Add| occurs on the right-hand side of the first line, and on the left-hand side of the second. From this we can construct a value where |transform f| is not idempotent:
\ignore\begin{code}
let x = Sub (Neg (Var "q")) (Var "q")
transform f x == Add (Neg (Var "q")) (Neg (Var "q"))
transform f (transform f x) == Mul (Val 2) (Neg (Var "q"))
\end{code}
To remedy this situation, whenever the right-hand side of a rule applies a constructor of type |Expr|, |f| can be reapplied:
\begin{code}
f (Sub x y) = f $ Add x (f $ Neg y)
f (Add x y) | x == y = f $ Mul (f $ Val 2) x
f x = x
\end{code}
Now we can guarantee that |transform f| is idempotent. In this particular example, we can inline the |f| applications attached to the constructors |Neg|, |Val| and |Mul| to give the more concise:
\begin{code}
f (Sub x y) = f $ Add x (Neg y)
f (Add x y) | x == y = Mul (Val 2) x
f x = x
\end{code}\codeexample
\end{examplerevisit}
\subsection{Action Transformations}
Rewrite transformations apply a set of rules \textit{repeatedly} until a normal form is found. One alternative is an action transformation, where each node is visited and transformed \textit{once}, and state is maintained and updated as the operation proceeds. The standard technique is to thread a monad through the operation, which we do using |transformM|, with a bottom-up transformation strategy.
\begin{example}
Suppose we wish to rename each variable occurrence to be unique:
\begin{code}
uniqueVars :: Expr -> Expr
uniqueVars x = evalState (transformM f x) vars
where
vars = ['x':show i | i <- [1..]]
f (Var i) = do y:ys <- get
put ys
return (Var y)
f x = return x
\end{code}
The function |transformM| is a monadic variant of |transform|. Here a \textit{state monad} is used to keep track of the list of names not yet used, with |evalState| computing the result of the monadic action, given an initial state |vars|.
\end{example}
\subsection{Paramorphisms}
A paramorphism is a fold in which the recursive step may refer to the recursive components of a value, not just the results of folding over them \cite{meertens:paramorphisms}. We define a similar recursion scheme in our library.
\begin{code}
para :: Uniplate alpha => (alpha -> [r] -> r) -> alpha -> r
\end{code}
The |para| function uses the functional argument to combine a value, and the results of |para| on its children, into a new result.
\begin{example}
Compiler writers might wish to compute the \textit{depth of expressions}:
\begin{code}
depth :: Expr -> Int
depth = para (\_ cs -> 1 + maximum (0:cs))
\end{code}\codeexample
\end{example}
\subsection{Holes and Contexts}
\label{secU:holes_contexts}
The final two operations in the library seem to be a novelty -- we have not seen them in any other generics library, even in those which attempt to include all variations \citep{ren:generic_recursion_toolbox}. These operations are similar to contextual pattern matching \citep{mohnen:context_patterns}, and have some connection to the zipper pattern \cite{huet:zipper}.
\begin{code}
holes, contexts :: Uniplate alpha => alpha -> [(alpha, alpha -> alpha)]
\end{code}
Given a value |y|, these functions both return lists of pairs |(x,f)| where |x| is a sub-expression of |y|, and |f| replaces the hole in |y| from which |x| was removed. In the case of |holes|, |x| will be a member of |children y|, and for |contexts|, |x| will be a member of |universe y|.
\begin{example}
Suppose that mutation testing requires all expressions obtained by incrementing or decrementing \textit{any single} literal in an original expression.
\begin{code}
mutants :: Expr -> [Expr]
mutants x = [c (Val j) | (Val i, c) <- contexts x, j <- [i-1, i+1]]
\end{code}
\end{example}
In general, these functions have the following properties:
\begin{code}
propChildren x = children x == map fst (holes x)
propId x = all (== x) [b a | (a,b) <- holes x]
\end{code}
\begin{code}
propUniverse x = universe x == map fst (contexts x)
propId x = all (== x) [b a | (a,b) <- contexts x]
\end{code}
\subsection{Summary}
\begin{figure}
\h{.*}\begin{code}
module Data.Generics.Uniplate where
children :: Uniplate alpha => alpha -> [alpha]
contexts :: Uniplate alpha => alpha -> [(alpha, alpha -> alpha)]
descend :: Uniplate alpha => (alpha -> alpha) -> alpha -> alpha
descendM :: (Uniplate alpha, Monad m) => (alpha -> m alpha) -> alpha -> m alpha
holes :: Uniplate alpha => alpha -> [(alpha, alpha -> alpha)]
para :: Uniplate alpha => (alpha -> [r] -> r) -> alpha -> r
rewrite :: Uniplate alpha => (alpha -> Maybe alpha) -> alpha -> alpha
rewriteM :: (Uniplate alpha, Monad m) => (alpha -> m (Maybe alpha)) -> alpha -> m alpha
transform :: Uniplate alpha => (alpha -> alpha) -> alpha -> alpha
transformM :: (Uniplate alpha, Monad m) => (alpha -> m alpha) -> alpha -> m alpha
universe :: Uniplate alpha => alpha -> [alpha]
\end{code}
\caption{All |Uniplate| methods.}
\label{figU:play}
\end{figure}
We present signatures for all our methods in Figure \ref{figU:play}, including several monadic variants. In our experience, the most commonly used operations are |universe| and |transform|, followed by |transformM| and |descend|.
\section{Implementing the |Uniplate| class}
\label{secU:implement_play}
\begin{figure}
\ind{Str}\ind{Zero}\ind{One}\ind{Two}\ind{strList}\ind{listStr}
\h{.*}\begin{code}
data Str alpha = Zero | One alpha | Two (Str alpha) (Str alpha)
instance Functor Str where
fmap f (Zero ) = Zero
fmap f (One x ) = One (f x)
fmap f (Two x y ) = Two (fmap f x) (fmap f y)
strList :: Str alpha -> [alpha]
strList (Zero ) = []
strList (One x ) = [x]
strList (Two x y ) = strList x ++ strList y
listStr :: [alpha] -> Str alpha
listStr [] = Zero
listStr [x] = One x
listStr (x:xs) = Two (One x) (listStr xs)
\end{code}
\caption{|Str| data type.}
\label{figU:str}
\end{figure}
Requiring each instance of the |Uniplate| class to implement eleven separate methods would be an undue imposition. Instead, given a type specific instance for a \textit{single} auxiliary method with a pair as result, we can define \textit{all eleven} operations generically, at the class level. The auxiliary method is defined as:
\ignore\begin{code}
uniplate :: Uniplate alpha => alpha -> (Str alpha, Str alpha -> alpha)
uniplate x = (cs,context)
\end{code}
The original Uniplate paper \cite{me:uniplate} used lists of items, but we instead use the |Str| data type defined in Figure \ref{figU:str} to collect items. The use of |Str| simplifies the definition of instances and improves performance. The value |cs| contains the same elements as |children x|; the |context| is a function to generate a new value, with a different set of children. The caller of |context| must ensure that the value given to |context| has precisely the same structure of |Str| constructors as |cs|. The result pair splits the information in the value, but by combining the |context| with |cs| the original value can be recovered:
\begin{code}
propId x = x == context cs
where (cs,context) = uniplate x
\end{code}
\subsection{Operations in terms of |uniplate|}
\label{secU:using_replacechildren}
\begin{figure}
\ind{children}\ind{universe}\ind{descend}\ind{transform}\ind{rewrite}\ind{descendM}\ind{transformM}\ind{rewriteM}\ind{para}\ind{holes}\ind{contexts}
\begin{code}
children :: Uniplate alpha => alpha -> [alpha]
children = strList . fst . uniplate
universe :: Uniplate alpha => alpha -> [alpha]
universe x = x : concatMap universe (children x)
descend :: Uniplate alpha => (alpha -> alpha) -> alpha -> alpha
descend f x = context $ fmap f children
where (children, context) = uniplate x
transform :: Uniplate alpha => (alpha -> alpha) -> alpha -> alpha
transform f = f . descend (transform f)
rewrite :: Uniplate alpha => (alpha -> Maybe alpha) -> alpha -> alpha
rewrite f = transform g
where g x = maybe x (rewrite f) (f x)
descendM :: (Monad m, Uniplate alpha) => (alpha -> m alpha) -> alpha -> m alpha
descendM f x = liftM context $ mapM f children
where (children, context) = uniplate x
transformM :: (Monad m, Uniplate alpha) => (alpha -> m alpha) -> alpha -> m alpha
transformM f x = f =<< descendM (transformM f) x
rewriteM :: (Monad m, Uniplate alpha) => (alpha -> m (Maybe alpha)) -> alpha -> m alpha
rewriteM f = transformM g
where g x = f x >>= maybe (return x) (rewriteM f)
para :: Uniplate alpha => (alpha -> [r] -> r) -> alpha -> r
para op x = op x $ map (para op) $ children x
holes :: Uniplate alpha => alpha -> [(alpha, alpha -> alpha)]
holes = f . uniplate
where f (Zero , g) = []
f (One x , g) = [(x, g . One)]
f (Two l r , g) = f (l, g . flip Two r) ++ f (r, g . Two l)
contexts :: Uniplate alpha => alpha -> [(alpha, alpha -> alpha)]
contexts x = (x,id) : [ (x_2, g_1 . g_2)
| (x_1, g_1) <- holes x, (x_2, g_2) <- contexts x_1]
\end{code}
\caption{Implementation of all |Uniplate| methods.}
\label{figU:implements}
\end{figure}
All eleven operations from \S\ref{secU:use_play} can be defined in terms of |uniplate|. We define all eleven operations in Figure \ref{figU:implements}. The common pattern is to call |uniplate|, then operate on the current children, often calling |context| to create a modified value. Some of these definitions can be made more efficient -- see \S\ref{secU:optimise_everything}.
\subsection{Writing |Uniplate| instances}
\label{secU:play_instances}
\begin{figure}
\ind{uniplate}\ind{Uniplate}
\h{.*}\begin{code}
class Uniplate alpha where
uniplate :: alpha -> (Str alpha, Str alpha -> alpha)
\end{code}
\h{.default .expr2}\begin{code}
instance Uniplate Expr where
uniplate (Neg a ) = (One a, \(One a') -> Neg a')
uniplate (Add a b ) = (Two (One a) (One b)
,\(Two (One a') (One b')) -> Add a' b' )
uniplate (Sub a b ) = (Two (One a) (One b)
,\(Two (One a') (One b')) -> Sub a' b' )
uniplate (Mul a b ) = (Two (One a) (One b)
,\(Two (One a') (One b')) -> Mul a' b' )
uniplate (Div a b ) = (Two (One a) (One b)
,\(Two (One a') (One b')) -> Div a' b' )
uniplate x = (Zero, \Zero -> x)
\end{code}
\caption{The |Uniplate| class and an instance for |Expr|.}
\label{figU:play_expr}
\end{figure}
We define a |Uniplate| instance for the |Expr| type in Figure \ref{figU:play_expr}.
The distinguishing feature of our library is that the children are defined in terms of their type. While this feature keeps the traversals simple, it does mean that rules for \textit{deriving} instance definitions are not purely syntactic, but depend on the types of the constructors. We now describe the derivation rules, followed by information on the Derive tool that performs this task automatically. (If we are willing to make use of Multi-Parameter Type Classes, simpler derivation rules can be used: see \S\ref{secU:implement_playex}.)
\subsection{Derivation Rules}
\begin{figure}
\ignore\begin{code}
_I $| d t_1...t_n |$ =
instance Uniplate (d t_1...t_n) where
uniplate = _N $| d |$ \? _T $| t_1 |$ \? ... \? _T $| t_n |$
_D $| data d v_1...v_n = a_1...a_m |$ =
_N $| d |$ \? v_1...v_n x = case x of _C $| a_1 |$ \? ... \? _C $| a_m |$
where x {-" \mbox{ is fresh} "-}
_C $| c t_1...t_n |$ =
c y_1...y_n -> (Zero `Two` a_1 `Two` ... `Two` a_n
,\(Zero `Two` z_1 `Two` ... `Two` z_n) -> c (b_1 z_1) ... (b_n z_n))
where y_1...y_n {-" \mbox{ and } "-} z_1...z_n {-" \mbox{ are fresh} "-}
(a_i,b_i) = _T $| t_i |$ \? y_i
_T $| TargetType |$ = \x -> (One x, \(One x') -> x')
_T $| PrimitiveType |$ = \x -> (Zero, \Zero -> x)
_T $| d t_1...t_n |$ = _N $| d |$ \? _T $| t_1 |$ \? ... \? _T $| t_n |$
_T $| v |$ = v
_N {-" \mbox{ is an injection to fresh variables} "-}
\end{code}
\caption{Derivation rules for Uniplate instances.}
\label{figU:derive}
\end{figure}
\begin{figure}
\ignore\begin{code}
_I $| Expr |$ =
instance Uniplate Expr where
uniplate = _N $| Expr |$
_N $| Expr |$ \? x = case x of
Val y_1 -> (Zero `Two` a_1, \(Zero `Two` z_1) -> Val (b_1 z_1)
where (a_1,b_1) = (\x -> (Zero, \Zero -> x)) y_1
Var y_1 -> (Zero `Two` a_1, \(Zero `Two` z_1) -> Var (b_1 z_1)
where (a_1,b_1) = _N $| List |$ \? y_1
Neg y_1 -> (Zero `Two` a_1, \(Zero `Two` z_1) -> Neg (b_1 z_1))
where (a_1,b_1) = (\x -> (One x, \(One x') -> x')) y_1
Add y_1 y_2 -> (Zero `Two` a_1 `Two` a_2
,\(Zero `Two` z_1 `Two` z_2) -> Neg (b_1 z_1) (b_2 z_2))
where (a_1,b_1) = (\x -> (One x, \(One x') -> x')) y_1
(a_2,b_2) = (\x -> (One x, \(One x') -> x')) y_2
-- other constructors following the same pattern as |Add| ...
_N $| List |$ \? v_1 x = case x of
[] -> (Zero, \(Zero) -> [])
(:) y_1 y_2 -> (Zero `Two` a_1 `Two` a_2
,\(Zero `Two` z_1 `Two` z_2) -> (:) (b_1 z_1) (b_2 z_2))
where (a_1,b_1) = v_1 y_1
(a_2,b_2) = _N $| List |$ \? v_1 y_2
\end{code}
\caption{The result of applying |_D| to |Expr|.}
\label{figU:d_expr}
\end{figure}
We model the derivation of an instance by describing a derivation from a data type to a set of declarations. The derivation rules are given in Figure \ref{figU:derive}. The |_I| rule takes a concrete type, with no type variables, and generates an instance for the Uniplate class. The |_D| rule takes a data type declaration, and defines a function over that data type. The |_C| rule defines a case alternative for each constructor. The |_T| rule defines type specific behaviour: a type is either the target type on which an instance is being defined, or a primitive such as |Char|, or an algebraic data type, or a free type variable.
The result of applying these functions to |Expr| is given in Figure \ref{figU:d_expr}. By applying simple transformation steps we can obtain the same instance as presented in Figure \ref{figU:play_expr}.
\subsection{Automated Derivation of |uniplate|}
\label{secU:derive}
Applying these derivation rules is a form of boilerplate coding! The DrIFT tool \citep{drift} derives instances automatically given rules depending only on the information contained in a type definition. However DrIFT is unable to operate with certain Haskell extensions (eg. {\TeX} style literate Haskell), and requires a separate pre-processing stage.
In collaboration with Stefan O'Rear we have developed the Derive tool \citep{derive}, which is is based on Template Haskell \citep{template_haskell}. The Derive tool generates Uniplate instances by applying the rules from Figure \ref{figU:derive}, along with some simplification steps, replacing values such as |Two x Zero| with |x|.
\renewcommand{\f}{\hspace{-1mm}}
\begin{example}
\begin{code}
data Term = Name String
| Apply Term Term
deriving ( {-" \text{\{\hspace{0mm}-! } \textsf{Uniplate} \text{ !-\}} "-} )
\end{code}
Running the Derive tool over this file, the generated code is:
\begin{code}
instance Uniplate Term where
uniplate (Apply x_1 x_2) = (Two (One x_1) (One x_2)
,\(Two (One x_1) (One x_2)) -> Apply x_1 x_2)
uniplate x = (Zero, \_ -> x)
\end{code}
\end{example}
\section{Multi-type Traversals}
\label{secU:use_playex}
We have introduced the |Uniplate| class and an instance of it for type |Expr|. Now let us imagine that |Expr| is merely the expression type in a language with statements:
\begin{onepage}
\h{.*}\begin{code}
data Stmt = Assign String Expr
| Sequence [Stmt]
| If Expr Stmt Stmt
| While Expr Stmt
\end{code}
\end{onepage}
We could define a |Uniplate| instance for |Stmt|, and so perform traversals upon statements too. However, we may run into limitations. Consider the task of finding all literals in a |Stmt| -- this requires boilerplate to find not just inner statements of type |Stmt|, but inner expressions of type |Expr|.
The |Uniplate| class takes a value of type |alpha|, and operates on its substructures of type |alpha|. What we now require is something that takes a value of type |beta|, but operates on the children of type |alpha| within it -- we call this class |Biplate|. Typically the type |beta| will be a container of |alpha|. We can extend our operations by specifying how to find the |alpha|'s within the |beta|'s, and then perform the standard |Uniplate| operations upon the |alpha| type. In the above example, \ignore|alpha = Expr|, and \ignore|beta = Stmt|.
We first introduce |UniplateOn|, which requires an explicit function to find the occurrences of type |alpha| within type |beta|. We then make use of Multi-parameter type classes (MPTC's) to generalise this function into a type class, named |Biplate|.
\subsection{The |UniplateOn| Operations}
We define operations, including |universeOn| and |transformOn|, which take an extra argument relative to the standard |Uniplate| operators. We call this extra argument |biplate|: it is a function from the containing type (|beta|) to the contained type (|alpha|).
\ind{BiplateType}
\h{.*}\begin{code}
type BiplateType beta alpha = beta -> (Str alpha, Str alpha -> beta)
biplate :: BiplateType beta alpha
\end{code}
The intuition for |biplate| is that given a structure of type |beta|, the function should return the largest substructures in it of type |alpha|. Unlike |uniplate|, if \ignore|alpha == beta| the original value should be returned:
\ind{biplateSelf}
\begin{code}
biplateSelf :: BiplateType alpha alpha
biplateSelf x = (One x, \(One x') -> x')
\end{code}
Unlike |Uniplate|, |Biplate| does \textit{not} always descend. The idea is that |Biplate| is used once at the beginning of a traversal to find the values of type |alpha|, then |Uniplate| operates recursively descending through the data structure.
We can now define a selection of the |On| functions -- the remaining functions from Figure \ref{figU:implements} can be implemented in a similar way. Each takes a |biplate| function as an argument:
\ind{childrenOn}\ind{universeOn}\ind{descendOn}\ind{transformOn}
\begin{code}
childrenOn :: Uniplate alpha => BiplateType beta alpha -> beta -> [alpha]
childrenOn biplate x = concatMap children $ strList $ fst $ biplate x
universeOn :: Uniplate alpha => BiplateType beta alpha -> beta -> [alpha]
universeOn biplate x = concatMap universe $ strList $ fst $ biplate x
descendOn :: Uniplate alpha => BiplateType beta alpha -> (alpha -> alpha) -> beta -> beta
descendOn biplate f x = context $ fmap (descend f) cs
where (cs, context) = biplate x
transformOn :: Uniplate alpha => BiplateType beta alpha -> (alpha -> alpha) -> beta -> beta
transformOn biplate f x = context $ fmap (transform f) cs
where (cs, context) = biplate x
\end{code}
These operations are similar to the original functions. They unwrap |beta| values to find the |alpha| values within them, operate using the standard |Uniplate| operations for type |alpha|, then rewrap if necessary. If |alpha| is constant, there is another way to abstract away the |biplate| argument, as the following example shows.
\begin{example}
The Yhc.Core library \citep{me:yhc_core}, part of the York Haskell Compiler (Yhc), makes extensive use of |Uniplate|. In this library, the central types include:
\begin{comment}
\begin{code}
data CoreData = CoreData
instance Uniplate CoreExpr
\end{code}
\end{comment}
\begin{code}
data Core = Core String [String] [CoreData] [CoreFunc]
data CoreFunc = CoreFunc String String CoreExpr
data CoreExpr = CoreVar String
| CoreApp CoreExpr [CoreExpr]
| CoreCase CoreExpr [(CoreExpr, CoreExpr)]
| CoreLet [(String, CoreExpr)] CoreExpr
-- other constructors
\end{code}
Most traversals are performed on the |CoreExpr| type. However, it is often convenient to start from one of the other types. For example, \h{stmt}|coreSimplify :: CoreExpr -> CoreExpr| may be applied not just to an individual expression, but to all expressions in a function definition, or a complete program. If we are willing to freeze the type of the second argument to |biplate| as |CoreExpr| we can write a class \ignore|UniplateExpr beta| corresponding to \ignore|Biplate beta CoreExpr |:
\begin{code}
class UniplateExpr beta where
uniplateExpr :: BiplateType beta CoreExpr
\end{code}
We then need to write instances for the types we are interested in, mainly those which are likely to contain a |CoreExpr| within them. These instances must follow the same rules as for the |Biplate| class.
\begin{code}
instance UniplateExpr Core where
uniplateExpr (Core a b c d) = (cs, \cs -> Core a b c (gen cs))
where (cs,gen) = uniplateExpr d
instance UniplateExpr CoreFunc where
uniplateExpr (CoreFunc a b c) = (cs, \cs -> CoreFunc a b (gen cs))
where (cs,gen) = uniplateExpr c
instance UniplateExpr CoreExpr where
uniplateExpr = biplateSelf
instance UniplateExpr a => UniplateExpr [a] where
uniplateExpr [] = (Zero, \Zero -> [])
uniplateExpr (x:xs) = (Two a as, \(Two n ns) -> b n : bs ns)
where (a , b ) = uniplateExpr x
(as, bs) = uniplateExpr xs
\end{code}
We can then implement traversal functions specific to |CoreExpr| in terms of the |On| functions:
\begin{code}
childrenExpr x = childrenOn uniplateExpr x
universeExpr x = universeOn uniplateExpr x
descendExpr x = descendOn uniplateExpr x
transformExpr x = transformOn uniplateExpr x
-- Similarly for all 11 functions in Figure \ref{figU:play}
\end{code}\codeexample
\end{example}
\bigskip
This technique has been used in the Yhc compiler. The Yhc compiler is written in Haskell 98 to allow for bootstrapping, so only the standard single-parameter type classes are available.
\subsection{The |Biplate| class}
If we are willing to make use of \textit{multi-parameter type classes} \cite{jones:mptc} we can define a class |Biplate| with |biplate| as its sole method. We do not require functional dependencies.
\ind{Biplate}\ind{biplate}
\h{.*}\begin{code}
class Uniplate alpha => Biplate beta alpha where
biplate :: BiplateType beta alpha
\end{code}
We can now implement |universeBi| and |transformBi| in terms of their |On| counterparts:
\ind{universeBi}\ind{transformBi}\ind{descendBi}
\begin{code}
universeBi :: Biplate beta alpha => beta -> [alpha]
universeBi = universeOn biplate
transformBi :: Biplate beta alpha => (alpha -> alpha) -> beta -> beta
transformBi = transformOn biplate
\end{code}
In general the move to |Biplate| requires few code changes, merely the use of the new set of |Bi| functions. To illustrate we give generalisations of two examples from previous sections, implemented using |Biplate|. We extend the |variables| and |simplify| functions to work on |Expr|, |Stmt| or many other types.
\begin{exampleany}{from \S1 (revisited)}
\begin{code}
variables :: Biplate beta Expr => beta -> [String]
variables x = [y | Var y <- universeBi x]
\end{code}
The equation requires only one change: the addition of the |Bi| suffix to |universe|. In the type signature we replace |Expr| with \ignore|Biplate beta Expr => beta|. Instead of requiring the input to be an |Expr|, we merely require that from the input we know how to reach an |Expr|.
\end{exampleany}
\begin{examplerevisit}{\ref{exU:simplify}}
\begin{code}
simplify :: Biplate beta Expr => beta -> beta
simplify x = transformBi f x
where f (Sub x y) = Add x (Neg y)
f x = x
\end{code}
In this redefinition we have again made a single change to the equation: the addition of |Bi| at the end of |transform|.
\end{examplerevisit}
\section{Implementing |Biplate|}
\label{secU:implement_playex}
The complicating feature of |biplate| is that when defining |Biplate| where \ignore|alpha == beta| the function does not descend to the children, but simply returns its argument -- requiring a check for type equality between values. This check can be captured either using the type system, or using the |Typeable| class \cite{lammel:syb}. We present three methods for defining a |Biplate| instance -- offering a trade-off between performance, compatibility and volume of code.
\begin{enumerate}
\item Direct definition requires $O(n^2)$ instances, but offers the highest performance with the fewest extensions. See \S\ref{secU:implement_playdirect}.
\item The |Typeable| class can be used, requiring $O(n)$ instances and no further Haskell extensions, but giving worse performance. See \S\ref{secU:implement_playtypeable}.
\item The |Data| class can be used, providing fully automatic instances with GHC, but requiring the use of rank-2 types, and giving the worst performance. See \S\ref{secU:implement_playdata}.
\end{enumerate}
All three methods can be fully automated using the Derive tool, and all provide a simplified method for writing |Uniplate| instances. The |Biplate| class definition itself is independent of the method used to implement its instances. This abstraction allows the user to start with the simplest instance scheme available to them, then move to alternative schemes to gain increased performance or compatibility.
\subsection{Direct instances}
\label{secU:implement_playdirect}
\begin{figure}
\ind{Type}\ind{PlateDirect}\ind{plate}
\h{.direct}\begin{code}
module Data.Generics.PlateDirect where
type Type beta alpha = (Str alpha, Str alpha -> beta)
plate :: beta -> Type beta alpha
plate f = (Zero, \_ -> f)
(|*) :: Type (alpha -> beta) alpha -> alpha -> Type beta alpha
(|*) (xs,x') y = (Two xs (One y),\(Two xs (One y)) -> x' xs y)
(|+) :: Biplate tau alpha => Type (tau -> beta) alpha -> tau -> Type beta alpha
(|+) (xs,x') y = (Two xs ys, \(Two xs ys) -> x' xs (y' ys))
where (ys,y') = biplate y
(|-) :: Type (tau -> beta) alpha -> tau -> Type beta alpha
(|-) (xs,x') y = (xs,\xs -> x' xs y)
(||*) :: Type ([alpha] -> beta) alpha -> [alpha] -> Type beta alpha
(||*) (xs,x') y = (Two xs (listStr y), \(Two xs ys) -> x' xs (strList ys))
(||+) :: Biplate tau alpha => Type ([tau] -> beta) alpha -> [tau] -> Type beta alpha
(||+) (xs,x') y = (Two xs ys, \(Two xs ys) -> x' xs (y' ys))
where (ys,y') = plateListDiff y
plateListDiff [] = plate []
plateListDiff (x:xs) = plate (:) |+ x ||+ xs
\end{code}
\caption{Implementation of PlateDirect.}
\label{figU:platedirect}
\end{figure}
Writing direct instances requires the |Data.Generics.PlateDirect| module to be imported, whose implementation is given in Figure \ref{figU:platedirect}. This style requires a maximum of $n^2$ instance definitions, where $n$ is the number of concrete types which contain each other, but gives the highest performance and most type-safety. The instances still depend on the type of each field, but are easier to define than the |Uniplate| instance discussed in \S\ref{secU:play_instances}. Here is a possible instance for the |Expr| type:
\h{.direct}\begin{code}
instance Uniplate Expr where
uniplate (Neg a ) = plate Neg |* a
uniplate (Add a b ) = plate Add |* a |* b
uniplate (Sub a b ) = plate Sub |* a |* b
uniplate (Mul a b ) = plate Mul |* a |* b
uniplate (Div a b ) = plate Div |* a |* b
uniplate x = plate x
\end{code}
Five infix combinators (| ||*|, | ||+|, | ||-|, | ||||*| and | ||||+|) indicate the structure of the field to the right. The | ||*| combinator says that the value on the right is of the target type, | ||+| says that a value of the target type \textit{may occur} in the right operand, | ||-| says that values of the target type \textit{cannot occur} in the right operand. | ||||*| and | ||||+| are versions of | ||*| and | ||+| used when the value to the right is a \textit{list} either of the target type, or of a type that may contain target values. The law \h{.direct}|plate f ||- x == plate (f x)| justifies the definition presented above.
This style of definition naturally expands to the multi-type traversal. For example:
\begin{onepage}
\h{.direct}\begin{code}
instance Biplate Stmt Expr where
biplate (Assign a b ) = plate Assign |- a |* b
biplate (Sequence a ) = plate Sequence ||+ a
biplate (If a b c ) = plate If |* a |+ b |+ c
biplate (While a b ) = plate While |* a |+ b
\end{code}
\end{onepage}
The information provided by uses of | ||-| and | ||+| avoids redundant exploration down branches that do not have the target type. The use of | ||||*| and | ||||+| avoid the definition of additional instances. The combinators are implemented in Figure \ref{figU:platedirect}, and work by building up the instance from left to right, adding successive fields.
Instances are given only on concrete types, containing no type variables. In the worst case, this approach requires a |Biplate| instance for each container/contained pair. In reality few traversal pairs are actually needed. The restricted pairing of types in |Biplate| instances also gives increased type safety; instances such as \ignore|Biplate Expr Stmt| do not exist.
In our experience definitions using these combinators offer similar performance to hand-tuned instances; see \S\ref{secU:results_speed} for measurements.
\subsection{|Typeable| based instances}
\label{secU:implement_playtypeable}
\begin{figure}
\ind{PlateTypeable}\ind{plateAll}\ind{PlateAll}\ind{Type}\ind{plate}\ind{plateSome}
\h{.typeable .data}\begin{code}
module Data.Generics.PlateTypeable where
class PlateAll beta alpha where
plateAll :: beta -> Type beta alpha
type Type beta alpha = (Str alpha, Str alpha -> beta)
plate :: beta -> Type beta alpha
plate x = (Zero, \_ -> x)
(|+) :: (Typeable tau, Typeable alpha, PlateAll tau alpha)
=> Type (tau -> beta) alpha -> tau -> Type beta alpha
(|+) (xs,x') y = (Two xs ys,\(Two xs ys) -> x' xs (y' ys))
where (ys,y') = plateSome y
instance (Typeable alpha, Typeable beta, Uniplate alpha, PlateAll beta alpha) =>
Biplate beta alpha where
biplate = plateSome
plateSome :: (Typeable beta, Typeable alpha, PlateAll beta alpha) => beta -> Type beta alpha
plateSome x = res
where
res = case asTypeOf (cast x) (Just $ head $ strList $ fst res) of
Nothing -> plateAll x
Just y -> (One y, \(One y) -> fromJust $ cast y)
\end{code}
\caption{Implementation of PlateTypeable.}
\label{figU:platetypeable}
\end{figure}
Instead of writing $O(n^2)$ class instances to locate values of the target type, we can use the |Typeable| class to \textit{test at runtime} whether we have reached the target type. This strategy is implemented in the |Data.Generics.PlateTypeable| module, whose implementation is given in Figure \ref{figU:platetypeable}. The user is required to define an instance of the auxiliary class |PlateAll|, separating the fields of a constructor with | ||+|:
\begin{comment}
\h{.typeable}\begin{code}
instance PlateAll Char a
\end{code}
\end{comment}
\h{.typeable}\begin{code}
instance (Typeable alpha, Uniplate alpha) => PlateAll Expr alpha where
plateAll (Neg a ) = plate Neg |+ a
plateAll (Add a b ) = plate Add |+ a |+ b
plateAll (Sub a b ) = plate Sub |+ a |+ b
plateAll (Mul a b ) = plate Mul |+ a |+ b
plateAll (Div a b ) = plate Div |+ a |+ b
plateAll x = plate x
instance (Typeable alpha, Uniplate alpha) => PlateAll Stmt alpha where
plateAll (Assign a b ) = plate Assign |+ a |+ b
plateAll (Sequence a ) = plate Sequence |+ a
plateAll (If a b c ) = plate If |+ a |+ b |+ c
plateAll (While a b ) = plate While |+ a |+ b
\end{code}
To give an instance of \h{.typeable ctxt}|PlateAll beta alpha|, for a concrete type |beta|, we require the context \h{ctxt}|Typeable alpha| and \h{ctxt}|Uniplate alpha|. To give an instance where |beta| has type variables of kind |*|, we require the context \h{ctxt}|Typeable tau| and \h{.typeable ctxt}|PlateAll tau alpha| for all the type variables |tau| in |beta|. We do not permit |beta| to have any higher-kinded type variables. To give an example instance with type variables, the instance for lists is:
\h{.typeable}\begin{code}
instance (PlateAll tau alpha, Typeable tau, Typeable alpha, Uniplate alpha) =>
PlateAll [tau] alpha where
plateAll [] = plate []
plateAll (x:xs) = plate (:) |+ x |+ xs
\end{code}
From a |PlateAll| instance, a |Biplate| is inferred, using the code from Figure \ref{figU:platetypeable}. The function |plateAll| always descends at least one level, then looks for values of the target type, corresponding to the functionality of |uniplate|. The function |plateSome| looks for the first values of the target type, corresponding to |biplate|. Unfortunately, we cannot infer an instance for |Uniplate| automatically, and must explicitly declare:
\h{.typeable}\begin{code}
instance Uniplate Expr where
uniplate = plateAll
instance Uniplate Stmt where
uniplate = plateAll
\end{code}
The reader may wonder why we cannot define:
\ignore\begin{code}
instance PlateAll alpha alpha => Uniplate alpha where
uniplate = plateAll
\end{code}
Consider the |Expr| type. To infer \h{ctxt}|Uniplate Expr| we require an instance for \h{.typeable ctxt}|PlateAll Expr Expr|. But to infer this instance we require \h{ctxt}|Uniplate Expr| -- which we are in the process of inferring! \footnote{GHC has co-inductive or recursive dictionaries, but Hugs does not. To allow continuing compatibility with Hugs, and the use of fewer extensions, we require the user to write these explicitly for each type.}
\subsection{Using the |Data| class}
\label{secU:implement_playdata}
The existing |Data| and |Typeable| instances provided by the SYB approach can also be used to define |Uniplate| instances:
\ignore\begin{code}
import Data.Generics
import Data.Generics.PlateData
data Expr = ... \? \? deriving (Typeable, Data)
data Stmt = ... \? \? deriving (Typeable, Data)
\end{code}
The disadvantages of this approach are (1) \textit{lack of type safety} -- there are now |Biplate| instances for many pairs of types where one is not a container of the other; (2) \textit{compiler dependence} -- it will only work where |Data.Generics| is supported, namely GHC at the time of writing.\footnote{Hugs supports the required rank-2 types for |Data.Generics|, but the work to port the library has not been done yet.} The clear advantage is that there is almost no work required to create instances.
\begin{figure}
\ind{PlateData}
\ignore\begin{code}
module Data.Generics.PlateData where
import Data.Generics.PlateTypeable
instance (Typeable alpha, Data alpha, Typeable beta, Data beta) => PlateAll beta alpha where
plateAll x = (children, context)
where
children = listStr $ concat $ gmapQ (strList . fst . plateSome) x
context xs = evalState (gmapM f x) $ strList xs
f y = do let (cs,con) = plateSome y
(as,bs) <- liftM (splitAt $ length $ strList cs) get
put bs
return $ con $ listStr as
instance (Typeable alpha, Data alpha) => Uniplate alpha where
uniplate = plateAll
\end{code}
\caption{Implementation of PlateData.}
\label{figU:platedata}
\end{figure}
How do we implement the Uniplate class instances? The implementation is given in Figure \ref{figU:platedata}, making use of the class |PlateAll| from PlateTypeable in Figure \ref{figU:platetypeable}. The operation to get the children can be done using |gmapQ|. The operation to replace the children is more complex, requiring a state monad to keep track of the items to insert.
The code in Figure \ref{figU:platedata} is not optimised for speed. Uses of |splitAt| and |length| require the list of children to be traversed multiple times. Transforming between \ignore|Str alpha| and |[alpha]| is also inefficient. We discuss improvements in \S\ref{secU:optimise_playdata}.
\section{Performance Improvements}
\label{secU:performance}
This section describes some of the performance improvements we have been able to make. First we focus on our optimisation of |universe|, using |foldr|/|build| fusion properties \cite{spj:rules}. Next we turn to our |Data| class based instances, improving them enough to outperform SYB itself.
\subsection{Optimising the |universe| function}
\label{secU:optimise_everything}
Our initial |universe| implementation was presented in \S\ref{secU:using_replacechildren} as:
\begin{code}
universe :: Uniplate on => on -> [on]
universe x = x : concatMap universe (children x)
\end{code}
A disadvantage is that |concatMap| produces and consumes a list at every level in the data structure. We can fix this by calling the |uniplate| method directly, and building the list with a tail:
\begin{onepage}
\begin{code}
universe :: Uniplate on => on -> [on]
universe x = f (One x) []
where f (Zero ) res = res
f (One x ) res = x : f (fst $ uniplate x) res
f (Two x y ) res = f x (f y res)
\end{code}
\end{onepage}
Now we only perform one reconstruction. We can do even better using GHC's list fusion \cite{spj:rules}. The user of |universe| is often a list comprehension, which is a \textit{good consumer}. We can make |f| a \textit{good producer}:
\begin{code}
universe :: Uniplate on => on -> [on]
universe x = build f
where f cons nil = g cons nil (One x) nil
g cons nil (Zero ) res = res
g cons nil (One x ) res = x `cons` g cons nil (fst $ uniplate x) res
g cons nil (Two x y ) res = g cons nil x (g cons nil y res)
\end{code}
\subsection{Optimising |PlateData|}
\label{secU:optimise_playdata}
Surprisingly, it is possible to layer |Uniplate| over the |Data| instances of SYB, with better performance than SYB itself. The first optimisation is to generate the two members of the |uniplate| pair with only one pass over the data value. We cannot use SYB's |gmapM| or |gmapQ| -- we must instead use |gfoldl| directly. With this first improvement in place we perform much the same operations as SYB. But the overhead of structure creation in |uniplate| makes traversals about 10\% slower than SYB.
The next optimisation relies on the extra information present in the |Uniplate| operations -- namely the target type. A boilerplate operation walks over a data structure, looking for target values to process. In SYB, the target values may be of \textit{any} type. For Uniplate the target is a \textit{single uniform} type. If a value is reached which is not a container for the target type, no further exploration is required of the values children. Computing which types are containers for the target type can be done relatively easily in the SYB framework \citep{lammel:syb2}:
\begin{onepage}
\ind{TypeBox}\ind{contains}
\begin{code}
data TypeBox = forall alpha . (Typeable alpha, Data alpha) => TypeBox alpha
contains :: TypeBox -> [TypeBox]
contains (TypeBox x) = if isAlgType dtyp then concatMap f ctrs else []
where
f c = gmapQ TypeBox (asTypeOf (fromConstr c) x)
ctrs = dataTypeConstrs dtyp
dtyp = dataTypeOf x
\end{code}
\end{onepage}
A |TypeBox| can be thought of as storing a \textit{type}, rather than a \textit{value}. The |TypeBox| constructor stores the |Data| and |Typeable| instance information for a particular type, along with the value |undefined| of that type. The \ignore|contains| function takes a |TypeBox| and returns a |TypeBox| representing the type of each field for each possible constructor. For example, the type \ignore|[Int]| has two constructors, |[]| which has no fields, and |(:)| which has fields of type |Int| and \ignore|[Int]|:
\ignore\begin{code}
contains (undefined :: [Int]) =
[TypeBox (undefined :: Int), TypeBox (undefined :: [Int])]
\end{code}
The |contains| function determines the types directly contained by a data type. By taking the transitive closure we can determine all the types reachable from a particular type. Hence all types can be divided into three sets:
\begin{enumerate}
\item The singleton set containing the type of the target.
\item The set of other types which \textit{may} contain the target type.
\item The set of other types which \textit{do not} contain the target type.
\end{enumerate}
We compute these sets for each type only once, and the cost of computing them is small. When examining a value, if its type is a member of set 3 we can prune the search. This trick is surprisingly effective. Take for example an operation over |Bool| on the value |(True,"Haskell")|. The SYB approach finds 16 subcomponents, Uniplate touches only 3 subcomponents.
With all these optimisations we can usually perform both queries and transformations faster than SYB. In the benchmarks we improve on SYB by between 30\% and 225\%, with an average of 145\% faster. Full details are presented in \S\ref{secU:results_speed}.
\section{Results and Evaluation}
\label{secU:results}
We evaluate our boilerplate reduction scheme in two ways: firstly by the \textit{conciseness of traversals} using it (i.e. the amount of boilerplate it removes), and secondly by its \textit{runtime performance}. We measure conciseness by counting lexemes -- although we concede that some aspects of concise expression may still be down to personal preference. We give a set of nine example programs, written using Uniplate, SYB and Compos operations. We then compare both the conciseness and the performance of these programs. Other aspects, such as the clarity of expression, are not so easily measured. Readers can make their own assessment based on the full sources we give.
\subsection{Boilerplate Reduction}
\label{secU:results_boilerplate}
As test operations we have taken the first three examples from this chapter, three from the Compos paper \citep{bringert:compos}, and the three given in the SYB paper \citep{lammel:syb} termed the ``Paradise Benchmark''. In all cases the Compos, SYB and Uniplate functions are given an appropriately prefixed name. In some cases, a helper function can be defined in the same way in both SYB and Uniplate; where this is possible we have done so. Type signatures are omitted where the compiler is capable of inferring them. For SYB and Compos we have used definitions from the original authors where available, otherwise we have followed the guidelines and style presented in the corresponding paper.
\subsubsection{Examples from this Chapter}
\begin{exampleany}{from \S\ref{secU:intro} (revisited)}
\ignore\begin{code}
uni_variables x = [y | Var y <- universe x]
syb_variables = everything (++) ([] `mkQ` f)
where f (Var y) = [y]
f _ = []
com_variables :: Expr a -> [String]
com_variables x = case x of
Var y -> [y]
_ -> composOpFold [] (++) com_variables x
\end{code}
Only Compos needs a type signature, due to the use of GADTs. List comprehensions allow for succinct queries in Uniplate.
\end{exampleany}
\begin{examplerevisit}{\ref{exU:zerocount}}
\ignore\begin{code}
uni_zeroCount x = length [() | Div _ (Val 0) <- universe x]
\end{code}
\begin{onepage}
\ignore\begin{code}
syb_zeroCount = everything (+) (0 `mkQ` f)
where f (Div _ (Val 0)) = 1
f _ = 0
com_zeroCount :: Expr a -> Int
com_zeroCount x = case x of
Div y (Val 0) -> 1 + com_zeroCount y
_ -> composOpFold 0 (+) com_zeroCount x
\end{code}
\end{onepage}
In the Uniplate solution the list of |()| is perhaps inelegant. However, Uniplate is the only scheme that is able to use the standard |length| function: the other two express the operation as a fold. Compos requires additional boilerplate to continue the operation on |Div y|.
\end{examplerevisit}
\begin{examplerevisit}{\ref{exU:simplify}}
\ignore\begin{code}
simp (Sub x y) = simp $ Add x (Neg y)
simp (Add x y) | x == y = Mul (Val 2) x
simp x = x
uni_simplify = transform simp
syb_simplify = everywhere (mkT simp)
com_simplify :: Expr a -> Expr a
com_simplify x = case x of
Sub a b -> com_simplify $ Add (com_simplify a) (Neg (com_simplify b))
Add a b -> case (com_simplify a, com_simplify b) of
(a',b') | a' == b' -> Mul (Val 2) a'
| otherwise -> Add a' b'
_ -> composOp com_simplify x
\end{code}
This is a modified version of |simplify| discussed in \S\ref{secU:rewrite_bottom}. The two rules are applied everywhere possible. Compos does not provide a bottom-up transformation, so needs extra boilerplate.
\end{examplerevisit}
\subsubsection{Multi-type examples from the Compos paper}
\begin{figure}
\ignore\begin{code}
data Stm = SDecl Typ Var | SAss Var Exp
| SBlock [Stm] | SReturn Exp
data Exp = EStm Stm | EAdd Exp Exp
| EVar Var | EInt Int
data Var = V String
data Typ = T_int | T_float
\end{code}
\caption{Data type from Compos.}
\label{figU:compos}
\end{figure}
The statement type manipulated by the Compos paper is given in Figure \ref{figU:compos}. The Compos paper translates this type into a GADT, while Uniplate and SYB both accept the definition as supplied.
As the |warnAssign| function from the Compos paper could be implemented much more neatly as a query, rather than a monadic fold, we choose to ignore it. We cover the remaining three functions.
\begin{examplename}{rename}
\ignore\begin{code}
ren (V x) = V ("_" ++ x)
uni_rename = transformBi ren
syb_rename = everywhere (mkT ren)
com_rename :: Tree c -> Tree c
com_rename t = case t of
V x -> V ("_" ++ x)
_ -> composOp com_rename t
\end{code}
The Uniplate definition is the shortest, as there is only one constructor in type |Var|. As Compos redefines all constructors in one GADT, it cannot benefit from this knowledge.
\end{examplename}
\begin{examplename}{symbols}
\ignore\begin{code}
uni_symbols x = [(v,t) | SDecl t v <- universeBi x]
\end{code}
\ignore\begin{code}
syb_symbols = everything (++) ([] `mkQ` f)
where f (SDecl t v) = [(v,t)]
f _ = []
\end{code}
\begin{onepage}
\ignore\begin{code}
com_symbols :: Tree c -> [(Tree Var, Tree Typ)]
com_symbols x = case x of
SDecl t v -> [(v,t)]
_ -> composOpMonoid com_symbols x
\end{code}
\end{onepage}
Whereas the Compos solution explicitly manages the traversal, the Uniplate solution is able to use the built-in |universeBi| function. The use of lists again benefits Uniplate over SYB.
\end{examplename}
\begin{examplename}{constFold}
\ignore\begin{code}
optimise (EAdd (EInt n) (EInt m)) = EInt (n+m)
optimise x = x
uni_constFold = transformBi optimise
syb_constFold = everywhere (mkT optimise)
com_constFold :: Tree c -> Tree c
com_constFold e = case e of
EAdd x y -> case (com_constFold x, com_constFold y) of
(EInt n, EInt m) -> EInt (n+m)
(x',y') -> EAdd x' y'
_ -> composOp com_constFold e
\end{code}
The constant-folding operation is a bottom-up transformation, requiring all subexpressions to have been transformed before an enclosing expression is examined. Compos only supports top-down transformations, requiring a small explicit traversal in the middle. Uniplate and SYB both support bottom-up transformations.
\end{examplename}
\subsubsection{The Paradise Benchmark from SYB}
\begin{figure}
\begin{code}
type Manager = Employee
type Name = String
type Address = String
data Company = C [Dept]
data Dept = D Name Manager [Unit]
data Unit = PU Employee | DU Dept
data Employee = E Person Salary
data Person = P Name Address
data Salary = S Integer
\end{code}
\caption{Paradise Benchmark data structure.}
\label{figU:paradise}
\end{figure}
The Paradise benchmark was introduced in the SYB paper \citep{lammel:syb}. The data type is shown in Figure \ref{figU:paradise}. The idea is that this data type represents an XML file, and a Haskell program is being written to perform various operations over it. The Compos paper includes an encoding into a GADT, with tag types for each of the different types.
We have made one alteration to the data type: |Salary| is no longer of type |Float| but of type |Integer|. In various experiments we found that the rounding errors for floating point numbers made different definitions return different results.\footnote{Storing your salary in a non-exact manner is probably not a great idea!} This change is of no consequence to the boilerplate code.
\begin{examplename}{increase}
The first function discussed in the SYB paper is |increase|. This function increases every item of type |Salary| by a given percentage. In order to fit with our modified |Salary| data type, we have chosen to increase all salaries by |k|.
\ignore\begin{code}
incS k (S s) = S (s + k)
uni_increase k = transformBi (incS k)
syb_increase k = everywhere (mkT (incS k))
com_increase :: Integer -> Tree c -> Tree c
com_increase k c = case c of
S s -> S (s + k)
_ -> composOp (com_increase k) c
\end{code}
In the Compos solution all constructors belong to the same GADT, so instead of just matching on |S|, all constructors must be examined.
\end{examplename}
\begin{examplename}{incrOne}
The |incrOne| function performs the same operation as |increase|, but only within a named department. The one subtlety is that if the named department has a sub-department with the same name, then the salaries of the sub-department should only be increased once. We are able to reuse the |increase| function from the previous section in all cases.
\ignore\begin{code}
uni_incrOne d k = descendBi f
where f x@(D n _ _) | n == d = uni_increase k x
| otherwise = descend f x
\end{code}
\ignore\begin{code}
syb_incrOne :: Data a => Name -> Integer -> a -> a
syb_incrOne d k x | isDept d x = syb_increase k x
| otherwise = gmapT (syb_incrOne d k) x
where isDept d = False `mkQ` isDeptD d
isDeptD d (D n _ _) = n == d
com_incrOne :: Name -> Integer -> Tree c -> Tree c
com_incrOne d k x = case x of
D n _ _ | n == d -> com_increase k x
_ -> composOp (com_incrOne d k) x
\end{code}
The SYB solution has grown substantially more complex, requiring two different utility functions. In addition |syb_incrOne| now \textit{requires} a type signature. Compos retains the same structure as before, requiring a case to distinguish between the types of constructor. For Uniplate we use |descend| rather than |transform|, to ensure no salaries are incremented twice.
\end{examplename}
\begin{examplename}{salaryBill}
The final function is one which sums all the salaries.
\ignore\begin{code}
uni_salaryBill x = sum [s | S s <- universeBi x]
syb_salaryBill = everything (+) (0 `mkQ` billS)
where billS (S s) = s
com_salaryBill :: Tree c -> Integer
com_salaryBill x = case x of
S s -> s
_ -> composOpFold 0 (+) com_salaryBill x
\end{code}
Here the Uniplate solution wins by being able to use a list comprehension to select the salary value out of a Salary object. The Uniplate class is the only one that is able to use the standard Haskell |sum| function, not requiring an explicit fold.
\end{examplename}
\subsubsection{Uniplate compared to SYB and Compos}
\begin{table}
\bigskip
\renewcommand{\f}[1]{\makebox[10mm][r]{#1}}
\newlength{\ff}
\setlength{\ff}{1mm}
\begin{tabular}{@@{}l@@{\hspace{-1.5mm}}r@@{\hspace{\ff}}r@@{\hspace{\ff}}r@@{\hspace{\ff}}r@@{\hspace{\ff}}r@@{\hspace{\ff}}r@@{\hspace{\ff}}r@@{\hspace{\ff}}r@@{\hspace{\ff}}r}
& \f{simp} & \f{var} & \f{zero} & \f{const} & \f{ren} & \f{syms} & \f{bill} & \f{incr} & \f{incr1} \\
\textbf{Lexemes} \\
Uniplate & 40 & 12 & 18 & 27 & 16 & 17 & 13 & 21 & 30 \\
SYB & 43 & 29 & 29 & 30 & 19 & 34 & 21 & 24 & 56 \\
Compos & 71 & 30 & 32 & 54 & 27 & 36 & 25 & 33 & 40 \\
\\
\textbf{Performance} \\
Uniplate Manual & 1.26 & 1.31 & 1.89 & 1.25 & 1.25 & 1.33 & 2.18 & 1.28 & 1.15 \\
Uniplate Direct & 1.30 & 1.37 & 2.17 & 1.34 & 1.36 & 1.28 & 2.89 & 1.40 & 1.24 \\
Compos & 1.50 & 1.17 & 1.65 & 1.50 & 1.38 & 1.46 & 3.70 & 1.65 & 1.60 \\
Uniplate Typeable & 1.50 & 1.72 & 2.86 & 2.09 & 2.00 & 3.10 & 9.49 & 1.74 & 1.81 \\
Uniplate Data & 2.35 & 3.76 & 7.52 & 2.31 & 2.50 & 4.10 & 16.72 & 2.08 & 2.03 \\
SYB & 3.24 & 7.28 & 16.33 & 3.69 & 3.33 & 9.75 & 54.70 & 4.09 & 3.70 \\
\\
\end{tabular}
\renewcommand{\f}[1]{\textbf{#1}}
\begin{tabular}{@@{}lrrr}
& \f{Query} & \f{Transform} & \f{All} \\
\textbf{Lexemes} \\
Uniplate & 60 & 134 & 194 \\
SYB & 113 & 172 & 285 \\
Compos & 123 & 225 & 348 \\
\\
\textbf{Performance} \\
Uniplate Manual & 1.68 & 1.24 & 1.43 \\
Uniplate Direct & 1.93 & 1.33 & 1.59 \\
Compos & 2.00 & 1.53 & 1.73 \\
Uniplate Typeable & 4.29 & 1.83 & 2.92 \\
Uniplate Data & 8.03 & 2.25 & 4.81 \\
SYB & 22.02 & 3.61 & 11.79 \\
\\
\end{tabular}
\textbf{Lexemes} are the number of lexemes for each of the solutions to the test problems using each of Uniplate, SYB and Compos. \textbf{Performance} is expressed as multiples of the run-time for a hand-optimised version not using any traversal library, with lower being better. \\
\caption{Table of lexeme counts and runtime performance.}
\label{tabU:uniplate_results}
\end{table}
In order to measure conciseness of expression, we have taken the code for all solutions and counted the number of lexemes -- using the |lex| function provided by Haskell. A table of results is given in Table \ref{tabU:uniplate_results}. The definitions of functions shared between SYB and Uniplate are included in both measurements. For the |incrOne| function we have not included the code for |increase| as well.
The Compos approach requires much more residual boilerplate than Uniplate, particularly for queries, bottom-up transformations and in type signatures. The Compos approach also requires a GADT representation.
Compared with SYB, Uniplate seems much more similar. For queries, Uniplate is able to make use of list comprehensions, which produces shorter code and does not require encoding a manual fold over the items of interest. For transformations, typically both are able to use the same underlying operation, and the difference often boils down to the |mkT| wrappers in SYB.
All the Uniplate functions could be implemented in the SYB framework, using the |Data| and |Typeable| classes instead of |Uniplate| and |Biplate|. If this was done, then the SYB examples would be identical to the Uniplate examples, and consequently have identical lexeme counts.
\subsection{Runtime Overhead}
\label{secU:results_speed}
This section compares the speed of solutions for the nine examples given in the previous section, along with hand-optimised versions, using no boilerplate removal library. We use four |Uniplate| instances, provided by:
\begin{description}
\item[Manual:] These are |Uniplate| and |Biplate| instances written by hand.
\item[Direct:] These instances use the direct combinators from \S\ref{secU:implement_playdirect}.
\item[Typeable:] These instances use the |Typeable| combinators from \S\ref{secU:implement_playtypeable}.
\item[Data:] These instances use the SYB |Data| instances directly, as described in \S\ref{secU:implement_playdata}.
\end{description}
For all data types we generate 100 values at random using QuickCheck \citep{quickcheck}. In order to ensure a fair comparison, we define one data type which is the same as the original, and one which is a GADT encoding. All operations take these original data types, transform them into the appropriate structure, apply the operation and then unwrap them. We measure all results as multiples of the time taken for a hand-optimised version. We compiled all programs with GHC 6.8.2 and -O2 on Windows XP.
The results are presented in Table \ref{tabU:uniplate_results}. Using Manual or Direct instances, Uniplate is slightly faster than Compos -- but about 50\% slower than hand-optimised versions. Using the Data instances provided by SYB, we are able to substantially outperform SYB itself! See \S\ref{secU:performance} for details of some of the optimisations used.
\section{Related Work}
\label{secU:related}
The Uniplate library is intended to be a way to remove the boilerplate of traversals from Haskell programs. It is far from the first library to attempt boilerplate removal.
\subsection{The SYB library}
The SYB library \citep{lammel:syb} is perhaps the most popular boilerplate removal system in Haskell. One of the reasons for its success is tight integration with the GHC compiler, lowering the barrier to use. We have compared directly against traversals written in SYB in \S\ref{secU:results_boilerplate}, and have also covered how to implement Uniplate in terms of SYB in \S\ref{secU:implement_playdata}. In our experience most operations are shorter and simpler than the equivalents in SYB, and we are able to operate without the extension of rank-2 types. Most of these benefits stem directly from our definition of children as being the children of the same uniform type, contrasting with the SYB approach of all direct children.
The SYB library is, however, more powerful than Uniplate. If you wish to visit values of different type in a single traversal, Uniplate is unsuitable. The |Data| and |Typeable| classes have also been pushed further in successive papers \citep{lammel:syb2,lammel:syb3}, allowing operations such as runtime reflection on values, which Uniplate cannot provide.
\subsection{The Compos library}
The Compos library \citep{bringert:compos} is another approach to the removal of boilerplate, requiring GADTs \citep{spj:gadt} along with rank-2 types. The Compos library requires an existing data type to be rewritten as a GADT. The conversion from standard Haskell data structures to GADTs currently presents several problems: they are GHC specific, deriving is not supported on GADTs, and GADTs require explicit type signatures. The Compos approach is also harder to write instances for, having no simple instance generation framework, and no automatic derivation tool (although one could be written). The inner |composOp| operator is very powerful, and indeed we have chosen to replicate it in our library as |descend|. But the Compos library is unable to replicate either |universe| or |transform| from our library.
\subsection{The Stratego tool}
The Stratego tool \citep{stratego} provides support for generic operations, focusing on both the operations and the strategies for applying them. This approach is performed in an \textit{untyped} language, although a typed representation can be modelled \cite{lammel:typed_generic_strategies}. Rather than being a Haskell library, Stratego implements a domain specific language that can be integrated with Haskell.
\subsection{The Strafunski library}
The Strafunski library \citep{strafunski, lammel:polymorphic_symphony} has two aspects: generic transformations and queries for trees of any type; and features to integrate components into a larger programming system. Generic operations are performed using strategy combinators which can define special case behaviour for particular types, along with a default to perform in other situations. The Strafunski library is integrated with Haskell, primarily providing support for generic programming in application areas that involve traversals over large abstract syntax trees.
\subsection{The Applicative library}
The Applicative library \citep{mcbride:applicative} works by threading an Applicative operation through a data structure, in a similar way to threading a Monad through the structure. There is additionally a notion of |Traversable| functor, which can be used to provide generic programming. While the Applicative library can be used for generic programming, this task was not its original purpose, and the authors note they have ``barely begun to explore'' its power as a generic toolkit.
\subsection{Generic Programming}
There are a number of other libraries which deal with generic programming, aimed more at writing \textit{type generic} (or \textit{polytypic}) functions, but which can be used for boilerplate removal. The \textit{Haskell generics suite}\footnote{\url{http://darcs.haskell.org/generics/}} showcases several approaches \citep{weirich:replib,hinze:generics_masses,hinze:generic_haskell}.