How unique is Unique?

Update: This post does not attempt to demonstrate any bug in Data.Unique. It just demonstrates that Data.Unique is not suitable for certain tasks. In particular, one should be careful about making any assumptions about the expression unsafePerformIO newUnique.

Data.Unique provides generation of unique symbols. However, it is not completely clear what the scope of this uniqueness is. As far as I can tell, it is safe to assume that the symbols are unique within the same “IO block”. So, normally the scope is a single run of a single program. (I suspect it is not even possible to compare these symbols from different runs of a program, so maybe it doesn’t make sense to talk about a larger scope.)

Problem

However, unsafePerformIO changes things. A program containing unsafePerformIO might have several “IO blocks”, and unfortunately, uniqueness is not always preserved between these blocks.

The following modules demonstrate the problem. Module M1 defines a Ref type, which is is simply a value tagged with a unique number. It also creates a reference a.


module M1 where

import Data.Unique
import System.IO.Unsafe

data Ref a = Ref Unique a

instance Eq (Ref a)
where
Ref i1 _ == Ref i2 _ = i1==i2

ref :: a -> Ref a
ref a = unsafePerformIO $ do
u <- newUnique
return (Ref u a)

a = ref "a"

Module M2 imports M1 and creates another reference b:


module M2 where

import M1

b = ref "b"

Now, comparing a and b should return False, since they are two distinct references. Let’s see if this is the case:

*M2> a==b
False
*M2> :! touch M2.hs
*M2> :r
[2 of 2] Compiling M2               ( M2.hs, interpreted )
Ok, modules loaded: M2, M1.
*M2> a==b
True

Ooops, it seems that reloading M2 caused the references to become equal. A probable explanation is that the global counter inside Data.Unique gets reset when reloading, but the reference a survives because M1 is not reloaded.

This problem was originally discovered when using Feldspar in GHCi. Version 0.3 had an implementation of observable sharing based on Data.Unique (the Ref module). This meant that reloading in GHCi sometimes had the effect of introducing erroneous sharing, leading to generation of bogus C programs.

The problem has been tested on GHC 6.12.3 and 7.0.2 using base-4.2.0.2 and base-4.3.1.0 respectively.

Solution

Here is a version of Data.Unique that creates symbols that are a bit more “unique”:


module MoreUnique where

import Data.Typeable
import Data.Time
import GHC.Conc
import System.IO.Unsafe (unsafePerformIO)

data Unique = Unique !Integer UTCTime
deriving (Eq,Ord)

uniqSource :: TVar Unique
uniqSource = unsafePerformIO $ do
utc <- getCurrentTime
newTVarIO (Unique 0 utc)
{-# NOINLINE uniqSource #-}

newUnique :: IO Unique
newUnique = atomically $ do
Unique val utc <- readTVar uniqSource
let next = Unique (val+1) utc
writeTVar uniqSource $! next
return next

This module tags each unique number with the creation time of the global counter. It should be extremely unlikely that two counters are initialized at the same time stamp, so these symbols can be said to be unique across reloads in GHCi. This is demonstrated by the files M1_more.hs and M2_more.hs.

The files related to this experiment are:

You can also download the whole thing by doing

darcs get http://community.haskell.org/~emax/darcs/MoreUnique