# A question

I’ve recently heard the following suprisingly general question:

The premise is that we have three services which return Future[Option[Int]]:

trait FooService {
def find(id:Int): Future[Option[Foo]]
}

trait BarService {
def find(id:Int): Future[Option[Bar]]
}

trait QuuxService {
def find(id:Int): Future[Option[Quux]]
}

And we need to query across all three services, i.e. in a non blocking/no option service we would have:

val foo = fooService.find(1)
val bar = barService.find(foo.barId)
val quux = quuxService.find(bar.quuxId)

Now, monads don’t compose, and because we have multiple Future[Option[T]] here, the quickest route to resolving an Option from a Future (without short-circuiting the failure case) is to use a partial function or fold, and then just break things down into very small methods so it doesn’t get too nesty.

I’ve heard that there’s a third option which is to use a monad transformer or a Reader monad.

Then he ran into implementation-specific terminology for that particular method. So, this is my (pointed) answer to the specific sub-question of how to deal with this problem ‘using a monad transformer’. I’ll try to make it prerequisite-free, using just Wikipedia-accessible mathematical definitions and Scala.

# Basics

So, just to check our bases, the idea of defining a Monad transformer for Future and Option would require them to indeed be monads. They’re functors, and they have a constructor, and a flatMap. Do they verify the monadic laws ?

For Option (where, in scala, the apply Option(x) resolves to Some(x) on non-derelict inputs):

• left identity: ∀ (f: A, g: A → Option[B]) Option (f) flatMap g = g f
• right identity: ∀ (f: Option[A]), f flatMap ( λ(x:A). Option(x) ) = f
• associativity: ∀ (f: Option[A], g: A → Option[B], h: B → Option[C]) f flatMap g flatMap h = f flatMap ( λ(x:A). g x flatMap h)

It’s a good exercise to check, e.g. for Option, that these equalities hold whatever the output of the functions whose type matches the pattern (X → Option[Y]) is.

For Future, the monadic laws look like this:

• left identity: ∀ (f: A, g: A → Future[B]) future { f } flatMap g = g f
• right identity: ∀ (f: Future[A]), f flatMap ( λ(x:A). future { x } ) = f
• associativity: ∀ (f: Future[A], g: A → Future[B], h: B -> Future[C]) f flatMap g flatMap h = f flatMap ( λ(x:A). g x flatMap h)

By the way, the left identity law for Futures doesn’t work if g throws an exception before returning its Future:

scala> val g = { (x: Int) => if (x == 0) throw new IllegalArgumentException("Zero !") else Future { 3 / x } }
g: Int => scala.concurrent.Future[Int] = <function1>

scala> g(0)
java.lang.IllegalArgumentException: Zero !
at $anonfun$1.apply(<console>:11)
at $anonfun$1.apply(<console>:11)
... 32 elided

scala> Future { 0 } flatMap g
res11: scala.concurrent.Future[Int] = scala.concurrent.impl.Promise$DefaultPromise@68f3b76b In sum, it tells us something we knew : Futures put some exception-raising behavior of their arguments in the future. We’ll have to be careful about which results depend on that left identity in the following. # Composition On to the statement that ‘monads don’t compose’. This is true, but it means that there is no general function that takes two monads as input, and systematically returns a monad for the composition of the two underlying functors, for any two input monads. At best, what one can always hope to return is a functor. That being said, it’s a starkly different statement from dealing with two particular monads. For two very specific, fixed monads, there may well be a way to compose them into a monad, and that is not a contradiction with the prior statement. And we’re in the case of two very specific monads: Option and Future. In fact, there’s even a construction called a monad transformer, slightly more demanding than a monad, that can be injected into a monad to yield another (transformed) monad (1). So, a monad transformer requires an interface with: • A type constructor T which takes a type constructor and returns another. Technically, it’s said to be of kind (★ → ★) → ★ → ★. As can be expected, the intent is to make it take the “transformee" monad as argument. • Monad operations (a constructor and a flatMap), for every output T(M) (the transformed monad), provided the input M (the transformee) is a monad. Notice how making sure the transformed monad is here a requirement. • Another operation, lift : ∀A: ★, ∀M: ★ → ★. M[A] → T[M[A]] (this reads, ∀ A a type and M a type constructor) satisfying the following laws: • ∀ M: ★ → ★, lift ∘ M = T • lift (M flatMap k) = (lift M) flatMap (lift ∘ k) This becomes more clear, as with every abstract structure, after building a couple of concrete cases. Intuitively, the transformed monad has the operation of the transformer injected in its inputs. So, given that we’re interested in services that return a Future[Option[A]] (as opposed to an Option[Future[A]]), we’re interested in defining the Option Monad Transformer (as opposed to the Future Monad Transformer). Thankfully, not only does this Option Monad Transformer exists, but it’s easy to define. trait ApplicativeFunctor[A, F[A]] { def apply[X](a:X): F[X] def map[B](f: A => B): F[B] } trait Monad[A, M[A]] extends ApplicativeFunctor[A, M] { def flatMMap[B](f: A => M[B]): M[B] } class OptionT[A, M[A]] (m:ApplicativeFunctor[Option[A], M]) extends ApplicativeFunctor[A, ({type T[X]=M[Option[X]]})#T] { override def apply[X](a:X): M[Option[X]] = m(Option(a)) override def map[B](f: A => B): M[Option[B]] = m map {(x: Option[A]) => x map f} } class OptionTM[A, M[A]](m:Monad[Option[A], M]) extends OptionT[A, M](m) with Monad[A, ({type T[X]=M[Option[X]]})#T] { override def flatMMap [B] (f:A => M[Option[B]]) : M[Option[B]] = m flatMMap { case Some(z) => f(z) case None => m(None) } } The name flatMMap instead of flatMap is here to avoid conflicts. The ({type T[X]=M[Option[X]]})#T is unsightly, and pollutes the scope of the program, but how to iron that particular wrinkle is beyond the scope of this post. Let’s give a moment of pause to the three laws: • left identity: ∀(f:A, g: A => Future[Option[B]]) Future { Option(f) } flatMap g = g ( f ) Unfolding this one reveals that for this equality to hold, we’d need the left equality for the Future monad to hold. Not that it’s the first time we’ve seen • right identity: ∀ (f: Future[Option[A]]), f flatMap ( λ(x:A). Future{ Option(x) } ) = f • associativity: ∀ (f: Option[A], g: A → Option[B], h: B → Option[C]) f flatMap g flatMap h = f flatMap ( λ(x:A). g x flatMap h) All that’s left is to test it. The inevitable part is reminding Scala there’s a Monad instance for Future, even in the context where Option[A] is the parameter, but it’s more about mapping our terminology with existing Scala methods. Then there is the related implicit declaration to palliate some insufficiencies in implicit search not jumping order in type parameters when a suitable application of a lower-kind constructor is found. abstract class MyTest{ import scala.concurrent.Future import scala.concurrent.ExecutionContext.Implicits._ // The unchanged future monad, but applied to Option[A] class FutureOM[A](fut: Future[Option[A]]) extends Monad[Option[A], Future] { def apply[X](a:X): Future[X] = Future { a } def map[B](f: Option[A] => B): Future[B] = fut map f def flatMMap[B](f: Option[A] => Future[B]): Future[B] = fut flatMap f } implicit def optionMF[A](f:Future[Option[A]]) = new OptionTM(new FutureOM(f)) trait FooService { def find(id: Int): Future[Option[Int]] } trait BarService { def find(id: Int): Future[Option[Int]] } trait QuuxService { def find(id:Int): Future[Option[Int]] } val fooService: FooService val barService: BarService val quuxService: QuuxService def finalRes(): Future[Option[Int]] = fooService.find(1) flatMMap (barService.find _) flatMMap (quuxService.find _) } The whole source is on github 1. In the following, I’ll use liberally of the Scala apply syntactic sugar of calling the constructor in the same way as the resulting type. Since I’m typing every variable, it should be easy to see what I’m talking about. In case of doubt, I apply the constructor with parentheses, and the type constructor with brackets. the For instance, Option is a type constructor: λ(A:★).Option[A]: ★ → ★, which returns a term of an Option type: ∀A: ★, (λ(x:A). Option(x)): A → Option[A]. So is List λ(A:★).List[A]: ★ → ★, which returns a term of a List type: ∀A: ★, (λ(x:A). List(x)): A → List[A]. Filed under scala monad option monad_transformer # Arithmétique Européenne Aux Européenes de 2014, si mes chiffres sont bons, le FN est crédité de 25,7% des voix, derrière l’UMP (20,3%) et le PS (14,7%). La participation à l’échelle métropolitaine est estimée à 42,5%. Supposons maintenant (c’est un pur exercice de pensée, à titre de curiosité arithmétique) que tous les électeurs du FN soient allés aux urnes. Parmi les gens qui n’ont pas participé à ce scrutin, il ne resterait donc plus que des gens qui votent pour un autre parti (peu importe lequel). Supposons également que les gens qui ont exprimé leur vote ce dimanche soient un échantillon fidèle de la population, c’est à dire que si la participation augmentait, l’UMP ou le PS feraient peu ou prou le même score, en pourcentage. Pour que l’UMP passe devant le FN, il suffirait donc d’une participation autour de 54%, et pour le PS, autour de 74%. En 1994, la participation de la France aux élections Européenes était de 52,7%. Les taux de participation au Danemark, en Irlande et en Grece sont historiquement au voisinage de ces 54%. Filed under elections politique # A very Data Sciencey summer Everybody knows about the coming Data Science Specialization offered by John Hopkins and Coursera this Summer1. Some people have noticed there was an unusual data science bootcamp running in New York this summer. But I haven’t find much trace of people noticing there were in fact many of those, along the same concept, all around the world ! Now, summer schools related to data science are an old thing in the academic world. But in this case, industrial partners are taking on some of the hands-on teaching. The idea being that students get practical experience, while companies get potential hires. Depending on the exact case, this access to graduates may be more or less contractual, and as such somewhat constraining. I’m surprised, though, by the worldwide, simultaneous enthusiasm this concept of a summer school teaching data science has garnered. Apparently, the news of data science being dead have been greatly exaggerated. Have a look at this list: The Zipfian academy runs for 3 months in San Francisco starting May 12th, and costs 16K USD (or 12K USD under a hiring agreement). The Science to Data Science in London runs for 5 weeks starting August 4th, and costs 360GBP. It recruits analytical PhDs only. The Data Science Retreat runs for 3 months in Berlin, and costs 7K EUR (or 3K EUR under a hiring agreement). It has one batch starting May 1st, another starting August 1st. The Data incubator runs for 6 weeks in NYC starting July 9th, and it only accepts PhDs. It seems to be free. The Insight Data Science Fellows program runs for 6 weeks, starting on July 28th in NYC, and September 12 in the Silicon Valley. It seems to be free. Microsoft research seems to also run a more academic, if well-funded similar program in NYC, running for 8 weeks starting June 16. 1. But did you notice you could still take it for free, foregoing the nice certificate, by registering for each class individually ? Filed under data science math workshop # Why, perhaps, we may need certified software. (…) (…) Oh, wait, did I just think of certified PHP ? I need a drink. And Ur/Web Filed under security functional_programming web php crypto # Look, Ma, No Queues ! A very classic, if a bit overrated exercise, is to print a tree in width-first order. Also called level-order, it means you print your binary tree row-by-row, with the nodes of depth d, leftmost first, before the nodes of depth d+1. This is sort of a fizzbuzz for recurison and basic data structure usage, because while there are many ways to cheat in this problem, the standard and expected solution involves a queue. In Scala: abstract class Tree case class Node(value: Int, left: Tree, right:Tree) extends Tree case class Leaf(value:Int) extends Tree def printLevelsWithQueue(t:Tree): List[Int] = { val q = Queue[Tree](t) def print_aux(q: Queue[Tree]): List[Int] = if (q.isEmpty) Nil else { val (tree, remq) = q.dequeue tree match{ case Leaf(v) => v :: print_aux(remq) case Node(v, l, r) => v :: print_aux(remq :+ l :+ r) } } print_aux(q) } Now, the interesting thing is that this needs not be : you can write this function without a single queue, by simply writing an auxiliary function that returns the lines the printout of the tree should be a part of, each of them a List[Int]. So, the end result is a list of lines, a List[List[Int]]. def printLevelsWithNoQueueAndMultipleRecursion(t:Tree): List[Int] = { def print_aux_rec(t:Tree): List[List[Int]] = t match { case Leaf(v) => List(List(v)) case Node(v, l, r) => { val leftLines = print_aux_rec(l) val rightLines = print_aux_rec(r) List(v) :: ((leftLines,rightLines).zipped map {(l1,l2) => l1:::l2}) } } print_aux_rec(t).flatten } Sadly, this function is horrendously costly, with multiple recursive calls, and a number of list appends that doubles for each level of the tree: one for the second level from the root, two for the third level, etc. Now, dealing with the append problem would involve passing list buffers around for the lines, instead of lists — which isn’t particularly interesting in and of itself. But I thought it would be nice to get rid of the multiple recursion. That’s a simple continuation-passing style translation: def printLevelsWithNoQueuesAndCPS(t:Tree): List[Int] = { def print_aux_rec_cps(t:Tree, andThen: List[List[Int]] => List[List[Int]]): List[List[Int]] = t match { case Leaf(v) => andThen(List(List(v))) case Node(v, l, r) => { def newThen = (leftLines:List[List[Int]]) => print_aux_rec_cps(r, (rightLines: List[List[Int]]) => andThen (List(v) :: ((leftLines, rightLines).zipped map (_ ::: _)))) print_aux_rec_cps(l,newThen) } } print_aux_rec_cps(t,{(x) => x}).flatten } Now, the problem with writing that in Scala is that the tail call optimization done in the compiler is relatively simple, and doesn’t deal with the imbricated recursive call here, even though this is a clear tail recursive function. And incidentally, that’s not so much of a problem : somebody writing a CPS-translated, list buffer-passing function in anything but a playful context is clearly out of his mind. But if you want to play with that function, seeing if it makes the stack explode, you may want to generate big trees. How would we do that ? Well, one of the ways to do that is to help yourself with one of the ways to cheat with the initial question of printing a tree in level order: the cheat is to represent the tree as a table, where the$k$-th node of the$d$-th level has index$2^{d-1}+k-1$. Using this correspondence, we can write a recursive function to generate that big tree of depth$n$: def bigExample(n: Int, k: Int):Tree = if (n == 0) Leaf(k) else { import scala.math._ val left = bigExample(n-1, k+1) val right = bigExample(n-1, k + pow(2,n).toInt) Node(k, left, right) } Oh, but wait ! This is again multiply recursive and hard on the stack ! CPS translation to the rescue: def bigExample_cps(n:Int, k:Int, andThen:Tree => Tree): Tree = if (n == 0) andThen(Leaf(k)) else { import scala.math._ def nextThen = (l:Tree) => bigExample_cps(n-1, k+pow(2,n).toInt, (r:Tree) => andThen(Node(k,l,r))) bigExample_cps(n-1,k+1,nextThen) } Sadly, we are faced again with the limitations of the Scala TCO, which indicates we might want to generate that tree using some other way to put the data on the stack. It would suffice to generate the tree bottom-up, instead of top-down, writing something that would look like, for example, the combination function of a certain Huffman coding exercise (that’s a personal message to followers of Martin Odersky’s course on functional programming). Or, we could try with a language that has a better optimization. I have a full gist of Ocaml code for you to play with. There you’ll also find all the code in this post. Filed under cps tail_recursion ocaml scala recursion # How a RAM disk saved my life ## I don’t want to duel that much In the last few weeks, I have been risking my life in the corridors at work. I have fit coworkers with a good sense of balance. Here is a picture of the horrible situations I’ve found myself into : (props to xkcd. It’s the greatest webcomic in the universe. read it regularly) Some of the code I’m dealing with, though, is not bound by the CPU speed or the number of cores I can throw at it. In particular, the ant distpack-opt task of the Scala compiler, compiling and packaging the documentation, is very much limited by storage transfer speeds. I have state-of-the-art SSDs on some of my machines, but yet found myself frustrated and waiting in front of the machine way too often. To be fair, as far as I’m concerned, once is too often. I’m not so good when standing on a rolling chair. ## The RAM disk solution and its imperfections The obvious solution, then was to use a RAM disk : insane transfer speeds would definitely help me, right ? Except RAM disks have two well-known problems: • a RAM disk is, fundamentally, a piece of storage — a partition — carved out of your existing RAM. Assuming you have enough of it, it’s very fun until you have no more power. Every time you shut down your machine, you need to transfer the content of your RAM to your hard drive. Every time there is a sudden power outage, you basically lose the whole partition. • moreover, a RAM partition is inherently limited in size. More interestingly, that’s a hard limit. What happens when you overflow the size of that partition is — at least on Linux — a nightmare: because the partition is treated like a fixed storage device, not a piece of memory (that could overflow into swap), it basically blows up in your face when full. ## The solutions on a modern Arch Thankfully, the tools to manage those two problems have come to maturity. At the time of this writing, my machines are running a Linux 3.8rc7 kernel on ArchLinux transitioned to systemd, and running RAM disks in that kind of environment is a breeze. • Ulatencyd (ArchLinux doc) is a daemon that uses cgroups, a recent-ish feature of the Linux kernel that lets you constrain the amount of resources allocated to a process. It watches over your processes, and dynamically gives a fair amount of ressources to them. Every second, it looks for memory pressure, and either relieves it if possible, or kills the guilty party if necessary. We have all had to face the swap of death, that frightening moment when a memory-leaking process overflows into the disk at a scary rate, slowing down — nay, taking down — your entire system. Here, Ulatencyd helps you separate processes on your system into cgroups, that are resized dynamically, and handle memory situations in a simple & graceful by default but amazingly configurable fashion. The Archlinux install is amazingly simple. Install the AUR package, and activate it the systemd way (systemctl enable ulatencyd.service) after (optionally) editing the configuration (/etc/ulatencyd/ulatencyd.conf) : the default maximal amount of physical memory given to a process is 70%, and since I have 16GB of RAM on all my machines (with a RAM disk moving between 3 and 5 GB), this limit is pretty safe. An added benefit ? In the middle of a long compilation, my machine is entirely responsive. I may be waiting for the compiler, but I can still surf the Internet in the meantime. • Anything-sync-daemon (ArchLinux doc) is a small daemon — in fact, a small bash script — that is aimed at hiding the management of the day-to-day life of a RAM disk from the user. It uses rsync to synchronize the contents of your RAM disk to disk priodically, alleviating the risks coming from the need for your RAM to be powered at all times. It creates the partitions in your RAM and fills them up at boot time, and shelves them at shutdown. Again, the ArchLinux install is insanely simple. There is an AUR package, the activation is trivial with systemd (systemctl enable asd.service) and the configuration is also easy. I simply edited /etc/asd.conf and put the following directories under RAMDISK : WHATTOSYNC=('/home/huitseeker/Scala' '/home/huitseeker/.eclipse' '/home/huitseeker/.m2' '/usr/share/eclipse' '/home/huitseeker/workspace' '/home/huitseeker/junit-workspace' '/usr/lib/jvm' '/home/huitseeker/runtime-EclipseApplicationwithEquinoxWeaving' '/home/huitseeker/.sbt' '/home/huitseeker/.ivy2') This is more than you need to simply compile Scala. As you probably noticed, this means : ## The Results For the distpack-opt target of the Scala compiler, I have tried this method with : • a slow, 2010-top-of-the-line Lynnfield core iMac (configuration) with a very well-used, slow hard drive. • a reasonable X220 Lenovo Thinkpad (config) with an i5-2540M and an SSD. Both configurations run on full-disk-encrypted storage using LUKS (but AVX-accelerated) (which explains the bad baseline you’re going to see for that disk-bound compilation). • The slow iMac went from 124 minutes for compilation to 27 minutes. • The Lenovo went from 24 minutes to 8 minutes 30 seconds Yay ! no more risking my life fencing in corridors any more ! # ensime.org is down But there’s a mirror on ibiblio.org ! Other wise said, if you encounter the following error: [warn] :::::::::::::::::::::::::::::::::::::::::::::: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :::::::::::::::::::::::::::::::::::::::::::::: [warn] :: org.ensime#ensime-sbt-cmd;latest.milestone: not found [warn] :::::::::::::::::::::::::::::::::::::::::::::: and don’t have the time to fix it the correct way, just add the following resolver to your plugins/build.sbt: resolvers += "iBlio Maven Repository" at "http://mirrors.ibiblio.org/maven2/" # Every single freshman CS question in ˜80 lines of Scala This post is about writing a single program that returns the stream representing any recursive sequence of order$k$, which$i$th term a[i] = f(a[i-1], a[i-2], ... a[i-k]), for all$k$, in Scala. It explains things very slowly: If the meaning of the previous sentence is already 100% clear to you, you can easily skip the next 2 paragraphs. All the code in this post can be found as a gist here. # Recursive sequences and their order Most first-year computer science questions aim at teaching students recursion. Hence, they involve a flurry of questions about defining recursive sequence of varied orders. What’s a recursive sequence ? It’s a sequence whose value at any rank$n$is defined by a function of the values at rank$(n-1) .. (n-k)$, and the initial values at ranks$1 … k$.$k$is the order. For example, the$n$th term$S_n$of a Syracuse sequence - a recursive sequence of order 1 - is given by: •$S_n = S_{n-1}/2$if n even •$S_n = 3 S_{n-1} + 1$if n odd Of course, you have to choose an initial term to define the sequence properly, but a famous conjecture is precisely that for any initial term, a Syracuse sequence - i.e. a member of the class of sequences that verify this recurrence relation - converges towards the cycle (1,4,2). Another example is the Fibonacci sequence, a recursive sequence of order 2: •$F_n = F_{n-1}+F_{n-2}$if n ≥ 2 •$F_1 = 1$•$F_0 = 0$Naturally, basic examples about recursion start with sequences of order one, and then move on to sequences of order two, usually with Fibonacci. The student is then asked to notice that the naive way of defining the Fibonacci sequence leads to a exponential number of calls, which allows the teacher to introduce notions such as e.g. memoization, or dynamic programming. # Generic recursive sequences, as streams A short while later, the student realizes that for a fixed$k$, there is a generic way of computing the$n$th term of a recursive sequence of order$k$, in linear time. Here it is for$k = 2$: def recTwo[A](f : A ⇒ A ⇒ A) (x:A) (y:A) (n:Int) : A = n match { case 0 ⇒ x case 1 ⇒ y case _ ⇒ recTwo (f) (y) (f (x) (y)) (n-1) } We abstract over the function and initial elements here: all you need to do to define the Fibonacci sequence is invoke recTwo ((x:Int) ⇒ (y:Int) ⇒ x + y)(0)(1). The crucial thing that makes the computation of the$n$th term linear is that you are shifting the whole sequence by one rank at each call, or, equivalently, saying that the nth term of the sequence with initial terms$(x,y)$is the$(n-1)$th term of the sequence with the same recurrence relation$f$and initial terms$(y, f(x,y))$. In effect, it’s a witty memoization trick that reuses the initial terms arguments passed to recTwo. So far, so good. Now, it turns out that it’s not too difficult to reuse the same reasoning to upgrade this function to one returning the Stream of the elements of the recursive sequence. It even makes the whole business simpler, since we don’t have to deal with$n$: import Stream._ def recStreamTwo[A](f : A ⇒ A ⇒ A) (x:A) (y:A):Stream[A] = x #:: recStreamTwo (f) (y) (f(x)(y)) Then to get the first 10 terms of the Fibonacci sequence: recStreamTwo ((x:Int) ⇒ (y:Int) ⇒ x + y)(0)(1) take 10 print Easy, right ? In fact, you can even apply the pattern to a sequence of any order, and define: def recStreamK [A](f : A ⇒ A ⇒ ... A) (x1:A) ... (xk:A):Stream[A] = x1 #:: recStreamK (f) (x2) (x3) ... (xk) (f(x1,x2,..., xk)) Good. Now, it turns out that this pattern solves most of the first-year CS questions about defining recursive sequences of a given order. But we still have to write a slightly different function, with a slightly different number of parameters, every time the order$k$changes. # Generalizing over the order ## The problem My question is : can we generalize over the order ? That is, can we write a single function that, for all$k$, allows us to generate the stream of elements$U_n$, given$a_1, … a_k$and a function$f$such that$U_n = f(U_{n-1}, U_{n-2}, … U_{n-k})$? The instinctive answer is no, at least within the bounds of the type system, since the type of the argument f in recStreamK is fixed by the order of the recursive sequence, which is also the arity of the corresponding recurrence relation defined by f. The common perception would have it that to generalize over the recursive order we need: • a macro language • initial arguments passed as a list, or a stream • type casts, or an equivalent way to bypass the type system • dependent types All those solutions are interesting. In fact, some were proposed in a recent Stackoverflow.com question. I learnt a lot from that question, and upvoted accordingly. But in Scala, we don’t necessarily need to resort to that. It turns out that with Scala’s overlapping implicits, we can define a single constructor that will automatically generate the appropriate stream for any order. In a type-safe manner. Here’s how. ## A small proviso More exactly, “here’s how” with a temporary proviso on tuples (that we will remove at the end): in Scala, the usual fixed-sized tuple of width k is not an instance of a recursively-generated type built from nested pairs, it is a bespoke constant type Tuplek (for a fixed k between 2 and 22), with little structural relation to Tuplek-1 or Tuplek+1. In particular, there is no recursive relation that allows me to consider something structurally akin to a function of type Tuplek ⇒ Tuplek+1 for any given k. As we will see, having some sort of recursive structure recognizable within some fixed-width tuple type is crucial to what we are trying to do, so that we will temporarily forgo Scala’s native Tuples, and work with our very own tuples defined in the following manner: • the singleton of type T is simply T. • for any tuple type U of a given width k, its successor, the tuple of width k+1, is the type Pair [T,U](1) Forget everything you know about Scala’s Tuple5 for a minute: when we think (1,2,3,4,5), what we are really going to write is (1,(2,(3,(4,5)))). The same goes for Tuple2 ...Tuple22. In a later, separate step, we will come back to how to apply this to regular arguments. So, let’s recapitulate: we want a generic function that given two arguments: • a function taking any k-tuple argument f : Pair[A,Pair[A, ... Pair[A,A]...]] ⇒ A, • and a corresponding k-tuple of initial elements (a1,(a2,...(ak-1,ak)...)), returns the Stream[A] consisting of the recursive sequence of order k defined by the recurrence relation given by f and the initial elements a1, ... ak. We want that single program to work for any k, but we want to be type-safe : if we receive a function taking k arguments as above, but only k-1 or k+1 initial elements, things should “break” at type-checking. ## Overlapping implicits As per Fighting Bit Rot With Types: The foundations of Scala’s implicits are quite simple. A method’s last argument list may be marked as implicit. If such an implicit argument list is omitted at a call site, the compiler will, for each missing implicit argument, search for the implicit value with the most speciﬁc type that conforms to the type of that argument. For a value to be eligible, it must have been marked with the implicit keyword, and it must be in the implicit scope at the call site. As it turns out, those implicit arguments form implicitly-passed dictionaries, and can carry transitive constraints: a method can define an implicit (with the implicit prefix keyword), and itself require implicit arguments - ensuring the propagation of constraints. These two mechanisms: • automatic instantiation • constraint propagation make Scala’s implicits a flavor of type classes. The feature we are going to be mostly interested in is the oft-overlooked overlap resolution between implicits: when two implicits are in scope, Scala follows the overloading resolution rules (Scala ref §6.26.3), and considers the lowest subclass in the subtyping lattice. Which means that if we want to direct implicit instance resolution, we can simply define the alternatives to be tried first in subtypes, and the fallback cases in supertypes. There are examples in several papers. ## Recursive Implicits Let’s consider the stream equation at rank$k$as we have outlined it in the pattern above: def recStreamK [A](f : A ⇒ A ⇒ ... A) (x1:A) ... (xk:A):Stream[A] = x1 #:: recStreamK (f) (x2) (x3) ... (xk) (f(x1)(x2) ... (xk)) If you consider$f(a1)$as a higher-order function of arity k-1, the last argument of the recursive call can be made to depend only on the k-1 arguments a2, ..., ak. This is important, because it will let us define this as the return of some recursive function at order$(k-1)$. Thus we want to define an intermediary function: prod (f) (a1, ..., ak) = (a1, ..., ak, f(a1, ..., ak)) This lets us define the stream rstream f taking a tuple of width k recursively as a function of some prod (f(a1)) taking a tuple of rank k-1: rstream (f) (a1, ..., ak) = a1 #:: rstream f (a2, ..., ak, f(a1, ..., ak)) = a1 #:: rstream f (prod (f(a1)) (a2, ... ak)) The next stage involves defining an implicit trait that defines methods rstream and prod for every tuple, by recognizing the Pair-directed structure of the recursive function’s argument. Since we have a recursive, structural type of tuple, this just involves a relatively simple manipulation: trait RecStream [S,Base] { def prod : (S ⇒ Base) ⇒ S ⇒ Pair[Base, S] def rstream : (S ⇒ Base) ⇒ S ⇒ Stream[Base] } class RecStreamDefault { implicit def ZeroRS[S] = new RecStream[S,S] { def prod = xs ⇒ ss ⇒ (ss, xs (ss)) def rstream = xs ⇒ ss ⇒ ss #:: rstream (xs) (xs(ss)) } } object RecStream extends RecStreamDefault { def apply[S,B](f:S ⇒ B) (implicit rs:RecStream[S,B]):S ⇒ Stream[B] = rs.rstream(f) implicit def SuccRS[R,B] (implicit rs:RecStream[R,B]) = new RecStream[Pair[B,R],B] { def prod = f ⇒ a ⇒ (a._1, rs.prod (y ⇒ f (a._1,y)) (a._2)) def rstream = f ⇒ a ⇒ a._1 #:: rstream (f) (rs.prod (y ⇒ f (a._1,y)) (a._2)) } } This is the core of the trick: we already have something that works for any k, and returns the appropriate recursive stream. Here are examples for orders 1 and 2: def increaser : Int ⇒ Stream[Int] = RecStream((x:Int) ⇒ x + 1) def fibonacci : Pair[Int,Int] ⇒ Stream [Int] = RecStream(x ⇒ x._1 + x._2) fibonacci(0,1).take(10).print increaser(0).take(10).print We are type-safe: if we attempt to write fibonacci(0).take(10), we get: recstream.scala:104: error: type mismatch; found : Int(0) required: (Int, Int) fibonacci(0).take(10) ^ one error found The only problem is the fact that if we really want to extend this to some k ≥ 3, then we can’t use a function such as (x:Int) ⇒ (y:Int) ⇒ (z:Int) ⇒ x + y + z. We have to shape it according to our bespoke nested-pairs tuple type, and write the painful x ⇒ x._1 + x._2._1 + x._2._2. Can’t we solve that problem too, and use regular curried functions to solve our problem ? ## Currying and Uncurrying Currying is a fancy name for transforming a function that takes a pair as an argument, of type$A × B → C$into a higher-order function that takes a single argument, and returns the function that takes the second argument before returning a result computed with both. That latter function has of course the type$A → (B → C)$. Since arrows are right-associative, the parentheses in the latter expression are optional. Naturally, uncurrying is the reverse transformation, from the higher-order function to the function taking a pair. To do this with first-order types, you just have to write the following couple of functions: def curry [A,B,C](f:Pair[A,B] ⇒ C) = (x:A) ⇒ (y:B) ⇒ f (x,y) def uncurry [A,B,C](f: A ⇒ B ⇒ C) = (x:Pair[A,B]) ⇒ f (x._1) (x._2) Now, this is all well and good if you are dealing with ground types, but what if you are dealing, as we are, with nested pairs ? How do we apply the transformation recursively to our tuples ? We can resort in fact to the same trick of using prioritized implicits! In fact, leaving to Scala to thread the construction of implicit curryfication functions through the unification process involved in typeinference. Here is the code for recursive curryfication using prioritized overlapping implicits: trait Curried[S,R] { type Fun def curry : (S ⇒ R) ⇒ Fun } class CurriedDefault { implicit def ZeroC[S,R] = new Curried[S,R]{ type Fun = S ⇒ R def curry = f ⇒ f } } object Curried extends CurriedDefault{ def apply[S,R] (f:S ⇒ R) (implicit c:Curried[S,R]) : c.Fun = c.curry(f) implicit def SuccC[T,S,R] (implicit c:Curried[S,R]) = new Curried[Pair[T,S],R] { type Fun = T ⇒ c.Fun def curry = f ⇒ x ⇒ c.curry(y ⇒ f(x,y)) } } I leave the code for recursive uncurryfication to the reader as an exercise. In fact, after looking for it by herself if it suits her, said reader can find the solution as a gist here (along with all the code presented in this post). Once this is defined, the definition of our generic recursive arbitrary-order streams is a breeze: def sumthree (x:Int)(y:Int)(z:Int) = x+y+z def sumthreestream = Curried(RecStream(Uncurried(sumthree _))) The client function call is (using java-style syntax to emphasize that I am passing arguments one by one) : sumthreestream(0)(0)(1).take(10).print Not convinced about the type yet ? Compiling this with -Xprint:typer yields: def sumthreestream: Int => Int => Int => Stream[Int] Isn’t Scala wonderful ? # Further work I suspect several improvements are possible using prioritized implicits are still possible, including: • avoiding our ugly kludge using nested pair types entirely • streamlining the use of curryfication and uncurryfication using views Don’t hesitate to suggest improvements in comments ! 1. note we implicitly forget about heterogeneity, since we won’t need it for our particular application # Timeline & stories with Facebook I’ve recently had a look at Facebook’s new feature, Timeline, and published mine in the process. With it, you can organize your news and shares along a temporal line design, that has already been called bloglike in some places. I think it’s a fantastic feature. The tagline of it is: Tell your life story with a new kind of profile Great. But there’s more stories to tell, and Facebook is getting oh so close to letting us tell them. I wish it would inch a little further. But before I get into what’s missing, let’s look at why this matters. Malcom Gladwell has looked in depth at how trends go viral in a famous 2000 book called The Tipping Point. In there, he explains the profile of who exactly helps people adopt those trends. Indeed, those connectors follow some sort of Pareto Principle where 20 percent of people make up 80 percent of the information exchange work necessary for people to adopt a trend and make it go viral. This means that giving those few key people the means to express themselves matters, even if they only makeup for a fraction of the general population. Basically, a trend extends its outreach through the actions of connectors, who embody a hub of contacts reaching across different social groups, salesmen, with the charisma to make us believe their story, and mavens, who are information specialists with a genuine desire to help others. Contrarily to Gladwell, I’m not sure people fit one or the other profile in all aspects fo their lives. But I recognize those roles around me, and I feel relatively familiar with the maven. Mavens have stories to tell. Detailed, involved, engaging, fascinating stories. But since they’re connectors, they don’t necessarily like dry, formal lessons you could find in a book. They have human stories, that they personally want to take a bunch of their peers through. No wonder why I identify. For instance, among other things, I’m a fan of a genre of music commonly called by such labels as post-rock, especially a branch of it bearing the even scarier moniker of math-rock. I try to know everything I can about it. What most people do know about it, though, is that at the local hipster coffee shop, there’s a dude with abstract art on his T-shirt and a suede jacket on that thinks and says anything else is crap. And it pains me personnally that it’s what most people will keep knowing of it. Math-rock seems so hermetic because, like so much music at the fringes right about now, it is an exploratory front of a long history of mixing influences and perspectives. The way I’d go telling people about math rock it is essentially a connective experience : I’d start writing a couple of weekly paragraphs about how the musical story went, tracing links between things seemingly very distinct. I’d relate how progressive rock started laying some tracks in the 60s, tell you where Don Cab’s crazy time signatures fit into this, and how the indie scene reappropriated some of this technique with an emotional influence that permeates through some of the more airy music you have probably enjoyed in the score of a couple of Michael Mann’s latest movies. I would add links to a couple of tracks you could grab on iTunes at the end of each post and listen to on the car ride home. And by the way, a bunch of us are going to that concert I think you’d like next week, would you like to join ? Here’s the Facebook page for the event. Does that approach means I’m a maven ? No idea. Is that sort of behavior maven-like ? You betcha. This kind of story is not reserved to a few select experts — otherwise I would be hard-pressed to explain why I would be entitled to raise my voice on the subject. We all have friends who introduced us, more or less informally, but always personally, to their interests and passions, musical or otherwise. And it works, more or less. I mean, if the genre is far from your thing, putting it in context alone is not going to make it so. But my experience in ventures of that sort (k-lophonies) shows people enjoy the story, in the sense that some context makes the music “familiar” enough for them to enjoy. I myself have been on the receiving end of some similar get-togethers about classical music that I’ve greatly benefited from. And hopefully, next time people meet the obnoxious hipster type at the coffee shop, they’ll be able to steer him or her towards a less annoying approach to talking about music. Now, Facebook has events, notes, and links. One thing I think is missing from the user experience — and I think one of the reasons people don’t blog on Facebook — is the option to unwind a string between those nuggets of context. Something that would let a friend who skipped the first few installments of my informal introduction to math rock catch up. Something that would, more importantly, form a connective experience through a story over the course of a few weeks. And Facebook’s Timeline is getting very close to this: I do have a life story, but I have other stories as well. Facebook could help me tell them, and in so doing, connect better with people. # Certifying RSA correct for breakfast Reasoning on the correctness of the RSA cryptosystem is much easier with a little bit of algebra: this is a perfect example of a case where the “practical” applications of proof assistants can be helped with libraries dealing with abstract algebra, including SSReflect. The proof we are going to build in the present post is based on a stock SSReflect version 1.3pl1, and is available on github. It can be compared with the RSA contribution to Coq, made on the basis of the standard library (note this is not a code golf, but rather a tentative to demonstrate that a big mathematical library helps). Let us look at the encryption and decryption primitives of RSA: given a message,$M$, represented as a number, the cypher text$C$is given by$C \equiv M^e \pmod n$. Likewise the decryption is defined as$M = C^d \pmod n$. The public key is the pair$(e,n)$and the private key is the pair$(d,n)$, with the message$M$chunked and padded appropriately so that$M \leq n$. In SSReflect: Definition encrypt (w : nat) := (w ^ e) %% n. Definition decrypt (w : nat) := (w ^ d) %% n. The correctness property verifies that the cypher text allows us to recover the original message, i.e.: $$C^d \equiv M^{e d} \equiv M \pmod n$$ For that, it suffices to choose$e,d$so that : $$ed \equiv 1 \pmod{\Phi(n)}$$ where$\Phi$designates Euler’s totient function :$\Phi(n)$is the number of integers$1 \leq m < n$such that$m$and$n$are coprime. This is an easy process: for security reasons,$n$is chosen to be the product of two randomly chosen large prime numbers$p$,$q$. Since for all$n_1,n_2$coprime, $$\Phi(n_1n_2) = \Phi(n_1)\Phi(n_2)\label{eq:phi_mul}$$ the totient of$n$is the easily computed to be$\Phi(pq)=(p − 1)(q − 1) = n - (p + q) + 1$. Naturally, having that last equation as a library lemma (phi_coprime) helps tremendously. Hence, to pick$e,d$verifying the equation above, all that remains to do is to pick a random$e$coprime to$\Phi(n)$such that we can use Euclid’s algorithm to compute$d$, its multiplicative inverse modulo$\Phi(n)$. Variable p q : nat. Variable prime_p : prime p. Variable prime_q : prime q. Variable neq_pq: p != q. Local Notation n := (p * q). Hypothesis big_p: 2 < p. Hypothesis big_q: 2 < q. Lemma pq_coprime: coprime p q. Proof. by rewrite prime_coprime // dvdn_prime2 //. Qed. Lemma phi_prime : forall x, prime x -> phi x = x.-1. Proof. move=> x; move/phi_pfactor; move/(_ _ (ltn0Sn 0)). by rewrite expn1 expn0 muln1. Qed. Lemma pq_phi: phi(n) = p.-1 * q.-1. Proof. rewrite phi_coprime; last by rewrite pq_coprime. by rewrite !phi_prime //. Qed. (* We pick an exponent e coprime to phi(n) *) Variable e : nat. Hypothesis e_phin_coprime: coprime (phi n) e. Definition d := (fst (egcdn e (phi n))). (* We check that ed is 1 modulo (phi n) *) Lemma ed_1_mod_phin: e * d = 1 %[mod (phi n)]. Proof. by rewrite -(chinese_modl e_phin_coprime 1 0) /chinese addn0 mul1n. Qed. In this introduction of basic lemmas, we noticed that the value of$\Phi$for a prime wasn’t included in the library. However, the Search tool of SSReflect allowed us to quickly find a more general lemma on the value of$\Phi$based on the prime decomposition of a number (phi_pfactor above). And finally, the value of$ed \pmod{\Phi(n)}$leads to correctness thanks to Euler’s theorem: For all$n$and$a$coprime to$n$,$a^{\Phi(n)} \equiv 1 \pmod n$Hence,$\exists k$such that$ed = k \Phi(n)+1$, so that $$M^{ed} = M^{k \Phi(n)+1} = M^{k \Phi(n)} M \equiv M \pmod n$$ is a consequence of Euler’s theorem if$M$and$n$are coprime. The bane of working with a proof assistant is that we can’t ignore an often-overlooked detail: if$M$and$n$are not coprime (and if$M \neq n$, otherwise this is trivially true), we can assume, without loss of generality that$p | M$and$q$and$M$are coprime. Indeed: Lemma notcoprime: forall x, 0 < x < n -> ~~ (coprime x n) -> ((p %| x) && coprime x q) || ((q %| x) && coprime x p). Proof. move=> x; case/andP=>[x_gt0 x_lt_n]; rewrite coprime_mulr; move/nandP. move/orP; rewrite coprime_sym [coprime _ q]coprime_sym 2?prime_coprime //. case Hdvdn: (p %| x) =>/=; rewrite ?negbK ?andbF ?andbT // orbF=>_. have xdpp_x: (x %/p) * p = x by move: (Hdvdn); rewrite dvdn_eq; move/eqP. rewrite -xdpp_x; apply/negP=> H; move:H; rewrite (euclid _ _ prime_q). rewrite [ _ %| p]dvdn_prime2 // eq_sym; move/negbTE: neq_pq=>->. rewrite orbF=>H; move: (dvdn_mul (dvdnn p) H). rewrite [_ * (_ %/ _)]mulnC xdpp_x; move/(dvdn_leq x_gt0). by rewrite leqNgt x_lt_n. Qed. In that case$M^{k \Phi(n)} M \equiv M \pmod q$is a consequence of Euler’s theorem, and$M^{k \Phi(n)} M \equiv M \pmod p$is trivially true, which allows us to conclude using a Chinese lemma. Most presentations of RSA cop out of treating this case, but it makes the encoding no less correct. We can now prove the theorem: Theorem rsa_correct1 : forall w : nat, w <= n -> (decrypt (encrypt w)) = w %[mod n]. Proof. move=> w w_leq_n; rewrite modn_mod modn_exp -expn_mulr. This simplifies the goal to w ^ (e * d) = w %[mod n]. Then we are going to want to treat the case where w = 0 first. Since this is a decidable property, we proceed by making a case on 0 < w. We simplify the left part: case w_gt0: (0 < w); last first. move/negbT: w_gt0; rewrite lt0n; move/negPn; move/eqP=>->. We are brought to 0 ^ (e * d) = 0 %[mod n]. The left part forces us to check that 0 < e*d: have ed_gt0: 0 < e * d. have:= (divn_eq (e*d) (phi n)); rewrite ed_1_mod_phin => ->. We are left with 0 < (e * d) %/ phi n * phi n + 1 %% phi n. We would like to simplify$1 \pmod{\Phi(n)}$, which requires$1 < \Phi(n)$: have phi_gt1 : 1 < (phi n); first rewrite pq_phi -(muln1 1). by apply: ltn_mul; rewrite -ltnS prednK ?big_p ?big_q ?prime_gt0. We can then conclude on both 0 < e * d and the original case where w = 0: by rewrite [_ * phi n]mulnC (modn_small phi_gt1) addnS lt0n. by rewrite exp0n ?ltn_addl // ?modn_small. Back to$w > 0$! Simplifying our goal a little further: have:= (divn_eq (e*d) (phi n)); rewrite ed_1_mod_phin modn_small // =>->. We get : w ^ ((e * d) %/ phi n * phi n + 1) = w %[mod n], and we are ready to conclude for the case where message and modulus are coprime ! This is the core of the proof : a simple use of Euler’s theorem and a few helper lemma on moduli. case cp_w_n : (coprime w n). rewrite expn_add [_ * phi n]mulnC expn_mulr -modn_mul2m; move: cp_w_n. by move/Euler=>E; rewrite -modn_exp E modn_exp exp1n modn_mul2m mul1n. Now, in the case where modulus and message are not coprime, we are going to want to use the notcoprime lemma above, for which we also require them not to be equal, so we quickly eliminate that case. case w_eq_n: (w == n). by move/eqP: w_eq_n =>->; rewrite -modn_exp modnn exp0n ?mod0n ?addn1. Finally, we apply the chinese lemma, to separate our requirements on$w \pmod{\Phi(n)}$into requirements on$w \pmod{\Phi(p)}$and$w \pmod{\Phi(q)}$: move: w_leq_n; rewrite leq_eqVlt {}w_eq_n orFb; move=> w_lt_n. apply/eqP; rewrite chinese_remainder; last first. by rewrite prime_coprime // dvdn_prime2. The goal at this stage is (w ^ ((e * d) %/ phi n * phi n + 1) == w %[mod p]) && (w ^ ((e * d) %/ phi n * phi n + 1) == w %[mod q]) We want to rewrite this a little further to make the prime$w$is not divisible by more prominent. rewrite mulnC {1 3}pq_phi {2}[_ * q.-1]mulnC -2!mulnA 2!expn_add expn_mulr. rewrite [w ^ ( q.-1 * _ )]expn_mulr expn1 -modn_mul2m -(modn_mul2m _ _ q). rewrite -modn_exp -(modn_exp _ q). This gives us the goal: ((w ^ p.-1 %% p) ^ (q.-1 * ((e * d) %/ phi n)) %% p * (w %% p) == w %[mod p]) && ((w ^ q.-1 %% q) ^ (p.-1 * ((e * d) %/ phi n)) %% q * (w %% q) == w %[mod q]) We can now use notcoprime above, apply Euler’s theorem in each case, and conclude ! move/andP: (conj (idP w_gt0) w_lt_n). move/notcoprime; move/(_ (negbT cp_w_n)); case/orP; move/andP=> [Hdvdn Hncp]. move: Hdvdn; rewrite /dvdn; move/eqP=>->; rewrite muln0 mod0n /=. move/Euler:Hncp; rewrite (phi_prime prime_q)=>->. by rewrite modn_exp exp1n modn_mul2m mul1n. move: Hdvdn; rewrite /dvdn; move/eqP=>->; rewrite muln0 mod0n /= andbT. move/Euler:Hncp; rewrite (phi_prime prime_p)=>->. by rewrite modn_exp exp1n modn_mul2m mul1n. Qed. This concludes our certification of RSA’s correctness, in itself 30 lines of proof script. Note we have never used any automated tactic database such as auto, which confers our proof script a particularly deterministic nature (but it’s not a requirement, of course). Here’s a run of coqwc, the coq line-count utility, on our final file: spec proof comments 28 53 17 RSA.v The final file is 126 lines. # How do you make a recursive merge sort more efficient in Python ? This is re-worked from an excerpt of an answer of mine (1) to an SO question that made me discover some interesting things about Python : turns out it’s not easy to make recursion faster by doing tail call elimination, even by hand. (2) We mean here to look at methods of tail call elimination and their effect on performance, demonstrated within the limits of the language, and with the example of merge sort. Tail call elimination aims at turning tail recursive calls into iterative loops, providing the benefit of not having to manage a recursive call stack, and thus, hopefully, of making things a bit faster. It is also used to deal with functions who are mutually recursive, or, more generally, whose number of recursive calls make the call stack grow very fast. Python is an interesting example to show tail call elimination within the language itself because it does not have tail call optimization built-in. Some languages (generally functional) have built-in tail call elimination, meaning every tail call you make gets compiled to an iterative loop. If you want an iterative merge sort in one of those languages, all you need is to re-write the classic version to make your recursive calls tail recursive, using a CPS translation, as we will show below. Of course, if what you want is a fast sort in Python, the native sort, or numpy’s alternative, or even an in-place quicksort are better alternatives. Even more impressively, faster implementations of a sort which closely (3) mimics the recursive pattern of the merge sort have been proposed on reddit in response to this post. And finally, if you wish to break the limits of what the language intends “natively”, it should be interesting to turn to one of the famous tricks for achieving TCO in Python. Our goal here is to apply tail call elimination within language boundaries, and maybe see if we obtain a TCO’ed function faster that the function it was derived from. The exhaustive benchmark with timeit of all implementations discussed in this answer (including bubblesort) is posted as a gist here. Just to give a starting point, I find the following results for the average duration of a single run : • Python’s native (Tim)sort : 0.0144600081444 • Bubblesort : 26.9620819092 Those are seconds (but since only the relative speed matters, this isn’t very important). As can be expected, Python is much faster. And to think that Timsort is supposed to be a (C) merge sort ! • For starters, we can write a relatively efficient, non-tail-recursive merge sort in the following way: def mergeSort(array): n = len(array) if n <= 1: return array left = array[:n/2] right = array[n/2:] return merge(mergeSort(left),mergeSort(right)) def merge(array1,array2): merged_array=[] while array1 or array2: if not array1: merged_array.append(array2.pop()) elif (not array2) or array1[-1] > array2[-1]: merged_array.append(array1.pop()) else: merged_array.append(array2.pop()) merged_array.reverse() return merged_array It runs, as expected, quite faster than the Bubblesort. Average run time : 0.126505711079 Now, our example may not be the best test of the difficulty of managing a huge call stack depth. The number of recursive calls R(n) on a list of size n is (2n-2) : for$k \geq 2$,$R(k) = 2+2* R(k/2)$, and$R(1)=0$. But in depth, the stack blowup of the recursive merge sort algorithm is in fact modest : as soon as you get out of the leftmost path in the array-slicing recursion pattern, the algorithm starts returning (& removing frames). So for 10K-sized lists, you get a function stack that is at any given time at most of size$log_2(10 000) = 14$… pretty modest. But we hope it might still give some glimpse of a hint as to what our TCO methods entail. • The call to merge is not tail-recursive; it calls both (mergeSort left) and (mergeSort right) recursively while there is remaining work in the call (merge). But we can make the merge tail-recursive by using CPS. Note that as-is, this would only be useful as-is if we were using a language that does tail call optimization, without it , it’s a bad idea. Indeed, this is only workable in Python if we do further work on that intermediary function, which, by itself, runs out of stack space for even modest lists: def cps_merge_sort(array): return cpsmergeSort(array,lambda x:x) def cpsmergeSort(array,continuation): n = len(array) if n <= 1: return continuation(array) left = array[:n/2] right = array[n/2:] return cpsmergeSort (left, lambda leftR: cpsmergeSort(right, lambda rightR: continuation(fastmerge(leftR,rightR)))) Once this is done, we can do TCO by hand to defer the call stack management done by recursion to the while loop of a normal function (trampolining, explained e.g. here or there, trick originally due to Guy Steele). Trampolining and CPS work great together. We write a thunking function, that “records” and delays application: it takes a function and its arguments, and returns a function that returns (that original function applied to those arguments). thunk = lambda name, *args: lambda: name(*args) We then write a trampoline that manages calls to thunks: it applies a thunk until the thunk returns a result (as opposed to another thunk) def trampoline(bouncer): while callable(bouncer): bouncer = bouncer() return bouncer Then all that’s left is to “freeze” (thunk) all your recursive calls from the original CPS function, to let the trampoline unwrap them in proper sequence. The function now returns a thunk, without recursion (and discarding its own frame), at every call: def tco_cpsmergeSort(array,continuation): n = len(array) if n <= 1: return continuation(array) left = array[:n/2] right = array[n/2:] return thunk (tco_cpsmergeSort, left, lambda leftR: thunk (tco_cpsmergeSort, right, lambda rightR: (continuation(fastmerge(leftR,rightR))))) mycpomergesort = lambda l: trampoline(tco_cpsmergeSort(l,lambda x:x)) This gives us a recursive merge sort in which the call stack size is at most one at any given time : by returning the thunk, our originally recursive procedure gives the “address” of the next function to call. But this does not go that fast (recursive mergesort:0.126505711079, this trampolined version : 0.170638551712). Indeed, each reursive calls need to create a closure (the thunk) and the trampoline loop has to test for closure (callable) continuously, so the overall naive trampolining is pretty expensive - unless you reach the end of the call stack, in which case they may prove of some use. • We can do a slightly more involved stack-based TCO elimination in the guise of this SO answer (rehashed here). The idea is to consider the recursion as a storage of work to be done and work accomplished until now, and to include pointers to differentiate between the two. Then, reproducing a recursive pattern just consists in managing the order in which work units (here, slices of our array) are treated. This gives the iterative function: def leftcomb(l): maxn,leftcomb = len(l),[] n = maxn/2 while maxn > 1: leftcomb.append((l[n:maxn],False)) maxn,n = n,n/2 return l[:maxn],leftcomb def tcomergesort(l): l,stack = leftcomb(l) while stack: # l sorted, stack contains tagged slices i,ordered = stack.pop() if ordered: l = fastmerge(l,i) else: stack.append((l,True)) # store return call rsub,ssub = leftcomb(i) stack.extend(ssub) #recurse l = rsub return l This goes a tad faster than the inefficient trampolined version above (trampolined mergesort: 0.170638551712, this stack-based version:0.144994809628). This is not surprising. The interesting fact is that this final iterative version goes slower that the basic, non-tail recursive run-of-the mill merge sort you would have written as CS 101 homework. Now this is cool : CPython’s function stack manipulations are faster than my explicit array manipulations. This is more a statement on the size of the recursive call stack (mentioned above) and the efficiency of CPython than on the quality of my CS homework (indeed, an even faster recursive version has been suggested on reddit). But still. It’s surprising that it would take an iterative function that does not do the basic, systematic recursion-to-stack transformation above to beat the run-of-the-mill recursive version. The final result summary ? on my machine (Ubuntu natty’s stock Python 2.7.1+), the average run timings (out of of 100 runs -except for Bubblesort-, list of size 10000, containing random integers of size 0-10000000) are: • Python’s native (Tim)sort : 0.0144600081444 • Bubblesort : 26.9620819092 • non-tail-recursive Mergesort : 0.126505711079 • trampolined CPS non-recursive mergesort : 0.170638551712 • stack-based non-recursive mergesort : 0.144994809628 1. Initially fraught with errors, and therefore duly ill-scored. 2. This has also benefited a lot of a discussion with Carey Underwood below and in a reddit thread on this post. Many thanks ! # Lightweight profiling for your Coq hacks Let’s suppose we want to hack a few functions in the Coq compiler. The classic situation is that we have the base implementation of our function as it stands, and an optimized version for which we want to check it really improves things. We all know Donald Knuth’s line on “premature optimization” … Crucially, we want execution time information, not just call count information: there is no reason our “optimized” subroutine, deep in the entrails of the compiler, to be called a different number of times. The easiest way to profile Coq is to compile it as native code, passing the -profile option at the execution of the configure script: huitseeker@fuwaad:/opt/coq/trunk$ ./configure --help| grep -A1 -e '-profile'
-profile
Add profiling information in the Coq executables

This will in fact modify the native compilation using ocamlopt, so that it generates profiling information by adding calls to the _mcount hook, at each ML definition. The details are in the OCaml manual

Once you have called the configure script with the appropriate options, and recompiled Coq, you can execute your toplevel as usual:

coqtop.opt  -q  -dont-load-proofs -I .    -compile mytheoryfile

This will generate a gmon.out file that you can then parse with gprof.

gprof /opt/coq/trunk/bin/coqtop.opt gmon.out > ~/gmonoutput.log

However, the problem then becomes that on a number of proof files, Coq spends most of its time within reduction primitives … as can be expected, I guess. This often dwarfs the mesurements for our specific function in reports from huge reduction loops that don’t concern it. Thankfully, a more lightweight option exists in the source of the compiler, hidden in library/profile.ml. It’s documented in comments in library/profile.mli:

To trace a function f you first need to get a key for it by using :

let fkey = declare_profile "f";;

(the string is used to print the profile infomation). Warning: this function does a side effect. Choose the ident you want instead “fkey”. Then if the function has ONE argument add the following just after the definition of f or just after the declare_profile if this one follows the definition of f.

let f = profile1 fkey f;;

If f has two arguments do the same with profile2, idem with 3, … For more arguments than provided in this module, make a new copy of function profile and adapt for the needed arity.

This is really the 5-minute way to profile your Coq hack (modulo recompilation time, of course) : there is an example of profiling of a 4-argument function in the comments of kernel/reductionops.ml, and Coq even has the niceness of already calling Profile.print_profile() for you at the end of the toplevel execution. You get a result such as :

Function name              Own time Total time  Own alloc Tot. alloc     Calls
f                            0.28      0.47        116        116      5      4
h                            0.19      0.19          0          0      4      0
g                            0.00      0.00          0          0      0      0
others                       0.00      0.47        392        508             9
Est. overhead/total          0.00      0.47       2752       3260
• Ocamlviz is another option, but we don’t care much about real-time profiling information. Running the profiler as server corresponds to another usecase : optimizing a proof script rather than the compiler itself. Besides, it requires linking with the libocamlviz library, which is somehow difficult to integrate to the monolithic (and somewhat labyrinthic) build process of Coq (especially when aiming to analyze just a few functions).

• You may also want to have a look at this issue of the ocaml newsletter, it details a few of the alternatives for profiling ocaml code, including oprofile, which I haven’t tested.

In my first year I was taught about the slide rule. They said, “The slide rule is important. Without it you can do nothing. The slide rule is the modern weapon of efficiency. With the slide rule you can get from here to the stars. Buy it, use it – your slide rule!” Within one year it was, “Burn the slide rule. The calculator can add up with none of this fucking sliding the shit around and working out where that bit in the middle goes. Smash it over your head.”
Eddie Izzard, Biography, Chapter 5