Sunday, September 25, 2011

kinna' stream attitude calling by-name my Clojure and my Scala

Hello, some very simple stuff - cross my heart - this week.

I recently decided to connect to the Project Euler site in order to practice on a regular base mathematics. The website proposes in order to begin some simple problems based on logic and solvable - most of them - by computer implementation.
I found the site problems fitting nicely in the context of execution of regular code kata. The idea being to (re)use some basic concepts of my favorite languages (java included, don't smile). These problems matter to me because some of them involve the use and calculus of prime numbers. Believe me or not I actually use prime numbers almost every day.
In addition to the well known usage of prime numbers in computer science, from a personal point of view, prime numbers are very suitable in writing tests when you find yourself challenging the binding of long properties using range of parameters run by JUnit Theories, but they also can provide unique output results when provided as input to algorithms processes.
Test takes between a fifth to a third of my coding time whether doing TDD or laying test harness around c... sorry legacy code. In order to generate prime numbers, you got to know how to identify them, but also how to generate them.

Why is it so ? Because as with natural integers, working with prime numbers means working with infinite suite, Yeah cool :)
 This is where working with a recursive suite is not really possible because of the nested aspect of methods calls. A function in charge of defining one prime number, then the other and so on would run ad infinitum.
What we would like would be running a little our prime number generator, then use its result in some processing , than run it again and so on. This approach breaks a little bit our habits of thinking in terms of function invoking other functions (or routines invoking subroutines). The astute functional programmer can see clearly now where to this talk is going.
A solution is in delaying the calls. This approach has been nicely exposed in the sicp book but also in the incredible Peter Henderson's book: Functional programming: Application and Implementation. You can also watch H. Abelson lecture 6A here.

Let's take a naive approach. Just imagine we want to generate on will the natural numbers starting from 1. Invoking the recursive:

(defn integers-from [n]
  (cons n [integers-from (+ n 1)]))

using

(first (integers-from 1))

would blow our stack because the function integers-from would call herself , evaluating first the (+ n 1) expression. Calling the function evaluating first (+ n is) is named a call by-value
What if we could delay this evaluation, promising to the function to evaluate herself if necessary (on demand). The implementation would look like

(defn integers-from [n]
  (cons n [(promised (integers-from (+ n 1)))]))

There is one simple way to delay this evaluation using a simple macro:

(defmacro promised [expression]
   `(fn [] ~expression))

The macro evaluation will lead to the creation of a closure holding the  integers-from expression. The example is contrived and only serves our purpose. When needed we could then force the delayed sequence to execute the following evaluation:

(defn fulfilled [promise]
    (promise))

A function getting the ten first natural integers would look like

(defn first-elements [k stream]
  (if (= 0 k)
    []
    (cons (first stream) (first-elements (dec k) (fulfilled (second stream))))))

The example works fine, try it.

algorithms.tools=> (first-elements 10 (integers-from 1))
(1 2 3 4 5 6 7 8 9 10)

but we would have to redefine all the nice higher order functions in Clojure in order to serve our purpose, explicitly forcing the fulfillment of promises .
The generated sequences are named Streams.

Well, Clojure embeds nicely this lazy evaluation mechanics (in clojure.core ) using the force and delay macros, but even more nicely providing the lazy-seq macro, in extenso producing implicitly lazy sequences. Using lazy sequences you don't even deal with force and delay. Quoting Stuart Halloway and Aaron Breda, you pay only for what you need (Programming Clojure 2nd edition is out and covers Clojure 1.3, just released by the way). 

The idea of delegating the evaluation of an expression passed to a function is named the call by-name in opposition to the call by-value.

But wait a minute. This means, that the expression should be evaluated each time it is explicitly used. Nope pals ! Creators of Clojure have inserted a memoization mechanic allowing the lazy sequence to cache the result on its very first invocation. So now we have call by-need invocation, the expression being evaluated once.
But beware. Although lazy sequences can be very valuable to delay the evaluation of big data structures, or very long I/O operations, there can be a memory saturation effect, specifically if you keep a reference to the head of the sequence.A kind of dog tail effect.

Using lazy sequences our number generator becomes:

(defn numbers-from [n]
  (lazy-seq (cons n (numbers-from (+ 1 n)))))

and you can try taking the first ten numbers

algorithms.tools=> (take 10 (numbers-from 1))
(1 2 3 4 5 6 7 8 9 10)

using the nice embedded higher order functions. What about my prime number list so I can solve my first problems ? We only need to create a stream of prime numbers:

(defn prime-stream
  ([] (lazy-seq (cons 2 (lazy-seq (cons 3 (prime-stream 5))))))
  ([from-prime]
    (lazy-seq
      (cons
        from-prime
        (prime-stream (next-prime from-prime))))))

I played with the function variable arity in Clojure and made the stream construction the result of a function instead of a var so no reference to the head is kept, in consequence no memory leak due to caching. Each time you need to manipulate it, you create it.
For reason of symmetry I kept the lazy-seq chain of calls in the empty signature declaration in order to create the beginning of the stream.
Making of (prime-stream 5) a lazy sequences is making the promise of calculating the upcoming prime numbers.
Good...almost.

Now I need to process the primes. My implementation is inspired from well known heuristics taking into consideration known facts like:
  • primes are not even
  • primes are certainly not dividable by five
  • every prime greater than 3 can be written in the form 6k+/-1
  • ...
These known heuristics save the day avoiding a brute force approach consisting in trying to divide each number N by numbers from 1 to N-1, which can become very expensive in terms of CPU consumption and very long versus the time of completion. The algorithm can be refined and its time of execution shortened. A first draft looks like :

(defn prime-form? [number]
  "All primes greater than 3 can be written in the form 6k+/-1."
  (let [sqrt (Math/floor (Math/sqrt number))]
    (loop [f 5 dividable false]
      (if (or (> f sqrt) dividable)
        (not dividable)
        (recur
          (+ f 6)
          (or
            (= 0 (mod number f))
            (= 0 (mod number (+ 2 f)))))))))

(defn prime? [number]
  "1 is not a prime.
  All primes except 2 are odd."
  (cond
    (even? number) false
    (= 0 (mod number 3)) false
    (prime-form? number) true
    :else false
    ))

(defn next-prime [from-prime]
  (loop [current (+ 2 from-prime)]
    (if (prime? current)
      current
      (recur (+ 2 current)))))

where
  • next-prime recursively explore the following numbers matching if they are prime (step 2 is because if we start from a prime, we can go from one odd number to the next odd number)
  • prime? takes in charge the match, first checking modularity versus 2 and 3, then delegating the congruence estimate to prime-from?
  • prime-from? challenges the congruence of the number versus 6k +/-1
Some checks in the prime? function are redundant with the next-prime function implementation, but my prime? function is a shared tool function.
Make a run with something like

(defn prime-at [position]
  (time
      (nth (prime-stream) position)))

and you will get :

algorithms.euler=> (prime-at 10001)
"Elapsed time: 201.724241 msecs"
104759

Congratulation you have solved one of the Project Euler problems. The following problems are harder, believe me :). So take a shot if you have time.

Coming to Scala, streams are implemented as... Stream objects and are nicely exposed by Joshua Suereth in his book. The nice thing with streams in Scala is that methods can be defined or redefined with almost whatever symbols you may like and the cons method is specifically pleasant:

  sealed trait PrimeSuite {
    @annotation.tailrec
    final def nextPrime(value: Long): Long = {
      if (isPrime(value + 2)) (value + 2)
      else nextPrime(value + 2)
    }
  }

  def suite: Stream[Long] = new PrimeSuite {
    def f(value: Long): Stream[Long] = value#::f(nextPrime(value))
    def create: Stream[Long] = cons(2, cons(3, f(5)))
  }.create

The sealed trait exists because of experiments I am achieving. The tail recursive optimization can be marked with the @tail.recur annotation. So far, so good, the other methods are carbon copy of the Clojure code:

 def isPrimeForm(value: Long): Boolean = {
    val square = floor(sqrt(value));
    def findPrime(from: Long = 5): Boolean ={
      from match {
        case current if current > square => true
        case current if ((value % current) == 0) => false
        case current if ((value % (current + 2)) == 0) => false
        case _  => findPrime(from + 6)
      }
    }
    findPrime()
  }

  def isPrime(value: Long): Boolean = {
    value match {
      case 2 => true
      case 3 => true
      case even if ((even & 1) == 0) => false
      case odd if ((odd % 3) == 0) => false
      case _ => isPrimeForm(value)
    }
  }


although this time I preferred using the idiomatic match/case pattern of Scala. The Stream is guaranteed is to cache the values using some internal mechanics. And that is nice because having a look at the #:: method I discovered this:


class ConsWrapper[A](tl: => Stream[A]) {
    def #::(hd: A): Stream[A] = cons(hd, tl)
    def #:::(prefix: Stream[A]): Stream[A] = prefix append tl
  }

This little exploration reveals the Scala form expected to reference a call by-name parameter:

=> Stream[A]

only invoked when explicitly used. That is slightly different from expecting

() => Stream[A]

where the expression is supposed to have been evaluated before the method call. (Nice and clean demo in Martin Odersky's book) Playing a little bit with the call by-name in Scala lead me to confirm that called by-name parameters should always be free from side effect.

Why is it so ? Take the following ugly example:

  def f(assert: => Boolean) {
    println(assert)
    Thread.sleep(5000)
    println(assert)
  }

  def firstShot() {
    var item = 3
    val runner = new Runnable {
      def run() {
        f({
          print("asserting...");
          5 > item
        })
      }
    }

    val thread = new Thread(runner)
    thread.start()
    thread.join()
  }

Basically I fire a runnable in charge of calling by name the method f, passing through a closure, closing onto the variable item (never use mutable values, never). The standard output provides me with:

asserting...true
asserting...true

so in essence the method is invoked twice. Consider now:

 def f(assert: => Boolean) {
    println(assert)
    Thread.sleep(5000)
    println(assert)
  }

 def secondShot() {
    var item = 3
    val runner = new Runnable {
      def run() {
        f({
          print("asserting...");
          5 > item
        })
      }
    }

    val thread = new Thread(runner)
    thread.start()
    Thread.sleep(1000)
    item = 7
    thread.join()
  }

leading to a show stopping side effect:

asserting...true
asserting...false

I volunteerly changed the mutable value and came to an unstable state. And yes indeed the call by-name parameter has been invoked twice, meanwhile the mutable value was altered. We can simulate a call by-need effect using a lazy value:

 def thirdShot() {
    var item = 3

    def f(assert: => Boolean) {
      lazy val asserted = assert
      println(asserted)
      Thread.sleep(5000)
      println(asserted)
    }

    val runner = new Runnable {
      def run() {
        f({
          print("asserting...");
          5 > item
        })
      }
    }

    val thread = new Thread(runner)
    thread.start()
    Thread.sleep(1000)
    item = 7
    thread.join()
  }

which output looks like:

asserting...true
true

Call by-need effect is nice by the way and spare resources so you should always use it and never (never) use mutable variable nor fields.

 Well, I want to understand finger trees so I leave you

Be seeing you :) !!!

4 comments:

Vlad Patryshev said...

Not all primes == 5(mod 6), you know

Globulon said...

Ouups, thank you for your remark. I was quite light on that one. I rewrote the small part which summarizes harshly the document referenced by the link. Thanks again and do not hesitate to come back to me if something is incorrect.
I started reading your blog and really enjoyed it :)

Markc said...

Interesting exploration. I know it's not the main point of this post, but a cleaner way to implement a lazy numbers-from in Clojure is:

(def numbers-from (partial iterate inc))

Marc-Daniel Ortega said...

Thank you, I greatly appreciate your feedback :)

Post a Comment