Monday, August 15, 2011

Sipping some Monte-Carlo in Scala and Clojure

Howdy buddies, a new one from the SICP. The lecture's nice and suits well the learning curve of all kinds of functional programming languages. Naturally that does concern Clojure, as a Lisp-1 natural language, and Scala. I recently watched the lecture concerning the (very) few benefits of assignments. There, Gerald Jay Sussman presented a simple example of a Monte Carlo simulation dedicated to the processing of the number Pi.

First, thanks a lot to Debasish Gosh who kindly accepted to answer my questions and inspect my code too. May he be assured of all my gratitude. Exchanging ideas with a so skilled person like him was quite an experience and enlightened these times of trouble for me. I hope I have reported all the modifications he suggested, may he forgive me if I did not. I may not have found yet some Master Craftsman in Europe in order to teach me about actors and Functional programming (40 year old may be too...old), but the kind help of this person was a real support.

Back to Pi. Some approximated value of pi can be quickly computed if based on the Cesaro demonstration that the square of Pi is inversely proportional to probability that two integers chosen at random will have no factors in common. More clearly:

P(gcd(N1,N2) == 1) = 6 /  π *  π

Because if so, their gcd is 1. Quoting the Wikipedia article "Monte Carlo methods (or Monte Carlo experiments) are a class of computational algorithms that rely on repeated random sampling to compute their results". So running a Monte Carlo experiment in order to process the Pi number can be resumed in running a gcd calculus on random natural integer values till our result converges to some expected value.
Therefore, random number generation takes place here, if we consider running a gcd function on a series of nearly random integer. In order not to make things too complex, let's assume that the action of generating numbers can be described as the process of creating a string of numbers extrapolated from a suite taking its roots from a seed value. I picture that as a mathematical suite :

Vn = f(Vn-1) so

Vn = f(f(f(.................f(Vo)

where Vo = some_seed

where at each step one need to cache the new generated value in order for the following one to be processed and so on. (Ok my mathematical symbolic vocabulary is limited but I do not want to copy n'paste Wikipedia)
This is a typical problem solved with the help of assignable variables. Of course some solution does exist which does not depends on some external variable but the result is much more cluttered (have a look there).

Having installed Leinengen (version 1.6.1 is very nice with repl at its top), I now have all the tools I need to challenge my code with tests. In the Clojure package cesaro.test I created a suite of small tests in a core_spec.clj file. Starting content is :

(ns cesaro.test.core-spec
  (:use cesaro.core)
  (:use clojure.test))

Basic. As in the cesaro.test package, in the core_spec.clj file content, my namespace is cesaro.test.core-spec. I claim there my intent to use the tools of the clojure.test namespace in order to challenge the functions inplemented in the cesaro.core namespace (probably the content of a core.clj file in a cesaro package... got it? :))

What I need is a working gcd function first. So are the very dumb tests used to create it:

(deftest gcd-with-matching-known-numbers-should-return-value
	(is (= (gcd 1989 867) 51)))

(deftest gcd-with-matching-some-other-numbers-should-return-value
	(is (= (gcd 36 27) 9)))

(deftest gcd-with-prime-numbers-should-return-one
	(is (= (gcd 23 11) 1)))

(deftest gcd-with-unordered-numbers-should-return-gcd
	(is (= (gcd 11 23) 1)))

The last test appeared later as I used some Euclide method so to process the gcd and had to face slight problems due to my expectation of ordered parameters :) (told you I was dumb...but still learning).
This leads me to:

(defn euclide-gcd [numberOne numberTwo]
	(cond 
		(= numberTwo numberOne) numberOne
		(= numberTwo 1) 1
		:else (let [dif (- numberOne numberTwo)]
			(if (< dif numberTwo)
				(recur numberTwo dif)
				(recur dif numberTwo)))))

(defn gcd [x y]
	(if (< x y) (euclide-gcd y x) (euclide-gcd x y)))

The euclide-gcd function expects ordered parameters. I do not like the conception and what I hate most are the cluttered cond/if branches. This is something I intend to avoid in the future. I hate branches as they do remind me of goto. Clojure deserves better than that.
Then I need some random generator. A mechanism that would allow me to generate a seed and the derived random numbers. The tests look like:

(deftest seeds-with-two-invocations-should-differ
	(is (not= (seed) (seed))))

(deftest random-with-two-invocations-should-differ
	(is (not= (random) (random))))

The implementation I provided :

(defn seed [] (rand-int Integer/MAX_VALUE))

(defn marsaglia-first[value]
	(+ (* 36969 (bit-and value 65535)) (bit-shift-right value 16)))

(defn marsaglia-second[value]
	(+ (* 18000 (bit-and value 65535)) (bit-shift-right value 16)))

(defn marsaglia-sum [x y] (+ (bit-shift-left x 16) y))

(def random 
	(let [x (ref (seed)) y (ref (seed))]
		(defn update []
			(dosync
				(alter x marsaglia-first)
				(alter y marsaglia-second)
				(marsaglia-sum @x @y)))))

And yes, I adopted a Marsaglia algorithm (see wikipedia previous reference). Although quite idiomatic, I find the result not as elegant as the Scheme solution where the set! form allows for the modification of a declared variable in a let form expression:

(define rand
  (let ((x random-init))
    (lambda ()
      (set! x (rand-update x))
      x)))

The Scheme code shape appears to be more concise. This concision is not due to the fact that I chose a Marsaglia algorithm involving two parameters instead of one, but it finds its origin in the fact that - by nature -, variables set by a let form in Clojure are immutable. The only solution I found was to use the idiomatic form of the references in Clojure, in order to both alter the references wrapped values. Any suggestion will be welcomed.

Frustrated by the two previous pieces of code, I decided to force myself to write a Monte carlo small running function, without no branching at all. A dumb test is to provide my upcoming function with always/never failing tests in order to check its limits:

(deftest montecarlo-with-always-successfull-simulation-should-return-1
	(defn always-successfull [] true)
	(is (= 1 (montecarlo 10 always-successfull))))

(deftest montecarlo-with-never-successfull-simulation-should-return-0
	(defn never-successfull [] false)
	(is (= 0 (montecarlo 10 never-successfull))))


The montecarlo function accepts two input parameters, the first being the number of attempts and the second the simulation to be applied. The returned result is the ratio of passed simulation versus the whole number of simulations.

No branches so? I would lie saying that that one was a piece of cake for a beginner in functional programming as I am. I had to switch into Scala, than come back to Clojure with this following precise implementation:

(defn with-result-updates []
	(let [increments {true [1 0] false [0 1]}]
		(fn [status] (increments status))))

(defn run-montecarlo [updates in-simulation]
	(fn [[passed missed] number-of-essays]
		(vec (map + (updates (in-simulation)) [passed missed]))))

(defn montecarlo [essays in-simulation]
	(let [runs (run-montecarlo (with-result-updates) in-simulation) from-start [0 0]]
		(/ (first (reduce runs from-start (range essays))) essays)))

Of course the entry point is the montecarlo function. As I know the number of essays to run, all I need is to iterate and not recur over a range of essays:

(range essays)

Meanwhile, at each step, I can run a simulation (the purpose of the run-montecarlo function) that will update a vector of statistics, provided as [0 0] at the very beginning:

from-start [0 0]

The first element being the number of passed essays and the second the number of failed ones. The reduce form (the equivalent of the Scala foldLeft) aims to produce a vector of statistics. With reduce, one can produce whatever he wants, even another list, from the driving list. The purpose of the with-result-updates function is to create a lambda function instance (so a procedure) capable of producing a vector of the deltas to be applied whether a simulation has succeeded or not:
  • [1 0] on success
  • [0 1] on failure
One instance of the procedure is created once, so is the hashmap embedding the possible results:

{true [1 0] false [0 1]}

We close onto one immutable single instance of the hasmap, embedded into the frame (context of execution) of the created procedure. The principle of closing applies too in the montecarlo function where we both close onto the previous described procedure instance and the simulation to be applied:

(let [runs (run-montecarlo (with-result-updates) in-simulation) from-start [0 0]]

The final test to run to check on pi can be :

(deftest square-pi-with-enough-essays-must-be-close-to-9-9
	(let [result (square-pi)]
		(println "square-pi" result)
		(is (< (- result 9.9) 0.025))))

where we assert on the vicinity of the values of square pi and 9.9. Come then a natural implementation:

(defn cesaro-test []
	(let [value  (gcd (random) (random))]
		(= 1 value)))

(defn square-pi []
	(/ 6 (montecarlo 16192 cesaro-test)))

Works nice. What about Scala ? Scala helped me to find the no-branch version of the montecarlo method. For the trained Java eye, Scala is a pleasant bridge to take in order to embrace good functional programming habits to be adopted in any other functional programming language. Don't misunderstand me. Clojure and Scheme overwhelm me with thrilling sensations each time I do practice them. They also show me I was rambling in the dark before.
So, going back again to Scala (did I say I bought the Scala T-shirts and Teddy bear? ), I first had to write Random number generator tests. This was an opportunity to try Specs2:

import org.specs2._
import RNG._

final class RNGTest extends Specification{ def is =

  "RNG specification"               ^
                                    p^
  "Seed Generator should"           ^
  "Generate two different seeds"    ! e1^
  "Generate positive numbers"       ! e2^
                                    p^
  "Number Generator should"         ^
  "Generate two different numbers"  ! e3


  def e1 = seed should (not (beEqualTo(seed)))

  def e2 = {
    val rand = RNG()
    Range(0, 100).map((value: Int) => rand() should (beGreaterThan(0)))
  }

  def e3 = {
    val rand = RNG()
    rand() should not (beEqualTo (rand()))
  }

}

leading me to :

import util.Random
import  Long._
import scala.math._
class RNG(var seedOne: Int, var seedTwo: Int ) {

  def apply(): Int = {
    seedOne = 36969 * (seedOne  & 65535) + (seedOne  >> 16)
    seedTwo = 18000 * (seedTwo & 65535) + (seedTwo >> 16);
    abs((seedOne << 16) + seedTwo)
  }
}

object RNG {
  Random.setSeed(MaxValue)


  def seed : Int = {
    abs(Random.nextInt())
  }

  def apply(): RNG = {
    new RNG(seed, seed)
  }
}

where the seed is provided by the native number generator. I also need a GCD so let's test it

import org.specs2._
import com.promindis.montecarlo.MathModule._

final class GCDTest extends Specification {
  def is =
    "GCD calculus specification"                          ^
                                                          p^
      "GCD with macthing numbers should"                  ^
      "Find a first known result"                         ! e1 ^
      "Find a second known result"                        ! e2 ^
      "Find known result with reversed paramteres"        ! e3 ^
                                                          p^
      "GCD with non macthing numbers should"              ^
      "assert on known mismacth"                          ! e4 ^
      "assert on known mismacth with invert parameters"   ! e5


  def e1 = gcdOf(1989, 867) should be equalTo (51)

  def e2 = gcdOf(36, 27) should be equalTo (9)

  def e3 = gcdOf(27, 36) should be equalTo (9)

  def e4 = gcdOf(23, 11) should be equalTo (1)

  def e5 = gcdOf(11, 23) should be equalTo (1)
}

and then the solution:

object MathModule {

 //.........

  def sorted(firstNumber: Int, secondNumber: Int): (Int, Int) = {
    if (firstNumber < secondNumber) (secondNumber, firstNumber) else (firstNumber, secondNumber)
  }

  def gcdOf(firstNumber: Int, secondNumber: Int): Int = {
    def gcdOf(pair: (Int, Int)): Int = {
      pair match {
        case (y, x) if (x == y) => x
        case (y, x) if x == 1 => 1
        case (y, x) if x < y => gcdOf(sorted(y - x, x))
        case (y, x) if x < y => gcdOf(sorted(x - y, y))
      }
    }

    gcdOf(sorted(firstNumber, secondNumber))
  }
}


One can admire the beauty of the self expressive pattern matching in Scala. Finally I need a Monte Carlo simulator. Writing the tests again with the beautiful specs2:

import org.specs2._

final class MontecarloTest extends  Specification{ def is =

  "Montecarlo simulation dshould"                     ^
                                                      p^
  "have 100% success with always successful scenario" !e1 ^
  "have 0% success with always failing scenario"      !e2

  def e1 = {
    def alwaysSuccessful(): Boolean = true
    val stats = Montecarlo.simulation(alwaysSuccessful, 100)
    stats._1 should (beEqualTo(100))
    stats._2 should (beEqualTo(stats._1))
  }

  def e2 = {
    def neverSuccessful(): Boolean = false
    val stats = Montecarlo.simulation(neverSuccessful, 100)

    stats._3 should (beEqualTo(stats._1))
  }

}

helped me to get to :

object Montecarlo {
  type test = () => Boolean

  val update: Map[Boolean, List[Int]] =
    Map[Boolean, List[Int]](true -> List(1,0), false -> (List(0, 1)))

  def simulation(onRunning: test, essays: Int): (Int, Int, Int) = {
    val results: List[Int] = Range(0, essays).foldLeft(List(0, 0)) {
      (stats: List[Int], index: Int) =>
        (stats, update(onRunning())).zipped.map[Int, List[Int]](_ + _)
    }
    (essays, results(0), results(1))
  }
}

Here the result is provided as 3-Tuple, returning the number of essays, the number of passed essays , then the number of failed essays. I suffered only on the map method application after zipping the resulting lists. The compiler seemed in need of some help with explicit typing in order to infer the returned type. The astute reader will note that we used the same trick as in Clojure, storing the increments values definitions for the two success/failure scenarii into a Map. I have all the tools I need to run a Cesaro test:

import org.specs2.mutable.Specification

final class PiResolutionTest extends Specification{

  "Resolution of Pi" should {
    "be close to 9.9 " in {
      val epsilon = (9.9 - MathModule.estimatPiSquare())
       epsilon should(beLessThan(0.05))
    }
  }

}

and challenge it:

object MathModule {

  def forCesaro(random: RNG): () => Boolean = {
    () =>  gcdOf(random(), random())  == 1
  }


  def estimatPiSquare(): Double = {
    val stats = Montecarlo.simulation(forCesaro(RNG()), 10000)
    println(stats)
    val ratio: Double = int2double(stats._2) / stats._1.asInstanceOf[Double]
    println(ratio)
    println(6.0 / ratio)
    (6.0 / ratio)
  }
//..............

}

What was learnt ?

Well, living without branches is not easy, but worthwhile because it helps in learning how to use and reuse functional programming bricks. But I also learnt I have to digg into the RNG code so to find a smarter way to generate my numbers in Clojure. Maybe a stream oriented approach would be better...
Nice, got to finish chapter 11 of the Joy of Clojure and read one chapter more in Gul Agha's Actors book.


Be seeing you ! :)


2 comments:

Vlad Patryshev said...

When you talk about two integers chosen at random, I believe you are talking about two integers between 0 and N, uniformly distributed; and then getting N growing to infinity. Right?

Or else there's a confusion there.

Globulon said...

You are right ! :) Thank you for noticing that.

Post a Comment