Wednesday, January 25, 2012

A Start Trek firing Disruptors from Scala

Hello again.

 Today I want to share some of my ramblings while exploring a little bit the Disruptor framework from a Scala eye. It is never too late. This little inspection was encouraged by David Greco, and Martin Krasser kindly suggested me to also have a look to a Scala port implemented by Jamie Allen. 

Before dissecting this last reference I decided to read the classic stuff on the subject, from the seminal technical article up to Trisha Gee's blog but also read again the very interesting articles written by Martin Thompson here. In order to get a better understanding I also suggest this reading and that one. Naturally on the pile of all that documentation do not forget Martin Fowler's must read production over here

Everything has been said about the Disruptor and I really advocate you to read about the subject before reading the incoming article if interested. My incoming summary in a few line is going to be too harsh to be considered as a serious presentation, and I hopefully expect that some of the experts will read one day the article so to correct my misconceptions. 

Roughly cloning Martin Fowler's article introduction let say that,as all financial trading platforms, the LMAX platform relies on business logic core components that must interact with the outside world as fast as possible, supporting a throughput of dozen of millions of messages per second. 

The chosen approach of the implementation find its name in the disruptive way of thinking that one must adopt so to understand the internals of the Disruptor.  Working with disruptors involves first thinking  "asynchronous", then accepting that simple and elegant solutions to exchange messages can be better than messaging based on queues. The pre-cited articles explain in a detailed manner the why and of the how at the origin of the astounding high level of performance of Disruptors compared to standard queue systems. 

The core business logic is surrounded by Disruptors. At the heart of the Disruptor, we find a ring buffer as the corner stone of the architecture, fed with one or more producers, and feeding one or more readers. And guess what, the one producer/ many readers configuration is the optimum combination.
Readers and producers run in their own thread of execution. We achieve the communication between readers/producers and the ring Buffer through barriers created by the ring buffer, although in the former case the producer(s) barrier is merged with the ring buffer. 
The fixed size ring buffer creates all its indexed slots during instantiation, the memory being allocated once and for all avoiding annoying GC's. peaks.
The producer claims for the available slots index, get the matching slots, updates the slot's content, and commit them. '
A reader waits through its sequence barrier for the available slots. 
On producer side, events are created by an event factory. 
On the consumer side, event handlers process the dispatched events.

 The framework produces a lot of support in order to ease the creation of typical consumers'configurations like pipe line configurations:


or diamond configuration for example:


Notice that the performance tests implement these configurations in the Disruptor source code. 

Before diving into the Scala port, I wanted to reproduce some configurations in Scala, the code exploration allowing me to get a better understanding of how thing goes. This is me: I need a client API point of you before understanding it all.

I decided not to adopt the Disruptor wizard approach and to dive down to the layer just under the wizard, creating by myself the barriers, handlers and so on. BatchEventProcessor remains the only generic class definition I kept in my explorations. Roughly said, a BatchEventProcessor instance while executed in its own thread of execution, takes in charge the access to the sequence barrier, getting the event, and passing them to an EventHandler instance. So a BatchEventProcessor basically impersonates a consumer.

The (little code) code project I produced is located here (yes a Github project finally :)). What I want to run are little scenarii reproducing some of the configurations. Compared to tests I over simplified the design. The (very basic) event management is hosted in the body othe the EventModule object declaration:


package com.promindis.disruptor.adapters
import com.lmax.disruptor._
import java.util.concurrent.CountDownLatch
import util.PaddedLong

object EventModule {
  class ValueEvent() {
    val value = new PaddedLong()

    def setValue(newValue: Long) {
      value.set(newValue)
    }

    def getValue = value.get()
  }

  object ValueEvent {
    def apply() = new ValueEvent()
  }

  object ValueEventFactory extends EventFactory[ValueEvent]{
    def newInstance(): ValueEvent = ValueEvent()
  }

  case class Handler(name: String, expectedShoot: Long = 0, latch: Option[CountDownLatch] = None) extends EventHandler[ValueEvent] {
    var counter = 0L

    def onEvent(event: ValueEvent, sequence: Long, endOfBatch: Boolean) {
      counter += 1L
      for (l <- latch if (counter == expectedShoot) ) {
        l.countDown()
      }
    }

    override def toString = "[" + name   + ": counter => " + counter  + "]"
  }

  def fillEvent(event: ValueEvent): ValueEvent = {
    event.setValue(1234);
    event
  }

}

Our events wrap a long value using the LMAX PaddedLong class definition. Handlers will count the numbers of received instances. I used the counter for personal manual testing and checking all the events where passed to handlers. I introduce some optional countdown latch, very useful to track the time when the last consumer in my configurations has achieved its duty. What we basically need, is to measure the times of execution. So we need something like that:

package com.promindis.disruptor.adapters

import System._

object TimeMeasurement {

  implicit def toFrequency(value: Long) = new {
    def throughput(during: Long): Long = {
      (value * 1000L) / during
    }
  }

  def sampling(block: => Unit) = {
    val start = currentTimeMillis
    block
    val end = currentTimeMillis
    new {
      def provided[T](f: Long => T) = {
        f(end - start)
      }
    }
  }
}

The usage being:

 measured {
   doYourStuff()
 } getting { elapsedTime =>
  doSomethingWith(elapsedTime)}

The BatchEventProcessor instances must be run into their own thread of execution. The implicit throughput method will allow for the application of a small mean calculus to any Long in the scope of the definition.

Therefore, It would be nice if we could run our tests without having to take into account all that blurry management of
  • creating a thread executor service
  • Running the batch processors
  • halting the processors
  • halting the executor
I propose something like:

package com.promindis.disruptor.adapters

import com.lmax.disruptor.BatchEventProcessor
import java.util.concurrent.Executors._

object ProcessorLifeCycle {

  def executing[T](processors: BatchEventProcessor[T]*)(block: => Unit) {
    val executor = newFixedThreadPool(processors.length)
    processors.foreach {executor.execute(_)}
    try {
      block
    } finally {
      processors.foreach{_.halt()}
      executor.shutdown()
    }
  }

}

a typical usage being:

executing(processors) {
   scenario
}

So far so good. I have all the tools I need to run a Scenario. The * character at the end of the typing declaration of the first parameter expresses a variable number of argument processors. Why not creating a trait to gather all the basics:

package com.promindis.disruptor.configurations
import com.lmax.disruptor.BatchEventProcessor
import com.promindis.disruptor.adapters.TimeMeasurement._
import com.promindis.disruptor.adapters.ProcessorLifeCycle._


final case class Configuration(
  ringBufferSize: Int = 1024 * 1024,
  iterations: Long = 1000L * 1000L * 25L,
  runs: Int  = 5
)

trait Scenario {

  final def playWith[T](processors: Seq[BatchEventProcessor[T]])(bench: => Unit)(implicit config: Configuration) = {
    sampling {
      executing(processors:_*) {
        bench
      }
    } provided {
      config.iterations.throughput(_)
    }
  }

  def challenge(implicit configuration: Configuration): Long

  def run(implicit config: Configuration): Seq[Long] =  {
    val config = Configuration()
    for (_ <- 1 to config.runs) yield challenge(config)
  }

  def main(args: Array[String]) {
    run(Configuration())
      .foreach{value => println("Nb Op/s: " + value)}
  }
}

An Scenario allows to play with a set of processors a (micro!) bench execution. A Configuration case class instance groups all what's needed to play multiple times a fixed number of iterations. Notice that we use our implicit throughput method application on the result of time measurement so to return after a play the measured number of messages per second.
The run method is the entry point of the little bench execution, returning a list of throughput values, their number also being set by an input configuration instance.

No better start than implementing the most simple possible example: a producer fills a ring buffer while a consumer handles the events. This code sample will provide us with an overview of the main class involved in the process:

package com.promindis.disruptor.configurations.unicast

import com.promindis.disruptor.adapters.RingBufferFactory._
import com.lmax.disruptor.BatchEventProcessor
import com.promindis.disruptor.adapters.EventModule.{Handler, ValueEvent, ValueEventFactory}
import java.util.concurrent.CountDownLatch
import com.promindis.disruptor.adapters.{EventModule, Shooter}
import com.promindis.disruptor.configurations.{Configuration, Scenario}


object UnicastWithShooter extends Scenario{

  override def challenge(implicit config: Configuration) = {
    val rb = ringBuffer(ValueEventFactory,size = config.ringBufferSize);

    val barrier =  rb.newBarrier()
    val countDownLatch = new CountDownLatch(1)
    val handler = Handler("P1", latch = Some(countDownLatch), expectedShoot = config.iterations)
    val processor = new BatchEventProcessor[ValueEvent](rb, barrier, handler);
    rb.setGatingSequences(processor.getSequence)

    val shooter = Shooter(config.iterations, rb, EventModule.fillEvent)

    playWith(List(processor)){
      shooter ! 'fire
      countDownLatch.await()
    }
  }
}

From the creation of the barrier to the settings of the ring buffer gating sequence, the trained eye can recognize a typical Disruptor code pattern. But what is a Shooter ? I defined a class able to feed the ring buffer, from its own thread of execution and taking a minimum number of parameters, aka, the number of iterations, a ring buffer and a way to set up a created event (fillEvent method). 

There comes the Scala Actor "native" implementation:

package com.promindis.disruptor.adapters

import com.lmax.disruptor.RingBuffer
import actors.Actor

class Shooter[T](numberOfShoot: Long, val ringBuffer: RingBuffer[T], val eventStrategy: T => T) extends Actor {
  self =>

  implicit def adapted[T](ringBuffer: RingBuffer[T]) = new {

    def shoot(update: T => T) {
      val (sequence, event) = nextEventStructure(ringBuffer)
      update(event)
      ringBuffer.publish(sequence);
    }

    def nextEventStructure[T](rb: RingBuffer[T]): (Long, T) = {
      val sequence = rb.next();
      (sequence, rb.get(sequence));
    }
  }

  def act() {
    react {
      case 'fire =>
        for (i <- 1L to numberOfShoot) {
          ringBuffer.shoot(eventStrategy)
        }
        self.exit()
    }
  }
}

object Shooter {
  def apply[T](numberOfShoot: Long, ringBuffer: RingBuffer[T], fillEvent: T => T) =
    new Shooter(numberOfShoot, ringBuffer, fillEvent).start()
}

If you did not read Philip Haller and Franck Sommers' book on actors and actors in Scala...you should :). In order to shorten the size of the react method implementation I used an implicit method declaration allowing for an "ad-hoc" extension of the ring buffer API, allowing it to shoot some event update strategy by itself. 
After all who better than a ring buffer knows how to get a sequence and publish it ?
Although not comfortable with implicits, I do not find disturbing using them very close to their context of application, specifically when they help in solving a kind of expression problem

When it comes to reuse the same patterns of code I can become very lazy (as a lot of people). Having a little time at home I decided I could simplify a little bit the creation of other configurations. What if I could create a pipe line configuration like that :

val chain = for {
  _ <- pipe[ValueEvent](Handler("C1"), rb)
  _ <- pipe[ValueEvent](Handler("C2"), rb)
  _ <- pipe[ValueEvent](Handler("C3", latch = Some(countDownLatch), expectedShoot = config.iterations), rb)
}

in order to pipe three consumers, just getting back the BatchEventProcessors paired with the handlers for purpose testing. 
This is when I switched to panic mode. Of course we are building a state configuration for the ring buffer so we have to meet again with the state monad we studied last time. Implementing it was hard the first time, not being sure I had understood everything about it. I have difficulties with the sate monad, so here comes an opportunity to learn how to use the pattern...again 

Everything I need can be gathered in the pipe method, located in the Builder object:

package com.promindis.disruptor.adapters
import com.promindis.disruptor.support.StateMonad
import com.lmax.disruptor.{SequenceBarrier, BatchEventProcessor, RingBuffer, EventHandler}


object Builder {
  type Context[X] = (BatchEventProcessor[X], EventHandler[X])

  def pipe[T](handler: EventHandler[T], rb: RingBuffer[T]) = new StateMonad[SequenceBarrier, List[Context[T]] ] {
    def apply(list: List[Context[T]]) = {
      list match {
        case (p::ps)=>
          val newBarrier = rb.newBarrier(p._1.getSequence)
          val newProcessor = new BatchEventProcessor[T](rb, newBarrier, handler)
          (newBarrier, (newProcessor, handler)::p::ps)
        case _ =>
          val newBarrier = rb.newBarrier()
          val newProcessor = new BatchEventProcessor[T](rb, newBarrier, handler)
          (newBarrier, (newProcessor, handler) :: Nil)
      }
    }
  }
}

Aliasing a pair (BatchEventProcessor/EventHandler) as a Context for a given Event type, I define this type alias as my mutable state.
We are going to pile each step, because we need all the created processors at the end of the configuration execution, and maybe we need the handlers too so to check their internal counters. The value returned by the application of state Monad instance will be a sequence barrier. Returning a sequence barrier can be useful when attaching multiple consumers to the same barrier.

Ideally we would need only the processors to submit them in some executor service. Piping the consumers consists in creating the barrier of one consumer using the sequence of the previous one, so the former can handle an event after the latter did. I advocate you once more to study a little bit the articles I provided the links to, before understanding the implementation of the tool methods used for configuration in the Builder object. As a result the implementation of a pipeline becomes:

package com.promindis.disruptor.configurations.pipeline

import com.promindis.disruptor.adapters.RingBufferFactory._
import java.util.concurrent.CountDownLatch

import com.promindis.disruptor.adapters.EventModule.{Handler, ValueEvent, ValueEventFactory}
import com.promindis.disruptor.configurations.{Configuration, Scenario}
import com.promindis.disruptor.adapters.{Builder, Shooter, EventModule}
import Builder._


object Pipe_Line extends Scenario{

  def challenge(implicit config: Configuration) = {
    val rb = ringBuffer(ValueEventFactory,size = config.ringBufferSize);
    val countDownLatch = new CountDownLatch(1)

    val chain = for {
      _ <- pipe[ValueEvent](Handler("C1"), rb)
      _ <- pipe[ValueEvent](Handler("C2"), rb)
      _ <- pipe[ValueEvent](Handler("C3", latch = Some(countDownLatch), expectedShoot = config.iterations), rb)
    } yield ()

    val consumers = chain(List())._2
    val processors = consumers.unzip._1
    rb.setGatingSequences(processors.head.getSequence)
    val shooter = Shooter(config.iterations, rb, EventModule.fillEvent)

    playWith(processors){
      shooter ! 'fire
      countDownLatch.await()
    }

  }
}

which is a far shorter than the first implementation, believe me :). 

Returning a list of tuples (BatchEventProcessor/EventHandler), requires a little of trickery with the unzip method, but that cluttering can be easilly removed, because except for testing we do not need to get back the handlers. 
The Github repository project content provides two more methods used to build diamonds and I am certain, other possible variations on configurations. 

The declarative part of the configuration of a diamond looks like:

val diamond = for {
  barrier <- fork(Handler("C1"), rb, rb.newBarrier())
  _ <- fork(Handler("C2"), rb, barrier)
  _ <- join(Handler("C2", latch = Some(countDownLatch), expectedShoot = config.iterations), rb)
}

also far shorter in length than the very first implementation.

Thank you for following me at this point. 

Running my tests provides the same results as running the tests located in the LMAX source code in terms of throughput values. 
Having reduced the number of iterations to 25 millions and extended generously the ring buffer size, depending on the tested configuration, I get from 3.5 millions to 5 millions messages per second. 
My old machine not being sized for this kind of stress (only two cores) , the results values are smaller than the benchmarks' result values presented by the LMAX team. 
But my old laptop still runs five to ten time  faster the Disruptor tests than the message queue tests. I find the result quite interesting, being sure that I do not host any parallel desktop application on this laptop that can handle 5 million message per second or even one million message per second. 

Somehow adopting a disruptive mind could be a smart move not only in large scaled applications. 
That's all folks. 

 Be seeing you ! :):):)

Wednesday, January 18, 2012

A naive Adler32 example in Clojure and Scala

Houdy,

between administrative intricacies this week, among other things, I took the time to reproduce both in Clojure and Scala a small exercise found in Real World Haskell (RWH). 
This blog entry will be very small as I simply provided in each language a way to implement the algorithm. 

The algorithm is the Adler32 checksum algorithm as presented in RWH. (You will be able to see the link at the end the protest on the Wikipedia site). Trying to decode the three code samples while the Wikipedia link is blacked out for protest, can be also seen as an interesting exercise :). 

The Adler32 algorithm is an algorithm invented by Mark Adler in 1995 and used in the zlib compression library. I see these katas as an interesting mean of learning new things on a daily base (isn't it our job to learn and understand better than use blindly external frameworks?). 
For copyright purpose I provide here my version of the algorithm and not the one in the book, as I tried to produce my own haskell version

import Data.Char (ord)
import Data.Bits (shiftL, (.&.), (.|.)) base = 65521 cumulate::(Int, Int) -> Char -> (Int, Int) cumulate (a, b) x = let a' = (a + (ord x .&. 0xff)) `mod` base b' = (a'+ b) `mod` base in (a', b') adler32::[Char] -> Int adler32 xs = let (a, b) = foldl cumulate (1, 0) xs in (b `shiftL` 16) .|. a

The authors use this algorithm on purpose in order to present an application of the use of the higher order fold function. Let give it a try:

ghci>adler32 "Thumper is a cute rabbit"
1839204552
ghci>

That gives us meat for our tests in Scala and Clojure (I have not learn yet about quickCheck Haskell) Logically, in Clojure our test should look like:

(ns algorithms.test.adler32-spec
  (:use algorithms.adler32)
  (:use clojure.test))

(deftest checksum-with-favourit-sentence-should-produce-result
  (is (= 1839204552 (checksum "Thumper is a cute rabbit"))))

that runs  green for the following implementation:

(ns algorithms.adler32)

(def base 65521)

(defn cumulate [[a b] x]
  (let [a-prim (rem (+ a (bit-and x 255)) base)]
    [a-prim (+ b a-prim)]))

(derive clojure.lang.LazySeq ::collection)

(defmulti checksum class)
(defmethod checksum String [data]
  (checksum  (lazy-seq (.getBytes data))))
(defmethod checksum ::collection [data]
  (let [[a b] (reduce cumulate [1 0] data)]
    (bit-or (bit-shift-left b 16) a)))


where I naively used a derive routine in order to dispatch my multimethod using the class function as a dispatcher. My dispatching mechanism resolves now the clojure.lang.LazySeq instances as children of ::collection :

algorithms.adler32=> (parents clojure.lang.LazySeq)
#{java.util.List clojure.lang.Obj clojure.lang.ISeq clojure.lang.IPending clojure.lang.Sequential :algorithms.adler32/collection}
algorithms.adler32=>

Test ok.

 Following the same reasoning in Scala, the test will be :


package com.promindis.algorithms.cheksum

import org.specs2.Specification


class Adler32Specification extends Specification { def is =
  "Adler32Specification"   ^
                          p^
    "checksum for input"   ^
    "Should restore the expected checksum value" !e1


  def e1 =
    new DefaultAdler32()
      .checksumText("Thumper is a cute rabbit".toCharArray)
      .should(beEqualTo(1839204552))
}

leading to

package com.promindis.algorithms.cheksum

trait Adler32 {
  val base = 65521

  def rebased(value: Int) = value % base

  def cumulated(acc: (Int, Int), item : Byte): (Int, Int) = {
    val a = rebased(acc._1 + (item & 0xff ))
    (a, (a + acc._2) % base)
  }

  def checksum(data: Traversable[Byte]): Int

  def checksumText(data: Traversable[Char]): Int
}


final class DefaultAdler32 extends Adler32 {

  override def checksum(data: Traversable[Byte]): Int = {
    val result = data.foldLeft((1, 0)) {cumulated(_, _)}
    (result._2 << 16) | result._1
  }

  def checksumText(data: Traversable[Char]) = {
    checksum(data.toSeq.map(_.toByte))
  }
}

Tests green :)
 That's all folks (I promised it would not be long). And don't take for granted what comes from closed boxes !

Be seeing you !!! :)

Monday, January 16, 2012

Changing my state of mind with a Monad in Scala

Houdy.

While looking for more real world example in order to complete a previous blog entry, I found myself struggling with the State Monad in order to solve what I supposed to be a typical State Monad problem.
My rambling to solve this specific problem did not succeed while in the mean time I successfully reproduced the canonical sample of a stack manipulation extracted from Learn You a Haskell (LYAH). Although I am not satisfied with the result I would like to expose this "kata", and will ask for your feedback in order to reach to the expected goal.
I specially thank Nilanjan Raychaudhuri - the author of Scala in action - for his precious help. Reading chapter 10 from his book confirmed I was working into the right direction. 

Reproducing the LYAH example remains a fruitful exercise in that sense that it constrains you to use Scala idioms (typing, self type annotation etc.) and forces you to think about some of the inner mechanics of the for comprehensions

In order to expose the interest of reproducing state management in functional programming languages, Miran Lipovaca presents a three coins problem simulating the extraction of results from tossing a coin, and a stack manipulation problem. From the point of view of imperative languages, the random generator internals or the stack internal would be easily modifiable, mutable objects allowing to generate new numbers or alter the stack state. 
In pure functional language, we manipulate immutable data. We have to create a new value object each time the equivalent of a state change occurs. But what if we could separate the flow of data from the side effect manipulation of the change of state. 
And that, is specifically our purpose, embedding a change of state in a dedicated instance. The secret lays into the abstract representation of this change of state as a function:

def apply(state: S): (T, S)

where S references the type of the state to be changed, and T is the type of the result of the stateful computations. The whole class hosts the apply function (so is applicable by itself in Scala), and impersonates the context that contains the state management. You apply the context in order to get your result value:

contextInstance(previousState) = (result, newState)

For the same price, you get the altered state. In the case of a stack, the state is the stack content. The provided manipulation contexts will be class instances implementing context templates for stack manipulation like pop and push. We will represent a stack state as a List of items of type A:

List[A]

Consequently, if we choose to name our state context StateMonad, the pop and push operations can be gathered in a Stack scope definition like the following:

object Stack {
  def push[A](x: A) = new StateMonad[Unit, List[A]] {
    def apply(state: List[A]) = ((), x :: state)
  }

  def pop[A] = new StateMonad[Option[A], List[A]] {
    def apply(state: List[A]) =
      state match {
        case x :: xs => (Some(x), xs)
        case _ => (None, state)
      }
  }
}

taking a leap of faith regarding an existing definition of the StateMonad trait. In the mean time we have acknowledged that our state Monad trait definition will be parameterized as:

trait StateMonad[+T, S]

While pushing data on top of a stack, I expect no result, so I return a () (aka void) instance:

scala> import Stack._
import Stack._

scala> push(5)(List())
res0: (Unit, List[Int]) = ((),List(5))

while the result of a pop context execution may contain an optional item of type A, depending on the size of the previous stack state (no elements at all, or at least one element):

scala> import Stack._
import Stack._

scala> pop(List(1))
res1: (Option[Int], List[Int]) = (Some(1),List())

scala> pop(List())
res2: (Option[Nothing], List[Nothing]) = (None,List())

I believe the case pattern matching in the pop method body, to be self explanatory. Chaining the state modifications, then, can be achieved using both definitions of map and flatMap. The application of the map method is helpful in transforming the result embedded into the context, producing a new state Monad taking into account the expected transformation:

def map[U](f: T => U) = new StateMonad[U, S] 

while defining a flatMap method helps in simplifying the chaining of

flatMap[U](f: T => StateMonad[U,S])

How is so ? Simply as we did last time

scala> import com.promindis.user._
import com.promindis.user._

scala> import Stack._
import Stack._

scala> val result = push(3).flatMap{ _ =>
     |       push(5).flatMap{_ =>
     |         push(7).flatMap{_ =>
     |           push(9).flatMap{_ =>
     |             pop.map{_ => ()}
     |           }
     |         }
     |       }
     |     }
result: java.lang.Object with com.promindis.state.StateMonad[Unit,List[Int]] = com.promindis.state.StateMonad$$anon$1@124e407

scala> result(List())
res2: (Unit, List[Int]) = ((),List(7, 5, 3))

scala>

The benefit of map and flatMap becomes obvious while using more idiomatic Scala expressions like comprehensions that get interpreted as the above lines of codes:

scala> import com.promindis.user._
import com.promindis.user._

scala> import Stack._
import Stack._

scala> val result = for {
     |       _ <- push(3)
     |       _ <- push(5)
     |       _ <- push(7)
     |       _ <- push(9)
     |       _ <- pop
     |     } yield ()
result: java.lang.Object with com.promindis.state.StateMonad[Unit,List[Int]] = com.promindis.state.StateMonad$$anon$1@7a6088

scala> result(List(1))
res3: (Unit, List[Int]) = ((),List(7, 5, 3, 1))

scala>

The full implementation of the StateMonad trait becomes then:

package com.promindis.state

trait StateMonad[+T, S]  {
  owner =>
  def apply(state: S): (T, S)

  def flatMap[U](f: T => StateMonad[U,S]) = new StateMonad[U, S] {
    override def apply(state: S) = {
      val (a, y) =  owner(state)
      f(a)(y)
    }
  }

  def map[U](f: T => U) = new StateMonad[U, S] {
    def apply(state: S) = {
      val (a, y) =  owner(state)
      (f(a), y)
    }
  }
}

object StateMonad {
  def apply[T, S](value: T) = new StateMonad[T, S] {
    def apply(state: S) = (value, state)
  }
}

The map function produces a resulting new container instance in charge of applying the new state transformation on the transformed result from the original container instance. The typed self annotation owner, helps in referencing the original container from the apply method body of the new anonymous StateMonad instance:

owner =>

How do we extract the result from the previous container ? Again, applying the previous container itself:

val (a, y) =  owner(state)

The result of the new anonymous StateMonad container will be

(f(a), y)

The body of the apply method in the container of the StateMonad instance resulting from the flatMap application will lead to
  • the application of the previous container (so to extract the previous result and state),
  • then the application of the transformation function to the result
  • and finally the application of the new StateMonad instance f(a) to the y intermediate state.
We have chained the previous container state change to the state change expected after the f function application. Whole this chaining is itself transparently hosted by a containing monad. The complete stack example can be reproduced:

package com.promindis.user

import com.promindis.state._

object Stack {
  def push[A](x: A) = new StateMonad[Unit, List[A]] {
    def apply(state: List[A]) = ((), x :: state)
  }

  def pop[A] = new StateMonad[Option[A], List[A]] {
    def apply(state: List[A]) =
      state match {
        case x :: xs => (Some(x), xs)
        case _ => (None, state)
      }
  }
}

object UseState {
  import Stack._
  def main(args: Array[String]) {
    val result = for {
      _ <- push(3)
      _ <- push(5)
      _ <- push(7)
      _ <- push(9)
      _ <- pop
    } yield ()

    println(result(List(1))._2)

    val otherResult = push(3).flatMap{ _ =>
      push(5).flatMap{_ =>
        push(7).flatMap{_ =>
          push(9).flatMap{_ =>
            pop.map{_ => ()}
          }
        }
      }
    }

    println(otherResult(List(1))._2)
  }
}


The example works fine as in LYAH, but I am not satisfied with the result for two reasons.
  • The first is that I was not able (still) to link this implementation to my previous Monad definition here.
  • The second point is that I have to reproduce some real life example like Nilanjan Raychaudhuri ones in order to stress these definitions.
All feedback and suggestion will be welcomed as usual.

Until then, I have to do a little haskell, study more the disruptor and practice some katas. 

 Be seeing you !!! :)

Wednesday, January 11, 2012

Where my Clojure plays with protocols, multimethods and the SICP

Hello again.

I wanted to share today the implementation of one of my exploration of the SICP book, and get your feedback on the solution I came with. 
While reading the book, I take the time to challenge both the exercises and code samples in Scheme - although I do regret not knowing it very well - and in Clojure. Somehow trying to reproduce each line of code in both languages can explain why it takes so much time :). 

Part two of the bookis dedicated to the exploration of data abstraction and more specifically, section 2.4 leads us slowly to a data directed programming example. Starting from the canonical example of operations on complex numbers Abelson and Sussman, creates a layered complex-arithmetic system, starting at the bottom from primitive list structures, implementing complex numbers,  up to program functions that use selector and constructor methods to manipulate complex numbers. 
The top higher order operations are not supposed to be aware of the lower details of the implementation as we should use as many representations of complex numbers as we would like to, in the bottom layer. 

Complex numbers can be represented by pairs describing rectangular or polar coordinate. Each of these representation suits more to specific operations. Adding two complex numbers is easier using a rectangular representation, while polar representation seems more adapted to multiplication for example. At any step we should be able to create new complex number instances from polar or rectangular data, which we can express as:

(make-from-real-imag 1 1)
(make-from-mag-ang 1 1)
  • In the first line we create a complex number for rectangular coordinates.The idoms real and imag respectively stand for real (x) and imaginay (y) components
  • In line two we create a complex number for polar coordinates. mag and ang respectively stand for magnitude(r) and angle (theta) components
At any step we should be able to select (x,y) components in rectangular representation and (r, theta) components in polar representation. What we wish to express is something like the following:

(real-part z1)
(imag-part z1)
(magnitude z1)
(angle z1)

All these constructors n' selectors should totally be unaware of the underlying representations.

But in addition, to conform to the SICP state of mind, the previous make-from-real-imag and make-from-mag-ang constructors should be adapted in order to choose the most suitable implementation. There, serious things begin. 

In order to both explore SICP book while playing with Clojure idioms, I voluntarily chose to use protocols. In a certain way, like Abelson and Sussman (see section 2.4.1), I do not want my higher order operations to depend on specific implementations because there can be many of them, but I want my implementations to be contracted by my higher order module operations. 
What I am expecting looks like a basic Inversion Of Control (IOC is a much higher concept than dependency injection, so check you do not confuse both ideas). Quoting Michael Fogus and Chris Houser's Joy Of Clojure: "protocols are set of function signatures, each with at least one parameter, that are given a collective name. 
To digg a little deeper into protocols, check also Stuart Halloway and Aaron Bedra's  book Programming Clojure and Stuart Halloway presentation over here

To be perfectly honest I am not at ease with protocols as they do remind me to much of OOP in a Functional Programming context. That is debatable I think. Both sets of constructors and selectors functions are perfect candidates for protocols.

(ns sicp.complex)

(defprotocol ComplexSelector
  (magnitude [this])
  (angle [this])
  (real-part [this])
  (imag-part [this]))

(defprotocol ComplexConstructor
  (make-from-real-imag-constructor [self x y])
  (make-from-mag-ang-constructor [self r t]))

Then we can provide a full implementation for rectangular coordinates:

(ns sicp.complex-rectangular
  (:use sicp.core)
  (:use sicp.complex))

(defrecord Rectangular [x y]
  ComplexSelector
  (real-part [z] (:x z))
  (imag-part [z] (:y z))
  (magnitude [z] (Math/sqrt (+ (square (real-part z)) (square (imag-part z)))))
  (angle [z] (Math/atan2 (imag-part z) (imag-part z))))

(defn rectangular-complex[]
  (reify ComplexConstructor
    (make-from-real-imag-constructor [self x y] (->Rectangular x y))

    (make-from-mag-ang-constructor [self r t]
      (->Rectangular (* r (Math/cos t)) (* r (Math/sin t))))))

challenged versus the following tests:

(ns sicp.test.complex-rectangular-spec
  (:use [sicp complex complex-rectangular])
  (:use clojure.test))

(deftest complex-rectangular-should-bind-magnitude
  (is (= 5.0 (magnitude (->Rectangular 3 4)))))

(deftest complex-rectangular-should-bind-angle
  (is (= (/ Math/PI 4) (angle (->Rectangular 1 1)))))

(deftest complex-rectangular-should-bind-real-part
  (is (= 1 (real-part (->Rectangular 1 2)))))

(deftest complex-rectangular-should-bind-real-part
  (is (= 2 (imag-part (->Rectangular 1 2)))))

and for polar coordinates:

(ns sicp.complex-polar
  (:use sicp.core)
  (:use sicp.complex))

(defrecord Polar [radius theta]
  ComplexSelector
  (magnitude [z] (:radius z))
  (angle [z] (:theta z))
  (real-part [z]
    (* (magnitude z) (Math/cos (angle z))))
  (imag-part [z]
    (* (magnitude z) (Math/sin (angle z)))))

(defn polar-complex []
  (reify ComplexConstructor
    (make-from-real-imag-constructor [self x y]
      (->Polar (Math/sqrt (+ (square x) (square y))) (Math/atan2 y x)))

    (make-from-mag-ang-constructor [self r t] (->Polar r t))))


challenged versus the following tests:

(ns sicp.test.complex-polar-spec
  (:use [sicp complex complex-polar])
  (:use clojure.test))

(deftest complex-polar-should-bind-magnitude
  (is (= 1 (magnitude (->Polar 1 2)))))

(deftest complex-polar-should-bind-angle
  (is (= 2 (angle (->Polar 1 2)))))

(deftest complex-polar-should-bind-real-part
  (is (= 1 (real-part (->Polar 1 0))))
  (is (= 0 (imag-part (->Polar 1 0)))))

(deftest complex-polar-should-bind-real-part
  (is (< 0.0001 (Math/abs  (- 1.0 (imag-part (->Polar 1 Math/PI))))))
  (is (< 0.0001 (Math/abs  (real-part (->Polar 1 Math/PI))))))

As an example I used on each implementation the reify macro so to create anonymous instances of the ComplexConstructor protocol. You get access to them by closure, not by declaration. 
All this is nice and good and tests keep green. The upcoming logical step leads us to the definition of the higher order functions for dividing, adding, multiplying etc. 
There raises a small difficulty. If I really want to build an expressions like:

defn add-complex [z1 z2]
  (make-from-real-imag (+ (real-part z1) (real-part z2))
    (+ (imag-part z1) (imag-part z2))))

(defn mul-complex [z1 z2]
  (make-from-mag-ang (* (magnitude z1) (magnitude z2))
    (+ (angle z1) (angle z2))))


that would make use of the make-from-real-imag and the make-from-mag-ang functions , I would like to be able to choose a suitable data representation on each .

Wait a minute, I wanted to invert the control, so naturally at the time of defining both the functions, in the same module as the protocols, I have no idea about which representation could be used. In order to get out of this trap enter multi-methods. The trick I used is not - I think - so different from Abelson and Sussman data directed programming
Their proposal aims to update or get data in a two dimension table, one axis being the "type" of the representation and the other axis being a symbol defining the operation expected. 
So each table cell hosts a reference towards the function, matching both a type and its operation. Implementing the protocols provides me with a dispatch on type. All I need is to identify my functions. Each function can be a multi-method declaration in the same module as the complex numbers protocols definition:

(defmulti make-from-real-imag (fn [x y] :real-imag-constructor))
(defmulti make-from-mag-ang (fn [r theta] :mag-ang-constructor))

defn add-complex [z1 z2]
  (make-from-real-imag (+ (real-part z1) (real-part z2))
    (+ (imag-part z1) (imag-part z2))))

(defn mul-complex [z1 z2]
  (make-from-mag-ang (* (magnitude z1) (magnitude z2))
    (+ (angle z1) (angle z2))))


I provided an elementary dispatching function using a returned symbol as selecting value. 

It is up to the implementing module that will gather both the complex module sicp.complex and its implementations to attach the corresponding defmethod definitions. 
Here is the full complex module:

(ns sicp.complex)

(defprotocol ComplexSelector
  (magnitude [this])
  (angle [this])
  (real-part [this])
  (imag-part [this]))

(defprotocol ComplexConstructor
  (make-from-real-imag-constructor [self x y])
  (make-from-mag-ang-constructor [self r t]))

(defmulti make-from-real-imag (fn [x y] :real-imag-constructor))
(defmulti make-from-mag-ang (fn [r theta] :mag-ang-constructor))

(defn add-complex [z1 z2]
  (make-from-real-imag (+ (real-part z1) (real-part z2))
    (+ (imag-part z1) (imag-part z2))))

(defn sub-complex [z1 z2]
  (make-from-real-imag (- (real-part z1) (real-part z2))
    (- (imag-part z1) (imag-part z2))))

(defn mul-complex [z1 z2]
  (make-from-mag-ang (* (magnitude z1) (magnitude z2))
    (+ (angle z1) (angle z2))))

(defn div-complex [z1 z2]
  (make-from-mag-ang (/ (magnitude z1) (magnitude z2))
    (- (angle z1) (angle z2))))

The best place to simulate a container hosting the complex module and matching representations, being a test module, I implemented the following:

(ns sicp.test.complex-spec
  (:use [sicp core complex complex-rectangular complex-polar])
  (:use clojure.test))


(def rectangular-datum (rectangular-complex))
(def polar-datum (polar-complex))

(defmethod make-from-real-imag :real-imag-constructor [x y]
  (make-from-real-imag-constructor rectangular-datum x y))

(defmethod make-from-mag-ang :mag-ang-constructor [r theta]
  (make-from-mag-ang-constructor polar-datum r theta))


(deftest make-from-real-imag-with-dispatch-should-create-rectangular-coordinate
  (is (= (->Rectangular 1 1) (make-from-real-imag 1 1))))

(deftest make-from-mag-ang-with-dispatch-should-create-rectangular-coordinate
  (is (= (->Polar 1 1) (make-from-mag-ang 1 1))))

(deftest add-complex-should-produce-rectangular-as-sum-of-real-imag
  (is (=  (->Rectangular 3 3)
        (add-complex
          (make-from-real-imag 1 1)
          (make-from-real-imag 2 2)))))

(deftest sub-complex-should-produce-rectangular-as-sum-of-real-imag
  (is (=  (->Rectangular 2 2)
        (sub-complex
          (make-from-real-imag 3 3)
          (make-from-real-imag 1 1)))))

(deftest mul-complex-should-produce-rectangular-as-sum-of-real-imag
  (is (=  (->Polar 6 0.9)
        (mul-complex
          (make-from-mag-ang 2 0.5)
          (make-from-mag-ang 3 0.4)))))

(deftest div-complex-should-produce-rectangular-as-sum-of-real-imag
  (is (=  (->Polar 3 0.4)
        (div-complex
          (make-from-mag-ang 6 0.9)
          (make-from-mag-ang 2 0.5)))))


where in white color, rectangular-datum and polar-datum reference instances of the ComplexConstructor protocol, each of one being respectively invoked thruough the def-methods make-from-real-imag and make-from-mag-ang
The dispatch functions used in the sicp.complex module being very simple our binding remains too very simple. We could maybe try to experiment on  and make evolve the dispatching function to extend the power and flexibility of the binding in the sicp.test.complex-spec module.

Tests green. 

So we have been able to define protocols, then we used elements from that protocols from higher order function and multi-methods, imposing our control to upcoming implementations, and finally we have been able to bind our higher order functions to the expected implementations. No that bad for a simple morning. Do not hesitate to come back with comments. 

 Be seeing you !!! :)

Monday, January 9, 2012

New Happy Scala year sipping a lite Monad

Hello again and a very happy new year .

I would like to thank last year readers as I did not expect so much people visiting the blog. 
I hope the readings provided you with helpful information, and thanks again to the persons who kindly commented providing each time precious feedback information. 
In addition to health and good fortune, I wish you to explore all the possible domains you would like to explore and learn as many things as possible (even Spring if you truthfully want to explore it :)). 

Among all resolutions I promised myself to find a day job allowing for more regular practice of Scala and/or Clojure in order to forge the necessary base of a practice aiming to sustain my night rambling in functional programming, type theory, category.
The incoming year would be a nice year if these ramblings also drove me to explore a NoSQL database and read both Thompson's Type Theory and Pierce TAPL. So what are your programming wishes ? :) 

For this first blog of the year I would like to share my modest ramblings on "contextual" programming in Scala, experiments started as a continuation of my reading of Learn You a Haskell. This experience naturally lead me to Martin Odersky's article Generics of Higher Kind and Scalable Component Abstractions, then to Joshua Suereth writing Scala in Depth, specifically chapter 11 (Joshua if you ever read this modest blog entry, thank you for providing us with such an enlightened book). 

My first purpose is to mimic, in many different upcoming proof of concept, the Haskell way of separating the pure from the impure, aka separating the referential transparency of functions from side effects. Yes, we are talking about Monads
It took me time to start dealing with Monads because of all the negative fuss around the Category Theory world. Like many of you, my mathematical background is far away. I belong to that part of people who thinks that it will be a necessary move to awaken this part of my knowledge in order to leverage it up to Category Theory understanding from the mathematical point of view,  in the mean time as trying some things from the ground, as to say from nearly real world examples. 

I am also convinced that part of defiance to the monadic world comes from the name of the concept itself. Trying to work on a canonical example I realized that after all we are dealing with Design Patterns just named after a mathematical domain. Quoting the gang of four Monads should only be "recurring solutions to common problems in software design".
Naturally people will be less frightened by the name "Decorator Pattern" than by the expression "Monad Pattern". "Decorator" remains a familiar term, a kind of more everyday life expression while the word Monad looks like magic mysterious formula.

Let's (nearly) forget about the name and focus on the today's canonical example. Acting using Habelson and Sussman wishful thinking we will not use tests today, just exploration, based on correct compilation of our code until we run a sample. The canonical example is a simple Logger.
Many may have written about that in Scala, and I guess for the impatient ones you should jump to Tony Morris' blog here to get a full brilliant implementation.
For the others, let's work on the current exploration and why not express critics in the comments so to improve the implementation. The following example is an exploration and surely can be improved. So do comment and do feedback if you feel the urge to. We may travel together to this new land together :) .

We would like to execute a pure function while logging information about different steps of the process. Logging can become a very serious side effect source as you might want to log to external systems, flush information into files etc. For the sake of simplicity just imagine we would append the logs. For example multiplying 3 and 5 while logging could produce the following result:

[15, Got number 3 Got number 5 ]

where we clearly have traced the fact that 3 was created fist and then 5 . Providing 3 and 5 to the multiplication operator will always provide the same result, but here we have been able to trace the  input processes at the origin of  the values. 
Our approach lays onto the use of a context holding both the processed value and the cumulative logs. One must be able to extract on will the embedded value while continuing on appending the log information. This is when using higher order function handling functions, as first class citizens saves us once again. 
How ? Let's imagine we have can handle such a container of type M, container holding both a value of type T and a log, a kind of

Container[T, M[_]]

A first step in the pattern implementation implies that we can be able to split the responsibilities between creating container instances and applying the pure function alone to the embedded value. The following map method on our container can grant this separation:

 def map[U](f: T => U): M[U];

nearly the same map as in the List class definition for example or every Traversable implementation in scala. Having applied internally the f pure function to the method execution, , the container handler must be able to produce a new container embedding a new value of type U. At this step the container should implement a set of behaviors in concordance to what is called a Functor. As we are growing our knowledge step by step, assume we conform to the expected behavior and try to define the following:

package com.promindis.patterns

trait Functor[T, M[_]] {
  def map[U](f: T => U): M[U];
}

Implicitly, I suppose that my container of type M will also implement the functor trait. Joshua Suereth provides more elegant ways to separate the container implementation from the pattern traits definition itself. For now, we just try to progress in our little exploration. 
That's nice, but our multiplication hosts two operands and we are just mapping on embedded value to another ! As a tribute to Learn You a Haskell(LYAH), and supposedly naming our Logging context StringWriter, imagine having at our hand the following value factory:

def logNumber(x: Int) = StringWriter(x, "Got number " + x + " ")

We could execute our multiplication :

logNumber(5).map(y => x * y))

Where the hell would come the y from ? Could we simply imagine producing that :

logNumber(3).map(x => logNumber(5).map(y => x * y))

Well... almost because the type of the produced result would be  StringWriter[StringWriter[Int]] and not  StringWriter[Int]. This flattening operation is then provide by the Monad Pattern per se (hurray !) as a flatten operation :

def flatten[T](m: M[M[T]]): M[T]

Just a minute ! This is it? No, as you can imagine , Monads should obey rules as their cousin the Functors, but from now I just propose you to use the following method

def flatMap [U](f: T => M[U])

simplifying our previous expression:

logNumber(3).flatMap(x => logNumber(5).map(y => x * y))

That could be chained up to the infinite (and beyond ...):

logNumber(3).flatMap(x => logNumber(5).flatMap( y => logNumber(7).map(z => x * y * z)))

and so on and so on etc... producing at each flatMap function invocation a new StringWriter[Int] typed container instance. 
The beauty from my point of view, comes from the separation of the extraction/insertion of the value into the working context from the function application (z => x * y * z), cleanly isolated. The complete definition I provided was

package com.promindis.patterns


trait MonadHelper[M[_]] {
  def flatten[T](m: M[M[T]]): M[T]
}

trait Monad[T, M[_]] extends Functor[T, M]{
  def flatMap [U](f: T => M[U])(implicit helper: MonadHelper[M]): M[U] = helper.flatten(map(f))
}


I voluntarily split the flatten function into a second trait as this generic operation is bound only to the type of the container. I expect the compiler to check at compile time that implicit object helpers are available in the scope. (I don't like implicits specifically but they can become very handy if not over used) As I expect my StringWriter to become some day reusable, I reused one of my existing traits, the Monoid to define my writer trait as in LYAH. Monoid not being the purpose of today blog, just keep in mind the LYAH definition remembering that these are simple structure allowing to isolate in handy way the manipulation of elements of the same type versus a simple function operation:

package com.promindis.patterns


trait Monoid[T] {
  def add(x: T, y: T): T
  def unit: T
  def concat(xs: Traversable[T]): T =
    if (xs.isEmpty) unit else xs.foldLeft(unit){add(_, _)}
}

object Monoid {
  implicit object StringMonoid extends Monoid[String] {
    override def add(x: String, y: String) = x + y
    override def unit = ""
  }

//  implicit object IntegerMonoid extends Monoid[Int] {
//    override def add(x: Int, y: Int) = x + y
//    override def unit = 0
//  }
//
//  implicit def listMonoid[T]: Monoid[List[T]] = new Monoid[List[T]] {
//    override def add(x: List[T], y: List[T]) = x ++ y
//    override def unit = Nil
//  }
//
//  def concatenate[T](xs: Traversable[T])(implicit monoid: Monoid[T]) = monoid.concat(xs)
}

As you have guessed, the StringMonoid definition offers us all the operations required to achieve String concatenation. 
Then comes the Writer trait definition :

package com.promindis.Logger

import com.promindis.patterns.Monoid


trait Writer[T, M] {
  val value: T
  val log: M
  def context: (T, M) = (value, log)
  def combined(otherLog: M )(implicit monoid: Monoid[M]) = monoid.add(log, otherLog)
}

and its implementation as both a Functor and a Monad

package com.promindis.Logger

import com.promindis.patterns.{MonadHelper, Monoid, Monad}


class StringWriter[T](val value: T, val log: String)  
extends Writer[T, String] with Monad[T, StringWriter]{

  def this(value: T)(implicit monoid: Monoid[String]) = this(value, monoid.unit)

  override def map[U](f: (T) => U) =  StringWriter(f(context._1), context._2)

  override def toString = "[" + value + ", " + log + "]"
}

object StringWriter {
  def apply[T](value: T) = new StringWriter[T](value)

  def apply[T](value: T, log: String) = new StringWriter[T](value, log)

  implicit object writerMonadHelper extends MonadHelper[StringWriter] {
    def flatten[T](m: StringWriter[StringWriter[T]]) = {
      val pair = m.context
      val innerW = pair._1
      val innerPair = innerW.context
      StringWriter(innerPair._1, m.combined(innerPair._2))
    }
  }

}
  • The StringWriter class will implicitly use the String concatenating monoid.
  • The StringWriter class is itself both a Monad and a Functor
  • The specific Monad helper is defined in the scope of the companion object as an implicit internal object writerMonadHelper  for the compiler to implicitly bind it to the flatMap method of the Monad trait
This is when we can play, and where the magic of the Scala compiler plays a nice part. 
We can of course reproduce the LYAH example using the chained invocation of flatMap/map methods, but as we implement both the methods flatMap and map, the Scala environment permits us to use for comprehension, in extenso:

package com.promindis.user

import com.promindis.Logger.StringWriter

object UseWriter {

  import StringWriter._

  def logNumber(x: Int) = StringWriter(x, "Got number " + x + " ")

  def main(arguments: Array[String]) {
    val value = for {
      a <- logNumber(3)
      b <- logNumber(5)
    } yield a * b

    println(value)
    println(
      logNumber(3).flatMap(x =>
        logNumber(5).map(y =>
          x * y)))

    val anotherValue = for {
      a <- logNumber(3)
      b <- logNumber(5)
      c <- logNumber(7)
    } yield a * b * c

    println(anotherValue)
    println(
      logNumber(3).flatMap(x =>
        logNumber(5).flatMap(y =>
          logNumber(7).map(z =>
            x * y * z))))
  }
}

That shall produce

[info] Running com.promindis.user.UseWriter
[15, Got number 3 Got number 5 ]
[15, Got number 3 Got number 5 ]
[105, Got number 3 Got number 5 Got number 7 ]
[105, Got number 3 Got number 5 Got number 7 ]

Not fond of the imperative style of the for comprehension, I concede that their use does clarify the code. Unlike Tony Morris I do not have provided more detailed example of usage but promise to work on them as soon as I will have ended writing the blog. 

Thank you for following me into this first exploration and do not hesitate to produce helping comments. 

Be seeing you !!! :)