Sunday, October 9, 2011

One pmap for the road before partying with an agent

Hi again. And thank you to the positive feedbacks of people having read the previous writings.

During a recent discussion with a bunch of interesting guys, one of them asked me why I was always expecting so much time between two consecutive writings.While explaining to him how hard it was to find both subject that might interest people and time to write about it, he then suggested that I should from time to time choose into one of my small Clojure/Scala/Java experiments and just reproduce this one it is.
It might not be perfect but should bring some interesting feedbacks. In addition to his comment, I noted, as from a personal point of view, that I liked too reading from other people short program stories because these small programs always present new keywords, other visions of problem analysis and so on...

As my current discovery of Akka fault tolerance takes longer than expected, I randomly picked up one of my last projecteuler implementation. I found the problem interesting as it provided some opportunity to experiment, on a (very) modest scale, on elements of parallelism in Clojure.

The problem was to find the greatest product of four adjacent numbers in any direction (up, down, left, right, or diagonally) in a 20x20 grid. I immediatly started writing something based on the feeling I had, how suitable the problem was for a functional programming kata.
The matrix can be divided in sub matrices in charge each to identify their maximum product, then maximum maximorum of all matrices is the looked up result. A maximum search is idempotent, the product processing in matrices can be easilly implemented using higher order functions etc..
Having loaded the most recent version of Clojure (1.3) , the Leiningen project was configured with the following content:

(defproject sorting "1.0.0-SNAPSHOT"
  :description "Algorithms"
  :dependencies [[org.clojure/clojure "1.3.0"]
  				[org.clojure/clojure-contrib "1.2.0"]])

where the contrib dependency has been introduced in order to parse the file hosting the matrix definition, making use of the io API. This light use is encompassed into the following file header:

(ns algorithms.matrix
	(:use [clojure.contrib.io :only [read-lines]])
	(:use [clojure.string :only [split trim-newline trim]]))

I selectively chose to read the lines and trim them if necessary. Data copied and paste from the net can sometime be strange. My first kata step was to load a file from the filesystem, a task - who believes it - I never achieved in Clojure before !!.
The test written to challenge that huge effort remained very simple as I tested that my loaded file was a real square matrix:

(def from-file "/dev/projects/clojure/leinengen/algorithms/matrix")

(deftest loaded-matrix-from-file-should-have-right-size
  (is (= 20 (count  (matrix-definition from-file))))
  (is (= 20 (count  (first (matrix-definition from-file))))))

My test looks like an acceptance test as I did not get into detailed info. I took the opportunity to apply all the necessary conversions from character to numbers inClojure. This drove me to the solution:

(defn string-matrix [lines]
    (map 
        (fn[line] (split (trim (trim-newline line)) #" ")) 
	       lines))

(defn to-int [line]
    (vec (map #(Integer/parseInt %) line)))

(defn converted [s-matrix]
    (vec (map #(to-int %) s-matrix)))

(defn matrix-definition [in-file]
  (converted (string-matrix (read-lines in-file))))

The matrix-definition function is the entry point, accepting a file path definition as a string expression. The read-lines function from the Clojure contrib library delivers a lazy sequence of strings that the string-matrix function converts into a sequence of character sequences. The converted function then to-int function helps tranforming the matrix of Strings into a matrix of integers.
No big deal.
 Nice, I have this big matrix now. In order to proceed, a next interesting step would be to able to grab square four matrices:

(def matrix-origin [  [ 8  2 22 97]
                      [49 49 99 40]
                      [81 49 31 73]
                      [52 70 95 23] ])

(def matrix-bottom-right [  [40 62 76 36]
                            [74 04 36 16]
                            [23 57 05 54]
                            [89 19 67 48]])

(deftest four-matrix-at-origin-should-match-file-content
  (is 
     (= matrix-origin 
        (square-at 0 0 4 (matrix-definition from-file )))))

(deftest four-matrix-at-bottom-left-should-match-file-content
  (is 
     (= matrix-bottom-right 
       (square-at 16 16 4 (matrix-definition from-file )))))

The purpose  of the function square-at will be to bring back this small sub matrix from our (not so) big definition. Here the for comprehensions come to the rescue

(defn square-at [x y size matrix]
    (vec (for [posy (range y (+ y size))]
      (vec (for [posx (range x (+ x size))]
        (get-in matrix [posy posx]))))))

Basically, quoting Stuart Halloway, a list comprehension creates a list based on an existing list. Working, with matrix, I naturally transform the result in a vector of vectors, for being indexed list, mandatory in our case. In possession of a matrix, we can work on the extraction of maximum products among rows, columns, and diagonals:

(deftest max-product-in-matrix-with-origin-should-match
  (is (= 24468444 
     (max-product-in (square-at 0 0 4 (matrix-definition from-file ))))))

Operating using wishful thinking, I expect to work from a main method:

(defn max-product-in [matrix]
  (apply max 
      (concat 
        (line-products-in matrix)
          (col-products-in matrix)
            (diag-products-in matrix))))

while extracting the maxima obtained from helping functions.
So far so good, I comment my current macroscopic test and define finer grained test so to challenge the helping functions input/output function:

(deftest line-products-in-origin-matrix-should-match
  (is (= [34144 9507960 8981847 7953400] 
     (line-products-in (square-at 0 0 4 (matrix-definition from-file ))))))

(deftest col-products-in-origin-matrix-should-match
  (is (= [1651104 336140 6414210 6514520] 
     (col-products-in (square-at 0 0 4 (matrix-definition from-file ))))))

(deftest diag-products-in-origin-matrix-should-match
  (is (= [279496 24468444] 
     (diag-products-in (square-at 0 0 4 (matrix-definition from-file ))))))

The implementing helping functions also host comprehensions combined with reduce operations, a whole assembly of higher order functions, as predicted:

(defn line-products-in [matrix]
  (vec (map #(reduce * %) matrix)))

(defn col-products-in [matrix]
    (vec (for [x (range 0 (count matrix))]
      (reduce * (map #(get % x) matrix)))))

(defn diag-products-in [matrix]
  (let [size (count matrix)]
    [(reduce * (for [x (range 0 size)]
      (get-in matrix  [x x])))
     (reduce * (for [x (range 0 size)]
      (get-in matrix  [ x (- (dec size) x)])))]))

Good, let's uncomment the previous test: max-product-in. Green(...figure of speech: in the REPL, you execute the (run-tests) methods and read the report). Good. My attempts to get the maiximum maximorum can be resumed in :

(deftest find-maximum-from-file
  (is 
     (= 70600674 
       (find-maximum (matrix-definition from-file) 20))))

And my first experience leads me to a simple solution:

(defn find-maximum [matrix in-range]
  (let [size (- in-range 3 ) 
        coordinates (for [x (range 0 size) y (range 0 size)] [x y])]
    (apply max 
      (map 
        #(max-product-in (square-at (first %) (second %) 4 matrix)) 
          coordinates))))

where in-range defines the width of my square matrix , as I want to be able to extend its size. Note I could have changed too the number of operands in the multiplication.
Works fine!

A first try to make a little bit parallel my algorithm would be the use of pmap. The magic in there, lays onto the change from map to pmap in the algorithm:

(defn par-find-maximum [matrix in-range]
  (let [
        size (- in-range 3 ) 
        coordinates (for [x (range 0 size) y (range 0 size)] [x y])]
    (apply max 
      (pmap 
        #(max-product-in (square-at (first %) (second %) 4 matrix)) 
          coordinates))))

Improvements on performance are not obvious, and more visible when the size of the matrix is dramatically extended by unfolding it for example, duplicating itself. Working on bigger multiplications, so bigger matrices would also make a difference.
As literally stated in pmap comment, pmap appears to be useful for computationally intensive functions where the time of the applied functions dominates the coordination overhead. At the scale of my small computer (Dual core) processing small operations, the improvements are not to be obvious.

Just for fun, because it does not provide any improvement in this situation, let's try to with agents. Ideally we could save the main thread the work of processing the maximum, deferring the operation to an abstraction dealing with the operation asynchronously.
Clojure agents can take in charge this kind of operation. An agent binds a value like a reference in Clojure, so it can be dereferenced at will. As with actors in Scala(and naturally Akka), an agent works on a queue of messages, each message corresponding to new incoming action aiming to change the bound value.
Here stops the similarities between actors and agents. The agent only changes its internal bound value versus asynchronous queued incoming actions. So it remains value centric, on the contrary of actors who can swap behaviors, transfer continuations etc...
Michael Fogus and Chris Houser take time to explain the differences in the Joy of Clojure in far more technical descritpion than mine :) As far as our little problem is concerned, I'd like to apply a little producer/consumer mechanic where my main thread of execution would push the values to an agent switching its bound value to a new value when the incoming value is greater then the one already bound. Here is the solution:

(defn other-par-find-maximum [matrix in-range]
  (let [
        size (- in-range 3 ) 
        sorter (agent 1 :validator number?)]
    (dorun
      (for [x (range 0 size) y (range 0 size)]
        (send sorter max (max-product-in (square-at x y 4 matrix)))))
    (await sorter)
    @sorter))

The agent is impersonated as a sorter, initialized with the value 1 and providing a validator method, in charge of checking that the input is effectively a number:

sorter (agent 1 :validator number?)

For each processed maximum in the matrix we send its result value to the agent:

(send sorter max (max-product-in (square-at x y 4 matrix)))

in respect to the send method syntax. The send method takes as input the agent reference in addition to a new value and a function to apply to both the bound and new value. The resulting bound value will be the output of the applied function.
 Good, we are done, tests green !!

 So we have seen how to apply a low cost change to our processing, migrating from map to pmap, and so preserving our algorithm. We have also used a Clojure agent in a producer/consumer pattern.

A short one, awaiting for incoming deeper subjects. Being on holiday soon, I will try to push more of these little experiments and take time to develop the biggest subjects.
Before I forget, the book of the week, by G.J.Michaelson, An Introduction to Functional Programming Through Lambda Calculus, to (re)discover...the postscript edition being available on line here.

 Got to switch to Scala, be seeing you !!! :):):)

2 comments:

Adrian Mouat said...

Be careful using contrib version 1.2 with clojure version 1.3; parts of it are incompatible.

If you want to use Clojure 1.3, you should be using the new modular contrib stuff (http://dev.clojure.org/display/doc/Clojure+Contrib)

Coincidentally, Greg Michaelson was actually one of my lecturers during my undergrad!

Globulon said...

Thank you very much for your feedback. I am going to check the new modular Clojure contribution

Post a Comment