Saturday, March 31, 2012

Where we explore the basics of Iteratees - Part I


It's been a long time since we have exchanged on interesting stuff in Scala nor Clojure. I propose you today to keep on going working on Scala examples. I did not forget about Clojure nor Lisp. I just try to find a very gifted Lisper with  3 or 4 decades of practice and the patience to teach me from the botton the Lisp Way and functional programming. If you know someone, don't hesitate to make us get in touch. Now, I need guidance more than ever.

I changed recently for a new position, surrounded by more gifted guys than I am in Scala and I have to follow... far behind as much as I can. That's life, from the comfort zone of a JEE expert to the tough land of a beginner. By the way, there will  be no coming back.
Our functional ramblings lead us to the adoption of the Play framework. Pleasant, smart, the Play framework has been lifted up (may I say :)) as of version 2.0 and exhibits a lot of features linked to hot functional programming topics like Iteratees. 

When experts present Iteratees they generally target complete IO examples. You can find high level examples here and here. Having been exploring Scala for a few months only (barely during three massively interrupted years) I cannot go so far in one run. I wanted at least to explore the basics of the pattern. This is today purpose. 

Choosing as a starting point the canonical article from the Monad reader Issue #16 named "Teaching an Old Fold New Tricks", I propose you to join me and start a first step journey into Iteratees. In his nice presentation, John W. Lato reminds us about the classical way of handling IO using cursors and manipulating chunks of data, leading to a very imperative style of programming. 
In addition ,the use of cursors favours everything except the composition of functions. At some point while working with IO we are forced to dig deeper into the API, forcing ourselves in handling lower order methods and pulling data from our streams of data. Enumeration based IO seems to provide us with a way to embrace a more composable, higher-order approach. The idea seems simple as inspired by the behaviour of the foldleft function, as defined in every functional programming language implementation. A foldleft function typically takes three arguments, as suggested by this imaginary foldleft function derived from the GenTraversableOnce interface in Scala:

  • a starting value for an accumulator (z)
  • a list of element used for the accumulation (l)
  • a function combining the accumulated value (or accumulator) with each of the element (f)
The foldleft function pushes the sequence elements into  the accumulated value using the combining function. Still following Lato's reasoning, we can consider that we enumerate over a stream of input values using an accumulator in order to maintain a kind of computational state. Let's consider processing streams of data the same way.
The accumulator, we call it Iteratee and provide it with the role of a stream processor. As enumerating our stream content, we would pass the Iteratee an element to be consumed and then get rid of that element, passing the following element and so on. Whatever happens , having pushed the element to the Iteratee, there is no coming back whether the Iteratee effectively processes the element or not, period. Sticking to the article so to learn , and reproducing the streamG abstract data type we provide the following representation of a stream:

As we will push the elements one by one, no need to use the standard approach representing streams as flow of trunk of data. In our implementation, we implement exactly the same stream markers as in the Monad reader issue. EMPTY notifies Iteratees that a stream does not present an element but is expected to provide upcoming elements, and EOF means to an Iteratee that the stream is closed. As nicely explained by Runar in his post - obviously derived from the same article - this abstract data type implementation is arbitrary and very IO oriented as we could define different stream markers. An computation (aka an Iteratee) can represent two states (I use the word state on purpose):
  • Done and as-is will offer a result
  • or expecting an upcoming continuation
that leads to the following implementation:

Regarding the Cont case class, the apply method provides a simple way to extract the result of the continuation from the incoming stream element. As with the foldleft method we want to be able to enumerate over our values and feed the Iteratee for them to proceed to the computation:

We recognize here a classic tail recursive process , stopping when the input sequence el is empty or the input Iteratee iter is done. Otherwise, the enumeration goes on and on... But an enumeration provides an Iteratee possibly holding our expected result. We must then apply some kind of run operation in order to retrieve the result. We face two situations:
  • we are done and we then get the result
  • we are not done yet and we expect the result providing an EOF marker. At that stage we can be provided with a result or no result at all, so an Optional result
Implemented in scala:

For this first approach we will stick to simple stuff , no fluff. Let us try to play with basic Iteratees. The following extract of code provides canonical examples:

respectively, getting the head of stream, dropping a given amount of elements and calculating the length of a stream. Although we kept simple the implementation of these first functions, one should notice a very interesting facet in this approach. Running the sample:

provides the following result

[info] Running com.promindis.user.UseIteratees

Having scrutinized the case matching in the function body one might notice that
  • An iteratee can return Done result only when receiving an EOF marker (length function)
  • An iteratee can return a Done result by itself (drop method)
So control on processing can both  come from the input stream of data or from the Iteratee ... or both. Consequence is one can pair the usage of a "never-ending" Iteratee with multiple sources or combine multiple Iteratees processing the same source. When done an Iteratee can let a "successor"  Iteratee takes the processing in charge. Here comes the magic word of combination which sounds like the word Monad. From the project derived from our previous posts we can derive the monadic behaviour of the Iteratees from the Applicative trait, getting for free the Functor , Monad and Applicative Functor extensions. As a reminder both the Functor and Monad definitions where given by:

Basically we must implement the map and flatten functions and provide a kestrel combinator to automatically pair an Iteratee with a matching applicative functor instance:

The function iterateesToApplicative acts as a factory function providing an Applicative Functor for Iteratees handling StreamG of type E elements. The typed lambda projection is achieved as to allow for the extraction of the result of an Iteratee computation.
The iterateeOnSteroid function represents the kestrel combinator implementation implicitly converting an Iteratee instance to a monadic structure.
The map function implementation remains trivial as usual. We apply the mapping function to the result of the Iteratee. Done, may I say :). Notice we propagate the map effect, in the case of continuation as an another continuation.
 Regarding the implementation of the flatten method we literally translated our previous expectations. When the flatten function is provided with a Done(Done(_, _), s) data structure, the computation is also in state Done for the resulting Iteratee and we pass the state as-is. A Done(Cont(f), s) data structures expresses that our resulting Iteratee is of course also in a Done state, but we can apply the result of the previous continuation to the stream element. Obviously dealing with continuation, leads us to recreate a new continuation to be flatten again until a Done Iteratee is met.

I agree this part is not easy and I have to come back to it on a regular base. At this step the last typical example of combination to provide to the reader is the famous alternate example. Imagine we were provided with a list of numbers and would want to pick elements every two elements. Basically we would compose the head and drop Iteratees functions presented earlier: 

and replicate the behaviour a certain amount of time. Bang we are doomed.
We never though so far to implement a replicate function. We face a situation where applying the implicit conversions will save the day once more. We define first the concept of a Replicable object for both simple object instances and object instances embedded in context:

with the help of Monoid definitions for both simple object and single parameter constructor type :

We can then provide an implicit conversion to replicates for both simple object and objects wrapped in contexts:

and some function allowing us to convert a traversable structure of replicated monadic values into an monadic traversable structure of values (aka apply a sequence as we did last time):

There we are, ready to provide the famous alternate example:

providing :

[info] Running com.promindis.user.UseIteratees
List(2, 4, 6, 8, 10)

Hurray,  we reached the first part of the seminal article. The next leap into the world of Iteratees should be longer. So far we learned what Iteratees are. We discovered they brought us a new vision on the way of dealing with streams of data. We brought to them some element of composition. This implementation allowed us to play with our previous experimentations on traversable structures and helped us in leveraging our skills in the reuse of monadic patterns.
In essence, all together we progressed.

 Before getting to IO application I will diverge soon to parser combinators. Some common aspects between Iteratees, parser combinators and the state Monad should help in understanding and applying all these concepts. Until next time, learn, hack, progress and never be satisfied of what you know.
Don't let anybody prevent you from leveraging and growing your knowledge. Improve, get better !!!
The project can be found at the same location as usual.

Be seeing you !!! :)

Sunday, March 4, 2012

Where we traverse, accumulate and collect in Scala


In a previous post targeting Applicative ramblings, we studied the ability to transform a list of monadic values into a Monadic list of values. In situation when one wishes to convert a list of futures into a future of list, or convert a list of optional values into an optional list of values during a validation process, this ability can become very handy and self explanatory. 
Recently, as watching the wonderful presentation of Rúnar Bjarnason here on functional programming in Scala with Scalaz, I noticed a more generic definition of the sequence function we have early defined. Starting form the usual code project here, from:

we could abstract the concept into the following trait:

where the type constructor C plays the part of a "traversable" structure. This more complete definition provides also a sequence method, but this time, taking benefit from a more generic traverse method. Quoting the McBride and Patterson seminal article on applicative programming with side effects, let say the Traverse trait captures data structures built with the C type constructor through which it can apply applicative computation. 
In essence the traverse method allows us to traverse a C[T] structure and produces a M[C[U]] structure with the help of a f: (T) ⇒ M[U] function. Viewing Applicative Functors as a way to lift functions into the world of Monadic values one can sense that using the traverse function we are going to distribute Monadic applications all over the data structure. 
Doing so, we will be able to convert a structure of possible values to a possible structure of values, same with a structure of Futures, Lists etc. and so on, whatever is the structure I can define as traversable using an ad-hoc definition.
I provided two definitions of traversable. One to reproduce the list example, and a second to play with a tree structure. Concerning the list definition:

As lists belong to a more generic category of traversable structures per se, I freely defined a ListLike trait gathering the basic functions required during the construction of a list. Then I explicitly self referenced this first trait in a second trait TraverseListLike reusing all the basic methods of the first trait. We witness here the real power of the applicative style allowing us to lift all the tool methods into the world of Monadic values keeping the construction of the new monadic structure specifically clear. 
The expression

  cons[U] _ :@:f(value):*:traverse(rest(source))(f) 

nicely reproduces the pattern of construction of a list like structure:  

(cons value (rest))  

in Lisp-ish form. The same procedure can apply to a structure I wanted to play with, the trees:

Both the definition of a Tree (algebraic data type) and of the Traverse object speak for themselves. Here, we still can perceive through the applicative style form  

cons[U] _ :@: f(v) :*: traverse(left)(f) :*: traverse(right)(f)  

the pattern of construction of nodes in a tree 

(cons value left-node right-node)  

Exercising our new toys provides the following:

At that step we have implemented the interface kindly exposed by Rúnar. The traverse function reveals itself to be very powerful. One example of application provided in the McBride and Patterson article proposes to accumulate information while we are traversing the structure. The cumulated value could evolve depending on each of the structure item. An elegant way to do it consists in cumulating values calculable using the add method of a Monoïd given by the type of the values:

Cumulating a value means changing a State while we are traversing the tree, the changing state type being the type of the cumulative value. De facto we think about using the State Monad which we extended to the applicative definition:

I propose you the following implementation for an accumulate function:

The function distributed all over the structure,  

(x: A) ⇒ toAccumulator(x, f, monoid) 

impersonates the f: (T) ⇒ M[U] function as one of the traverse function parameters. 
The toAccumulator function provides a State instance depending both on an input value being the current item in the tree and the Monoïd used for the cumulative effect. The Monoïd cumulate the result of applying f to the current item with any provided external state. 
Traversing the structure means chaining all the cumulated state changes each change depending on a local item value. 

The function f: (T) ⇒ M[U] expects a single parameter type constructor M[_]. So, instead of defining once again a type lambda I defined a local Projection[X] type synonym allowing to dictate to the traverse function the [Projection, A, A] parameters where Projection matches a single parameter type constructor. In exchange the Scala compiler not fooled by my trick demands an explicit specification of the returned type : State[T[A], U]. I'd really love Scala type Gurus to provide me with an explanation concerning this behaviour and another about the fact too that the compiler needs a local definition of the stateApplicative[U]() as the stateApplicative function is already defined implicitly. 

Suppose I want to check some property on any of the element in the tree in a Tree.The input function f: (T) ⇒ U in the accumulate functions becomes a predicate producing a boolean value we cumulate using an OR operation.Wishing to verify if there are some integers greater than 8 in a tree of integers, we provide:

Providing the Any definition as a an logical OR centric Monoid. Both the traversableTree applicative functor and Any Monoid are explicitly provided as the Monoid Any definition is not implicit. We could could cumulate all the values greater than 8:

Here the toListMonoid definition is implicit, so omitted. Building on top of the accumulate function we can reduce (aka crush) all the values in a tree using a Monoïd given by the type of the elements in the tree:

In one of his wondeful posts Debasish Gosh presents Scala variations on the The Essence of the Iterator Pattern article by Jeremy Gibbons and Bruno C. d. S. Oliveira. 
Among all variations, one function called collect, specifically modifies the elements of a data structure purely and independently of the accumulation process. The example provided converts a list of integers into a new list while counting the number elements independently. Here is the implementation of his example:

The implementation, very similar to the accumulate function definition splits in the state definition (f(x), g(s))) both the effect of transforming the tree applying f(x) to a current item, while cumulating a state g(s).

Today we started taking benefit from the Applicative style levering our ability to traverse data structures, cumulating effects, rebuilding our tree, or doing both in just one shot (code project here). 
 I have to go and work on CQRS now. Night's going to be short 

Be seeing you !!! :)