Saturday, March 31, 2012

Where we explore the basics of Iteratees - Part I


It's been a long time since we have exchanged on interesting stuff in Scala nor Clojure. I propose you today to keep on going working on Scala examples. I did not forget about Clojure nor Lisp. I just try to find a very gifted Lisper with  3 or 4 decades of practice and the patience to teach me from the botton the Lisp Way and functional programming. If you know someone, don't hesitate to make us get in touch. Now, I need guidance more than ever.

I changed recently for a new position, surrounded by more gifted guys than I am in Scala and I have to follow... far behind as much as I can. That's life, from the comfort zone of a JEE expert to the tough land of a beginner. By the way, there will  be no coming back.
Our functional ramblings lead us to the adoption of the Play framework. Pleasant, smart, the Play framework has been lifted up (may I say :)) as of version 2.0 and exhibits a lot of features linked to hot functional programming topics like Iteratees. 

When experts present Iteratees they generally target complete IO examples. You can find high level examples here and here. Having been exploring Scala for a few months only (barely during three massively interrupted years) I cannot go so far in one run. I wanted at least to explore the basics of the pattern. This is today purpose. 

Choosing as a starting point the canonical article from the Monad reader Issue #16 named "Teaching an Old Fold New Tricks", I propose you to join me and start a first step journey into Iteratees. In his nice presentation, John W. Lato reminds us about the classical way of handling IO using cursors and manipulating chunks of data, leading to a very imperative style of programming. 
In addition ,the use of cursors favours everything except the composition of functions. At some point while working with IO we are forced to dig deeper into the API, forcing ourselves in handling lower order methods and pulling data from our streams of data. Enumeration based IO seems to provide us with a way to embrace a more composable, higher-order approach. The idea seems simple as inspired by the behaviour of the foldleft function, as defined in every functional programming language implementation. A foldleft function typically takes three arguments, as suggested by this imaginary foldleft function derived from the GenTraversableOnce interface in Scala:

  • a starting value for an accumulator (z)
  • a list of element used for the accumulation (l)
  • a function combining the accumulated value (or accumulator) with each of the element (f)
The foldleft function pushes the sequence elements into  the accumulated value using the combining function. Still following Lato's reasoning, we can consider that we enumerate over a stream of input values using an accumulator in order to maintain a kind of computational state. Let's consider processing streams of data the same way.
The accumulator, we call it Iteratee and provide it with the role of a stream processor. As enumerating our stream content, we would pass the Iteratee an element to be consumed and then get rid of that element, passing the following element and so on. Whatever happens , having pushed the element to the Iteratee, there is no coming back whether the Iteratee effectively processes the element or not, period. Sticking to the article so to learn , and reproducing the streamG abstract data type we provide the following representation of a stream:

As we will push the elements one by one, no need to use the standard approach representing streams as flow of trunk of data. In our implementation, we implement exactly the same stream markers as in the Monad reader issue. EMPTY notifies Iteratees that a stream does not present an element but is expected to provide upcoming elements, and EOF means to an Iteratee that the stream is closed. As nicely explained by Runar in his post - obviously derived from the same article - this abstract data type implementation is arbitrary and very IO oriented as we could define different stream markers. An computation (aka an Iteratee) can represent two states (I use the word state on purpose):
  • Done and as-is will offer a result
  • or expecting an upcoming continuation
that leads to the following implementation:

Regarding the Cont case class, the apply method provides a simple way to extract the result of the continuation from the incoming stream element. As with the foldleft method we want to be able to enumerate over our values and feed the Iteratee for them to proceed to the computation:

We recognize here a classic tail recursive process , stopping when the input sequence el is empty or the input Iteratee iter is done. Otherwise, the enumeration goes on and on... But an enumeration provides an Iteratee possibly holding our expected result. We must then apply some kind of run operation in order to retrieve the result. We face two situations:
  • we are done and we then get the result
  • we are not done yet and we expect the result providing an EOF marker. At that stage we can be provided with a result or no result at all, so an Optional result
Implemented in scala:

For this first approach we will stick to simple stuff , no fluff. Let us try to play with basic Iteratees. The following extract of code provides canonical examples:

respectively, getting the head of stream, dropping a given amount of elements and calculating the length of a stream. Although we kept simple the implementation of these first functions, one should notice a very interesting facet in this approach. Running the sample:

provides the following result

[info] Running com.promindis.user.UseIteratees

Having scrutinized the case matching in the function body one might notice that
  • An iteratee can return Done result only when receiving an EOF marker (length function)
  • An iteratee can return a Done result by itself (drop method)
So control on processing can both  come from the input stream of data or from the Iteratee ... or both. Consequence is one can pair the usage of a "never-ending" Iteratee with multiple sources or combine multiple Iteratees processing the same source. When done an Iteratee can let a "successor"  Iteratee takes the processing in charge. Here comes the magic word of combination which sounds like the word Monad. From the project derived from our previous posts we can derive the monadic behaviour of the Iteratees from the Applicative trait, getting for free the Functor , Monad and Applicative Functor extensions. As a reminder both the Functor and Monad definitions where given by:

Basically we must implement the map and flatten functions and provide a kestrel combinator to automatically pair an Iteratee with a matching applicative functor instance:

The function iterateesToApplicative acts as a factory function providing an Applicative Functor for Iteratees handling StreamG of type E elements. The typed lambda projection is achieved as to allow for the extraction of the result of an Iteratee computation.
The iterateeOnSteroid function represents the kestrel combinator implementation implicitly converting an Iteratee instance to a monadic structure.
The map function implementation remains trivial as usual. We apply the mapping function to the result of the Iteratee. Done, may I say :). Notice we propagate the map effect, in the case of continuation as an another continuation.
 Regarding the implementation of the flatten method we literally translated our previous expectations. When the flatten function is provided with a Done(Done(_, _), s) data structure, the computation is also in state Done for the resulting Iteratee and we pass the state as-is. A Done(Cont(f), s) data structures expresses that our resulting Iteratee is of course also in a Done state, but we can apply the result of the previous continuation to the stream element. Obviously dealing with continuation, leads us to recreate a new continuation to be flatten again until a Done Iteratee is met.

I agree this part is not easy and I have to come back to it on a regular base. At this step the last typical example of combination to provide to the reader is the famous alternate example. Imagine we were provided with a list of numbers and would want to pick elements every two elements. Basically we would compose the head and drop Iteratees functions presented earlier: 

and replicate the behaviour a certain amount of time. Bang we are doomed.
We never though so far to implement a replicate function. We face a situation where applying the implicit conversions will save the day once more. We define first the concept of a Replicable object for both simple object instances and object instances embedded in context:

with the help of Monoid definitions for both simple object and single parameter constructor type :

We can then provide an implicit conversion to replicates for both simple object and objects wrapped in contexts:

and some function allowing us to convert a traversable structure of replicated monadic values into an monadic traversable structure of values (aka apply a sequence as we did last time):

There we are, ready to provide the famous alternate example:

providing :

[info] Running com.promindis.user.UseIteratees
List(2, 4, 6, 8, 10)

Hurray,  we reached the first part of the seminal article. The next leap into the world of Iteratees should be longer. So far we learned what Iteratees are. We discovered they brought us a new vision on the way of dealing with streams of data. We brought to them some element of composition. This implementation allowed us to play with our previous experimentations on traversable structures and helped us in leveraging our skills in the reuse of monadic patterns.
In essence, all together we progressed.

 Before getting to IO application I will diverge soon to parser combinators. Some common aspects between Iteratees, parser combinators and the state Monad should help in understanding and applying all these concepts. Until next time, learn, hack, progress and never be satisfied of what you know.
Don't let anybody prevent you from leveraging and growing your knowledge. Improve, get better !!!
The project can be found at the same location as usual.

Be seeing you !!! :)


Post a Comment