Posted by Hiren Dutta on 3:49 PM


Three days back at Google I/O 2013 Cloud Keynote conference, Google announced that they are getting rid of initial MapReduce processing engine (published at early 2004). When followers are adopting MapReduce paradigm into their products and services like Hadoop based platform or Processor Core based parallel processing implementation, Google is already started looking for next gen solution of parallel processing called "FlumeJava". A paper was published by Google Lab on FlumeJava in early 2010 by Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw and Nathan Weizenbaum. In I/O' 13 conference developer advocate of Cloud Compute Engine announced that they have successfuly replaced MapReduce initial Engine and FlumeJava already is in production.
MapReduce and similar systems significantly ease the task of writing data-parallel code. However, many real-world computations require a pipeline of MapReduces, and programming and managing such pipelines can be difficult. FlumeJava, a Java library that makes it easy to develop, test, and run efficient data-parallel pipelines. 

1. Parallel collections and their operations present a simple, high-level, uniform abstraction over different data representations and execution strategies.
2. To enable parallel operations to run efficiently it is internally constructing an execution plan dataflow graph
3. When final outcome of parallel processing outcome is needed FlemeJava first optimizes operations of underlying sub operations.
4. Combination of high-level abstractions for parallel computation, deferred evaluation and optimization, and efficient parallel sub-operations, easy-to-use system that approaches the efficiency of optimized pipelines.

Please note that FlumeJava is not a complementary tool of MapReduce, it's developed to optimize several pipelined MapReduce jobs in an optimized way. Research shows that Graph based Algorithmic optimization is far more optimized in terms of time complexity than hand held optimization of several MapReduce processing.
I can say, yet another tool that came out from Google's basket which will open new era of parallel processing techniques which will encourage and will bring fresh air to dev communities in near future..... :)