Three days back at
Google I/O 2013 Cloud Keynote conference, Google announced that they are getting
rid of initial MapReduce processing engine (published at early 2004). When followers are adopting MapReduce
paradigm into their products and services like Hadoop based platform or Processor
Core based parallel processing implementation, Google is already started
looking for next gen solution of parallel processing called
"FlumeJava". A paper was published by Google Lab on FlumeJava in
early 2010 by Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams,
Robert R. Henry, Robert Bradshaw and Nathan Weizenbaum. In I/O' 13 conference
developer advocate of Cloud Compute Engine announced that they have successfuly
replaced MapReduce initial Engine and FlumeJava already is in production.
MapReduce and similar
systems significantly ease the task of writing data-parallel code. However,
many real-world computations require a pipeline of MapReduces, and programming
and managing such pipelines can be difficult. FlumeJava, a Java library that makes
it easy to develop, test, and run efficient data-parallel pipelines.
1. Parallel collections
and their operations present a simple, high-level, uniform abstraction over
different data representations and execution strategies.
2. To enable parallel operations
to run efficiently it is internally constructing an execution plan dataflow
graph
3. When final outcome of
parallel processing outcome is needed FlemeJava first optimizes operations of
underlying sub operations.
4. Combination of
high-level abstractions for parallel computation, deferred evaluation and
optimization, and efficient parallel sub-operations, easy-to-use system that
approaches the efficiency of optimized pipelines.
Please note that
FlumeJava is not a complementary tool of MapReduce, it's developed to optimize
several pipelined MapReduce jobs in an optimized way. Research shows that Graph
based Algorithmic optimization is far more optimized in terms of time
complexity than hand held optimization of several MapReduce processing.
I can say, yet another
tool that came out from Google's basket which will open new era of parallel
processing techniques which will encourage and will bring fresh air to dev
communities in near future..... :)