HADOOP’S STOCKS IN TRADE
Apache Hadoop, open-source
software, has proved to be the data
prospector with the most market
traction in the last five years. Originally created by current Cloudera
CEO and Apache Foundation
Chairman Doug Cutting while he
worked at Yahoo, Hadoop got its
name from a stuffed elephant (an
appropriate image for so-called big
data) belonging to Cutting’s son.
Hadoop processes large caches of
data by breaking them into smaller,
more accessible batches and distributing them to multiple servers
to analyze. (Agility is a vital attribute: It’s like cutting your food into
smaller pieces for easier consumption.) Hadoop then processes queries
and delivers the requested results
in far less time than old-school
analytics software—most often minutes instead of hours or days.
“The analysts at Gartner and IDC
have described big data as being
about the volume, velocity and
variety of data, and those are the
things that draw people to Hadoop
as a system,” said Cloudera Product
Manager Charles Zedlewsky.
After Cutting and his internal Yahoo team came up with the
Hadoop code, it was tested and used
extensively within the Yahoo IT system for several years. The company
subsequently released the code to
the open-source community, which
enabled a whole new IT sector: the
productization of Hadoop.
Why give away the code? Because
when Cutting and Yahoo developed,
tested and ran the base code in-house, they learned how complicated it is to use. They immediately
saw that the money-earning future
of the software would come from
surrounding services: an intuitive
user interface, customized deployments and additional features.
In March 2009, startup Cloudera
was the first independent company
to take the open-source code and
productize the Hadoop analytics
engine with its CDH (Cloudera’s
Distribution, including Apache
Hadoop) and Cloudera Enterprise.
An impressive group of investors
and advisors teamed up to launch the
company, including VMware founder
and former CEO Diane Greene,
Flickr co-founder Caterina Fake, former MySQL CEO Marten Mickos,
LinkedIn President Jeff Weiner and
Facebook CFO Gideon Yu.
Since Cloudera’s debut, a handful
of top-tier companies and startups
have crafted their own versions of
Hadoop based on the freely available open-source architecture.
This is truly a new-generation
enterprise IT competition. It’s similar to a relay race in that all the contestants have the same type of baton
(Hadoop code) and have to compete
based strictly on their own speed,
agility and creativity. Currently, the
race is on among a new set of competitors attempting to market big
data analytics to the most enterprises
in the most effective way.
BIG BET AT BIG BLUE
IBM, the first large systems
maker to use the engine, provides its
Hadoop-based InfoSphere BigInsights
in basic and enterprise editions. But
the company has even bigger plans.
Speaking to a Computer History Museum audience Aug. 4 in
Mountain View, Calif., CEO Sam
Palmisano said Big Blue is putting
a heavy R&D emphasis on new-generation data analytics, describing it as one of the company’s “big
bets”—a project that requires at
least a $100 million investment.
At the same event, IBM Fellow
and Computer Science Research
Director Laura Haas said that IBM
Labs is far beyond the big data
research mode and is into “exadata”
analytics research. “We’re working
on some very, very interesting things
in this area,” Haas told e WEEK.
While Haas wasn’t at liberty to discuss details of the plans, Palmisano
revealed this in his Aug. 4 presentation: “In about a year from now, you’ll
be starting to see the fruits of our ‘big
bet’ on big data. The work we’ve been
doing for the last several years with
Watson [the IBM computer that won
JEOPARDY! matches against two human
champions] will move into products
that will be used for a great many
purposes, including health care,
science and financial applications.
“Our engineers say they’re not
far away from building a supercom-puter about the size of a human
brain that can fit into a shoebox.”
Now that’s squeezing big data into
a small package.
OTHER HADOOP DISTRIBUTIONS
Newcomer MapR Technologies
released a distributed file system
and MapReduce engine, the MapR
Distribution for Apache Hadoop. It
also partnered with storage and security giant EMC to provide another