Sunday, February 22, 2009

Apache Hadoop

Apache Hadoop - an open source software framework for reliable, scalable and distributed computing in Java developed by Apache. The components/sub-projects are Hadoop core (distributed file system and computing framework), Hbase (scalable and distributed database), Pig (framework for computation, ZooKeeper (reliable coordination system) and Hive (a data warehouse).

The current design supports upto 10,000 node clusters and petabytes of data. Many of the top organizations are using Hadoop to run large distributed computations.

No comments: