Apache Logo
The Apache Way Contribute ASF Sponsors

Formal board meeting minutes from 2010 through present. Please Note: The board typically approves minutes from one meeting during the next board meeting, so minutes will be published roughly one month later than the scheduled date. Other corporate records are published, as is an alternate categorized view of all board meeting minutes.

2017 | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | Pre-organization meetings

Hadoop

18 Jan 2017 [Christopher Douglas / Chris]

Apache Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

The community is working through criteria for its 3.x and 2.x series,
particularly w.r.t. compatibility e.g., [1]. Progress on 3.0.0-alpha2 [2,3]
and a 2.8 [4] will likely produce RCs soon.

[1] https://issues.apache.org/jira/browse/HDFS-11096
[2] https://s.apache.org/zBhP
[3] https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0.0+release
[4] https://s.apache.org/smEX

RELEASES

Last release: 2.6.5 2016-10-07

COMMUNITY

(+ PMC Carlo Curino 2016-11-03)
(+ PMC Li Lu 2017-01-10)
(+ PMC Ming Ma 2016-11-03)
(+ PMC Rohith Sharma K S 2016-11-17)
(+ PMC Varun Vasudev 2016-10-20)
(+ PMC Zhe Zhang 2016-11-03)
(+ committer Bibin Chundatt 2016-12-12)
(+ committer Konstantinos Karanasos 2017-01-12)
(+ committer Rakesh Radhakrishnan 2016-12-30)
(+ committer Sidharta Seethana 2016-12-15)
(+ committer Sunil Govind 2016-10-27)
(+ committer Yiqun Lin 2017-01-14)
(+ branch-HDFS-9806 Thomas Demoor 2016-10-24)
(+ branch-YARN-5734 Jonathan Hung 2016-12-13)
(+ branch-YARN-5734 Min Shen 2016-12-13)
(+ branch-YARN-5734 Ye Zhou 2016-12-13)
auth: 161 committers and 75 PMC members

SECURITY

CVE-2016-3086: Apache Hadoop YARN NodeManager vulnerability
CVE-2016-5001: Apache Hadoop Information Disclosure

19 Oct 2016 [Christopher Douglas / Brett]

Apache Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity computers.

Apache Hadoop cut its first release from trunk since 2011. Releases in the 2.x
series will continue to follow stricter compatibility guidelines on a branch.
The 3.0.0-alpha1 release has kicked off several API cleanups, reasoning about
dependencies, and other pains endured to avoid downstream breakage.

RELEASES

2.6.5 was released 2016-10-07
2.7.3 was released 2016-08-24
3.0.0-alpha1 was released 2016-09-02

COMMUNITY

(+ committer Anu Engineer 2016-07-27)
(+ committer Mingliang Liu 2016-08-14)
(+ committer Wei-Chiu Chuang 2016-07-20)
(+ committer Xiao Chen 2016-07-20)
(+ committer Larry McKay 2016-07-20)
(+ branch-HADOOP-13345 Rajesh Balamohan 2016-08-06)
(+ branch-HADOOP-10285 Rakesh Radhakrishnan 2016-09-26)
(+ branch-HADOOP-12756 Mingfei Shi 2016-08-22)
(+ branch-HADOOP-13345 Aaron Fabbri 2016-08-14)
(+ branch-HDFS-9806 Ewan Higgs 2016-09-21)
(+ branch-HDFS-9806 Virajith Jalaparti 2016-09-21)
(+ branch-HDFS-9806 Pieter Reuse 2016-09-21)
(+ branch-YARN-4752 Daniel Templeton 2016-08-14)
(+ branch-YARN-5079 Billie Rinaldi 2016-08-12)
(+ branch-YARN-5079 Gour K Saha 2016-08-12)
auth:154 committers and 69 PMC members

TRADEMARKS

The project updated its logo to include "Apache" [1]. We have an outstanding
request to trademarks@ to register our yellow elephant logo with the USPTO.

[1] https://issues.apache.org/jira/browse/HADOOP-13184

There was a discussion concerning who is responsible for trademark enforcement, and a general consensus that it isn't currently working. Shane and Chris Douglas to continue this discussion offline.

@shane report back on resolving the hadoop trademark enforcement issues.

20 Jul 2016 [Christopher Douglas / Chris]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

YARN timeline server v2 (YARN-2928) merged. Container queuing and
resource-aware scheduling made progress.

HDFS intra-datanode rebalancing (HDFS-1312) merged. Object storage (Ozone)
and the native client made progress. Design of an async FileSystem API and
an implementation in HDFS started in JIRA.

HADOOP integrations with cloud storage were particularly active this
quarter. The S3A client received significant updates from a diverse set of
community members. Clients for the Aliyun Object Store Service (OSS) and
Microsoft Azure Data Lake Store (ADLS) also posted proposals and prototypes.

MAPREDUCE was updated to work with the next generation of the YARN timeline
service.

The Yetus project has greatly improved CI and regression testing,
particularly across branches. Given the Hadoop project's intent to cut
releases from trunk again, Yetus's support for feature branches is
particularly helpful.

RELEASES

Releases have been blocked on HADOOP-12893, bringing the NOTICE and LICENSE
files up to date. It is recently resolved.

COMMUNITY
(+ PMC Xiaoyu Yao 2016-06-14)
(+ PMC Lei Xu 2016-05-15)
(+ PMC Arun Suresh 2016-06-23)
(+ committer Brahma Reddy Battula 2016-06-11)
(+ committer Ray Chiang 2016-06-17)
(+ committer Subru Krishnan 2016-06-14)
(+ committer Varun Saxena 2016-06-22)
(+ branch-YARN-3368 Sreenath Somarajapuram 2016-06-21)
(+ branch-YARN-3368 Sunil Govind 2016-06-03)
auth: 142 committers (including branch), 69 PMC members

20 Apr 2016 [Christopher Douglas / Marvin]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

YARN development yielded improvements in preemption, hardening of the timeline
server, and a new web UI. Designs for resource-aware scheduling and
long-running services are being shaped in JIRA.

HDFS erasure coding, native client, object store (Ozone), and intra-datanode
rebalancing are significant areas under development.

MapReduce continues to receive bug fixes, but even maintenance has slowed.

Trademark enforcement continues to be a challenge. While most vendors have
engaged quickly and positively, slow (non-)compliance falls off the radar. We
are working with trademarks@ to amortize the costs of engagement with
templates and will track these incidents in the BRAND JIRA as appropriate, to
track followup.

RELEASES
- 2.6.4 was released on Feb 10 2016
- 2.7.2 was released on Jan 26 2016

COMMUNITY
(+ PMC Yongjun Zhang 2016-02-18)
(+ PMC Sangjin Lee 2016-04-12)
(+ committer Masatake Iwasaki 2016-01-20)
(+ committer Eric Payne 2016-02-08)
(+ committer Li Lu 2016-02-21)
(+ committer Naganarasimha Garla 2016-03-29)
(+ committer Kai Zheng 2016-04-07)
(+ committer Larry McCay 2016-04-08)
(+ branch-HDFS-1312 Anu Engineer 2016-03-03)
(+ branch-HDFS-8707 Bob Hansen 2016-01-13)
(+ branch-YARN-1011 Iñigo Goiri 2016-01-29)
auth: 138 committers (including branch), 66 PMC members

20 Jan 2016 [Christopher Douglas / Brett]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

In YARN, resource-aware scheduling (YARN-1011) continues to make progress on
a branch, particularly for oversubscription and distributed scheduling.
Other areas of active development include node labels, reservations, docker
support, and the timeline server.

In HDFS, a long-awaited native client (HDFS-8707) has made progress in a
branch. Other areas include the WebHDFS protocol, truncate, erasure coding,
and intra-datanode rebalancing. Discussion on the dev list suggests that
support for erasure coding will likely be pushed from the next release, to
2.9 or 3.0.

Bug fixes and stability improvements continue to be filed and fixed in
MapReduce.

The community prepares maintenance releases (2.6.4 and 2.7.2) concurrently
with a release of the head of branch-2 as 2.8.0.

RELEASES
- 2.6.3 was released on Wed Dec 16 2015

COMMUNITY
(+ PMC Yi Liu 2015-11-09)
(+ PMC Tsuyoshi Ozawa 2015-12-09)
(+ PMC Wangda Tan 2015-12-09)
(+ PMC Akira Ajisaka 2015-12-16)
(+ PMC Robert Kanter 2016-01-12)
(+ branch-YARN-2928 Varun Saxena 2015-12-04)
(+ branch-YARN-2928 Naganarasimha 2015-12-04)
(+ branch-HDFS-8707 Stephen Walkauskas 2016-01-07)

auth: 134 committers (including branch), 64 PMC members

18 Nov 2015 [Christopher Douglas / Brett]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

In YARN, support for resizeable containers (YARN-1197) merged to trunk and
branch-2, where its development continues. Application priorities
(YARN-1963) and the v2 timeline server (YARN-2928) continue to make
progress. Failover, HA, and rolling upgrade support were polished. Some
issues related to resource-aware scheduling advanced, tentatively. Support
for Docker containers (YARN-3611) improved, trending toward support for
multiple runtimes (YARN-3853).

In HDFS, support for erasure coding (HDFS-7285) merged to trunk. Many
improvements focused on improving interactions between features (storage
policies, upgrade domains, erasure coding, etc.). Separation between the
namespace and block management, years in development, has received renewed
attention (e.g., HDFS-8966). HDFS also separated its client(s) into a
separate package. Another native client implementation (HDFS-8707) has made
steady progress.

In MapReduce, bug fixes, stability improvements, and documentation comprised
most of the activity. It remains in maintenance mode.

In Common, Hadoop dev support scripts were rewritten and split into Yetus, a
new TLP. The s3a and wasb filesystem bindings also received many bug fixes
and improvements. Portability of native code improved.

The community continues to stabilize the 2.6.x and 2.7.x branches (currently
voting on 2.7.2), and has discussed a 2.8.0 release. It also opened a
discussion of patch workflows, as alternatives to JIRA/patch files/RTC.
While the Github integration is currently enabled, project members are
working with other communitities and infra on alternatives (e.g., Gerrit).

RELEASES
 - hadoop-2.6.1 @ 2015-09-23
 - hadoop-2.6.2 @ 2015-10-28

COMMUNITY

(+ PMC Devaraj K 2015-07-20)
(+ PMC Yi Liu 2015-11-09)
(+ committer Zhihai Xu 2015-07-27)
(+ committer Anubhav Dhoot 2015-09-22)
(+ committer Sangjin Lee 2015-09-30)
(+ committer Zhe Zhang 2015-10-16)
(+ committer Walter Su 2015-10-27)
Branch: Timeline service
(+ branch-YARN-2928 Vrushali Channapattan 2015-09-14)
(+ branch-YARN-2928 Li Lu 2015-09-29)
Branch: IPv6 support
(+ branch-HADOOP-11890 Elliott Clark 2015-09-03)
(+ branch-HADOOP-11890 Nate Edel 2015-09-04)
Branch: C++ HDFS client
(+ branch-HDFS-8707 James Clampffer 2015-07-29)

auth: 131 committers (including branch), 60 PMC members

21 Oct 2015 [Christopher Douglas / Sam]

No report was submitted.

15 Jul 2015 [Chris Douglas / Sam]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

In YARN, work generalizing node labels, improving Docker support, and
implementing v2 of the timeline server made progress. Support for resizable
containers also appears on-track. A proposal for federated YARN clusters has
design docs and some preliminary code in a branch.

In HDFS, the erasure coding work continues to make progress in a branch. The
object store also has a design doc for discussion and a preliminary set of
patches has been committed to a branch. A native client and prototype HTTP/2
protocol have also made progress in branches.

In Common, work refining the test-patch scripts has expanded its scope to
become a separable component that could support other projects in the
ecosystem. Work revising the native build for Solaris also expanded to remake
much of the native build infrastructure. The S3 filesystem shim was also
updated extensively.

RELEASES
- hadoop-2.7.0 @ 2015-04-22
- hadoop-2.7.1 @ 2015-07-08

COMMUNITY

(+ PMC Vinayakumar B 2015-07-07)
(+ PMC Junping Du 2015-07-07)
(+ PMC Xuan Gong 2015-07-07)
(+ PMC Haohui Mai 2015-02-20)
(+ committer Lei Xu 2015-06-14)
(+ committer Ming Ma 2015-06-18)
(+ committer Xiaoyu Yao 2015-04-16)
(+ committer Varun Vasudev 2015-05-28)
(+ committer Rohith Sharma K S 2015-06-17)
Branch: Split test-patch off into its own TLP (HADOOP-12111)
(+ branch-HADOOP-12111 Andrew Kyle Purtell 2015-06-27)
(+ branch-HADOOP-12111 Nick Dimiduk 2015-06-27)
(+ branch-HADOOP-12111 Andrew Bayer 2015-06-27)
(+ branch-HADOOP-12111 Sean Busbey 2015-06-27)
Branch: YARN Federation (YARN-2915)
(+ branch-YARN-2915 Subru Krishnan 2015-07-06)
(+ branch-YARN-2915 Kishore Chaliparambil 2015-07-06)
Branch: Distributed scheduling (YARN-2877)
(+ branch-YARN-2877 Sriram Rao 2015-05-21)
(+ branch-YARN-2877 Konstantinos Karanasos 2015-05-21)
Branch: Object store (HDFS-7240)
(+ branch-HDFS-7240 Anu Engineer 2015-07-10)
Branch: Data Transfer Protocol via HTTP/2 (HDFS-7966)
(+ branch-HDFS-7966 Duo Zhang 2015-07-02)

auth: 124 committers (including branch), 58 PMC members

22 Apr 2015 [Chris Douglas / Bertrand]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

In YARN, the next iteration of the TimelineServer, work on network shaping,
per-queue policies, and collecting node metrics for scheduling have made
progress. Work on erasure coding in HDFS continues. A design document for
the object store (HDFS-7240) also appeared. Activity is low in MapReduce,
mostly bug fixes and repairs for unstable tests. Overhaul of shell scripts
continues in Common, in addition to changes supporting pluggable
authentication and authorization.

RELEASES

None

COMMUNITY

(+ PMC Haohui Mai 2015-02)
(+ committer Arun Suresh 2015-03)
(+ committer Xiaoyu Yao 2015-03)

auth: 110 committers (including branch), 55 PMC members

18 Feb 2015 [Chris Douglas / Jim]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

The 2.6 release added a large set of features and made many improvements,
including transparent encryption, heterogeneous/tiered storage, support for
Docker containers, reservation-based scheduling, node labels, S3a support,
key management server (KMS), service registry, and rolling upgrades in YARN.

Ongoing development in YARN includes a new round of improvements to the
timeline server (YARN-2928), nodemanager decommission and work-preserving
restart (YARN-914, YARN-1336, YARN-556), improved locking in the RM
(YARN-3091), shared cache (YARN-1492), and disk as a resource (YARN-2139).

Ongoing development in HDFS includes erasure coding (HDFS-7285), support for
truncate (HDFS-3107), namenode synchronization (HDFS-7396), and a native
client (HDFS-6994).

MapReduce received a healthy set of bug fixes and stability improvements.

RELEASES
- hadoop-2.6.0 @ 2014-11-19
- hadoop-2.5.2 @ 2014-11-20

COMMUNITY
(+ PMC Zhijie Shen @ 2014-11)
(+ PMC Jian He @ 2014-11)
(+ committer Yi Liu  @ 2014-11)
(+ committer Carlo Curino @ 2014-11)
(+ committer Gera Shegalov @ 2014-12)
(+ committer Robert Kanter @ 2014-12)
(+ committer Tsuyoshi Ozawa @ 2014-12)
(+ committer Akira Ajisaka @ 2015-01)
(+ committer Wangda Tan @ 2015-01)
(+ branch-HDFS-7285 Zhe Zhang @ 2014-11)
(+ branch-HDFS-7285 Kai Zhang @ 2014-11)
(+ branch-HDFS-7285 Bo Li @ 2014-11)
(+ branch-YARN-2139 Wei Yan @ 2014-12)


auth: 108 committers (including branch), 54 PMC members

21 Jan 2015 [Chris Douglas / Chris]

No report was submitted.

15 Oct 2014 [Chris Douglas / Rich]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

The 2.6 release has branched for release. In addition to bug fixes, it adds
several new features and refines existing work in the 2.x release series.

Among the notable work in YARN: improvements to its fault tolerance and
support for rolling upgrades (YARN-556, YARN-1336), timeline/history server
(YARN-1530), log handling (YARN-2443), admission control/planning
(YARN-1051), support for long-running services (YARN-913), node labels
(YARN-796), and large container allocation (YARN-1769).

Among the notable work in HDFS: tiered storage in archival (HDFS-6584),
in-memory replicas (HDFS-6581), inotify support (HDFS-6634), extended
attributes (HDFS-2006), encryption (HDFS-6134, HADOOP-10150), and a native
client implementation.

In MapReduce, a native collector (MAPREDUCE-2841) offers improved
performance to many deployments.

Work started earlier in the 2.x branch- particularly related to security,
encryption, and high availability- continues apace.

RELEASES
- hadoop-2.5.0 @ 2014-08-12
- hadoop-2.5.1 @ 2014-09-11

COMMUNITY
(+ PMC Karthik Kambatla @ 2014-09-18)
(+ committer Benoy Antony @ 2014-08-07)
(+ committer Akira Ajisaka @ 2014-08-21)
(+ branch-MAPREDUCE-2841 Binglin Zhang @ 2014-07-14)
(+ branch-MAPREDUCE-2841 Sean Zhong @ 2014-07-14)
(+ branch-MAPREDUCE-2841 Manu Zhang @ 2014-08-21)
auth: 101 committers (including branch), 52 PMC members

16 Jul 2014 [Chris Douglas / Rich]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

YARN development of a generic TimelineServer, resource tracking for disk and
network resources, caching of common dependencies, support for container
preemption, and other features continues.

The HDFS extended attributes feature branch merged to trunk (2014-06-11).
Thorough specification of FileSystem semantics (HADOOP-9361) also
successfully merged. Native checksumming, hedged reads, features built over
HA interfaces, NFS, and ACLs are also actively developed in trunk and
release branches.

Across all projects, work adding encryption and security features continues
in a development branch.

The project changed its bylaws to allow 5 days for release votes, instead of
the 7 allocated for other decisions.

RELEASES
- hadoop-0.23.11 @ 2014-06-27
- hadoop-2.4.1 @ 2014-06-29

COMMUNITY
(+ PMC Andrew Wang @ 2014-06-01)
(+ PMC Arpit Agarwal @ 2014-06-01)
(+ PMC Brandon Li @ 2014-06-01)
(+ PMC Chris Nauroth @ 2014-06-01)
(+ PMC Colin McCabe @ 2014-06-01)
(+ PMC Jing Zhao @ 2014-06-01)
(+ PMC Sandy Ryza @ 2014-06-01)
(+ branch-HADOOP-10388 Abraham Elmahrek @ 2014-05-01)
(+ branch-HADOOP-10388 Yongjun Zhang @ 2014-05-01)
(+ branch-HDFS-2006 Charles Lamb @ 2014-05-12)
(+ branch-HDFS-2006 Yi Liu @ 2014-05-12)
(+ branch-fs-encryption Charles Lamb @ 2014-05-14)
(+ branch-fs-encryption Yi Liu @ 2014-05-14)
(+ branch-YARN-1051 Carlo Curino @ 2014-06-15)
(+ branch-YARN-1051 Subramaniam Venkatraman Krishnan @ 2014-06-15)
auth: 94 committers (including branch), 51 PMC members

16 Apr 2014 [Chris Douglas / Chris]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

The YARN execution platform continues to evolve by generalizing from the
specific requirements of the MapReduce framework. As one prominent example,
a development branch implementing a more general application history server
(YARN-321) merged to trunk and the 2.x release series. The operability and
robustness of the platform is also improved by recent attention to failover
and recovery in the ResourceManager and NodeManager components (e.g.,
YARN-1336, YARN-1815).

The HDFS subproject also merged two significant development branches to
trunk: rolling upgrades (HDFS-5535) and ACLs (HDFS-4685). Improvements in
the Common RPC layer, short-circuit reads, and 'hedged' reads (HDFS-5776)
evolve Hadoop storage toward more heterogeneous workloads and architectures.

RELEASES
- hadoop-2.3.0 @ 2014-02-20
- hadoop-2.4.0 @ 2014-04-07

COMMUNITY
(+ committer Haohui Mai @ 2014-02-11)
(+ committer Vinayakumar B @ 2014-03-04)
(+ committer Xuan Gong @ 2014-03-13)
(+ branch-HADOOP-10388 Binglin Chang @ 2014-03-13)
(+ branch-HADOOP-10388 Wenwu Peng @ 2014-04-07)
auth: 88 committers (including branch), 44 PMC members
The last addition to the PMC was Bikas Saha 2013-10

15 Jan 2014 [Chris Douglas / Shane]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

The Hadoop project reached a significant milestone, releasing Hadoop 2.2.0 as
the first GA artifact in that series.

Two development branches have merged to trunk: In-memory caching of HDFS
blocks (HDFS-4949) (29-Oct-2013) and the first phase in presenting
heterogeneous storage to applications (HDFS-2832) (13-Dec-2013). Development
of these features continues in trunk.

YARN continues to refine its resource model. Salient issues include modifying
containers (YARN-1197), delegating cluster resources (YARN-1488), and
improving its model for services (YARN-896). Work on improving high
availability in the ResourceManager (YARN-149), particularly YARN-1029, has
made very promising progress.

RELEASES
- hadoop-2.2.0 @ 2013-10-15
- hadoop-0.23.10 @ 2013-12-02

COMMUNITY
(+ committer Roman Shaposhnik @ 2013-10-25)
(+ committer Jun Ping Du @ 2013-12-04)
(+ committer Jian He @ 2013-12-04)
(+ committer Mayank Bansal @ 2013-12-04)
(+ committer Karthik Kambatla @ 2013-12-04)
(+ committer Ravi Prakash @ 2013-12-04)
(+ committer Omkar Joshi @ 2013-12-04)
(+ committer Zhijie Shen @ 2013-12-04)
(+ branch-YARN-1492 Chris Trezzo 2013-12-18)
(+ branch-YARN-1492 Sangjin Lee 2013-12-18)
(+ branch-HDFS-4685 Haohui Mai 2013-12-29)
auth: 84 committers (including branch), 44 PMC members

16 Oct 2013 [Chris Douglas / Greg]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

The 2.x series reached beta status in August, targeting a GA release before the
end of the year. Work hardening the release continues apace.

The project updated its bylaws to allow for "branch committers" in support of
feature development by newer contributors. The first set are collaborating on a
security initiative that has been discussed on the dev list, in meetups, and in
related JIRAs since last spring. No work on the branch has started, despite
exchanges on the lists on possible, seminal issues to tackle.

RELEASES
- hadoop-1.2.1 @ 2013-08-05
- hadoop-2.0.6-alpha @ 2013-08-22
- hadoop-2.1.0-beta @ 2013-08-25
- hadoop-2.1.1-beta @ 2013-09-30

COMMUNITY
(+ PMC Bikas Saha @ 2013-10-07)
(+ committer Arpit Agarwal @ 2013-08-08)
(+ committer Sanford Ryza @ 2013-07-25)
(+ committer Andrew Wang @ 2013-07-25)
(+ committer Devaraj K @ 2013-07-23)
auth: 74 committers, 44 PMC members

17 Jul 2013 [Chris Douglas / Shane]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

There is little to report, as we submitted an off-cycle report in June.
Security discussions on the dev list converge slowly, but consensus is
developing around implementation tasks, if not the precise shape of that work.
Preparation for the 2.1-beta release continues. Contributors continue to
stabilize APIs, iron out incompatibilities with the 1.x codebase, and integrate
with related projects.

When the Hadoop project spun off subprojects a few years ago, the projects
adjusted their committer roles. We'd been ambivalent about finishing that, but
finally did, removing about 11 accounts (none had participated since then).

RELEASES
- hadoop-0.23.9 @ 2013-07-09

COMMUNITY
auth: 68 committers, 43 PMC members

19 Jun 2013 [Chris Douglas / Greg]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

This is an off-cycle report, as the last few weeks were eventful. The Hadoop
project made five releases from three active development branches, elected 7
members to the PMC, and added five committers. The project amended its
bylaws to eliminate votes on "release plans".

RELEASES
 - hadoop-0.23.7 @ 2013-04-18
 - hadoop-2.0.4 2013-04-23
 - hadoop-1.2 @ 2013-05-13
 - hadoop-0.23.8 @ 2013-06-05
 - hadoop-2.0.5 @ 2013-06-09

COMMUNITY
 (+ PMC Jonathan Eagles 2013-05-29)
 (+ PMC Kihwal Lee 2013-05-29)
 (+ PMC Steve Loughran 2013-05-29)
 (+ PMC Luke Lu 2013-05-29)
 (+ PMC Uma Maheswar Rao G 2013-05-29)
 (+ PMC Hitesh Shah 2013-05-29)
 (+ PMC Daryn Sharp 2013-05-29)
 (+ committer Brandon Li 2013-05-21)
 (+ committer Colin McCabe 2013-05-21)
 (+ committer Jing Zhao 2013-05-22)
 (+ committer Ivan Mitic 2013-05-23)
 (+ committer Chris Narouth 2013-05-23)
 auth: 79 committers, 43 PMC members

The bylaws contained an obscure clause that required release managers to
call a vote on a "release plan". Given that a majority vote of the PMC
establishes a new release, the meaning of this rarely-observed ritual is
ambiguous: there was a vote, but nothing in it was binding. After several
weeks of heated exchanges that accomplished nothing, the PMC voted to remove
the clause from the bylaws entirely. Now, any committer who wants to roll a
release notifies the dev list to explain its motivation and get preliminary
feedback, but there is no vote.

The completely avoidable confusion these threads created has mostly
resolved.

17 Apr 2013 [Chris Douglas / Bertrand]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

Two significant development branches merged to trunk:
- Support for Windows
 ( http://s.apache.org/e7c )
- (HDFS) Fast-path for local reads on Linux (merge vote closing presently)
 ( http://s.apache.org/gM )
 ( http://s.apache.org/7y1 )

Developers have run Hadoop on Windows by emulating its *NIX dependencies,
but the former branch effects a cleaner integration. The latter branch
removed a performance hack for trusted services, replacing it with a more
secure and general implementation for all HDFS clients. Developers on
Windows requested that the workaround remain intact while comparable
functionality is implemented on that platform.

The two merge votes were nearly concurrent, so the development community
discussed the tradeoffs in supporting the new platform, particularly given
the present example of its impact. The informal consensus laid the burden of
support, testing, and monitoring on the subset of developers working on
Windows. Concretely, this extracted commitments to set up and maintain CI
infrastructure while relieving others of requirements to fix breakage on a
platform they may not run. As applied to the HDFS branch being merged, the
implementor(s) of the feature restored the workaround. The dev community
converged on these banal agreements fairly quickly.

Increased collaboration with the Apache Bigtop project in the release
process has improved early detection of downstream integration issues. The
upcoming release of 2.0.4-alpha (currently being voted on) has benefitted
significantly.

Hadoop continues to be an umbrella hosting effectively independent projects
(HDFS, MapReduce, YARN). The PMC has not discussed its disposition to
partition them recently. While one of the prenominate merges is an example
of cross-project work, such patches remain rare.

No issues require board attention at this time.

RELEASES
- hadoop-1.1.2 @ 2013-03-06

COMMUNITY
(+ PMC Jason Lowe 2013-02-28)
auth: 74 committers, 36 PMC members
mailing lists @ 2013-04-01
  1805 general
  3995 user

COMMON
Common is the shared libraries for HDFS and MapReduce.
mailing lists @ 2013-04-01
   390 common-commits
  1789 common-dev
   378 common-issues

HDFS
HDFS is a distributed file system that supports reliable replicated
storage across the cluster using a single name space.
mailing lists @ 2013-04-01
   201 hdfs-commits
   862 hdfs-dev
   258 hdfs-issues

MAPREDUCE
MapReduce is an implementation of the map/reduce programming paradigm.
mailing lists @ 2013-04-01
   198 mapreduce-commits
   904 mapreduce-dev
   256 mapreduce-issues

YARN
YARN is a distributed computation framework for easily writing
distributed applications.
mailing lists @ 2013-04-01
    57 yarn-commits
   221 yarn-dev
    81 yarn-issues

20 Feb 2013

Change the Apache Hadoop Chair

 WHEREAS, the Board of Directors heretofore appointed Arun
 Murthy to the office of Vice President, Apache Hadoop, and

 WHEREAS, the Board of Directors is in receipt of the resignation
 of Arun Murthy from the office of Vice President, Apache
 Hadoop, and

 WHEREAS, the Project Management Committee of the Apache Hadoop
 project has chosen by vote to recommend Chris Douglas as the
 Successor to the post;

 NOW, THEREFORE, BE IT RESOLVED, that Arun Murthy is
 relieved and discharged from the duties and responsibilities of
 the office of Vice President, Apache Hadoop, and

 BE IT FURTHER RESOLVED, that Chris Douglas be and hereby is
 appointed to the office of Vice President, Apache Hadoop, to
 serve in accordance with and subject to the direction of the
 Board of Directors and the Bylaws of the Foundation until
 death, resignation, retirement, removal or disqualification, or
 until a successor is appointed.

 Special Order 7A, Change the Apache Hadoop Chair, was approved
 by Unanimous Vote of the directors present.

20 Feb 2013 [Arun Murthy / Greg]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

On the people side, we have new people joining our ranks.
* We've added 3 new committers - Kihwal Lee, Arpit Gupta, Bikas Saha
* We've added 1 new PMC member: Harsh J
* We've elected a new PMC Chair, Chris Douglas. (Also added to the board
 agenda.)

On the project side, we have made 4 releases:
- hadoop-0.23.5 was released on 28th November, 2012
- hadoop-1.1.1 was released on 1st December, 2012
- hadoop-0.23.6 was released on 6th February, 2013
- hadoop-2.0.3-alpha was released on 13th February, 2013

PMC Chair Vote - We had a fairly contentious discussion after for the
PMC Chair resulted in a tie after STV. The discussions included
*analysis* of voting patterns w.r.t employers, accusations and
counter-accusations about reasons for those patterns such as marketing
etc., a proposal to *rotate PMC chair organization* as one of the
remedies, which eventually veered into a direction where one PMC
member perceived it as a of 'threat to remove all PMC members of an
organization' which was rapidly diffused by a clarification by the
other PMC member. In the end, one of the 2 candidates tied after the
vote withdrew to allow for an amicable solution and also cited
concerns about the nature of some of the discussions.

Clearly, the lesson the Hadoop PMC has learnt is that, in future, voting
should be done via the ASF Voting Tool.

As the outgoing Chair, my personal recommendation is that splitting
the Hadoop project into separate TLPs (HDFS, YARN, MapReduce) will not
only break up the 'umbrella' Hadoop project to better reflect the fact
that the communities are significantly disparate, but will also, more
importantly, help avoid excessive fascination with the Hadoop
brand. We've discussed about this in the past (see October 2012 Board
Report) - some people agree about this, others don't. We'll continue
to talk.

Overall, aside from these skirmishes, the community continues to
function in a healthy manner as evinced by the fact that we continue
to make a significant number of software releases, grow the community
by adding new users/contributors/committers/PMC-members and generally
make great forward progress. Hence, I feel there isn't any reason for
the Board to take any action.

Community:
* 51 committers
* 3932 user@
* 1783 subscribers on general@

COMMON

Common is the shared libraries for HDFS and MapReduce.

Community:
* 1751 subscribers on common-dev

HDFS

HDFS is a distributed file system that supports reliable replicated storage
across the cluster using a single name space.

Community:
* 829 subscribers on hdfs-dev

YARN

YARN is a distributed computation framework for easily writing distributed
applications.

Community:
* 185  subscribers to yarn-dev

MAPREDUCE

MapReduce is an implementation of the map/reduce programming paradigm.

Community:
* 867  subscribers to mapreduce-dev

16 Jan 2013 [Arun Murthy / Roy]

No report was submitted.

AI: Roy to pursue a report for Hadoop

21 Nov 2012 [Arun Murthy / Bertrand]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

On the people side, we have new people joining our ranks.
* We've added one new committer - Jason Lowe
* We've added 3 new PMC members: Siddharth Seth, Robert Evans, Thomas Graves

On the project side, we have made 6 releases:
- hadoop-2.0.1-alpha was released on 26th July, 2012
- hadoop-0.23.3 was released on 17th September, 2012
- hadoop-2.0.2-alpha was released on 9th October, 2012
- hadoop-1.0.4 was released on 11th October, 2012
- hadoop-1.1.0 was released on 14th October, 2012
- hadoop-0.23.4 was released on 15th October, 2012

Developer community is working well together, even though there was a fresh
(but minor) outbreak of vendor wars with some participation by members of
the PMC. No action from the Board is necessary now.

We've added a new Hadoop YARN sub-project.

We had a fairly contentious public discussion on splitting Apache Hadoop
into separate projects since there are at least 3 very distinct developer
communities in Apache Hadoop now: HDFS, YARN & MapReduce. For now the
community has voted to merge separate committer lists, but there seems to be
some emerging, albeit very early/tenuous consensus that after hadoop-2 is
declared 'stable' we should split the project into separate projects (HDFS,
YARN, MapReduce). This will better reflect reality that they have distinct
communities. No action from the Board is necessary now.

Community:
* 48 committers
* 3817 user@
* 1624 subscribers on general@

COMMON

Common is the shared libraries for HDFS and MapReduce.

Community:
* 1681 subscribers on common-dev

HDFS

HDFS is a distributed file system that supports reliable replicated storage
across the cluster using a single name space.

Community:
* 735 subscribers on hdfs-dev

YARN

YARN is a distributed computation framework for easily writing distributed
applications.

Community:
* 86  subscribers to yarn-dev

MAPREDUCE

MapReduce is an implementation of the map/reduce programming paradigm.

Community:
* 766  subscribers to mapreduce-dev

17 Oct 2012 [Arun Murthy / Doug]

No report was submitted.

AI: Greg to pursue a report for Hadoop

25 Jul 2012 [Arun Murthy / Rich]

Apache Hadoop status report for July 2012

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

On the people side, we have new people joining our ranks.
* We've added two new committers - Daryn Sharp, Jonathan Eagles
* We've added one new PMC member: Alejandro Abdelnur

On the project side, we have made 1 bug-fix release in the stable line and 1
major new release:
- hadoop-1.0.3 was released on 16th May, 2012
- hadoop-2.0.0-alpha was released on 23rd May, 2012

- Work on further Hadoop 2.0.1-alpha (a security bug-fix release) is done,
and is currently under vote.
- Work on hadoop-1.1.0 is nearly done.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Releases:
- hadoop-1.0.3 was released on 16th May, 2012
- hadoop-2.0.0-alpha was released on 23rd May, 2012

Community:
* 48 committers
* 1613 subscribers on common-dev
*  3151 subscribers on common-user
* 1533 subscribers on general

New committers:
* 2 new committers have been added to this project.

HDFS

HDFS is a distributed file system that supports reliable replicated
storage across the cluster using a single name space.

Releases:
- hadoop-1.0.3 was released on 16th May, 2012
- hadoop-2.0.0-alpha was released on 23rd May, 2012

New committers:
* 1 new committer has been added to this project.

Community:
* 43 committers
* 668 subscribers on hdfs-dev
* 1205 subscribers on hdfs-user

MAPREDUCE

MapReduce is a distributed computation framework for easily writing
applications that process large volumes of data.

Releases:
- hadoop-1.0.3 was released on 16th May, 2012
- hadoop-2.0.0-alpha was released on 23rd May, 2012

New committers:
* 1 new committer has been added to this project.

Community:
* 46 committers
* 689  subscribers to mapreduce-dev
* 1354 subscribers to mapreduce-user

(Hadoop)

18 Apr 2012 [Arun Murthy / Roy]

Apache Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity computers.

On the people side, we have had new people join our ranks:
* We've added new committers - Thomas Graves, Robert Evans, Hitesh Shah & Uma M
* We've added new PMC members: Aaron Myers, Matt Foley

On the project side, we have made 3 releases:
- hadoop-1.0.1 was released on 22nd Feb, 2012
- hadoop-0.23.1 was released on 28th Feb, 2012
- hadoop-1.0.2 was released on 4th April, 2012

- Work on further Hadoop 0.23.2 release is nearly done, and is
 scheduled for a release in the next few days.

- Developer community is working well together.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Community:
* 46 committers
* 1520 subscribers on common-dev
* 2952 subscribers on common-user
* 1503 subscribers on general

New committers:
* 4 new committers (Thomas Graves, Robert Evans, Hitesh Shah & Uma M) have been
 added to this project.

HDFS

HDFS is a distributed file system that supports reliable replicated storage
across the cluster using a single name space.

New committers:
* 1 new committer (Uma M) has been added to this project.

Community:
* 42 committers
* 607 subscribers on hdfs-dev
* 1092 subscribers on hdfs-user

MAPREDUCE

MapReduce is a distributed computation framework for easily writing
applications that process large volumes of data.

New committers:
* 3 new committers (Thomas Graves, Robert Evans & Hitesh Shah) have been
 added to this project.

Community:
* 45 committers
* 637  subscribers to mapreduce-dev
* 1250 subscribers to mapreduce-user

24 Jan 2012 [Arun Murthy / Bertrand]

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

On the people side, we have new person join our ranks.
  * We've added one new committer - Siddharth Seth.

On the project side, we have made some very exciting progress. We have
had a total of 3 releases:
 - hadoop-0.23.0 released from trunk, first one off trunk in nearly 2 years.
 - hadoop-0.22.0 released, branched in early 2011.
 - hadoop-1.0.0 released of branch-0.20.2xx baseline (now branch-1)
 - https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces21

 - Work on further Hadoop 0.23.1 release is continuing, and is
   scheduled for release at the end of the month

 - Developer community is working well together. The public dialogue
   among vendors who employ many in the developer community seems to
   have died down since the last board report. No action from the
   board is required at this stage.

 - Some vendors are continuing to use the lists to promote their own
   products. A few PMC members have responded to discourage this
   practice, but not directly as the PMC. No action from the board is
   required at this stage.

COMMON

 Common is the shared libraries for HDFS and MapReduce.

 Releases:
 * 0.23.0 was released on 11th Nov, 2011.
 * 0.22.0 was released on 10th Dec, 2011.
 * 1.0.0 was released on 29th Dec, 2011.

 Community:
 * 42 committers
 * 1433 subscribers on common-dev
 * 2761 subscribers on common-user
 * 1468 subscribers on general

 New committers:
 * 1 new committer has been added to this project.

HDFS

HDFS is a distributed file system that supports reliable replicated
storage across the cluster using a single name space.

 Community:
 * 41 committers
 * 567 subscribers on hdfs-dev
 * 985 subscribers on hdfs-user

MAPREDUCE

MapReduce is a distributed computation framework for easily writing
applications that process large volumes of data.

 Releases:
 * None this period.

 New committers:
 * 1 new committer has been added to this project.

 Community:
 * 42 committers
 * 587 subscribers to mapreduce-dev
 * 1118 subscribers to mapreduce-user

26 Oct 2011

Change the Apache Hadoop Chair

   WHEREAS, the Board of Directors heretofore appointed Ian
   Holsman to the office of Vice President, Apache Hadoop, and

   WHEREAS, the Board of Directors is in receipt of the resignation
   of Ian Holsman from the office of Vice President, Apache
   Hadoop, and

   WHEREAS, the Project Management Committee of the Apache Hadoop
   project has chosen by vote to recommend Arun Murthy as the
   Successor to the post;

   NOW, THEREFORE, BE IT RESOLVED, that Ian Holsman is
   relieved and discharged from the duties and responsibilities of
   the office of Vice President, Apache Hadoop, and

   BE IT FURTHER RESOLVED, that Arun Murthy be and hereby is
   appointed to the office of Vice President, Apache Hadoop, to
   serve in accordance with and subject to the direction of the
   Board of Directors and the Bylaws of the Foundation until
   death, resignation, retirement, removal or disqualification, or
   until a successor is appointed.

 Resolution 7B was approved by unanimous roll call vote,
 with Doug Cutting abstaining.

26 Oct 2011 [Ian Holsman / Larry]

 Hadoop status report for October 2011

 Hadoop is a set of related tools and frameworks for creating and
 managing distributed applications running on clusters of commodity
 computers.

 On the people side, we have a couple of new people join our ranks.
  * Giri Kesavan, and Jitendra Pandey have accepted a role in the PMC
  * 4 people have accepted committership. Alejandro Abdelnur, Harsh J
    Chouraria, Eric Yang and Ramya Sunil
  * A new PMC Chair (Arun Murthy) is being recommended to the board for
    their approval.

 On the project side, we have made some exciting progress.
 - 0.20.205's vote has closed successfully, and will be released shortly.
   This release integrates two major features (security & append), of which
   the append feature was topic of much internal debate, so this is an
   excellent outcome for the health of Hadoop, and allows other projects like
   HBase to use a 'official' release.

 - Work on Hadoop 0.23 release is continuing, and is scheduled for release
   at the end of the month

 - Konstantin Shvachko is now leading the 0.22 Release process

 - Mavenization of our codebase is complete

 - Developer community is working well together

 - Vendors are continuing to use the lists to promote their own products.
   We are formulating appropriate responses to discourage this practice.
   No action from the board is required at this stage.


COMMON

 Common is the shared libraries for HDFS and MapReduce.


 Releases:
 * 0.20.204.0 (beta) was released on the 5 September.

 Community:
 * 2598 subscribers on common-dev
 * 1341 subscribers on common-user
 * 1392 subscribers on general

HDFS

 HDFS is a distributed file system that supports reliable replicated
 storage across the cluster using a single name space.

 New committers:
 * 4 new committers has been added to this project.

 Community:
 * 41 committers
 * 499 subscribers on hdfs-dev
 * 864 subscribers on hdfs-user

MAPREDUCE

 MapReduce is a distributed computation framework for easily writing
 applications that process large volumes of data.

 Releases:
 * None this period.

 New committers:
 * 4 new committers has been added to this project.

 Community:
 * 41 committers
 * 528  subscribers to mapreduce-dev
 * 1016 subscribers to mapreduce-user

17 Aug 2011 [Ian Holsman / Greg]

 Hadoop status report for August 2011

 Hadoop is a set of related tools and frameworks for creating and
 managing distributed applications running on clusters of commodity
 computers.


 * Hadoop Summit - 1600 people attended
 * HortonWorks launch
 * The 0.20.203.0 release and the divisive vote.
 * 0.20.204.0 is having a rc1 voted on.
 * Hadoop naming debate
 * Lack of progress on contacting the potential trademark infringers
 * 0.22 stalling
 * More weight gathering behind 0.23
 * Growing ecosystem as more incubator project are in the Hadoop ecosystem
 * Commercial forks of Hadoop (eg. MapR) and how to respond to them on the
   lists and attending developer meetups
 * A number of developers active on the HA Jira (HDFS-1623) asked for a
   in-person high bandwidth meeting to to get clarification on the design
   document posted on the Jira, this wasn't publicized on-list
 * Fixed of our site to claim trademark for Hadoop and the other Apache
   projects.
 * Trademarks is proceeding with registering the Hadoop trademark.
 * Yahoo removed the references to the Yahoo Distribution of Hadoop.

  In regards to the releases, we have 3 releases going on. the 0.20.X release
  stream, that has some minor features and mainly bug fixes, and the 0.22 and
  0.23 releases that represent some major changes. 0.22 & 0.23 differ in
  featureset and 0.23 is a superset of 0.22.

COMMON

 Common is the shared libraries for HDFS and MapReduce.


 Releases:
 * None this period.

 Community:
 * 1294 subscribers on common-dev
 * 2487 subscribers on common-user
 * 1375 subscribers on general

HDFS

 HDFS is a distributed file system that supports reliable replicated
 storage across the cluster using a single name space.

 New committers:
 * 1 new committers have been added to this project.

 Community:
 * 38 committers
 * 465 subscribers on hdfs-dev
 * 788 subscribers on hdfs-user

MAPREDUCE

 MapReduce is a distributed computation framework for easily writing
 applications that process large volumes of data.

 Releases:
 * None this period.

 New committers:
 * 3 new committers have been added to this project.

 Community:
 * 40 committers
 * 485 subscribers to mapreduce-dev
 * 938 subscribers to mapreduce-user

Larry asked and Owen answered what the "Hadoop naming debate" is. It is a reference to whether to accept http://wiki.apache.org/hadoop/Defining%20Hadoop which seeks to limit the name "Hadoop" to mean releases from Apache and pushing all other derived products to be "powered by Hadoop." There was generally support except from the companies that use the Hadoop name for derivative products. There was a request to suspend the vote for more discussion, but once the vote stopped the discussion stopped.

20 Jul 2011 [Ian Holsman / Sam]

Report missing; will report next month.

20 Apr 2011 [Ian Holsman / Noirin]

 Hadoop status report for April 2011

 Hadoop is a set of related tools and frameworks for creating and
 managing distributed applications running on clusters of commodity
 computers.

 On the people side, we have a couple of new people join our ranks.
  * Todd Lipcon has accepted a role in the PMC
  * Koji Noguchi has accepted a role as a committer in both HDFS & MR.
  * Matthew Foley has accepted a role as a committer in HDFS.
  * We have another invitations outstanding, and hope he will take
    up the committer role shortly.

 On the branding side, we and the trademark group have been actively
 engaging companies to make proper use and attribution of our Apache Hadoop
 Trademark. These discussions are ongoing, and generally positive.

 On the product release side, Nigel is continuing to progress with the 0.22
 release. We have 18 outstanding blockers.  HADOOP-7106, which re-organizes
 some SVN structure, should be committed by the end of next week.
 MAPREDUCE-2178 is the biggest outstanding blocker that many other depend
 on.  Still no clear plan on getting it fixed.

 and Arun has taken over with the 0.20.200 (formerly known as 0.20.3).
 He pushed a giant patch to the branch-0.20-security branch. Then, based on
 the feedback from the community, Owen took over and committed individual
 patches for the same codebase to the branch. Currently we have a couple of
 unit tests failing, after fixing them we should be good to make an
 official release after getting necessary approvals from the PMC.

 Discussions around rationalize the codebase have started, with mrunit
 being moved to the incubator, and further discussions about either
 maintain the contrib modules or moving them to apache-extras/incubator

 The biggest news is saved for last. Yahoo! has announced that they will
 stop maintaining their own internal codebase, and switch to actively
 developing on the apache one. This is a great step forward, and they have
 also started having more discussions about architecture (MR-279) on the
 list. We look forward to more in-depth discussions happening in the
 public forums.


COMMON

 Common is the shared libraries for HDFS and MapReduce.


 Releases:
 * None this period.

 Community:
 * 1194 subscribers on common-dev
 * 2293 subscribers on common-user
 * 1328 subscribers on general

HDFS

 HDFS is a distributed file system that supports reliable replicated
 storage across the cluster using a single name space.

 New committers:
 * 2 new committers has been added to this project.

 Community:
 * 35 committers
 * 375 subscribers on hdfs-dev
 * 631 subscribers on hdfs-user

MAPREDUCE

 MapReduce is a distributed computation framework for easily writing
 applications that process large volumes of data.

 Releases:
 * None this period.

 New committers:
 * 2 new committer has been added to this project.

 Community:
 * 37 committers
 * 400 subscribers to mapreduce-dev
 * 764 subscribers to mapreduce-user

19 Jan 2011 [Ian Holsman / Sam]

 Hadoop status report for January 2011

     Hadoop is a set of related tools and frameworks for creating and
     managing distributed applications running on clusters of commodity
     computers.

     Nigel has volunteered to RM the 0.22, and it is making progress, the previous
     RM stepped down due to not having enough time since the 6685 patch was not
     going to make this release. Progress on 6685 has not really progressed.

     Owen has volunteered to RM the 0.20.3 release, and there is discussions
     about integrating the 'security' patch-set that Yahoo! is developing, that
     Arun has volunteered to RM. Both of these are separate branches.

     We have invited 11 new committers into the project this month, all have
     accepted, are in the process of getting their accounts setup. We also had
     2-3 people who the PMC felt were not ready for committership yet. There is
     still a lot of discussion about what the criteria of what makes a committer,
     but I think we are in a better place than before.

     We are working with the brand management team about Yahoo!'s and
     Cloudera's use of Hadoop's name. Both of these are showing good progress
     thanks to the brand management teams hard work.

     We are still having lots of discussions about future work on the 0.20 branch
     this includes the security patch-set, adding append, and the 0.20.3 release
     The security patch-set has it's own issues, due to it requiring some work
     if it will be contributed as separate patches, and also how it the work will
     be applied to the upcoming 0.22 release. (see http://s.apache.org/NfJ &
     http://s.apache.org/uf  for the discussions around the append branch & security
     branches) there have been a couple of misunderstandings around the security
     releases.

     We have also started discussions about why we have so many mailing lists,
     what they are used for, and the possibility of combining some of them (and
     2 code bases). We have updated the website to provide better documentation.
     The codebase discussion is more about moving directories around, rather than
     combining them into a single one.

 COMMON

     Common is the shared libraries for HDFS and MapReduce.


     Releases:
     * None this period.

     Community:
     * 1123 subscribers on common-dev
     * 2140 subscribers on common-user
     * 1335 subscribers on general

 HDFS

     HDFS is a distributed file system that supports reliable replicated
     storage across the cluster using a single name space.

     New committers:
     * 5 new committers have been added to this project.

     Community:
     * 33 committers
     * 323 subscribers on hdfs-dev
     * 525 subscribers on hdfs-user

 MAPREDUCE

     MapReduce is a distributed computation framework for easily writing
     applications that process large volumes of data.

     Releases:
     * None this period.

     New committers:
     * 8 new committers have been added to this project.

     Community:
     * 35 committers
     * 342 subscribers to mapreduce-dev
     * 647 subscribers to mapreduce-user

The report indicates that changes have been made that satisfy the board. The project is back on a quarterly reporting schedule.

15 Dec 2010 [Ian Holsman / Noirin]

 Hadoop status report for December 2010

     Hadoop is a set of related tools and frameworks for creating and
     managing distributed applications running on clusters of commodity
     computers.

     There was one contentious issue raised (HADOOP-6685), which ongoing
     discussion has continued about which technical direction is better
     moving forward. There is currently a veto on the patch. This patch is
     not critical to the health of the project.

     6 new PMC members have been added, and votes for several new committers
     have started.
     We would like to welcome the follow people to the Hadoop PMC:
       * Eli Collins
       * Jakob Homan
       * Amareshwari Ramadasu
       * Suresh Srinivas
       * Sharad Agarwal
       * Vinod Kumar Vavilapalli

     We have invited a new committer, but so far he has not responded

     We are working with the brand management team about Yahoo!'s and
     Cloudera's use of Hadoop's name.

     The 0.22 release scheduled for November is still in progress.


 COMMON

     Common is the shared libraries for HDFS and MapReduce.


     Releases:
     * None this period.

     New Committers:
     * None this period.

     Community:
     * 1089 subscribers on common-dev
     * 2106 subscribers on common-user
     * 1294 subscribers on general

 HDFS

     HDFS is a distributed file system that supports reliable replicated
     storage across the cluster using a single name space.

     Releases:
     * None this period.

     New committers:
     * None this period.

     Community:
     * 28 committers
     * 299 subscribers on hdfs-dev
     * 498 subscribers on hdfs-user

 MAPREDUCE

     MapReduce is a distributed computation framework for easily writing
     applications that process large volumes of data.

     Releases:
     * None this period.

     New committers:
     * None this period

     Community:
     * 27 committers
     * 317 subscribers to mapreduce-dev
     * 612 subscribers to mapreduce-user

 ZOOKEEPER

     The ZooKeeper project is now a separate project, and will be
     removed from further notices going forward

17 Nov 2010 [Ian Holsman / Bertrand]

 Hadoop status report for October 2010 to November 2010

 Hadoop is a set of related tools and frameworks for creating and
 managing distributed applications running on clusters of commodity
 computers.

 Discussions have started on the issues that the board identified; we
 seem to have a general agreement on some issues, but we need an official
 consensus on the proposals, and have them discussed openly in the public
 mailing lists.

 Specifically:

 * Everyone is in general agreement that we need to release more often.
   The question revolves around how we test them to ensure they keep to the
   quality that Hadoop releases are known for.

 * The discussion of having 'mentors' to help guide new committers was
   started.

 * The Cloudera branding issue was forwarded to the trademarks group, where
   Shane & Karen are deciding how best to pursue the issue of their
   certification courses and branding on their website.

 * Bylaws have been discussed on general@

 * Owen will be the release manager for the 0.22 release schedule later
   this month.

 * The ZooKeeper project has voted to become a separate TLP. This has been
   raised for the board's consideration.

 * people have started using reviews.apache.org to discuss patches


 COMMON

 Common is the shared libraries for HDFS and MapReduce.


 Releases:
 * None this period.

 New Committers:
 * None this period.

 Community:
 * 1073 subscribers on common-dev
 * 2068 subscribers on common-user

 HDFS

 HDFS is a distributed file system that supports reliable replicated
 storage across the cluster using a single name space.

 Releases:
 * None this period.

 New committers:
 * None this period.

 Community:
 * 26 committers
 * 286 subscribers on hdfs-dev
 * 463 subscribers on hdfs-user

 MAPREDUCE

 MapReduce is a distributed computation framework for easily writing
 applications that process large volumes of data.

 Releases:
 * None this period.

 New committers:
 *  Scott Chen was voted in as a committer in August 2010.


 Community:
 * 26 committers
 * 303 subscribers to mapreduce-dev
 * 568 subscribers to mapreduce-user

 ZOOKEEPER

 ZooKeeper is a reliable coordination service for distributed
 applications.

 Releases:
 * None this period.

 Two releases are in progress, near term a 3.3.2 fix release (1 blocker
 pending), and longer term 3.4.0 feature release.


 New committers:
 none

 Community:

 * 6 active committers, 2 PMC members
 * 176 subscribers on zookeeper-dev
 * 356 subscribers on zookeeper-user

 The ZooKeeper project has petitioned the board to become a TLP.

20 Oct 2010

Change the Apache Hadoop Project Chair

   WHEREAS, the Board of Directors heretofore appointed Owen
   O'Malley to the office of Vice President, Apache Hadoop, and

   WHEREAS, with the desire of the Board of Directors to rotate
   the position of Vice President, Apache Hadoop, the Project
   Management Committee of the Apache Hadoop Project has chosen to
   recommend Ian Holsman as the successor to the post;

   NOW, THEREFORE, BE IT RESOLVED, that Owen O'Malley is relieved
   and discharged from the duties and responsibilities of the
   office of Vice President, Apache Hadoop, and

   BE IT FURTHER RESOLVED, that Ian Holsman be and hereby is appointed
   to the office of Vice President, Apache Hadoop, to serve in
   accordance with and subject to the direction of the Board of
   Directors and the Bylaws of the Foundation until death,
   resignation, retirement, removal or disqualification, or until
   a successor is appointed.

 Approved by unanimous roll call vote with Doug abstaining.

20 Oct 2010 [Owen O'Malley / Geir]

Hadoop status report for July 2010 to October 2010

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

The 2rd annual Hadoop World was held on 12 October in NYC. It had 900
attendees.  before the conference. The program is available here:
http://www.cloudera.com/company/press-center/hadoop-world-nyc/agenda/.

The divestiture of sub-projects has continued. We have promoted Hive,
and Pig to be top level Apache projects and Chukwa to the
Incubator. This has had the positive effect that the majority of the
current PMC is involved in the core projects (Common, HDFS and
MapReduce).

The Hadoop PMC removed one member, who has completely dropped out of contact:
* Jim Kellerman
and as part of moving subprojects out, the following PMC members resigned:
* Alan Gates
* Ashish Thusoo
* Daniel Dai
* Namit Jain
* Olga Natkovich
* Pradeep Kamath

The tension between Cloudera and Yahoo has dramatically increased this
quarter and is past the breaking point. This was exacerbated by the
board's sudden insistence that the Hadoop project pick a new PMC chair
without discussing the issues with anyone other than the Cloudera
employee sitting on the board. Over the last 2.5 years, I've done my
best to do what was right for the Hadoop project and it is too bad the
community has degenerated to the current state. I sincerely want to
get the problems resolved so that we can get back to developing
software and enjoying a community that can work together.

Critical issues for the Hadoop PMC to address:
  * Change is difficult and this will involve change.
  * We need to enact bylaws so that there is a clear understanding of
    the rules.
  * The PMC needs to define and document the goals and processes that the
    project will follow going forward.
     * Expectations about committers reviewing each other's patches
     * Expectations about becoming a committer and PMC member.
     * Policies about expecting PMC members and committers to stay
       involved. People without skin in the game who vote without
       working on the project are just signing up other people for
       work.
  * Poisonous people within the project need to be managed.
  * Cloudera's abuse of the Hadoop trademark in their product names needs to be
     halted.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Releases:
 * 0.21.0 released 24 Aug 2010 with over 1300 jiras resolved.

Committers:
 * We redefined the Common committers to be the union of all HDFS and MapReduce
   committers.

Community:
 * 1062 subscribers on common-dev
 * 2067 subscribers on common-user

HDFS

HDFS is a distributed file system that support reliable replicated
storage across the cluster using a single name space.

Releases:
 * 0.21.0 released 24 Aug 2010 with over 1300 jiras resolved.

New committers:
 None

Community:
* 26 committers
* 280 subscribers on hdfs-dev
* 454 subscribers on hdfs-user

MAPREDUCE

MapReduce is a distribute computation framework for easily writing
applications that process large volumes of data.

Releases:
 * 0.21.0 released 24 Aug 2010 with over 1300 jiras resolved.

New committers:
 * Scott Chen

Community:
* 27 committers
* 300 subscribers to mapreduce-dev
* 553 subscribers to mapreduce-user

ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed
applications.

Releases:
* No releases this quarter

Two releases are in progress, near term a 3.3.2 fix release (1 blocker
pending), and longer term 3.4.0 feature release.


New committers:
none

Community:

* 5 active committers, 2 PMC members

* 176 subscribers on zookeeper-dev (up from 160 3 months ago)
* 347 subscribers on zookeeper-user (up from 307 in the same timeframe)

Three GSOC students completed their projects successfully. This
resulted in significant new functionality being added to the project,
and some renewed interest from a contributor standpoint. Two of the
three students have indicated that they are interested to continue
working in the community.

The discussion to move ZooKeeper to TLP status has been reopened and
is in progress at the time of this writing.

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

Good report; looks like progress is being made here.

21 Jul 2010 [Owen O'Malley / Henri]

Hadoop status report for April 2010 to July 2010

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

The 3rd annual Hadoop Summit was held on 29 June in Santa Clara. It
sold out at 1,000 attendees 10 days before the conference. The program
is available here:
http://developer.yahoo.com/events/hadoopsummit2010/agenda.html. The
slides and videos of the presentations are available online.

The second Hadoop World was announced in NYC on 12 October. The call
for presentations is open until 2 August.

There are a large number of local Hadoop User Groups around the
world. The Bay Area HUG meets monthly and has an audience of roughly
300 people.

To increase communication and reduce tensions, the SF Bay Area core
contributors (Common, HDFS, and MapReduce) have been having monthly
meetings that rotate between venues (Cloudera, Facebook, and
Yahoo!). We've discussed wide-ranging topics from process issues to
new technical ideas. All of the notes and slides are distributed on
the lists to engage developers who can't attend.

The Hadoop PMC added the following members:
* Sanjay Radia (Yahoo)
* Hemath Yamijala (indep)

CHUKWA

Chukwa is a distributed log collection framework that aggregates logs
from across a cluster into a reasonable number of HDFS files.

As part of the continuing Hadoop divestiture of sub-projects, Chukwa's
developers were encouraged to move to Apache Incubator. Although
Chukwa has already completed many of the Incubator graduation
requirements (diversity of committers, code clearance, releases), they
have not voted in new contributors or PMC members. Also, none of the
Chukwa committers have been on any Apache PMC's and need more guidance
than jumping into a TLP would have provided. Some of the work has been
done (accepted by Incubator, moved subversion, added to Incubator
wiki), but more is left to do (web site, mailing lists). They are
scheduled to report next month as a Podling.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Releases:
 * 0.21.0 release candidates for Common, HDFS, and MapReduce have been
   rolled, but there are still some blockers. The hope is to get the
   blockers fixed and a release out next month.

New committers:
 * Amareshwari Sriramadasu (Yahoo)

Community:
 * 1013 subscribers on common-dev
 * 1965 subscribers on common-user

HDFS

HDFS is a distributed file system that support reliable replicated
storage across the cluster using a single name space.

Releases:
* The 0.21 release is still solidifying.
* A new branch called branch-0.20-append was created to support the append
  feature to HDFS files. HBase needs this feature to run without data loss in a
  production environment.

New committers:
* New committer Eli Collins (Cloudera)

Community:
* 26 committers
* 198 code contributors
* 247 subscribers on hdfs-dev
* 390 subscribers on hdfs-user

* Design proposal to support distributed HDFS NameNode.

HIVE

Hive is a data warehouse written on top of Hadoop.  It provides SQL to
query and manage data stored in Hadoop in table and partitions and
provides a metastore to metadata information about the data stored in
hadoop.

Releases:
0.6.0 branched and we are priming up to release it.

New committers:
* John Sichi (Facebook)

Community:
* 164 contributors (commented, filed bugs or contributed to Hive). This was
  115 at the last report time.

MAPREDUCE

MapReduce is a distribute computation framework for easily writing
applications that process large volumes of data.

Releases:
* The 0.21 release is still solidifying.

New committers:
* Amareshwari Sriramadasu (Yahoo)

Community:
* 268 subscribers to mapreduce-dev
* 465 subscribers to mapreduce-user

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.

Releases:
 * Pig 0.7.0 released on 5/13/2010

Community:
 * 12 committers and 5 emeriti (4 retired in the last month)
 * 191 developers (compared to 181 in the last report)
 * 452 users (compared to 402 in the last report)


ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed
applications.

Releases:
* release 3.3.1 on 17/May/10

New committers:
none

Community:

* 5 active committers, 2 PMC members

* 160 subscribers on zookeeper-dev (up from 147 3 months ago)
* 307 subscribers on zookeeper-user (compared to 269 in the same timeframe)

Three student proposals to work on ZooKeeper projects were accepted for
GSOC.

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

Noirin reminded the PMC to let ConCom know when there are events going on in their community, even if the PMC is not the one organizing them.

21 Apr 2010 [Owen O'Malley / Geir]

Hadoop status report for January 2010 to April 2010

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

In response to the board's request that we evaluate the Hadoop
sub-projects with respect to ensuring adequate supervision. Here is
the breakdown by sub-project:
* Avro and HBase have decided to each become a TLP.
* Pig and ZooKeeper have discussed the issue and would prefer to
remain a sub-project for now. The three primary concerns are the work
of splitting themselves out, a lack of organizational diversity in the
committers, and loss of visibility if the Hadoop TLP site doesn't link
to them. The last concern can be addressed by ensuring that the TLP
*does* continue to link to their project pages. These projects are
adequately monitored and have good representation on the PMC, but
the PMC is still discussing what the their recommendation to the board is.
* Hive hasn't discussed the issue, which needs to be addressed. I
expect that it is in the same group as Pig and ZooKeeper.
* Chukwa still struggles to broaden its community from the original
developers and to reach consensus on its goals. It has three committers,
but no representation on the PMC, which makes it difficult to make
releases and ensure adequate supervision. The PMC has not yet discussed
what to do with Chukwa.
* Common, HDFS, and MapReduce are still very tightly bound. Many
patches cross 2 or 3 of the 3 sub-projects and each of the trunks only
builds against the other project's trunks. They are branched and
released in unison. They will likely remain together for a long time.

We started the process of discussing the bylaws that Hadoop should
adopt, but we need to drive this through to completion. I would
suggest that in the future, projects which are becoming TLP establish
bylaws as part of being created. Without explicit bylaws, there are
many votes for which it isn't clear what the required level of
consensus is.

The Hadoop PMC added the following members:
* Namit Jain

AVRO

Avro is an inter-language serialization and RPC library that supports
versioning of schemas and protocols for both compiled and interpreted
languages.

A resolution to promote Avro to a top-level project is currently before
the board.  If the board passes this resolution, this will be Avro's
last report as a Hadoop subproject.

Avro made three releases this quarter, 1.3.0, 1.3.1 and 1.3.2.  We
expect to make a 1.4.0 release in the next quarter.

Development has been active in all versions of Avro: C, C++, Java,
Python, and Ruby.

Three new, legally-independent committers were added this quarter:
* Jeff Hodges
* Scott Carey
* Bruce Mitchener

CHUKWA

Chukwa is a distributed log collection framework that aggregates logs
from across a cluster into a reasonable number of HDFS files.

Release:
* Testing Chukwa 0.4.0 RC1 to RC3

Current state of community
* 4 active contributors
* 15 subscribers on chukwa-dev
* 17 subscribers on chukwa-user

The upcoming 0.4 release will include new real time Hadoop Activity monitor
for small to mid scale Chukwa deployment and JMSAdaptor for pulling data
from JMX.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Releases:
 * 0.20.2 (including HDFS and MapReduce) was released on 2/16/2010
with 29 patches
 * We plan to rebase the 0.21 branch to trunk this month

New committers:
 * none

Community:
 * 963 subscribers on common-dev
 * 1924 subscribers on common-user

The development has continued to be active, including Kerberos-based
and token-based authentication to the RPC.

The previous 0.21 branch failed to be released and we expect to rebase
the branch to the current trunk in the next few weeks. The challenge
is learning how to adapt our project and processes to the growing
importance of Hadoop. We are moving toward a release manager-based
approach similar to the HTTPD one, in the hopes that will lead to
stable releases without stagnating on the 0.20 branch forever. We are
also requiring more thought out, documented, and tested changes.
Changes that are backwards incompatible or potentially destabilizing
must go through a lot of scrutiny. This is all part of the process of
moving from a research prototype to a critical piece of infrastructure
in our respective organizations.

HBASE

HBase is a distributed column-oriented database built on top of Hadoop
Common and HDFS.

A resolution to promote HBase to a top-level project is currently
before the board.  If the board passes this resolution, this will be
HBase's last report as a Hadoop subproject.

Releases:
* 0.20.3 on 2010/01/25 -- 74 fixes.
* There is currently a release candidate out for 0.20.4

New Committers:
* None

Community

* HBase User Group 9 met at Mozilla, 03/10/2010
* HBase User Group 10 and Hackathon happening 04/19/2010

HDFS

HDFS is a distributed file system that support reliable replicated
storage across the cluster using a single name space.

Releases:
* We plan to rebase the 0.21 branch to trunk this month

New committers:
* None

Community:
* 11 new code contributors
* 25 committers
* 194 code contributors
* 205 subscribers to hdfs-dev
* 317 subscribers to hdfs-user

Work is in progress to incorporate security features into HDFS.

HIVE

Hive is a data warehouse written on top of Hadoop.  It provides SQL to query
and manage data stored in Hadoop in table and partitions and provides a
metastore to metadata information about the data stored in hadoop.

Releases:
release 0.5.0 on 2010/02/23. This release has 106 bug fixes, 39 new features
and 26 improvements.

New committers:
* John Sichi

Community:

* Hive User Group meetup at Facebook, 03/18/2010 attended by over 70 people.
* A total of 138 people have commented, filed bugs or contributed on the
  Hive JIRA so far. This number was at 115 at the time of the last report.

MAPREDUCE

MapReduce is a distribute computation framework for easily writing
applications that process large volumes of data.

Releases:
* We plan to rebase the 0.21 branch to trunk this month

New committers:
* None

Community:
* 178 subscribers to mapreduce-dev
* 280 subscribers to mapreduce-user

Features:

Security features are being implemented that include both the
Kerberos-based and token-based authentication and authorization so
that user's can define who is allowed to do what on their job.

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.

Releases:
 * Pig 0.6.0 released on 3/1/2010

New committers:
 * Dmitriy Ryaboy
 * Thejas Nair

Community:
 * 15 committers
 * 181 developers (compared to 171 in the last report)
 * 402 users (compared to 225 in the last report)

We've put out 4 GSOC ideas and received 2 student proposals.

Pig community reviewed the board's request to promote some of the
subprojects to TLP. Pig community consensus is to stay as Hadoop
subproject for the time being. Detailed discussion can be found at
http://www.mail-archive.com/pig-dev@hadoop.apache.org/msg08589.html.


ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed
applications.

Releases:
* release 3.3.0 on 25/March/10

New committers:
none

Community:

* 5 active committers, 2 PMC members

* 147 subscribers on zookeeper-dev (up from 114 3 months ago)
* 269 subscribers on zookeeper-user (up from 225 in the same timeframe)

We've put out a number of GSOC ideas and seen 6 student proposals.
Mentors are reviewing and we hope to gain a number of projects for
GSOC 2010.

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

Work is underway to certify and support running the ZooKeeper service
in production under Windows servers.

The ZooKeeper community reviewed the board's request to examine
subprojects with an eye to graduation to TLP status. Please find the
results of the ZooKeeper as TLP discussion here: http://bit.ly/c4fuZT
There was consensus amongst the development team that we will stay as
a subproject of Hadoop for the time being. Full details of the
discussion can be found in the thread provided

Wide concern that there is a disconnect between how Hadoop is run and the expectation from the board on how Apache projects are run; Jim to join the mailing list.

20 Jan 2010 [Owen O'Malley / Justin]

Hadoop status report for October 2009 to January 2010

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

Hadoop World China was held on 2009/11/15 and was well attended. It had
representation from Cloudera, Facebook, Google, and Yahoo. There was
also a smaller Hadoop Conference in Japan on 2009/11/13.

The Hadoop PMC added the following members:
  * Daniel Dai
  * Pradeep Kamath
  * Zheng Shao
  * Tsz Wo (Nicholas) Sze

AVRO

Avro is an inter-language serialization and RPC library that supports
versioning of schemas and protocols for both compiled and interpreted
languages.

Development has been brisk this quarter.  We're anticipating a 1.3
release in late January.

New committers:
  * Philip Zeyliger
  * Jeff Hammerbacher

Community:
  * 6 active committers
  * 94 subscribers on avro-dev
  * 114 subscribers on avro-user

CHUKWA

Chukwa is a distributed log collection framework that aggregates logs
from across a cluster into a reasonable number of HDFS files.

Release:
  * release 0.3.0 on 2009/11/09 with 40 issues
  * branch 0.4 planned for 2010/02

Current state of community
  * 4 active contributors
  * 15 subscribers on chukwa-dev
  * 17 subscribers on chukwa-user

The upcoming 0.4 release will include new real time Hadoop Activity monitor
for small to mid scale Chukwa deployment.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Releases:
  * branch 0.21 was made on 2009/09/18

New committers:
  * Boris Shkolnik
  * Jakob Homan

Community:
  * 904 subscribers on common-dev
  * 1838 subscribers on common-user

The development has continued to be active, but work on the blockers
on the upcoming 0.21 release has been moving very slowly. Even after
splitting HDFS and MapReduce out of Common a large number of patches
cross the sub-project boundaries.

HBASE

HBase is a distributed column-oriented database built on top of
Hadoop Common and HDFS.

Releases:
  * 0.20.2 on 2009/11/19 -- 40 fixes.
  * There is currently a release candidate out for 0.20.3

New committers:
  * Lars George.

HDFS

HDFS is a distributed file system that support reliable replicated
storage across the cluster using a single name space.

Releases:
* No new releases this quarter. A considerable effort is being made to
  make the earlier release 0.21 stable.

New committers:
  * Boris Shkolnik
  * Jakob Homan

Community:
  * 25 committers
  * 183 code contributors
  * 245 subscribers on hdfs-user
  * 163 subscribers on hdfs-dev

Features:
  The HDFS Append feature is now part of the latest HDFS 0.21 release. A
  design for implementing security in HDFS has been published in the
  Jira forum and is gathering feedback from developers.

HIVE

Hive is a data warehouse written on top of Hadoop. It provides a SQL
to query and manage data stored in Hadoop in table and partitions.

Releases:
  release 0.4.1 on 2009/12/17 with 7 issues

New committers:
  none this quarter

For the upcoming 0.5 release, there are 153 resolved issues and 3 open ones.

MAPREDUCE

MapReduce is a distribute computation framework for easily writing
applications that process large volumes of data.

Releases:
  * branch 0.21 was made on 2009/09/18

New committers:
  none this quarter

Community:
  * 178 subscribers to mapreduce-dev
  * 280 subscribers to mapreduce-user

Features:
  MapReduce 0.21 continues to stabilize relatively slowly. Security
  and changes to support Avro types through the shuffle continue to go in.

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.

Releases:
  * release 0.5.0 on 29/Oct/09 with 48 issues
  * release 0.6.0 branched and no blockers; to be released shortly

New Committers:
  * Ashutosh Chauhan
  * Dmitry Ryaboy
  * Richard Ding
  * Jeff Zhang

Community:
 * 354 subscribers to pig-user
 * 171 subscribers to pig-dev

ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed
applications.

On 12/4/09 we gained a new committer, Henry Robinson of Cloudera!

Releases:
  * release 3.2.1 on 9/Sept/09
  * release 3.1.2 on 14/Dec/09
  * release 3.2.2 on 14/Dec/09

Community:
  * 114 subscribers on zookeeper-dev (up from 99 3 months ago)
  * 225 subscribers on zookeeper-user (up from 175 in the same timeframe)

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

Justin suggested that the board ask that the Hadoop project answer the same questions regarding spinning off subprojects that was asked of Lucene in the previous month. Doug indicated that this was in progress.

21 Oct 2009 [Owen O'Malley / Jim]

Hadoop status report for July to October 2009

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

Hadoop World NYC on 2009/10/02 was well received by roughly 500
attendees. It was organized by Cloudera and sponsored by Yahoo,
Facebook, Amazon WebServices, IBM, Rackspace, Softlayer, eHarmony,
SuperMicro, Intel, Impetus, Booz Allen Hamilton, and Vertica. The
format was similar to Hadoop Summit with a general session with six 20
minute talks in the morning and three tracks each with ten 30 minute
talks in the afternoon. Hadoop World China will be held next month.

AVRO

Avro is an inter-language serialization and RPC library that supports
versioning of schemas and protocols for both compiled and interpreted
languages.

Releases:
  * release 1.2.0 on 2009-10-14
  * release 1.1.0 on 2009-09-08
  * release 1.0.0 on 2009-07-09

New committers:
  Matt Massie
  Thiruvalluvan M. G.

CHUKWA

Chukwa is a distributed log collection framework that aggregates logs
from across a cluster into a reasonable number of HDFS files.

New committers:
  Jerome Boulon

The SALSA and Mochi suite of Hadoop log analysis and visualization
tools, built at Carnegie Mellon, have been progressively phased in and
integrated with the Chukwa log collection and processing
infrastructure. The basic analysis and visualization components are
available, and further work is being done to improve the
user-friendliness of operating these added tools, and to improve the
automated manageability for analysis and visualization. This can also
serve as a roadmap for other analysis tools to be integrated with
Chukwa.

Development has been proceeding steadily. Chukwa is substantially more
reliable, flexible and robust than it was a year ago, or even four
months ago.  The system is in production use at UC Berkeley, and a
number of user suggestions have been incorporated. We intend to
release 0.3 in the coming weeks.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Releases:
  * release 0.19.2 on 2009/06/30 with 40 issues
  * release 0.20.1 on 2009/09/01 with 87 issues
  * branch 0.21 was made on 2009/09/18

New committers:
  Konstantin Boudnik for QA
  Suresh Srinivas

Community:
  * 784 subscribers on common-dev
  * 1738 subscribers on common-user

The upcoming 0.21 release will include the new FileContext API, which
will replace the FileSystem API, and the visibility and audience
annotations that let us mark the intended public-ness of various
classes.

HBASE

HBase is a distributed column-oriented database built on top of Hadoop
Common and HDFS.

HBase had a User Group meeting on August 7th and a Hackathon over the
weekend of August 7-9.  Both events were open to the public and hosted
by StumbleUpon.

Releases:
 * release 0.20.0 on 09/September/2009 - 465 issues addressed by this release
 * release 0.20.1 on 10/12/2009 - 60 issues addressed by this release

Current state of community
  * 23 active comtributors
  * 459 subscribers to hbase-user mailing list
  * 185 subscribers to hbase-dev mailing list

HDFS

HDFS is a distributed file system that support reliable replicated
storage across the cluster using a single name space.

Releases:
  * release 0.19.2 on 2009/06/30 as part of common 0.19.2
  * release 0.20.1 on 2009/09/01 as part of common 0.20.1
  * branch 0.21 was made on 2009/09/18

New committers:
  Konstantin Boudnik for QA
  Suresh Srinivas

Community:
  * 112 subscribers to hdfs-dev
  * 154 subscribers to hdfs-user

HDFS 0.21 was feature frozen and branched. The biggest features are
the much requested feature to append and sync to written files.  There
are 8 remaining blocker issues that need to be resolved.

A developer meet focused entirely on HDFS testing was held at the
Yahoo Sunnyvale campus. It was well represented by
about 15 contributors from Yahoo, Cloudera, Facebook, etc.

HIVE

Hive is a data warehouse written on top of Hadoop. It provides a SQL
to query and manage data stored in Hadoop in table and partitions.

Releases:
  release 0.4.0 on 2009/10/14 with 209 issues

New committers:
  * Edward Capriolo
  * He Yongqiang

Hive 0.4.0 had 46 new features, 115 bug fixes, 6 optimizations, 35
improvements and 2 incompatible changes.

At present there are 617 open issues with none of them as a blocker
for 0.5.0. A total of 619 issues have been resolved so far.

Community:

we continue to see new contributors in the project. Since
the last report the number of contributors in the project have grown
from 21 to 48. Out of these 35 contributors are external to
Facebook. A total of 94 people have commented, filed bugs or
contributed on the Hive JIRA so far. This number was at 49 at the time
of the last report.

MAPREDUCE

MapReduce is a distribute computation framework for easily writing
applications that process large volumes of data.

Releases:
  * release 0.19.2 on 2009/06/30 as part of common 0.19.2
  * release 0.20.1 on 2009/09/01 as part of common 0.20.1
  * branch 0.21 was made on 2009/09/18

New committers:
  Konstantin Boudnik for QA

Community:
  * 121 subscribers to mapreduce-dev
  * 172 subscribers to mapreduce-user

MapReduce 0.21 will have substantially improved Capacity and FairShare
schedulers that let administrators share clusters more
effectively.  The ability to run tasks as the submitting user and
a standardized job history format written in Avro's JSON format.

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.

Releases:
 * release 0.4.0 on 29/Sep/09 with 48 issues

Community:
 * 155 subscribers on pig-dev
 * 269 subscribers on pig-user (I could not update this number because
   my request failed with mailbox full error)

ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed
applications.

Releases:
  * release 3.2.1 on 2009/09/09

Community:
  * 99 subscribers on zookeeper-dev
  * 175 subscribers on zookeeper-user

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

Hadoop is a trademark of the ASF, and Hadoop World is a conference and therefore needs to be approved by ConCom. Originally, there was some confusion, but as ConCom approves of this usage, so no issue.

ConCom will work to clarify policies, such as whether the name of such conferences (in the future, not retroactively) need to be named as Apache Hadoop World or the like.

15 Jul 2009 [Owen O'Malley / Jim]

Hadoop status report for April to July 2009.

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

The Hadoop Summit '09 was held in Santa Clara on June 10 and was
attended by more than 750 people. Registration for the event was
$100. The morning was a general track and the afternoon had 3 tracks:
developers, administration, and applications. Cloudera and Yahoo also
offered two free Hadoop training sessions (basics and advanced) the
following day that were filled very quickly.

Two books were published about Hadoop:
  * Hadoop: The Definitive Guide by Tom White
    http://www.hadoopbook.com/
  * Pro Hadoop by Jason Venner
    http://developers.apress.com/book/view/9781430219422

AVRO

Avro is an inter-language serialization and RPC library that supports
versioning of schemas and protocols for both compiled and interpreted
languages.

Releases:
  * coming soon release 1.0.0 with 52 jiras addressed from 12
    contributors

CHUKWA

Chukwa is a distributed log collection framework that aggregates logs
from across a cluster into a reasonable number of HDFS files.

Releases:
  * release 0.1.2 on 14/May/2009 with 132 issues
  * currently voting on release 0.2.0 with 56 issues

COMMON (was previously Core)

Common is the shared libraries for HDFS and map/reduce.

This quarter we split the Core subproject into Common, HDFS, and
Map/Reduce. The old branches and releases are in Common, but for 0.21
in the three subprojects will release independently.

Releases:
  * release 0.20.0 on 22/Apr/09 with 114 issues
  * currently voting on 0.19.2 with 42 issues

Community:
  * 784 subscribers on common-dev
  * 1703 subscribers on common-user

HBASE

HBase is a distributed column-oriented database, build on top of
Hadoop Common and HDFS.

Releases:
  * release 0.19.2 on 09/May/09 - 17 issues addressed by this release
  * release 0.19.3 on 27/May/09 - 15 issues addressed by this release
  * release 0.20.0 (alpha) on 17/Jun/09
  * coming soon release 0.20.9 with 338 out of 354 issues addressed

New Committers:
  * Andrew Purtell (previously missed from the board report)
  * Nitay Joffe
  * Ryan Rawson
  * Jonathan Gray

3. Current state of community
  * 23 active comtributors (159 contributors since project inception)
  * 459 subscribers to hbase-user mailing list
  * 185 subscribers to hbase-dev mailing list

HDFS

HDFS is a distributed file system that support reliable replicated
storage across the cluster using a single name space.

A developer meet for Hadoop was held at the Yahoo Sunnyvale campus to
discuss requirements for HDFS Appends. It was well represented by
about 15 contributors from Yahoo, Microsoft, Facebook, etc. Another
developer meet was held at the Cloudera campus in Burlingame. This
meet discussed, among others, a few short-term HDFS issues that need
attention.

Community:
  * 50 subscribers on hdfs-dev
  * 51 subscribers on hdfs-user

HIVE

Hive is a data warehouse written on top of Hadoop Core. It provides a SQL
to query and manage data stored in hadoop in table and partitions.

Releases:
  * release 0.3.0 on 29/Apr/09 with 52 issues
  * coming soon release 0.4.0 in the next month with 130 issues
 At present there are 248 open issues filed against Hive.

Committers:
  * Yongqiang He

Community:
  * 30 contributors (up from 21 in the last report)
  * 67 people have commented on Hive Jiras

MAP/REDUCE

Map/reduce is a distribute computation framework for easily writing
applications that process large volumes of data.

Community:
  * 51 subscribers to mapreduce-dev
  * 56 subscribers to mapreduce-user

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.

Releases:
  * release 0.3.0 on 25/Jun/09 with 33 issues

Community:
  * 144 subscribers on pig-dev
  * 269 subscribers on pig-user

ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed
applications.

Releases:
  * release 3.2.0 on 8/Jul/09.
      A number of major new features are included, in particular;
    extending the client libraries to include common ZK use cases
    (recipes), namespace support, added python binding support, REST
    based API to the server, Perl binding support, numerous
    optimizations and bug fixes (122 JIRAs in this release).

Community:
  * 83 subscribers on zookeeper-dev
  * 141 subscribers on zookeeper-user

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

Regarding Hadoop Common: Greg wondered if Common could move over to commons.apache.org

Doug suggested that that was premature, and that much of the code may not be useful to non-Hadoop applications.

Brett agreed that that would not make sense.

Regarding developer meet up: Justin: only committers were invited, if somebody else had showed up, would they have been allowed?

Doug: invitations were sent directly to committers.

Jim: would others have been allowed?

Doug: others did hear about it and did attend. I would appreciate clear guidelines.

Roy: committers only is normal for a dev meeting

Jim: I remember an issue with a "closed" meeting with Geronimo, and will dig up those minutes. You want to avoid any impression that it is by invitation only.

Roy: there is no problem with contributors only, the problem is if you only invite a subset of the contributors

Justin: the issue is that contributors may be a superset of committers

Doug: I would be happy with a rule that it should be discussed on the dev list, and be invitation only, with all committers being included

Jim: that makes sense

Roy: suggests updating /dev with this information

Jim: volunteers?

Roy: will do

15 Apr 2009 [Owen O'Malley / Bertrand]

Hadoop is a set of tools for creating and managing distributed
applications, especially those with large data sets.

Hadoop was the focus of a nice article in the New York Times
(http://tinyurl.com/coafzr) on 17 March 2009. Unfortunately, the
article failed to mention that Hadoop is an Apache project.

The PMC added 8 new members: Raghu Angadi, Devaraj Das, Chris Douglas,
Alan Gates, Mahadev Konar, Hairong Kuang, Konstantin Shvachko, and
Ashish Thusoo.

We've also voted to create two new subprojects: Chukwa and Avro.
Chukwa is a distributed log aggregation and cluster monitoring system
that was originally in Core's contrib directory. The initial
committers for Chukwa are Ariel Rabkin and Eric Yang. Avro is a
serialization and RPC library with a focus on supporting versioned
persistent data and supporting scripting languages. The initial
committers for Avro are Doug Cutting and Sharad Agarwal.

Hadoop was well represented at ApacheCon EU, with a track of talks
about Core, HBase, and Pig.

A Hadoop Summit is being organized for June 10th in Santa Clara.

CORE, HDFS, and MAP/REDUCE

Core is the fundamental set of utilities, including RPC,
serialization, and compression that the rest of Hadoop depends
on. HDFS provides a distributed file system. Map/Reduce provides a
framework for distributed applications that process large data sets.

Amazon has started explicitly marketing and supporting Hadoop as a
service on EC2 at a much lower cost than a standard EC2 virtual
machine.

We are still in the process of factoring Map/Reduce and HDFS out of
Core. The code is separated and all that is left to be split are the
unit test cases and their dependencies.

Releases:
0.20.0 is nearing release, with  280 jiras addressed.
0.18.3 was released on 27 Jan 2009 with 51 jiras addressed.

The current plan is to try and release Core, HDFS, and Map/Reduce 1.0
this year.

Community:

Core has added Sharad Agarwal, Giri Kesavan, Ariel Rabkin, Sanjay
Radia, and Eric Yang as committers. The community is active and
growing.

HBASE

HBase is a distributed column-oriented database, build on top of
Hadoop Core.

Releases:
0.18.1 was released on 27 October 2008. 14 issues were addressed.
0.19.0 was released on 21 January 2009. 184 issues were addressed.
0.19.1 was released on 19 March 2009. 43 issues were addressed.

Work is underway on release 0.20.0 with 97 of 174 issues resolved. It
is expected that many of the open issues will be pushed to a
subsequent release.

Meet-ups:

January 14, 2009; March 3, 2009 - HBase User Group meetings in San Francisco
January 30, 2009 - HBase Hackathon in Los Angeles

Community:

There are no new committers since the last report. There are about 7
active contributors (of which 3 are committers).

There are also a number of people who come by to "kick the tires" but
then leave because of possible data loss due to a lack of a patch for
HADOOP-4379.

HIVE

Hive is a data warehouse written on top of Hadoop Core. It provides a SQL
to query and manage data stored in hadoop in table and partitions.

Releases:

Our 0.2.0 branch that was to be released in Feb, 2009 was not released
and was not put for a vote as there were some significant fixes which
the community felt should be checked in before it could be put to
vote. As this branch was not fully soak tested on Facebook production
load, we decided to target the 0.3.0 branch for release.

0.3.0 was branched in Mar, 2009. All the blockers in that branch have
been fixed. We are going to put a release candidate from that branch
up for vote by Apr 15, 2009.

At present there are 177 open issues with none of them as a blocker
for 0.3.0. 111 issues have been resolved since the last report in
January.

Community:

Hive continues to see growth in the number and diversity of
contributors. Since the last report the number of contributors in the
project have grown from 16 to 21.  We added Prasad Chakka, Raghu
Murthy, Johan Oskarsson, and Joydeep Sen Sarma as committers.

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.

A vote was called on Pig 0.2.0 on 3/27/09. This release is major
redesign of the system including addition of type system, significant
(2-10x) performance improvements, addition of Limit, ORDER BY desc and
Grunt shell improvements.

ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed applications.

Releases:

3.1.0 was released on 2009/02/13 with 70 jiras fixed.
3.1.1 was released on 2009/03/27 with 11 jiras fixed.

Our next release, 3.2.0, is slated for 5/26/2009. A number of major
new features will be included, in particular; extending the client
libraries to include common ZK use cases (recipes), adding REST based
API to the server, Perl binding support, numerous optimizations.

Community:

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

There was a discussion around umbrella projects. There's some general concern about splitting up tightly coupled projects. Fear of losing cross pollination. J. Aaron will post to board@/members@.

Bertrand takes the action item to communicate the board view on umbrella projects; recommend thinking about spinning off self-contained projects to TLP.

21 Jan 2009 [Owen O'Malley / Bertrand]

Hadoop status report for September 2008 to January 2009.

Hadoop is a set of tools for creating and managing distributed
applications.

There were various Hadoop user meetings:
 * Beijing
 * Berlin
 * Los Angeles (HBase)
 * New Orleans (as part of ApacheCon US)
 * New York
 * San Diego
 * San Francisco (HBase)
 * Santa Clara

CORE, HDFS, and MAP/REDUCE

Core is the fundamental set of utilities, including RPC,
serialization, and compression that the rest of Hadoop depends
on. HDFS provides a distributed file system. Map/Reduce provides a
framework for distributed applications that process large data sets.

The pace of development in Core is very rapid and the community is
active. Some of the Chinese developers have translated the
documentation for Core into Chinese and submitted them as a
patch.

Although the work to factor out Hive is complete, the factoring
for HDFS and Map/Reduce is pretty close and they should become
separate subprojects in the next 3 months.

Discussions, plans, and work have continued to work toward a 1.0 release of
Core, HDFS, and Map/Reduce. The hope is to achieve the desired levels of
compatibility and stability and release 1.0 this year.

Releases:
0.20.0 is feature-frozen, but unreleased with 184 jiras fixed.
0.19.0 was released on 2008/11/18 with 360 jiras fixed.
0.18.2 was released on 2008/11/3 with 25 jiras fixed.

HBASE

HBase is a distributed column-oriented database, built on top of
Hadoop Core.

Releases:
0.18.1 was released on 2008/10/27 with 14 jiras fixed.

10 of 11 issues have been addressed for 0.18.2, but it is unclear
if 0.18.2 will be released given that 0.19.0 will be released soon.

At this point, 176 of 176 issues have been addressed for hbase-0.19.0.
Testing is in progress at this moment. If no new blocker issues are
identified, a release candidate will be published in the next few days.

HIVE

Hive is a data warehouse written on top of hadoop. It provides a SQL
to query and manage data stored in hadoop in table and partitions.

Hive was split out of Core on 11/12/2008. Most of the migration
related work from hadoop contrib to hadoop subproject has been
completed. Enabling Hudson builds for Hive is still
pending. Continuous builds on committed changes using CABIE are
already enabled.

Releases:

We are planning to make our first release, which is named 0.2.0,
sometime in the Feb 2009. At present we have 130 outstanding issues
with 23 of those identified as blockers for a release. 103 issues have so
far been resolved since Hive was open sourced.

Hive has added Ashish Thusoo and Namit Jain as committers. The number
of contributors to the project has grown from 7 to 16 since Hive
became a hadoop subproject. 6 of these are contributors external to
facebook.

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled with
infrastructure for evaluating these programs.

Pig graduated from Apache incubator and became Hadoop subproject on
10/17/08

Releases:
0.1.1 was released on 2008/12/8/08 with 2 jiras.

Release 0.1.1 was primarily focused on integrating with Hadoop 0.18.

Pig welcomed Pradeep Kamath and Santhosh Srinivasan as new committers.

ZOOKEEPER

Zookeeper is a reliable coordination service for distributed applications.

Releases:

3.0.1 was released on 2008/11/24 with 16 jiras fixed.
3.0.0 was released on 2008/10/27 with 108 jiras fixed.

Our next release, 3.1.0, is slated for 1/19/2009. A number of major
new features will be included, in particular; improved management
(JMX) support and Quota (ie. filesystem quota) support will also be
added.

Feedback and community involvement has been slowly increasing.

15 Oct 2008 [Owen O'Malley / Henning]

Hadoop is a set of tools for creating and managing distributed applications.

The PMC felt that the Core project had grown difficult to manage as a
single subproject. With the core-dev email list topping over 3600
messages last month it is difficult to keep on top of the entire
project. We therefore have voted to split Core into 4 pieces: Core,
which is the common infrastructure; HDFS, which is the distributed
file system; Map/Reduce, which is the distributed computation
framework; and Hive, which is a higher-level query processor built on
Map/Reduce. After release 0.19 has stabilized we will work on
splitting up the code bases.

Additionally, we have started a vote whether to accept Pig as a
subproject when it graduates from the incubator.

We have added Arun Murthy to the PMC.

Hadoop will be well represented at ApacheCon US next month. There will
be 3 Hadoop talks in the main series and an assortment of related
talks at the Hadoop Camp. There will be presentations about the Core,
Hive, Pig, and Zookeeper subprojects at Hadoop Camp.

CORE

Core is a framework for building distributed applications, which
includes a distributed file system and map/reduce.

Releases:
0.19.0 is feature-frozen, but unreleased, with 270 jiras.
0.18.2 is unreleased, with 3 jiras.
0.18.1 was released on 2008/09/17 with 6 jiras.
0.18.0 was released on 2008/08/19 with 266 jiras.
0.17.3 is unreleased, with 4 jiras.
0.17.2 was released on 2008/08/11, with 12 jiras.

The development has been ever increasing with 0.19.0 having the
largest number of patches in a release. In 0.19, includes Hive as a
contrib module. We have started discussions on the email lists about
when we should release 1.0 and what level of forwards and backwards
compatibility we should guarantee.

HBASE

HBase is a distributed column-oriented database, build on top of Hadoop Core.

Releases:
0.2.0 was released on 2008/08/08. 293 issues were addressed by this release.
0.2.1 was released on 2008/09/13. 44 issues were addressed by this release.
0.18.0 was released on 2008/09/21. 58 issues were addressed by this release.

The hbase-0.2.x releases run on hadoop-0.17.x. With hbase-0.18.0,
releases have been renumbered to reflect the version of hadoop that
the hbase release runs on.

Work has started on release 0.18.1. 5 of 9 issues have been addressed.
Work has started on release 0.19.0. 25 of 58 issues have addressed.

On Wednesday, October 8, Microsoft agreed to let two of their
engineers, who are committers and PMC members, resume their
contributions to HBase. The contributions had been blocked when
Microsoft acquired Powerset last quarter.

Andrew Purtell was added as an HBase committer.

ZOOKEEPER

Zookeeper is a service for coordinating processes of distributed applications.

Migration from SourceForge to Apache of source, documentation, wiki,
issue tracking, mailing lists, etc... is complete. We are planning to
make our first Apache release, which is named 3.0, on Oct 22nd with
over 85 issues addressed.

We discussed growth of the project... no significant concerns.

Jim to check on Zookeeper's filling out of the Incubator's IP clearance.

16 Jul 2008

Hadoop is a set of tools for distributed applications.

The PMC voted to add a new Hadoop subproject, named Zookeeper, which
is a distributed coordination service. Zookeeper was developed by
Yahoo and was granted to Apache. Zookeeper should form a great basis
for Map/Reduce and HDFS high availability. The original committers are
Patrick Hunt, Flavio Junqueira, Mahadev Konar, Andrew Kornev, and Ben
Reed.

There are now monthly Hadoop user get togethers in northern California
(http://upcoming.yahoo.com/event/869166) and there is one scheduled
for August in London (http://upcoming.yahoo.com/event/506444).

CORE

Core is a framework for building distributed applications, which
includes a distributed file system and map/reduce.

Releases:
0.18.0 is feature frozen but unreleased, currently with 254 jiras.
0.17.2 is unreleased, currently with 4 jiras
0.17.1 was released 23 June 2008 fixing 10 jiras
0.17.0 was released 18 May 2008 fixing 200 jiras.
0.16.4 was released 5 May 2008 fixing 4 jiras.
0.16.3 was released 16 April 2008 fixing 7 jiras.

Core won the annual terabyte sort benchmark http://tinyurl.com/4o8bns,
which is the first time that either a Java or an open source program
won the competition. Core has added 4 committers, Johan Oskarsson,
Lohit Vijaya Renu, Zheng Shao, and Tsz Wo Sze. We've had very active
development and active user base.

HBASE

HBase is a distributed column-oriented database, build on top of Hadoop Core.

Releases:
0.1.1 was released on 27 March 2008. 12 issues were addressed by this release.
0.1.2 was released on 13 May 2008. 27 issues were addressed by this release.
0.1.3 was released on 27 June 2008. 16 issues were addressed by this release.

The hbase-0.1.x releases runs on hadoop-0.16.x.

Work continues on release 0.2.0 which will run on hadoop-0.17.x.
231 of 239 issues have been resolved. We are targeting the end of July
for a release candidate.

On Tuesday, July 1, Microsoft and Powerset signed a deal for Microsoft to
acquire Powerset. Two of the HBase committers (who are also members of the
Hadoop PMC) are employed by Powerset and may not be able to continue work
on HBase after the deal closes. They and their manager are working with
Microsoft to determine what will happen, but may not know for several weeks
yet.

ZOOKEEPER

Zookeeper is a service for coordinating processes of distributed applications.

Migration from SourceForge to Apache is in progress. Yahoo's code
grant was filed with the ASF, the SourceForge SVN snapshot has been
loaded into ASF SVN and Hudson is now running daily builds on the
codebase. SourceForge tracker has been fully migrated to Jira and the
developers are now using ASF Jira and mailing lists. Migration of
documentation and website is in progress and expected to be completed
in the next couple of weeks. A new release of ZooKeeper is being
worked on in parallel with the move, completing this will be a major
focus subsequent to the ASF migration.

Ben Reed (Yahoo) and Ted Dunning (Veoh) presented ZooKeeper at the
latest Hadoop social - reaction was extremely positive. Many attendees
were already using ZK, and almost all were at least familiar with the
project.

16 Apr 2008 [Owen O'Malley / J Aaron]

TLP

The Hadoop Summit (http://upcoming.yahoo.com/event/436226/) occurred
on March 25 and had more than 300 people attending. It was well
received by the community.

CORE

Development has been active this month with 0.16.2 being released on 2
April 2008. We will likely release 0.16.3 with 7 jiras this week.
Release 0.17, which has 160 jiras, has been branched and will be
released when it is stabilized. Hadoop Core was well represented at
ApacheCon EU with a BOF and 3 talks by Owen O'Malley, Tom White, and
Allen Wittenauer.

HBASE

The first version of HBase as a subproject, version 0.1.0, was
released on March 28th. We are now working on patches for version
0.1.1, which will be released after hadoop-0.16.2. 6 of 8 identified
issues have been resolved.

With the focus on releasing 0.1.0, progress slowed a bit for release
0.2.0. Since last month, an additional 20 issues have been resolved
and an additional 29 have been identified for a total of 74 out of 102
issues resolved.

19 Mar 2008 [Owen O'Malley / Greg]

TLP

We have filed the appropriate paperwork for using cryptography within
Hadoop. The first use will be HADOOP-2239, which will likely be
committed this week.

Yahoo and the Computing Community Consortium are sponsoring a Hadoop
Summit (http://upcoming.yahoo.com/event/436226/) on March 25 to bring
together users and developers. 215 people have signed up to attend.

CORE

We added two committers this month: Mukund Madhugiri for QA and
release engineering, and Hemanth Yamjiala for contrib.

Development has been active this month and we have released 0.16.1
this month, which fixed 40 jiras. Release 0.17 is scheduled to feature
freeze in the first week of April and currently includes 70 committed
jiras.

HBASE

Development has been focused on making our first subproject release,
0.1.0. The 0.1.0 release is feature frozen and runs against Hadoop
Core 0.16.x. 20 of the 25 identified blocker issues have been
resolved.

The priorities for the 0.2 release are robustness and scalability. The
proposal is on the HBase Wiki at:
http://wiki.apache.org/hadoop/Hbase/Plan-0.2. HBase 0.2 is based on
Hadoop Core trunk and is making progress as well with 54 of 73 issues
resolved.

An hbase contributor, Dennis Kubes, bought the domain hbase.org for
the project, which points to hadoop.apache.org/hbase.

A second HBase Users Group meeting was held at Powerset on March 4,
with approximately 30 people attending. The meeting was informal,
mostly getting the user community to discuss problems they had
encountered using HBase and to gather issues blocking the 0.1.0
release.

Greg to work with Owen to arrange for the transfer of the hbase.org domain to the ASF.

20 Feb 2008 [Owen O'Malley / Henning]

TLP

The top-level project completed the split of Hadoop out of Lucene and
into a TLP. The subproject that was Hadoop, is now called Hadoop
Core. We have also moved HBase into a sub-project from being in Hadoop
Core's contrib directory. Although Core and HBase have many ties, the
contributor list and code base is largely disjoint between them and
the split will reduce the heavy traffic on both development lists.

CORE

Hadoop Core has released 0.16.0, 0.15.3, and 0.15.2. As we move toward
more stability, we've moved our feature freezes to every 3 months
(beginning of Jan, Apr, July, and Oct). Development has been very
active, including adding user permissions to HDFS. (Fixed Jira counts:
23 unreleased, 180 for 0.16.0, 4 for 0.15.3, and 15 for 0.15.2)

HBASE

HBase, which is a distributed storage system for structured data, has
become a subproject of Hadoop. We have added Bryan Duxbury as a
committer. Development has been very active (Fixed Jira counts: 7
unreleased, 142 for 0.16.0)

Approved by General Consent.

16 Jan 2008

Establish the Apache Hadoop Project

 WHEREAS, the Board of Directors deems it to be in the best
 interests of the Foundation and consistent with the
 Foundation's purpose to establish a Project Management
 Committee charged with the creation and maintenance of
 open-source software related to a distributed computing
 platform, including a distributed filesystem and an
 implementation of the map/reduce distributed computing
 metaphor, for distribution at no charge to the public.

 NOW, THEREFORE, BE IT RESOLVED, that a Project Management
 Committee (PMC), to be known as the "Apache Hadoop Project",
 be and hereby is established pursuant to Bylaws of the
 Foundation; and be it further

 RESOLVED, that the Apache Hadoop Project be and hereby is
 responsible for the creation and maintenance of software
 related to a distributed computing platform, including a
 distributed filesystem and an implementation of the map/reduce
 distributed computing metaphor; and be it further

 RESOLVED, that the office of "Vice President, Apache Hadoop" be
 and hereby is created, the person holding such office to
 serve at the direction of the Board of Directors as the chair
 of the Apache Hadoop Project, and to have primary responsibility
 for management of the projects within the scope of
 responsibility of the Apache Hadoop Project; and be it further

 RESOLVED, that the persons listed immediately below be and
 hereby are appointed to serve as the initial members of the
 Apache Hadoop Project:

   * Andrzej Bialecki             <ab@apache.org>
   * Doug Cutting                 <cutting@apache.org>
   * Nigel Daley                  <ndaley@apache.org>
   * Jim Kellerman                <jimk@apache.org>
   * Owen O'Malley                <omalley@apache.org>
   * Enis Soztutar                <enis@apache.org>
   * Michael Stack                <stack@apache.org>
   * Christophe Taton             <taton@apache.org>
   * Thomas E. White              <tomwhite@apache.org>

 NOW, THEREFORE, BE IT FURTHER RESOLVED, that Owen O'Malley
 be appointed to the office of Vice President, Apache Hadoop, to
 serve in accordance with and subject to the direction of the
 Board of Directors and the Bylaws of the Foundation until
 death, resignation, retirement, removal or disqualification,
 or until a successor is appointed; and be it further

 RESOLVED, that the Apache Hadoop Project be and hereby
 is tasked with the migration and rationalization of the Apache
 Lucene Hadoop sub-project; and be it further

 RESOLVED, that all responsibilities pertaining to the Apache
 Lucene Hadoop sub-project encumbered upon the
 Apache Lucene Project are hereafter discharged.

 Special order 7C, Establish the Apache Hadoop Project,
 was approved by Unanimous Vote.