Skip to Main Content
Apache Events The Apache Software Foundation
Apache 20th Anniversary Logo

This was extracted (@ 2024-04-17 22:10) from a list of minutes which have been approved by the Board.
Please Note The Board typically approves the minutes of the previous meeting at the beginning of every Board meeting; therefore, the list below does not normally contain details from the minutes of the most recent Board meeting.

WARNING: these pages may omit some original contents of the minutes.
This is due to changes in the layout of the source minutes over the years. Fixes are being worked on.

Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).

Hadoop

17 Apr 2024 [Xiaoqiao He / Craig]

Report was filed, but display is awaiting the approval of the Board minutes.

17 Jan 2024 [Xiaoqiao He / Bertrand]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

## Project Status:
Current project status: Ongoing
Issues for the board: None

## Membership Data:
Apache Hadoop was founded 2008-01-16 (16 years ago)
There are currently 245 committers and 125 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

Community changes, past quarter:
- Shilun Fan was added to the PMC on 2023-10-31
- No new committers. Last addition was Simbarashe Dzinamarira on 2023-09-27.

## Project Activity:
Recent releases:
3.3.6 was released on 2023-06-25.
3.3.5 was released on 2023-03-23.
3.3.4 was released on 2022-08-08.
We are preparing for the 3.4.0 release, which is ongoing and
delayed more than expected time.
The announcement of 3.3.7-aws[1] is not from Hadoop PMC and
it is not an official release.
[1] https://lists.apache.org/thread/0ptzdzs30vddotqg1gnxmr38c03r5xl9

## Community Health:
- Mailing list activity:
common-dev@hadoop.apache.org had a 0% increase in
 traffic in the past quarter (472 emails compared to 471)
common-issues@hadoop.apache.org had a 3% decrease in
 traffic in the past quarter (5436 emails compared to 5560)
hdfs-dev@hadoop.apache.org had a 6% decrease in
 traffic in the past quarter (458 emails compared to 487)
hdfs-issues@hadoop.apache.org had a 17% increase in
 traffic in the past quarter (2410 emails compared to 2044)
mapreduce-dev@hadoop.apache.org had a 0% increase in
 traffic in the past quarter (276 emails compared to 276)
mapreduce-issues@hadoop.apache.org had a 10% decrease in
 traffic in the past quarter (194 emails compared to 214)
user@hadoop.apache.org had a 20% decrease in traffic
 in the past quarter (32 emails compared to 40)
yarn-dev@hadoop.apache.org had a 0% increase in
 traffic in the past quarter (360 emails compared to 358)
yarn-issues@hadoop.apache.org had a 0% increase in
 traffic in the past quarter (1173 emails compared to 1163)

- Commit activity:
410 commits in the past quarter (4% increase)
72 code contributors in the past quarter (18% increase)

- JIRA activity:
183 issues opened in JIRA, past quarter (-47% change)
129 issues closed in JIRA, past quarter (-38% change)

- GitHub PR activity:
261 PRs opened on GitHub, past quarter (-22% change)
229 PRs closed on GitHub, past quarter (-9% change)

It looks like JIRA and GitHub traffic are both decreasing
in the past quarter. However, the project development
overall looks healthy with more contributors and commits
check in, also the next release is in progress.

18 Oct 2023 [Xiaoqiao He / Rich]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

## Project Status:
Current project status: Ongoing
Issues for the board: None

## Membership Data:
Apache Hadoop was founded 2008-01-16 (16 years ago)
There are currently 245 committers and 124 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- No new PMC members. Last addition was Mukund Thakur on 2023-01-21.
- Ahmar Suhail was added as committer on 2023-08-25
- Simbarashe Dzinamarira was added as committer on 2023-09-27
- Shuyan Zhang was added as committer on 2023-09-27

## Project Activity:
Recent releases:
3.3.6 was released on 2023-06-25.
3.3.5 was released on 2023-03-23.
3.3.4 was released on 2022-08-08.

We are preparing on 3.4.0, which will be released before the end of 2023.

## Community Health:
- Mailing list activity:
common-dev@hadoop.apache.org had a 9% decrease in
  traffic in the past quarter
 (489 emails compared to 532)
common-issues@hadoop.apache.org had a 0%
  increase in traffic in the past quarter
 (5668 emails compared to 5612)
hdfs-dev@hadoop.apache.org had a 6% increase
  in traffic in the past quarter
 (500 emails compared to 469)
hdfs-issues@hadoop.apache.org had a 9% increase
  in traffic in the past quarter
 (2065 emails compared to 1891)
mapreduce-dev@hadoop.apache.org had a 4%
  decrease in traffic in the past quarter
 (283 emails compared to 294)
mapreduce-issues@hadoop.apache.org had a 81%
  increase in traffic in the past quarter
 (187 emails compared to 103)
user@hadoop.apache.org had a 39% decrease in
  traffic in the past quarter
 (21 emails compared to 34)
yarn-dev@hadoop.apache.org had a 6% decrease in
  traffic in the past quarter
 (366 emails compared to 389)
yarn-issues@hadoop.apache.org had a 2% increase
  in traffic in the past quarter
 (1133 emails compared to 1105)

- JIRA activity:
330 issues opened in JIRA, past quarter (20% increase)
191 issues closed in JIRA, past quarter (-16% change)

- Commit activity:
359 commits in the past quarter (-32% change)
59 code contributors in the past quarter (-23% change)

- GitHub PR activity:
318 PRs opened on GitHub, past quarter (12% increase)
232 PRs closed on GitHub, past quarter (-12% change)

From JIRA and Github PR activity, the review bandwidth/active reviewers are
not enough, we are trying to improve it and try to explore potential
committers and add some new committers.

16 Aug 2023

Change the Apache Hadoop Project Chair

 WHEREAS, the Board of Directors heretofore appointed Wei-Chiu Chuang
 (weichiu) to the office of Vice President, Apache Hadoop, and

 WHEREAS, the Board of Directors is in receipt of the resignation of
 Wei-Chiu Chuang from the office of Vice President, Apache Hadoop, and

 WHEREAS, the Project Management Committee of the Apache Hadoop project
 has chosen by vote to recommend Xiaoqiao He (hexiaoqiao) as the
 successor to the post;

 NOW, THEREFORE, BE IT RESOLVED, that Wei-Chiu Chuang is relieved and
 discharged from the duties and responsibilities of the office of Vice
 President, Apache Hadoop, and

 BE IT FURTHER RESOLVED, that Xiaoqiao He be and hereby is appointed to
 the office of Vice President, Apache Hadoop, to serve in accordance
 with and subject to the direction of the Board of Directors and the
 Bylaws of the Foundation until death, resignation, retirement, removal
 or disqualification, or until a successor is appointed.

 Special Order 7B, Change the Apache Hadoop Project Chair, was
 approved by Unanimous Vote of the directors present.

19 Jul 2023 [Wei-Chiu Chuang / Justin]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

## Project Status:
Current project status: Ongoing
Issues for the board: None

## Membership Data:
Apache Hadoop was founded 2008-01-16 (16 years ago)
There are currently 242 committers and 124 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- No new PMC members. Last addition was Mukund Thakur on 2023-01-21.
- No new committers. Last addition was Shilun Fan on 2022-11-22.

## Project Activity:
Recent releases:
3.3.6 was released on 2023-06-25.
3.3.5 was released on 2023-03-23.
3.3.4 was released on 2022-08-08.

## Community Health:
- Mailing list activity:
common-dev@hadoop.apache.org had a 2% increase in traffic in the past quarter
 (529 emails compared to 518)
common-issues@hadoop.apache.org had a 7% increase in traffic in the past quarter
 (5700 emails compared to 5309)
dev@hadoop.apache.org had a 500% increase in traffic in the past quarter
 (6 emails compared to 1):
hdfs-dev@hadoop.apache.org had a 11% increase in traffic in the past quarter
 (494 emails compared to 444)
hdfs-issues@hadoop.apache.org had a 27% increase in traffic in the past quarter
 (2040 emails compared to 1603)
mapreduce-dev@hadoop.apache.org had a 0% increase in traffic in the past quarter
 (297 emails compared to 297)
mapreduce-issues@hadoop.apache.org had a 24% increase in traffic in the past quarter
 (116 emails compared to 93)
user@hadoop.apache.org had a 10% decrease in traffic in the past quarter
 (28 emails compared to 31):
yarn-dev@hadoop.apache.org had a 9% increase in traffic in the past quarter
 (409 emails compared to 372)
yarn-issues@hadoop.apache.org had a 19% increase in traffic in the past quarter
 (1189 emails compared to 992)

- JIRA activity:
277 issues opened in JIRA, past quarter (8% increase)
238 issues closed in JIRA, past quarter (29% increase)

- Commit activity:
532 commits in the past quarter (53% increase)
75 code contributors in the past quarter (13% increase)

- GitHub PR activity:
288 PRs opened on GitHub, past quarter (11% increase)
271 PRs closed on GitHub, past quarter (34% increase)

In the recent board report feedback, some board members worried about 'a 100%
decrease in dev-list activity does sound quite serious'. We try to check it
and only happens for dev@hadoop.apache.org, the reason is that there are
several separate dev-list for every hadoop project sub-modules, and only few
people use this mailing list thus triffic of dev@hadoop.apache.org is big
moves, we are discussing if need to remove this dev-list, other sub-modules
dev-list traffic works actively.

We even added hadoop-api-shim as a sub-project in a new repo under hadoop:
https://github.com/apache/hadoop-api-shim

19 Apr 2023 [Wei-Chiu Chuang / Bertrand]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Hadoop was founded 2008-01-15 (15 years ago)
There are currently 242 committers and 124 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- Mukund Thakur was added to the PMC on 2023-01-20
- No new committers. Last addition was Shilun Fan on 2022-11-21.

## Project Activity:
3.3.5 was released on 2023-03-23, which added a new Vectored IO API support
(HADOOP-18103).

We are continuing to maintain 3.3.x release line. Meanwhile, Steve initiated a
thread to discuss dropping JDK8 support moving forward.

New feature development started this quarter:
* YARN-11411 [Umbrella] Build Concurrent Yarn Scheduler
* HADOOP-18671 Add recoverLease(), setSafeMode(), isFileClosed() APIs to
 FileSystem

## Community Health:

It appears the development activities are gradually trending down, though this
is to be expected as the project matures. I see that Steve responded to most
of the vulnerability reports, which is great. However, the community (me
included) collectively should be more vigilant to vulnerability reports.

* dev@hadoop.apache.org had a 0% decrease in traffic in the past quarter (1
emails compared to 1)
* general@hadoop.apache.org had a 29% decrease in traffic
in the past quarter (5 emails compared to 7)
* mapreduce-issues@hadoop.apache.org had a 67% decrease in traffic in the past
quarter (97 emails compared to 288)
* user@hadoop.apache.org had a 62% decrease
in traffic in the past quarter (31 emails compared to 80)
* yarn-issues@hadoop.apache.org had a 31% decrease in traffic in the past
quarter (1102 emails compared to 1584)
* 237 issues opened in JIRA, past quarter
(-5% change)
* 175 issues closed in JIRA, past quarter (-14% change)
* 342 commits in the past quarter (4% increase)
* 66 code contributors in the past quarter
(-9% change)
* 244 PRs opened on GitHub, past quarter (-9% change)
* 193 PRs closed on GitHub, past quarter (-23% change)

18 Jan 2023 [Wei-Chiu Chuang / Sander]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Hadoop was founded 2008-01-15 (15 years ago)
There are currently 242 committers and 123 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- No new PMC members. Last addition was Stephen O'Donnell on 2022-07-25.
- Shilun Fan was added as committer on 2022-11-21

## Project Activity:
No new release was GA in this quarter. However, 3.3.5 is pending release.

## Community Health:
Overall community activities seem to be getting lower, partly due to the
holiday season.

* dev@hadoop.apache.org had a 100% decrease in traffic in the past quarter (0
 emails compared to 6)
* mapreduce-issues@hadoop.apache.org had a 59% decrease in traffic in the past
 quarter (87 emails compared to 208)
* user@hadoop.apache.org had a 50% decrease in traffic in the past quarter (26
 emails compared to 51)
* user-zh@hadoop.apache.org had a 100% increase in traffic in the past quarter
 (6 emails compared to 3)
* yarn-dev@hadoop.apache.org had a 26% decrease in traffic in the past quarter
 (349 emails compared to 466)
* yarn-issues@hadoop.apache.org had a 39% decrease in traffic in the past
 quarter (751 emails compared to 1227)
* 236 issues opened in JIRA, past quarter (-50% change)
* 191 issues closed in JIRA, past quarter (-41% change)
* 310 commits in the past quarter (-21% change)
* 73 code contributors in the past quarter (32% increase)
* 254 PRs opened on GitHub, past quarter (-46% change)
* 238 PRs closed on GitHub, past quarter (-36% change)

19 Oct 2022 [Wei-Chiu Chuang / Rich]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Hadoop was founded 2008-01-15 (15 years ago)
There are currently 241 committers and 123 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- Stephen O'Donnell was added to the PMC on 2022-07-25
- Mehakmeet Singh was added as committer on 2022-07-29
- Zander Xu was added as committer on 2022-09-28

## Project Activity:
3.3.4 was released on 2022-08-08.
3.2.4 was released on 2022-07-22.

3.3.5 release work is under way.

Announced CVE:

* CVE-2022-25168 Command injection in
 org.apache.hadoop.fs.FileUtil.unTarUsingTar
* CVE-2021-25642 Apache Hadoop YARN remote code execution in
 ZKConfigurationStore of capacity scheduler

## Community Health:

It looks like JIRA and github traffic are both decreasing. However, the
project development overall looks healthy with a number of releases published
or in progress. Additionally ApacheCon NA took place in October and a number
of talks were related to Hadoop.

Hadoop Meetup took place in Shanghai on Sep 24. Lots of talks and large crowd.

* dev@hadoop.apache.org had a 100% decrease in traffic in the past quarter (0
 emails compared to 6)
* mapreduce-issues@hadoop.apache.org had a 59% decrease in traffic in the past
 quarter (87 emails compared to 208)
* user@hadoop.apache.org had a 50% decrease in traffic in the past quarter (26
 emails compared to 51)
* user-zh@hadoop.apache.org had a 100% increase in traffic in the past quarter
 (6 emails compared to 3)
* yarn-dev@hadoop.apache.org had a 26% decrease in traffic in the past quarter
 (349 emails compared to 466)
* yarn-issues@hadoop.apache.org had a 39% decrease in traffic in the past
 quarter (751 emails compared to 1227)
* 429 issues opened in JIRA, past quarter (10% increase)
* 299 issues closed in JIRA, past quarter (2% increase)
* 353 commits in the past quarter (-28% change)
* 55 code contributors in the past quarter (-37% change)
* 423 PRs opened on GitHub, past quarter (4% increase)
* 336 PRs closed on GitHub, past quarter (1% increase)

20 Jul 2022 [Wei-Chiu Chuang / Sam]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Hadoop was founded 2008-01-15 (14 years ago)
There are currently 239 committers and 122 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- No new PMC members. Last addition was Sun Chao on 2022-03-07.
- Tao Li was added as committer on 2022-04-22

## Project Activity:
2.10.2 was released on 2022-05-31.
3.3.3 was released on 2022-05-17.

Steve Loughran is planning to release 3.3.4, and Masatake is preparing for
3.2.4.

Three CVEs were published:

* CVE-2022-26612 Arbitrary file write during untar on Windows
* CVE-2021-37404 Heap buffer overflow in libhdfs native library
* CVE-2021-33036 Apache Hadoop Privilege escalation vulnerability

## Community Health:

Overall community health is good. JIRA and GitHub activities are trending up
while mailing lists are trending down. I see a number of new contributors
joined. One contributor, Tao Li, was invited to become a committer and there
are more contributors being discussed in the private mailing list. The
community is prioritizing security fixes & publishing security vulnerability
announcements, thanks to Masatake, Akira and others.

* general@hadoop.apache.org had a 5% increase in traffic in the past quarter
 (18 emails compared to 17)
* dev@hadoop.apache.org had a 100% decrease in traffic in the past quarter (0
 emails compared to 6)
* user@hadoop.apache.org had a 50% decrease in traffic in the past quarter (26
 emails compared to 51)
* user-zh@hadoop.apache.org had a 100% increase in traffic in the past quarter
 (6 emails compared to 3)
* yarn-dev@hadoop.apache.org had a 26% decrease in traffic in the past quarter
 (349 emails compared to 466)
* yarn-issues@hadoop.apache.org had a 39% decrease in traffic in the past
 quarter (751 emails compared to 1227)
* common-dev@hadoop.apache.org had a 22% decrease in traffic in the past
 quarter
 (475 emails compared to 608)
* common-issues@hadoop.apache.org had a 21% decrease in traffic in the past
 quarter (6753 emails compared to 8517)
* hdfs-dev@hadoop.apache.org had a 11% decrease in traffic in the past quarter
 (507 emails compared to 568)
* hdfs-issues@hadoop.apache.org had a 7% decrease in traffic in the past
 quarter (2819 emails compared to 3004)
* mapreduce-dev@hadoop.apache.org had a 22% decrease in traffic in the past
 quarter (231 emails compared to 294)
* mapreduce-issues@hadoop.apache.org had a 59% decrease in traffic in the past
 quarter (87 emails compared to 208)

* 356 issues opened in JIRA, past quarter (21% increase)
* 267 issues closed in JIRA, past quarter (20% increase)
* 437 commits in the past quarter (-5% change)
* 88 code contributors in the past quarter (4% increase)
* 374 PRs opened on GitHub, past quarter (25% increase)
* 299 PRs closed on GitHub, past quarter (13% increase)

20 Apr 2022 [Wei-Chiu Chuang / Rich]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Hadoop was founded 2008-01-16 (14 years ago)
There are currently 238 committers and 122 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- Sun Chao was added to the PMC on 2022-03-08
- Benjamin Teke was added as committer on 2022-03-24
- András Győri was added as committer on 2022-02-15

## Project Activity:
3.3.2 was released on 2022-03-02.
3.2.3 was released on 2022-03-28.

Masatake Iwasaki volunteered to be RM for 2.10.2. Steve Loughran volunteered
to be RM for 3.3.3.

* Patch attachment via JIRA is now disabled. All contributions should be made
 via GitHub PR. (HADOOP-17798)

## Community Health:

There appears to be a downward trend in the amount of contribution. But
judging from the number of contributors, the number of which remain stable.

* dev@hadoop.apache.org had a 100% decrease in traffic in the past quarter (0
 emails compared to 6)
* mapreduce-issues@hadoop.apache.org had a 59% decrease in traffic in the past
 quarter (87 emails compared to 208)
* user@hadoop.apache.org had a 50% decrease in traffic in the past quarter (26
 emails compared to 51)
* user-zh@hadoop.apache.org had a 100% increase in traffic in the past quarter
 (6 emails compared to 3)
* yarn-dev@hadoop.apache.org had a 26% decrease in traffic in the past quarter
 (349 emails compared to 466)
* yarn-issues@hadoop.apache.org had a 39% decrease in traffic in the past
 quarter (751 emails compared to 1227)
* 273 issues opened in JIRA, past quarter (-27% change)
* 201 issues closed in JIRA, past quarter (-37% change)
* 379 commits in the past quarter (-33% change)
* 99 code contributors in the past quarter (12% increase)
* 260 PRs opened on GitHub, past quarter (-27% change)
* 232 PRs closed on GitHub, past quarter (-26% change)

19 Jan 2022 [Wei-Chiu Chuang / Sander]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

* hadoop-thirdparty is a set of internal artifacts used by
the project to mitigate the impact of our dependency choices on the wider
ecosystem.

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Hadoop was founded 2008-01-16 (14 years ago)
There are currently 236 committers and 121 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- No new PMC members. Last addition was Xiaoqiao He on 2021-05-05.
- Gautham Banasandra was added as committer on 2021-11-04

## Project Activity:
We've not had a new PMC member added for a while. Given the amount of traffic
in the community I am pretty sure there are a number of good candidates out
there that we should nominate. I sent an email to initiate the discussion.

Release 3.3.2: RC0 was cut and dropped due to a number of issues. RC1 is being
prepared. Release 3.2.3 is stalled.

Notable feature development:
* HADOOP-17124 Support LZO using aircompressor
* HADOOP-18055 Async Profiler endpoint for Hadoop daemons
* HADOOP-17979 Interface EtagSource to allow FileStatus subclasses to provide
 etags
* YARN-11025 Implement distributed decommissioning

## Community Health:
The overall mailing list traffic, Jira and Github activities were down, which
is expected given the holiday season.


* dev@hadoop.apache.org had a 100% decrease in traffic in the past quarter (0
 emails compared to 6)
* mapreduce-issues@hadoop.apache.org had a 59% decrease in traffic in the past
 quarter (87 emails compared to 208)
* user@hadoop.apache.org had a 50% decrease in traffic in the past quarter (26
 emails compared to 51)
* user-zh@hadoop.apache.org had a 100% increase in traffic in the past quarter
 (6 emails compared to 3)
* yarn-dev@hadoop.apache.org had a 26% decrease in traffic in the past quarter
 (349 emails compared to 466)
* yarn-issues@hadoop.apache.org had a 39% decrease in traffic in the past
 quarter (751 emails compared to 1227)
* 339 issues opened in JIRA, past quarter (-24% change)
* 289 issues closed in JIRA, past quarter (-11% change)
* 521 commits in the past quarter (-13% change)
* 85 code contributors in the past quarter (-13% change)
* 322 PRs opened on GitHub, past quarter (-9% change)
* 281 PRs closed on GitHub, past quarter (-9% change)

Statistics of the ASF slack channels:
#hdfs: 151 users, up from 138.
#hadoop: 160 users, up from 148.
#yarn: 56 users, up from 52.

20 Oct 2021 [Wei-Chiu Chuang / Sheng]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

* hadoop-thirdparty is a set of internal artifacts used by
the project to mitigate the impact of our dependency choices on the wider
ecosystem.
## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Hadoop was founded 2008-01-16 (14 years ago)
There are currently 235 committers and 121 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- No new PMC members. Last addition was Xiaoqiao He on 2021-05-05.
- Ahmed Hussein was added as committer on 2021-09-24

## Project Activity:
No new releases this quarter, but the community, led by Brahma Reddy Battula,
is gearing up for a 3.2.3 release. Similarly, Chao Sun is leading the 3.3.2
release work.

Notable feature development:
* YARN-10496 ([Umbrella] Support Flexible Auto Queue Creation in Capacity
 Scheduler) was completed in this quarter.
* YARN-8849 (DynoYARN: A simulation and testing infrastructure for YARN
 clusters). This feature was proposed in the Hadoop jira and completed in
 LinkedIn's github repo, so not a Hadoop feature yet.
* YARN-9698 ([Umbrella] Tools to help migration from Fair Scheduler to
 Capacity Scheduler) was completed in this quarter. Follow up work is under
 the umbrella YARN-10843.

* MAPREDUCE-7341 (Add a task-manifest output committer for Azure and GCS) is
 ongoing.
## Community Health:

A number of community members spoke at the ApacheCon Asia held in August:
* Bigtop 3.0: Rerising community driven Hadoop distribution by Kengo Seki,
 Masatake Iwasaki.
* Technical tips for secure Apache Hadoop cluster by Akira Ajisaka, Kei KORI.
* Data Lake accelerator on Hadoop-COS in Tencent Cloud by Li Cheng.

A number of community members spoke at the ApacheCon@Home held in September:
* YARN Resource Management and Dynamic Max by Fang Liu, Fengguang Tian,
 Prashant Golash, Hanxiong Zhang, Shuyi Zhang
* Uber HDFS Unit Storage Cost 10x Deduction by Jeffrey Zhong, Jing Zhao, Leon
 Gao
* Scaling the Namenode - Lessons learnt by Dinesh Chitlangia
* How Uber achieved millions of savings by managing disk IO across HDFS
 cluster by Leon Gao, Ekanth Sethuramalingam
* Containing an Elephant: How we moved Hadoop/HBase into Kubernetes and Public
 Cloud by Dhiraj Hegde

I have been tracking the following metrics over the past 5 quarters and they
have been steadily trending up. This is the first quarter we had more than a
hundred code contributors! The number of commits is dwindling because we
maintain only three branches now.

* 406 issues opened in JIRA, past quarter (-16% change)
* 287 issues closed in JIRA, past quarter (-25% change)
* 551 commits in the past quarter (-13% change)
* 101 code contributors in the past quarter (16% increase)
* 323 PRs opened on GitHub, past quarter (-3% change)
* 273 PRs closed on GitHub, past quarter (-9%
change)

Statistics of the ASF slack channels:
#hdfs: 138 users, up from 132.
#hadoop: 148 users, up from 142.
#yarn: 52 users, up from 49.

21 Jul 2021 [Wei-Chiu Chuang / Sander]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

* hadoop-thirdparty is a set of internal artifacts used by
the project to mitigate the impact of our dependency choices on the wider
ecosystem.

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Hadoop was founded 2008-01-16 (13 years ago)
There are currently 234 committers and 121 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- Xiaoqiao He was added to the PMC on 2021-05-05
- Fengnan Li was added as committer on 2021-06-23
- Gergely Pollák was added as committer on 2021-05-26
- Qi Zhu was added as committer on 2021-05-14

## Project Activity:
We had one release, Hadoop 3.3.1, which was released on 2021-06-13. In
preparation of the release, we also made two releases of hadoop-thirdparty.
- hadoop-thirdparty-1.1.1 was released on 2021-06-01.
- hadoop-thirdparty-1.1.0 was released on 2021-05-18.

In parallel, we declared the EOL of the 3.1 release line. Currently, we
maintain only three release lines: 3.3, 3.2 and 2.10.

Notable feature development:
completed in the quarter
- HADOOP-16829 Über-jira: S3A Hadoop 3.3.1 features.
- HDFS-15759 EC: Verify EC reconstruction correctness on DataNode
- HDFS-13916 Distcp SnapshotDiff to support WebHDFS
- HDFS-15790 Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
- Gautham added CI for several OSes: CentOS 7, CentOS 8 and Debian 10.

Ongoing development
- MAPREDUCE-7341 Add a task-manifest output committer for Azure and GCS
- HDFS-15982 Deleted data using HTTP API should be saved to the trash
- HDFS-14703 NameNode Fine-Grained Locking via Metadata Partitioning
- HADOOP-11890 Uber-JIRA: Hadoop should support IPv6

## Community Health:
Looking at the JIRA and GitHub statistics, the project is a little quiet. We
have a number of outstanding PRs (255/279=91%) unresolved in the quarter, but
we managed to close slightly more PRs than before. Overall, the activity is
around the same ball park since Ozone went TLP.

- 413 issues opened in JIRA, past quarter (-29% change)
- 342 issues closed in JIRA, past quarter (-27% change)
- 571 commits in the past quarter (-29% change)
- 89 code contributors in the past quarter (-10% change)
- 279 PRs opened on GitHub, past quarter (-9% change)
- 255 PRs closed on GitHub, past quarter (2% increase)

Statistics of the ASF slack channels: I'm seeing more users and more
activities in the slack channels, which is a good sign.
#hdfs: 132 users
#hadoop: 142 users
#yarn: 49 users

Notable mailing list statistics:
- common-dev@hadoop.apache.org had a 30% increase in traffic in the past
 quarter (807 emails compared to 617)
- dev@hadoop.apache.org had a 266% increase in traffic in the past quarter (44
 emails compared to 12)
- general@hadoop.apache.org had a 46% decrease in traffic in the past quarter
 (6 emails compared to 11)
- mapreduce-dev@hadoop.apache.org had a 43% increase in traffic in the past
 quarter (405 emails compared to 282)
- user@hadoop.apache.org had a 31% increase in traffic in the past quarter (38
 emails compared to 29)
- yarn-issues@hadoop.apache.org had a 53% decrease in traffic in the past
 quarter (1497 emails compared to 3144)

21 Apr 2021 [Wei-Chiu Chuang / Craig]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

## Issues:
The security vulnerability handling is becoming a hot potato. There is an
increasing attention to vulnerabilities as well as updating vulnerable third
party dependencies. I started a thread to discuss ways to expedite resolution.

GitHub raised 28 alerts as of today, most of them Javascript packages used by
YARN UI. But we lack volunteers working to update these packages.

The AWS EMR team is interested in knowing/collaborating more with the Apache
Hadoop project on the vulnerabilities announced. Obviously, without a
committer in the project prevent them from knowing/participating in addressing
these vulnerabilities. Meanwhile, AWS EMR is one of the largest commercial
providers of Hadoop, it would be irresponsible for our users if EMR can't take
the appropriate actions. Can/should we find a way to include EMR (as well as
other cloud providers) in the discussion of vulnerabilities?

## Membership Data:
Apache Hadoop was founded 2008-01-16 (13 years ago)
There are currently 230 committers and 120 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- Szilard Nemeth was added to the PMC on 2021-04-02
- Jinglun was added as committer on 2021-03-27
- Mukund Thakur was added as committer on 2021-02-04

## Project Activity:

Hadoop 3.2.3 was officially released this quarter (01/09/2021).
Hadoop 3.3.1 release is being discussed/planned.

Feature development:
(completed)
- HDFS-15714 HDFS Provided Storage Read/Write Mount Support On-the-fly - work
  started this quarter and resolved in early April. Release: 3.4.0
- HADOOP-16830 Add Public IOStatistics API - completed this Jan. Release:
  3.3.1
- HADOOP-16492 Support HuaweiCloud Object Storage as a Hadoop Backend File
  System - this work started Aug'19 and finally completed this Jan. Release:
  3.4.0
- HADOOP-16524 Automatic keystore reloading for HttpServer2. Release: 3.4.0
  and 3.3.1.

(ongoing)
- HDFS-15714 HDFS Provided Storage Read/Write Mount Support On-the-fly
- HDFS-15747 RBF: Rename across sub-namespaces. -- this one is near
  completion.
- YARN-10370 [Umbrella] Reduce the feature gap between FS Placement Rules and
  CS Queue Mapping rules -- this one is done, with remaining work moving to
  "Part II"
- YARN-10534 Enable runC container transformations.
- HADOOP-16829 Über-jira: S3A Hadoop 3.3.1 features
- MAPREDUCE-6749 MR AM should reuse containers for Map/Reduce Tasks -- we will
  be creating a branch for this development.
- HADOOP-17474 Optimise abfs incremental listings -- this work started this
  quarter.
- YARN-10496
[Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler -- this
 work started from last quarter.

## Community Health:

Activity is picking up again after the holiday season.
359 issues opened in JIRA, past quarter (4% increase)
318 issues closed in JIRA, past quarter (21% increase)

778 commits in the past quarter (24% increase)
99 code contributors in the past quarter (26% increase)

266 PRs opened on GitHub, past quarter (13% increase)
214 PRs closed on GitHub, past quarter (5% increase)

20 Jan 2021 [Wei-Chiu Chuang / Roy]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

## Issues:
The Ozone sub project completed split from the Hadoop project. The transition
went well.

## Membership Data:
Apache Hadoop was founded 2008-01-15 (13 years ago)
There are currently 228 committers and 119 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- Eric Badger was added to the PMC on 2020-11-17
- Takanobu Asanuma was added to the PMC on 2020-11-08
- No new committers. Last addition was Lisheng Sun on 2020-10-01.

## Project Activity:
No new release was announced in this quarter. However, several RCs for 3.2.2
was cut and voted on several times in this quarter. 3.2.2 later passed vote in
January 2021.

One CVE was announced:

CVE-2018-11764 Apache Hadoop Privilege escalation in web endpoint

Web endpoint authentication check is broken. Authenticated users may
impersonate any user even if no proxy user is configured.

Versions affected: 3.0.0-alpha4, 3.0.0-beta1, 3.0.0 Fixed versions: 3.0.1
Impact: privilege escalation Reporter: Daryn Sharp Reported Date: 2018/03/17
Issue Announced: 2020/10/21

Hadoop Common: The object store work is keeping people busy.
* There's now a storage connector for HuaweiCloud Object Storage, so Hadoop
 and other applications using the FileSystem APIs can work with data stored
 in Huawei's cloud.
* A new IOStatistics API has gone in to allow applications to query input
 classes (filesystems, streams, iterators) for IO performance details. This
 should allow tests and applications to identify performance issues during
 profiling and hopefully production
* AWS S3 is now consistent. This enables the maintainers of the S3A connector
 to remove all the S3Guard code, which relied on DynamoDB for a consistent
 view of the data. They are looking forward to this.

* The guava update is becoming a major friction for downstream applications to
 adopt new Hadoop releases. The community is working to shade guava as the
 solution. (HADOOP-16924)
* The native compression libraries for Snappy and LZ4 are now shipped with
 Hadoop binary, no longer requiring manual installation of the native
 libraries on the host machines, making them easier to use. (HADOOP-17125 and
 HADOOP-17292)

HDFS:
* A new encryption codec "SM4/CTR/NoPadding" was added (HDFS-15098).
* HDFS Router Based Federation received a number of new improvements,
 including balancer (HDFS-15294), isolation (HDFS-14090). Rename support is
 being worked on, starting this quarter. (HDFS-15747)
* The new View FS implementation is near completion. (HDFS-15289)
* The community is working to add dynamic mount support for both read and
 write for HDFS Provided Storage. (HDFS-15714)
* Dynamic disk-level tiering (HDFS-15547) continued from last quarter.

YARN:
* The consolidation of FairScheduler and CapacityScheduler started in Q3 and
 is near completion. (YARN-10370)
* Capacity scheduler is being enhanced to support auto queue creation.
 (YARN-10496)

## Community Health:
Overall, the community participation appears relatively healthy despite
Ozone's recent move to TLP. We had a steady supply of new contributors and new
features this quarter.

Erasure Coding appears to get traction in the last two quarters. Numerous EC
bug fixes and improvements were raised this quarter. It looks like Hadoop 3 is
getting adopted.

Code development and mailing list traffic were both down significantly quarter
over quarter, possibly due to the holiday season. Traffic in ozone-dev and
ozone-issues mailing lists were down because of the Ozone TLP.

dev@hadoop.apache.org had a 75% decrease in traffic in the past quarter (10
emails compared to 39) general@hadoop.apache.org had a 67% decrease in traffic
in the past quarter (15 emails compared to 45)
mapreduce-issues@hadoop.apache.org had a 39% increase in traffic in the past
quarter (237 emails compared to 170) ozone-dev@hadoop.apache.org had a 93%
decrease in traffic in the past quarter (13 emails compared to 174)
ozone-issues@hadoop.apache.org had a 80% decrease in traffic in the past
quarter (1180 emails compared to 5804) user@hadoop.apache.org had a 30%
decrease in traffic in the past quarter (56 emails compared to 80)
user-zh@hadoop.apache.org had a 45% decrease in traffic in the past quarter (5
emails compared to 9) 322 issues opened in JIRA, past quarter (-31% decrease)
242 issues closed in JIRA, past quarter (-34% decrease) 591 commits in the
past quarter (-3% decrease) 82 code contributors in the past quarter (-16%
decrease) 214 PRs opened on GitHub, past quarter (-13% decrease) 185 PRs
closed on GitHub, past quarter (-20% decrease)

In addition to mailing lists, JIRA and GitHub PR, we are seeing more traffic
in the official ASF slack hdfs (113 users), hadoop (119 users) and yarn (39
users) channels over the last quarter. They are being used to communicate
community online meetup events and troubleshooting issues.

21 Oct 2020

Change the Apache Hadoop Project Chair

 WHEREAS, the Board of Directors heretofore appointed Vinod Kumar
 Vavilapalli (vinodkv) to the office of Vice President, Apache Hadoop,
 and

 WHEREAS, the Board of Directors is in receipt of the resignation of
 Vinod Kumar Vavilapalli from the office of Vice President, Apache
 Hadoop, and

 WHEREAS, the Project Management Committee of the Apache Hadoop project
 has chosen by vote to recommend Wei-Chiu Chuang (weichiu) as the
 successor to the post;

 NOW, THEREFORE, BE IT RESOLVED, that Vinod Kumar Vavilapalli is
 relieved and discharged from the duties and responsibilities of the
 office of Vice President, Apache Hadoop, and

 BE IT FURTHER RESOLVED, that Wei-Chiu Chuang be and hereby is appointed
 to the office of Vice President, Apache Hadoop, to serve in accordance
 with and subject to the direction of the Board of Directors and the
 Bylaws of the Foundation until death, resignation, retirement, removal
 or disqualification, or until a successor is appointed.

 Special Order 7F, Change the Apache Hadoop Project Chair, was
 approved by Unanimous Vote of the directors present.

21 Oct 2020 [Vinod Kumar Vavilapalli / Bertrand]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

## Issues:
- The Hadoop community passed a proposal to spin-off the Ozone project (a
 Hadoop subproject) to a Top Level Project.
- Vinod stepped down from Chair. Wei-Chiu Chuang is elected as the new Chair.

## Membership Data:
Apache Hadoop was founded 2008-01-15 (13 years ago)
There are currently 228 committers and 117 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- Ayush Saxena was added to the PMC on 2020-07-21
- Adam Antal was added as committer on 2020-07-21
- Andras Bokor was added as committer on 2020-09-23
- Hui Fei was added as committer on 2020-09-25
- Jim Brennan was added as committer on 2020-08-04
- Peter Bacsko was added as committer on 2020-07-27
- István Fajth was added as committer on 2020-09-03 (Ozone branch committer)
- Prashant Pogde was added as committer on 2020-09-03 (Ozone branch committer)
- Lisheng Sun was added as committer on 2020-10-01

## Project Activity:
Recent releases:
2.9 release line was declared EOL on 2020-09-07.
3.3.0 was released on 2020-07-14.
3.1.4 was released on 2020-08-03.
2.10.1 was released on 2020-09-21.
3.2.2 is being prepared by Xiaoqiao.

A major milestone was achieved when the Ozone project announced the 1.0.0
release on 2020-09-02.

During the recent ApacheCon@Home, 7 (and probably some more) Hadoop
talks were given by the community members.

## Community Health:

The community is healthy. To highlight, 6 committers and 1 PMC were added in
the Hadoop Core project, and two branch committers were added to the Ozone
project. Release activities have gone up dramatically with four (including
Ozone) releases announced and one being prepared.

15 Jul 2020 [Vinod Kumar Vavilapalli / Roy]

## Description:
The mission of Hadoop is the creation and maintenance of software related to
Distributed computing platform

## Issues:
As Ozone gearing towards the GA release, Marton started a thread to discuss
the plan to make Ozone a TLP. There is a general consensus within the
community to move Ozone out of Hadoop. The proposal is still being discussed,
no actual steps are taken yet. [DISCUSS] making Ozone a separate Apache
project https://s.apache.org/wpc3m

## Membership Data:
Apache Hadoop was founded 2008-01-15 (12 years ago)
There are currently 219 committers and 116 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

Community changes, past quarter:
- Masatake Iwasaki was added to the PMC on 2020-04-16
- Lokesh Jain was added to the PMC on 2020-06-15
- David Mollitor was added as committer on 2020-04-11
- Xiaoqiao He was added as committer on 2020-06-11
- Li Cheng was added as committer on 2020-05-04
- Nilotpal Nandi was added as committer on 2020-04-10
- Siddharth Wagle was added as committer on 2020-06-11
- Vivek Ratnavel Subramanian was added as committer on 2020-04-11
- Yisheng Lien was added as committer on 2020-04-20

Adam Antal and Peter Bacsko are both voted to become committers and both
accepted the invite at the end of the quarter. The karma is yet to be added.

## Project Activity:

Diversity&inclusion has recently received attention. A discussion thread
is happening in the private mailing list
to take actions to make the Hadoop project more inclusive, including removing
offending branch names, source code and etc.

Sammi Chen is the RM for Ozone 0.6 release. Brahma Reddy Battula is
continuing on the Hadoop 3.3.0 release and preparing the initial release
candidate. Since the Submarine project has become its own TLP, the Submarine
code is removed from the Hadoop 3.3.0 release. Gabor started releasing
Hadoop 3.1.4

## Community Health:
The weekly Ozone dev community sync is going strong. Recently, a separate,
Asia-Pacific time zone friendly sync for the Ozone community is started.

The new user-zh@ mailing list is not being well utilized in this quarter.
We should promote to make the project more inclusive.

Community Diversity:
Of the new committers added to the project, 4 out of 7 are affiliated with
Cloudera. 4 out of 7 are located in Asia.

15 Apr 2020 [Vinod Kumar Vavilapalli / Patricia]

## Description:
The Apache™ Hadoop® project develops open-source software for reliable,
scalable, distributed computing.

## Issues:
There are no problematic issues requiring board attention at the moment.

## General
- A new mailing list user-zh@hadoop.apache.org was created towards end of Feb
 2020 to foster questions about Apache Hadoop in Chinese for individuals who
 feel more comfortable communicating in Chinese.

## Project Activity:
### Releases
- Apache Hadoop Ozone 0.5.0, the first beta release of Ozone, was announced on
 March 24 2020
- hadoop-thirdparty-1.0.0 was released on 2020-03-18. hadoop-thirdparty is a
 set of internal artifacts used by the project to mitigate the impact of our
 dependency version updates on the wider ecosystem.

### Other release related news
- Apache Hadoop 3.3.0 release originally planned for mid March 2020 is running
 late
- Apache Hadoop 2.8.x release line is marked as end-of-life

## Membership Data:
Apache Hadoop was founded 2008-01-16 (12 years ago)
There are currently 215 committers and 114 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

### PMC changes, past quarter:
- Currently 114 PMC members
- New PMC members since last report: 1

- Zhankun Tang was added to the PMC on 2020-03-24

### Committer base changes, past quarter:
- Currently 215 committers
- New committers since last report: 6

- Nilotpal Nandi was added as committer on 2020-04-11
- David Mollitor was added as committer on 2020-04-11
- Vivek Ratnavel Subramanian was added as committer on 2020-04-11
- Wilfred Spiegelenburg was added as committer on 2020-03-24
- Siyao Meng was added as committer on 2020-03-24
- Aravindan Vijayan was added as committer on 2020-02-03

## Community Health:
### JIRA Activity
Slightly down from last quarter
- 1074 JIRA tickets created since the last board report [ project in (YARN,
 SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND createdDate >= 2020-01-15
 ]
- 841 JIRA tickets resolved since the last board report [ project in (YARN,
 SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND resolutiondate >=
 2020-01-15 ]

### Mailing list subscriptions & activity:
Mailing list activity on existing JIRA related lists (issues, commits)
continues to go down across the board - presumably due to lower release
activities. The dev lists are a mixed bag with common-dev seeing more activity.
The new list user-zh obviously has net positive activity.

15 Jan 2020 [Vinod Kumar Vavilapalli / Danny]

## Description:
The Apache™ Hadoop® project develops open-source software for reliable,
scalable, distributed computing.

## Issues:
There are no problematic issues requiring board attention at the moment.

## General
- Submarine as a TLP was approved by the board at the previous board meeting.
 Development and releases of the Submarine module inside Apache Hadoop have
 since moved over to the new TLP project.
- Apache Hadoop 3.3.0 release is being planned for mid March 2020

## Project Activity:
### Releases
Apache Hadoop 2.10.0 was released on 2019-10-29
Apache Hadoop 3.1.3 was released on 2019-10-21
Ozone 0.4.1-alpha was released on 2019-10-13

## Membership Data
Apache Hadoop was founded 2008-01-16 (12 years ago)
There are currently 209 committers and 113 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

### PMC changes, past quarter:
- Currently 113 PMC members.
- New PMC members since last report: 5

- Chen Liang was added to the PMC on 2019-12-16
- Giovanni Matteo Fumarola was added to the PMC on 2019-12-24
- Nanda kumar was added to the PMC on 2019-10-17
- Shashikant Banerjee was added to the PMC on 2019-12-16
- Surendra Singh Lilhore was added to the PMC on 2020-01-06

### Committer base changes, past quarter:
- Currently 209 committers.
- New committers since last report: 4 (1 new branch committer)

- Attila Doroszlai was added as committer on 2019-12-17
- Prabhu Joseph was added as committer on 2019-10-23
- Stephen O'Donnell was added as a branch committer on 2019-11-08
 (HDDS-1880-Decom branch)
- Chao Sun (previously a branch committer on HDFS-12943, Standby reads) was
 added as a committer on 2019-12-24

## Community Health:
### JIRA Activity
Slightly down from last quarter
- 1249 JIRA tickets created since the last board report [ project in (YARN,
 SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND createdDate >= 2019-10-14 ]
- 977 JIRA tickets resolved since the last board report [ project in (YARN,
 SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND resolutiondate >=
 2019-10-14 ]

### Mailing list subscriptions & activity:
Mailing list activity down across the board on previously existing lists.
Submarine sub-module spinning out to a TLP should be a contributor. Also,
new lists created for Ozone sub-module should also contribute to the
down-activity on on the issue lists.

16 Oct 2019 [Vinod Kumar Vavilapalli / Dave]

## Description
The Apache™ Hadoop® project develops open-source software for reliable,
scalable, distributed computing.

## Issues
There are no problematic issues requiring board attention ATM.

The community voted to spin off the Submarine module to a separate top-level
Apache project and is pursuing board's approval.

## General
# A significant Hadoop Community Meetup @ Beijing happened in Aug 2019.
  Coverage here:
  https://blogs.apache.org/hadoop/entry/hadoop-community-meetup-beijing-aug
# Branch EOL discussion finally happened and resolved. Release lines 2.6, 2.7,
  3.0 are marked EOL
# Ozone moved to a separate source tree in addition to stand alone releases.
  Initial thoughts are exchanged if it'd also go the Submarine way of a TLP
# 2.10 release process is underway
# CVE Announcements: CVE-2018-11768 was announced on Oct 4 2019: HDFS FSImage
  Corruption

# Comment on previous report
  > da: I don't understand "Branch Committer" are these people Committers or
  not? AFAIK we don't recognise any other role.
  vinodkv: They enjoy all the rights of a committer but their voting-in is
  expedited on a specific speculative branch. Please see the corresponding
  section in hadoop bylaws here: http://hadoop.apache.org/bylaws.html. Happy to
  add more pointers if need be.

## Membership Data:
Apache Hadoop was founded 2008-01-16 (12 years ago)
There are currently 206 committers and 108 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:4.

### PMC changes, past quarter:
- Currently 108 PMC members.
- New PMC members since last report: 4

- Bharat Viswanadham was added to the PMC on 2019-09-26
- Marton Elek was added to the PMC on 2019-07-29
- Hanisha Koneru was added to the PMC on 2019-09-26
- Jonathan Hung was added to the PMC on 2019-10-04

### Committer base changes, past quarter:
- Currently 206 committers.
- New committers since last report: 3

- Dinesh Chitlangia was added as committer on 2019-10-05
- Liu Xun was added as committer on 2019-10-05
- Zac Zhou was added as committer on 2019-10-09

## Project Activity:
### Releases
 - Apache Hadoop 3.2.1 was released on 2019-09-22.
 - Apache Hadoop 3.1.3 release is getting wrapped up after the vote passed
 - Apache Hadoop Ozone 0.4.1 Alpha is being put to vote

## Community Health:

### JIRA Activity
Significantly up compared to last quarter
- 1341 JIRA tickets created   since the last board report [ project in (YARN,
  SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND createdDate >= 2019-07-15
  ]
- 1136 JIRA tickets resolved since the last board report [ project in (YARN,
  SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND resolutiondate >=
  2019-07-15 ]

### Github Activity
Significantly up, as more and more of the community is moving patch reviews
from JIRA over to Github
 - 569 PRs opened on GitHub, past quarter (60% increase)
 - 606 PRs closed on GitHub, past quarter (108% increase)

### Mailing list subscriptions & activity:
Mailing list traffic is significantly back up, the last quarter being down
slightly was likely a one-off.

17 Jul 2019 [Vinod Kumar Vavilapalli / Joan]

The Apache™ Hadoop® project develops open-source software for reliable,
scalable, distributed computing.

GENERAL
- Release 3.0.4 / branch-3.0 EOL discussion happened in May 2019 - not fully
 concluded
- A full day Hadoop Community Meetup happened at Cloudera Palo Alto on June
 26: https://www.meetup.com/Hadoop-Contributors/events/262055924/

RELEASES
- Apache Hadoop Ozone 0.4.0-alpha was released on May 7 2019
- Apache Hadoop Submarine 0.2.0 was released on Jul 2 2019

COMMUNITY

## PMC changes:

- New PMC Members since last report: 3
- Currently 104 PMC members.

- Mukul Kumar Singh was added to the PMC on Mon May 13 2019
- Billie Rinaldi was added to the PMC on Tue May 14 2019
- Aaron Fabbri was added to the PMC on Tue Jun 18 2019

## Committer base changes:

- New committers since last report: 7
- Currently 203 committers.

- Thomas Marquardt was added as a committer on Wed June 19 2019 (was
 previously a branch committer for ABFS connector work HADOOP-15407 since Jun
 2018).
- Gabor Bota was added as a committer on Tue Jun 25 2019
- Daniel Zhou was added as a committer on Wed Jun 26 2019
- Szilard Nemeth was added as a committer on Sat Jun 29 2019
- Abhishek Modi was added as a committer on Sat Jul 06 2019
- Tao Yang was added as a committer on Tue Jul 09 2019
- Ayush Saxena was added as a committer on Tue July 11 2019 (was previously a
 branch committer to RBF HDFS-13891 branch since Mar 2019).

## JIRA Activity
- 999 JIRA tickets created   since the last board report [ project in (YARN,
 SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND createdDate >= 2019-04-15
 ]
- 720 JIRA tickets resolved since the last board report [ project in (YARN,
 SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND resolutiondate >=
 2019-04-15 ]

## Mailing list subscriptions & activity:
Slightly down (on both subscriber count as well as emails sent)

17 Apr 2019 [Vinod Kumar Vavilapalli / Rich]

The Apache™ Hadoop® project develops open-source software for reliable,
scalable, distributed computing.

GENERAL
- The community voted by Feb 10 2019 and created a new submodule named
 "hadoop-submarine" for enabling deep learning training & serving jobs on
  Hadoop. It follows an independent release cycle - a process already
  established for Ozone.
- Branch-2.7 EOL is being discussed
- CVE announcements: CVE-2018-1296, CVE-2018-11767

RELEASES

- Apache Hadoop 3.1.2 was released on Mon Feb 04 2019
- Apache Hadoop 3.2.0 was released on Tue Jan 15 2019
- Apache Hadoop Ozone Hadoop Ozone 0.4.0 is being voted

COMMUNITY

## PMC changes:

- No new PMC additions in the last three months
- Currently 101 PMC members.

## Committer base changes:

- Currently 198 committers.
- New committers since last report: 5

- Chandni Singh was added as a committer on Wed Mar 20 2019
- Ayush Saxena was added as a *branch committer* for HDFS-13891 branch on Wed
  Mar 13 2019
- Zhankun Tang was added as a committer on Tue Mar 12 2019
- Eric Badger was added as a committer on Tue Mar 05 2019
- Lokesh Jain was added as a committer on Thu Feb 21 2019

## JIRA Activity
(Previous reports were based on the reporter tool and were buggy. Now using
 the right keys - YARN, SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE)
- 534 JIRA tickets created since the last board meeting
- 878 JIRA tickets resolved since the last board meeting

## Mailing list subscriptions & activity:
Steady

16 Jan 2019 [Vinod Kumar Vavilapalli / Ted]

The Apache™ Hadoop® project develops open-source software for reliable,
scalable, distributed computing.

GENERAL

- Cloudera and Hortonworks who employ committers and PMC of this project have
 merged. This merger is going to reduce community diversity w.r.t
 contributors/reviewers/committers/PMC members.
- The latest two Hadoop releases are done/being done by new release-managers.
 This is helping spread around release responsibilities further.

- Apache Hadoop git repository moved from git-wip-us server to
 gitbox.apache.org per the larger ASF INFRA changes
- Community is making progress on Java 11. With faster Java updates from
 Oracle, Community is looking at how to track better this moving target of
 Java support.

RELEASES

- Apache Hadoop 2.9.2 was released on Sun Nov 18 2018
- Apache Hadoop Ozone 0.3.0-alpha was released onThu Nov 22 2018

- Apache Hadoop 3.2.0 release is being voted, some security issues held up the
 release.

COMMUNITY

## PMC changes:

- Currently 101 PMC members.
- New PMC members since last report: 3
 - Haibo Chen was added to the PMC on Mon Nov 19 2018
 - Iñigo Goiri was added to the PMC on Thu Dec 13 2018
 - Yiqun Lin was added to the PMC on Mon Nov 19 2018

## Committer base changes:

- Currently 193 committers.
- New committers since last report: 3
 - Shashikant Banerjee was added as a committer on Thu Oct 11 2018
 - Boton Huang was added as a committer (previously a branch committer) on
   Thu Oct 16 2018
 - Suma Shivaprasad was added as a committer on Mon Nov 19 2018

## JIRA Activity
- 1543 JIRA tickets created in the last 3 months
- 1209 JIRA tickets closed/resolved in the last 3 months

## Mailing list subscriptions & activity:
Steady

17 Oct 2018 [Vinod Kumar Vavilapalli / Roman]

Apache Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity computers.

GENERAL
- Community has moved to a newly built website by September 1st week.
- Community events of note
 -- Hadoop Contributors meetup happened on Tue Sep 25 2018, hosted by Oath
    in Sunnyvale, CA with dial-in for remote attendees.
 -- Bay Area Hadoop User Group meetup happened on Wed Aug 29 2018, hosted by
    Hortonworks in Santa Clara

RELEASES

- Apache Hadoop 2.8.5 was released on Sat Sep 15 2018
- Apache Hadoop 3.1.1 was released on Tue Aug 08 2018
- Apache Hadoop 3.2.0 release is being worked on, closer to a RC.
- Apache Hadoop Ozone 0.2.1-alpha was released on Mon Oct 01 2018
 -- Ozone is a newer module in the project that is getting its own
    independently versioned release artifacts.

COMMUNITY

## PMC changes:

- Currently 98 PMC members.
- New PMC members: 4
 - Bibin Chundatt was added to the PMC on Mon Aug 13 2018
 - Vrushali Channapattan was added to the PMC on Sun Jul 29 2018
 - Weiwei Yang was added to the PMC on Mon Aug 13 2018
 - Yufei Gu was added to the PMC on Mon Jul 30 2018

## Committer base changes:

- Currently 191 committers.
- New commmitters:
 - Ajay Kumar was added as a committer on Thu Sep 13 2018
 - Marton Elek was added as a committer on Mon Jul 30 2018
 - Takanobu Asanuma was added as a committer on Tue Jul 24 2018

- Branch committers
 - Kasper Janssens was added as a branch committer (HDFS-12090) on Mon Jul
   23 2018 - vote was in Jan 2018.

## JIRA Activity
- 1846 JIRA tickets created in the last 3 months
- 1557 JIRA tickets closed/resolved in the last 3 months

## Mailing list subscriptions & activity:
Steady

18 Jul 2018 [Vinod Kumar Vavilapalli / Ted]

Apache Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity computers.

UPDATES

Maintaining 5+ active release branches is proving to be a pain in general,
more so with handling of security vulnerabilities.

RELEASES

- 2.7.6 was released on Sun Apr 15 2018
- 3.0.2 was released on Sun Apr 22 2018
- 2.9.1 was released on Wed May 09 2018
- 2.8.4 was released on Mon May 14 2018
- 3.0.3 was released on Wed May 30 2018

COMMUNITY

## PMC changes:

- Currently 94 PMC members.
- New PMC members:
  - Sammi Chen was added to the PMC on Thu Jun 07 2018
  - Sean Mackrory was added to the PMC on Wed Jun 13 2018

## Committer base changes:

- Currently 187 committers.
- New commmitters:
  - Jonathan Hung was added as a committer on May 5 2018 [ Doesn't show up on
    https://reporter.apache.org ]
  - Shane Kumpf was added as a committer on Mon May 14 2018
  - Nanda Kumar was added as a committer on Wed June 20 2018 [ Doesn't show
    up on https://reporter.apache.org ]
  - Ewan Higgs was added as a committer on Wed June 19 2018 [ Doesn't show up
    on https://reporter.apache.org ]
  - Giovanni Matteo Fumarola was added as a committer on Fri June 22 2018 [
    Doesn't show up on https://reporter.apache.org ]

- New branch commmitters:
  - Duo Zhang was added as a branch committer for work on Non-blocking HDFS
    Access for H3 (HDFS-13572) on Wed Jun 06 2018 [ Doesn't show up on
    https://reporter.apache.org ]
  - Esfandiar Manii was added as a branch committer for ABFS connector work
    (HADOOP-15407) on Fri Jun 08 2018
  - Thomas Marqardt was added as a branch committer for ABFS connector work
    (HADOOP-15407) on Fri Jun 08 2018
  - Botong Huang was added as a branch committer on Hadoop + Windows Server
    work (HADOOP-15461) on Mon Jun 25 2018. Already a branch committer on
    another branch YARN-7402.

## Mailing list activity:
Steady

SECURITY

Announced CVEs
- CVE-2016-6811 on April 30 2018: Apache Hadoop Privilege escalation
 vulnerability (Issue fixed long time ago, but CVE announcement slipped
 through the cracks)

16 May 2018

Change the Apache Hadoop Project Chair

 WHEREAS, the Board of Directors heretofore appointed Chris Douglas
 (cdouglas) to the office of Vice President, Apache Hadoop, and

 WHEREAS, the Board of Directors is in receipt of the resignation of
 Chris Douglas from the office of Vice President, Apache Hadoop, and

 WHEREAS, the Project Management Committee of the Apache Hadoop project
 has chosen by vote to recommend Vinod Kumar Vavilapalli (vinodkv) as the
 successor to the post;

 NOW, THEREFORE, BE IT RESOLVED, that Chris Douglas is relieved and
 discharged from the duties and responsibilities of the office of Vice
 President, Apache Hadoop, and

 BE IT FURTHER RESOLVED, that Vinod Kumar Vavilapalli be and hereby is
 appointed to the office of Vice President, Apache Hadoop, to serve in
 accordance with and subject to the direction of the Board of Directors
 and the Bylaws of the Foundation until death, resignation, retirement,
 removal or disqualification, or until a successor is appointed.

 Special Order 7C, Change the Apache Hadoop Project Chair, was
 approved by Unanimous Vote of the directors present.

18 Apr 2018 [Christopher Douglas / Shane]

Apache Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

Hadoop maintains five to seven active release branches. We hope this will be
the peak. That said, the 3.x releases are a healthy balance of stabilization
in 3.0.1 and new features merged and released in 3.1.0.

RELEASES

3.0.1 was released 2018-03-22
3.1.0 was released 2018-04-05

COMMUNITY

(+ PMC Rakesh Radhakrishnan 2018-01-23)
(+ committer Hanisha Koneru 2018-01-10)
(+ committer Mukul Kumar Singh 2018-02-09)
(+ committer Rushabh Shah 2018-04-06)
(+ committer Bharat Viswanadham 2018-04-06)
(+ branch-HDFS-12090 Bert Verslyppe 2018-03-14)
(+ branch-HDFS-12090 Ewan Higgs 2018-01-19)
(+ branch-HDFS-12943 Chao Sun 2018-01-02)
(+ branch-HDFS-12943 Erik Krogen 2018-01-10)
(+ branch-YARN-7402 Botong Huang 2018-01-31)
(+ branch-YARN-7402 Giovanni Matteo Fumarola 2018-01-31)
auth: 183 committers (including branch) and 92 PMC members

17 Jan 2018 [Christopher Douglas / Ted]

Apache Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity computers.

Hadoop 3.0.0 is GA. This not only unlocks new features and development, it also
puts the project on stable footing to continue a steady cadence with less
backporting. We're not there yet, as we currently have four active release
branches to support existing users, but it is a significant milestone.

Several feature branches have merged, or are preparing to merge, for a 3.1.0
release in February.

RELEASES

3.0.0-beta1 was released 2017-10-02
2.8.2 was released 2017-10-23
2.9.0 was released 2017-11-16
2.8.3 was released 2017-12-12
3.0.0 was released 2017-12-12
2.7.5 was released 2017-12-13

COMMUNITY

(+ PMC Brahma Reddy Battula 2017-12-14)
(+ PMC Konstantinos Karanasos 2017-11-26)
(+ committer Billie Rinaldi 2017-10-26)
(+ committer Miklos Szegedi 2017-12-27)
(+ committer Sammi Chen 2017-10-15)
(+ committer Virajith Jalaparti 2017-12-29)
(+ committer Inigo Goiri 2017-10-19)
auth: 176 committers (including branch) and 91 PMC members

18 Oct 2017 [Christopher Douglas / Jim]

Apache Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

The 3.x series entered beta as the community merged the final set of
features and is focused on stabilization. The 2.x series received a bugfix
release in 2.7.4 as the community prepares for the 2.8.2 and 2.9 releases.
Details of the release roadmap [1] and blockers/status for individual
releases [2,3] are tracked in wiki.

[1] https://cwiki.apache.org/confluence/display/HADOOP/Roadmap
[2] https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.9+Release
[3] https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

RELEASES

2.7.4 was released 2017-08-03
3.0.0-beta1 was released 2017-10-02

COMMUNITY

(+ PMC Anu Engineer 2017-08-16)
(+ PMC Daniel Templeton 2017-08-24)
(+ PMC Eric Payne 2017-07-24)
(+ PMC John Zhuge 2017-08-29)
(+ PMC Kai Zheng 2017-08-07)
(+ PMC Mingliang Liu 2017-09-21)
(+ PMC Naganarasimha 2017-08-10)
(+ PMC Ray Chiang 2017-08-24)
(+ PMC Sunil G 2017-07-28)
(+ PMC Varun Saxena 2017-07-21)
(+ PMC Wei-Chiu Chuang 2017-08-29)
(+ PMC Xiao Chen 2017-08-29)
(+ committer Aaron Fabbri 2017-09-06)
(+ committer Chen Liang 2017-09-05)
(+ committer Sean Busbey 2017-09-14)
(+ committer Surendra Singh Lilhore 2017-09-18)
(+ committer Wei Yan 2017-08-23)
(+ committer Weiwei Yang 2017-09-25)
(+ branch-HDFS-7240 Mukul Kumar Singh 2017-09-21)
(+ branch-HDFS-7240 Nanda kumar 2017-09-20)
(+ branch-HDFS-7240 Yuanbo Liu 2017-09-20)
(+ branch-YARN-1011 Miklos Szegedi 2017-09-29)
auth: 175 committers (including branch) and 89 PMC members

19 Jul 2017 [Christopher Douglas / Ted]

Apache Hadoop is a set of related tools and frameworks for creating
and managing distributed applications running on clusters of commodity
computers.

The community is completing the 3.x-alpha series of releases from
trunk, moving to a stabilizing, -beta series. The 2.7.4 release series
will receive a bugfix release, likely in the next few weeks. The 2.8
(and 2.9) release branches are also likely to be released this year
while 3.x enters GA.

RELEASES

3.0.0-alpha3 was released 2017-05-25
3.0.0-alpha4 was released 2017-07-06

COMMUNITY
(+ PMC Subru Krishnan 2017-07-04)
(+ committer Chris Trezzo 2017-04-24)
(+ committer Vrushali Channapattan 2017-04-24)
(+ committer Yufei Gu 2017-05-19)
(+ committer Nathan Roberts 2017-05-22)
(+ committer James Clampffer 2017-05-31)
(+ committer Sean Mackrory 2017-06-16)
(+ committer Manoj Govindassamy 2017-07-03)
auth: 169 committers (including branch) and 77 PMC members.

SECURITY

CVE-2017-7669: Apache Hadoop privilege escalation

19 Apr 2017 [Christopher Douglas / Shane]

Apache Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

The project cut a release from the long-lived 2.8 branch, iterating on its
stable series of releases. Concurrently, it is stabilizing and adding new
features to a 3.x series, anticipating one more alpha release before
stabilizing in beta. Activity in both HDFS and YARN proceeds both in feature
branches (e.g., object storage, "native task" support) and in a steady
stream of fixes and smaller features in mainline branches.

RELEASES

2.8.0 was released 2017-03-22
3.0.0-alpha2 was released 2017-01-24

COMMUNITY

(+ PMC Ravi Prakash 2017-02-07)
(+ committer John Zhuge 2017-02-24)
(+ committer Yiqun Lin 2017-01-14)
(+ committer Haibo Chen 2017-04-13)
(+ branch-HADOOP-13335 Sean Mackrory 2017-02-13)
(+ branch-HDFS-7240 Chen Liang 2017-03-31)
(+ branch-HDFS-7240 Weiwei Yang 2017-04-13)
auth: 164 committers (including branch) and 76 PMC members

18 Jan 2017 [Christopher Douglas / Chris]

Apache Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

The community is working through criteria for its 3.x and 2.x series,
particularly w.r.t. compatibility e.g., [1]. Progress on 3.0.0-alpha2 [2,3]
and a 2.8 [4] will likely produce RCs soon.

[1] https://issues.apache.org/jira/browse/HDFS-11096
[2] https://s.apache.org/zBhP
[3] https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0.0+release
[4] https://s.apache.org/smEX

RELEASES

Last release: 2.6.5 2016-10-07

COMMUNITY

(+ PMC Carlo Curino 2016-11-03)
(+ PMC Li Lu 2017-01-10)
(+ PMC Ming Ma 2016-11-03)
(+ PMC Rohith Sharma K S 2016-11-17)
(+ PMC Varun Vasudev 2016-10-20)
(+ PMC Zhe Zhang 2016-11-03)
(+ committer Bibin Chundatt 2016-12-12)
(+ committer Konstantinos Karanasos 2017-01-12)
(+ committer Rakesh Radhakrishnan 2016-12-30)
(+ committer Sidharta Seethana 2016-12-15)
(+ committer Sunil Govind 2016-10-27)
(+ committer Yiqun Lin 2017-01-14)
(+ branch-HDFS-9806 Thomas Demoor 2016-10-24)
(+ branch-YARN-5734 Jonathan Hung 2016-12-13)
(+ branch-YARN-5734 Min Shen 2016-12-13)
(+ branch-YARN-5734 Ye Zhou 2016-12-13)
auth: 161 committers and 75 PMC members

SECURITY

CVE-2016-3086: Apache Hadoop YARN NodeManager vulnerability
CVE-2016-5001: Apache Hadoop Information Disclosure

19 Oct 2016 [Christopher Douglas / Brett]

Apache Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity computers.

Apache Hadoop cut its first release from trunk since 2011. Releases in the 2.x
series will continue to follow stricter compatibility guidelines on a branch.
The 3.0.0-alpha1 release has kicked off several API cleanups, reasoning about
dependencies, and other pains endured to avoid downstream breakage.

RELEASES

2.6.5 was released 2016-10-07
2.7.3 was released 2016-08-24
3.0.0-alpha1 was released 2016-09-02

COMMUNITY

(+ committer Anu Engineer 2016-07-27)
(+ committer Mingliang Liu 2016-08-14)
(+ committer Wei-Chiu Chuang 2016-07-20)
(+ committer Xiao Chen 2016-07-20)
(+ committer Larry McKay 2016-07-20)
(+ branch-HADOOP-13345 Rajesh Balamohan 2016-08-06)
(+ branch-HADOOP-10285 Rakesh Radhakrishnan 2016-09-26)
(+ branch-HADOOP-12756 Mingfei Shi 2016-08-22)
(+ branch-HADOOP-13345 Aaron Fabbri 2016-08-14)
(+ branch-HDFS-9806 Ewan Higgs 2016-09-21)
(+ branch-HDFS-9806 Virajith Jalaparti 2016-09-21)
(+ branch-HDFS-9806 Pieter Reuse 2016-09-21)
(+ branch-YARN-4752 Daniel Templeton 2016-08-14)
(+ branch-YARN-5079 Billie Rinaldi 2016-08-12)
(+ branch-YARN-5079 Gour K Saha 2016-08-12)
auth:154 committers and 69 PMC members

TRADEMARKS

The project updated its logo to include "Apache" [1]. We have an outstanding
request to trademarks@ to register our yellow elephant logo with the USPTO.

[1] https://issues.apache.org/jira/browse/HADOOP-13184

There was a discussion concerning who is responsible for trademark enforcement, and a general consensus that it isn't currently working. Shane and Chris Douglas to continue this discussion offline.

@shane report back on resolving the hadoop trademark enforcement issues.

20 Jul 2016 [Christopher Douglas / Chris]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

YARN timeline server v2 (YARN-2928) merged. Container queuing and
resource-aware scheduling made progress.

HDFS intra-datanode rebalancing (HDFS-1312) merged. Object storage (Ozone)
and the native client made progress. Design of an async FileSystem API and
an implementation in HDFS started in JIRA.

HADOOP integrations with cloud storage were particularly active this
quarter. The S3A client received significant updates from a diverse set of
community members. Clients for the Aliyun Object Store Service (OSS) and
Microsoft Azure Data Lake Store (ADLS) also posted proposals and prototypes.

MAPREDUCE was updated to work with the next generation of the YARN timeline
service.

The Yetus project has greatly improved CI and regression testing,
particularly across branches. Given the Hadoop project's intent to cut
releases from trunk again, Yetus's support for feature branches is
particularly helpful.

RELEASES

Releases have been blocked on HADOOP-12893, bringing the NOTICE and LICENSE
files up to date. It is recently resolved.

COMMUNITY
(+ PMC Xiaoyu Yao 2016-06-14)
(+ PMC Lei Xu 2016-05-15)
(+ PMC Arun Suresh 2016-06-23)
(+ committer Brahma Reddy Battula 2016-06-11)
(+ committer Ray Chiang 2016-06-17)
(+ committer Subru Krishnan 2016-06-14)
(+ committer Varun Saxena 2016-06-22)
(+ branch-YARN-3368 Sreenath Somarajapuram 2016-06-21)
(+ branch-YARN-3368 Sunil Govind 2016-06-03)
auth: 142 committers (including branch), 69 PMC members

20 Apr 2016 [Christopher Douglas / Marvin]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

YARN development yielded improvements in preemption, hardening of the timeline
server, and a new web UI. Designs for resource-aware scheduling and
long-running services are being shaped in JIRA.

HDFS erasure coding, native client, object store (Ozone), and intra-datanode
rebalancing are significant areas under development.

MapReduce continues to receive bug fixes, but even maintenance has slowed.

Trademark enforcement continues to be a challenge. While most vendors have
engaged quickly and positively, slow (non-)compliance falls off the radar. We
are working with trademarks@ to amortize the costs of engagement with
templates and will track these incidents in the BRAND JIRA as appropriate, to
track followup.

RELEASES
- 2.6.4 was released on Feb 10 2016
- 2.7.2 was released on Jan 26 2016

COMMUNITY
(+ PMC Yongjun Zhang 2016-02-18)
(+ PMC Sangjin Lee 2016-04-12)
(+ committer Masatake Iwasaki 2016-01-20)
(+ committer Eric Payne 2016-02-08)
(+ committer Li Lu 2016-02-21)
(+ committer Naganarasimha Garla 2016-03-29)
(+ committer Kai Zheng 2016-04-07)
(+ committer Larry McCay 2016-04-08)
(+ branch-HDFS-1312 Anu Engineer 2016-03-03)
(+ branch-HDFS-8707 Bob Hansen 2016-01-13)
(+ branch-YARN-1011 Iñigo Goiri 2016-01-29)
auth: 138 committers (including branch), 66 PMC members

20 Jan 2016 [Christopher Douglas / Brett]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

In YARN, resource-aware scheduling (YARN-1011) continues to make progress on
a branch, particularly for oversubscription and distributed scheduling.
Other areas of active development include node labels, reservations, docker
support, and the timeline server.

In HDFS, a long-awaited native client (HDFS-8707) has made progress in a
branch. Other areas include the WebHDFS protocol, truncate, erasure coding,
and intra-datanode rebalancing. Discussion on the dev list suggests that
support for erasure coding will likely be pushed from the next release, to
2.9 or 3.0.

Bug fixes and stability improvements continue to be filed and fixed in
MapReduce.

The community prepares maintenance releases (2.6.4 and 2.7.2) concurrently
with a release of the head of branch-2 as 2.8.0.

RELEASES
- 2.6.3 was released on Wed Dec 16 2015

COMMUNITY
(+ PMC Yi Liu 2015-11-09)
(+ PMC Tsuyoshi Ozawa 2015-12-09)
(+ PMC Wangda Tan 2015-12-09)
(+ PMC Akira Ajisaka 2015-12-16)
(+ PMC Robert Kanter 2016-01-12)
(+ branch-YARN-2928 Varun Saxena 2015-12-04)
(+ branch-YARN-2928 Naganarasimha 2015-12-04)
(+ branch-HDFS-8707 Stephen Walkauskas 2016-01-07)

auth: 134 committers (including branch), 64 PMC members

18 Nov 2015 [Christopher Douglas / Brett]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

In YARN, support for resizeable containers (YARN-1197) merged to trunk and
branch-2, where its development continues. Application priorities
(YARN-1963) and the v2 timeline server (YARN-2928) continue to make
progress. Failover, HA, and rolling upgrade support were polished. Some
issues related to resource-aware scheduling advanced, tentatively. Support
for Docker containers (YARN-3611) improved, trending toward support for
multiple runtimes (YARN-3853).

In HDFS, support for erasure coding (HDFS-7285) merged to trunk. Many
improvements focused on improving interactions between features (storage
policies, upgrade domains, erasure coding, etc.). Separation between the
namespace and block management, years in development, has received renewed
attention (e.g., HDFS-8966). HDFS also separated its client(s) into a
separate package. Another native client implementation (HDFS-8707) has made
steady progress.

In MapReduce, bug fixes, stability improvements, and documentation comprised
most of the activity. It remains in maintenance mode.

In Common, Hadoop dev support scripts were rewritten and split into Yetus, a
new TLP. The s3a and wasb filesystem bindings also received many bug fixes
and improvements. Portability of native code improved.

The community continues to stabilize the 2.6.x and 2.7.x branches (currently
voting on 2.7.2), and has discussed a 2.8.0 release. It also opened a
discussion of patch workflows, as alternatives to JIRA/patch files/RTC.
While the Github integration is currently enabled, project members are
working with other communitities and infra on alternatives (e.g., Gerrit).

RELEASES
 - hadoop-2.6.1 @ 2015-09-23
 - hadoop-2.6.2 @ 2015-10-28

COMMUNITY

(+ PMC Devaraj K 2015-07-20)
(+ PMC Yi Liu 2015-11-09)
(+ committer Zhihai Xu 2015-07-27)
(+ committer Anubhav Dhoot 2015-09-22)
(+ committer Sangjin Lee 2015-09-30)
(+ committer Zhe Zhang 2015-10-16)
(+ committer Walter Su 2015-10-27)
Branch: Timeline service
(+ branch-YARN-2928 Vrushali Channapattan 2015-09-14)
(+ branch-YARN-2928 Li Lu 2015-09-29)
Branch: IPv6 support
(+ branch-HADOOP-11890 Elliott Clark 2015-09-03)
(+ branch-HADOOP-11890 Nate Edel 2015-09-04)
Branch: C++ HDFS client
(+ branch-HDFS-8707 James Clampffer 2015-07-29)

auth: 131 committers (including branch), 60 PMC members

21 Oct 2015 [Christopher Douglas / Sam]

No report was submitted.

15 Jul 2015 [Chris Douglas / Sam]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

In YARN, work generalizing node labels, improving Docker support, and
implementing v2 of the timeline server made progress. Support for resizable
containers also appears on-track. A proposal for federated YARN clusters has
design docs and some preliminary code in a branch.

In HDFS, the erasure coding work continues to make progress in a branch. The
object store also has a design doc for discussion and a preliminary set of
patches has been committed to a branch. A native client and prototype HTTP/2
protocol have also made progress in branches.

In Common, work refining the test-patch scripts has expanded its scope to
become a separable component that could support other projects in the
ecosystem. Work revising the native build for Solaris also expanded to remake
much of the native build infrastructure. The S3 filesystem shim was also
updated extensively.

RELEASES
- hadoop-2.7.0 @ 2015-04-22
- hadoop-2.7.1 @ 2015-07-08

COMMUNITY

(+ PMC Vinayakumar B 2015-07-07)
(+ PMC Junping Du 2015-07-07)
(+ PMC Xuan Gong 2015-07-07)
(+ PMC Haohui Mai 2015-02-20)
(+ committer Lei Xu 2015-06-14)
(+ committer Ming Ma 2015-06-18)
(+ committer Xiaoyu Yao 2015-04-16)
(+ committer Varun Vasudev 2015-05-28)
(+ committer Rohith Sharma K S 2015-06-17)
Branch: Split test-patch off into its own TLP (HADOOP-12111)
(+ branch-HADOOP-12111 Andrew Kyle Purtell 2015-06-27)
(+ branch-HADOOP-12111 Nick Dimiduk 2015-06-27)
(+ branch-HADOOP-12111 Andrew Bayer 2015-06-27)
(+ branch-HADOOP-12111 Sean Busbey 2015-06-27)
Branch: YARN Federation (YARN-2915)
(+ branch-YARN-2915 Subru Krishnan 2015-07-06)
(+ branch-YARN-2915 Kishore Chaliparambil 2015-07-06)
Branch: Distributed scheduling (YARN-2877)
(+ branch-YARN-2877 Sriram Rao 2015-05-21)
(+ branch-YARN-2877 Konstantinos Karanasos 2015-05-21)
Branch: Object store (HDFS-7240)
(+ branch-HDFS-7240 Anu Engineer 2015-07-10)
Branch: Data Transfer Protocol via HTTP/2 (HDFS-7966)
(+ branch-HDFS-7966 Duo Zhang 2015-07-02)

auth: 124 committers (including branch), 58 PMC members

22 Apr 2015 [Chris Douglas / Bertrand]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

In YARN, the next iteration of the TimelineServer, work on network shaping,
per-queue policies, and collecting node metrics for scheduling have made
progress. Work on erasure coding in HDFS continues. A design document for
the object store (HDFS-7240) also appeared. Activity is low in MapReduce,
mostly bug fixes and repairs for unstable tests. Overhaul of shell scripts
continues in Common, in addition to changes supporting pluggable
authentication and authorization.

RELEASES

None

COMMUNITY

(+ PMC Haohui Mai 2015-02)
(+ committer Arun Suresh 2015-03)
(+ committer Xiaoyu Yao 2015-03)

auth: 110 committers (including branch), 55 PMC members

18 Feb 2015 [Chris Douglas / Jim]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

The 2.6 release added a large set of features and made many improvements,
including transparent encryption, heterogeneous/tiered storage, support for
Docker containers, reservation-based scheduling, node labels, S3a support,
key management server (KMS), service registry, and rolling upgrades in YARN.

Ongoing development in YARN includes a new round of improvements to the
timeline server (YARN-2928), nodemanager decommission and work-preserving
restart (YARN-914, YARN-1336, YARN-556), improved locking in the RM
(YARN-3091), shared cache (YARN-1492), and disk as a resource (YARN-2139).

Ongoing development in HDFS includes erasure coding (HDFS-7285), support for
truncate (HDFS-3107), namenode synchronization (HDFS-7396), and a native
client (HDFS-6994).

MapReduce received a healthy set of bug fixes and stability improvements.

RELEASES
- hadoop-2.6.0 @ 2014-11-19
- hadoop-2.5.2 @ 2014-11-20

COMMUNITY
(+ PMC Zhijie Shen @ 2014-11)
(+ PMC Jian He @ 2014-11)
(+ committer Yi Liu  @ 2014-11)
(+ committer Carlo Curino @ 2014-11)
(+ committer Gera Shegalov @ 2014-12)
(+ committer Robert Kanter @ 2014-12)
(+ committer Tsuyoshi Ozawa @ 2014-12)
(+ committer Akira Ajisaka @ 2015-01)
(+ committer Wangda Tan @ 2015-01)
(+ branch-HDFS-7285 Zhe Zhang @ 2014-11)
(+ branch-HDFS-7285 Kai Zhang @ 2014-11)
(+ branch-HDFS-7285 Bo Li @ 2014-11)
(+ branch-YARN-2139 Wei Yan @ 2014-12)


auth: 108 committers (including branch), 54 PMC members

21 Jan 2015 [Chris Douglas / Chris]

No report was submitted.

15 Oct 2014 [Chris Douglas / Rich]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

The 2.6 release has branched for release. In addition to bug fixes, it adds
several new features and refines existing work in the 2.x release series.

Among the notable work in YARN: improvements to its fault tolerance and
support for rolling upgrades (YARN-556, YARN-1336), timeline/history server
(YARN-1530), log handling (YARN-2443), admission control/planning
(YARN-1051), support for long-running services (YARN-913), node labels
(YARN-796), and large container allocation (YARN-1769).

Among the notable work in HDFS: tiered storage in archival (HDFS-6584),
in-memory replicas (HDFS-6581), inotify support (HDFS-6634), extended
attributes (HDFS-2006), encryption (HDFS-6134, HADOOP-10150), and a native
client implementation.

In MapReduce, a native collector (MAPREDUCE-2841) offers improved
performance to many deployments.

Work started earlier in the 2.x branch- particularly related to security,
encryption, and high availability- continues apace.

RELEASES
- hadoop-2.5.0 @ 2014-08-12
- hadoop-2.5.1 @ 2014-09-11

COMMUNITY
(+ PMC Karthik Kambatla @ 2014-09-18)
(+ committer Benoy Antony @ 2014-08-07)
(+ committer Akira Ajisaka @ 2014-08-21)
(+ branch-MAPREDUCE-2841 Binglin Zhang @ 2014-07-14)
(+ branch-MAPREDUCE-2841 Sean Zhong @ 2014-07-14)
(+ branch-MAPREDUCE-2841 Manu Zhang @ 2014-08-21)
auth: 101 committers (including branch), 52 PMC members

16 Jul 2014 [Chris Douglas / Rich]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

YARN development of a generic TimelineServer, resource tracking for disk and
network resources, caching of common dependencies, support for container
preemption, and other features continues.

The HDFS extended attributes feature branch merged to trunk (2014-06-11).
Thorough specification of FileSystem semantics (HADOOP-9361) also
successfully merged. Native checksumming, hedged reads, features built over
HA interfaces, NFS, and ACLs are also actively developed in trunk and
release branches.

Across all projects, work adding encryption and security features continues
in a development branch.

The project changed its bylaws to allow 5 days for release votes, instead of
the 7 allocated for other decisions.

RELEASES
- hadoop-0.23.11 @ 2014-06-27
- hadoop-2.4.1 @ 2014-06-29

COMMUNITY
(+ PMC Andrew Wang @ 2014-06-01)
(+ PMC Arpit Agarwal @ 2014-06-01)
(+ PMC Brandon Li @ 2014-06-01)
(+ PMC Chris Nauroth @ 2014-06-01)
(+ PMC Colin McCabe @ 2014-06-01)
(+ PMC Jing Zhao @ 2014-06-01)
(+ PMC Sandy Ryza @ 2014-06-01)
(+ branch-HADOOP-10388 Abraham Elmahrek @ 2014-05-01)
(+ branch-HADOOP-10388 Yongjun Zhang @ 2014-05-01)
(+ branch-HDFS-2006 Charles Lamb @ 2014-05-12)
(+ branch-HDFS-2006 Yi Liu @ 2014-05-12)
(+ branch-fs-encryption Charles Lamb @ 2014-05-14)
(+ branch-fs-encryption Yi Liu @ 2014-05-14)
(+ branch-YARN-1051 Carlo Curino @ 2014-06-15)
(+ branch-YARN-1051 Subramaniam Venkatraman Krishnan @ 2014-06-15)
auth: 94 committers (including branch), 51 PMC members

16 Apr 2014 [Chris Douglas / Chris]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

The YARN execution platform continues to evolve by generalizing from the
specific requirements of the MapReduce framework. As one prominent example,
a development branch implementing a more general application history server
(YARN-321) merged to trunk and the 2.x release series. The operability and
robustness of the platform is also improved by recent attention to failover
and recovery in the ResourceManager and NodeManager components (e.g.,
YARN-1336, YARN-1815).

The HDFS subproject also merged two significant development branches to
trunk: rolling upgrades (HDFS-5535) and ACLs (HDFS-4685). Improvements in
the Common RPC layer, short-circuit reads, and 'hedged' reads (HDFS-5776)
evolve Hadoop storage toward more heterogeneous workloads and architectures.

RELEASES
- hadoop-2.3.0 @ 2014-02-20
- hadoop-2.4.0 @ 2014-04-07

COMMUNITY
(+ committer Haohui Mai @ 2014-02-11)
(+ committer Vinayakumar B @ 2014-03-04)
(+ committer Xuan Gong @ 2014-03-13)
(+ branch-HADOOP-10388 Binglin Chang @ 2014-03-13)
(+ branch-HADOOP-10388 Wenwu Peng @ 2014-04-07)
auth: 88 committers (including branch), 44 PMC members
The last addition to the PMC was Bikas Saha 2013-10

15 Jan 2014 [Chris Douglas / Shane]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

The Hadoop project reached a significant milestone, releasing Hadoop 2.2.0 as
the first GA artifact in that series.

Two development branches have merged to trunk: In-memory caching of HDFS
blocks (HDFS-4949) (29-Oct-2013) and the first phase in presenting
heterogeneous storage to applications (HDFS-2832) (13-Dec-2013). Development
of these features continues in trunk.

YARN continues to refine its resource model. Salient issues include modifying
containers (YARN-1197), delegating cluster resources (YARN-1488), and
improving its model for services (YARN-896). Work on improving high
availability in the ResourceManager (YARN-149), particularly YARN-1029, has
made very promising progress.

RELEASES
- hadoop-2.2.0 @ 2013-10-15
- hadoop-0.23.10 @ 2013-12-02

COMMUNITY
(+ committer Roman Shaposhnik @ 2013-10-25)
(+ committer Jun Ping Du @ 2013-12-04)
(+ committer Jian He @ 2013-12-04)
(+ committer Mayank Bansal @ 2013-12-04)
(+ committer Karthik Kambatla @ 2013-12-04)
(+ committer Ravi Prakash @ 2013-12-04)
(+ committer Omkar Joshi @ 2013-12-04)
(+ committer Zhijie Shen @ 2013-12-04)
(+ branch-YARN-1492 Chris Trezzo 2013-12-18)
(+ branch-YARN-1492 Sangjin Lee 2013-12-18)
(+ branch-HDFS-4685 Haohui Mai 2013-12-29)
auth: 84 committers (including branch), 44 PMC members

16 Oct 2013 [Chris Douglas / Greg]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

The 2.x series reached beta status in August, targeting a GA release before the
end of the year. Work hardening the release continues apace.

The project updated its bylaws to allow for "branch committers" in support of
feature development by newer contributors. The first set are collaborating on a
security initiative that has been discussed on the dev list, in meetups, and in
related JIRAs since last spring. No work on the branch has started, despite
exchanges on the lists on possible, seminal issues to tackle.

RELEASES
- hadoop-1.2.1 @ 2013-08-05
- hadoop-2.0.6-alpha @ 2013-08-22
- hadoop-2.1.0-beta @ 2013-08-25
- hadoop-2.1.1-beta @ 2013-09-30

COMMUNITY
(+ PMC Bikas Saha @ 2013-10-07)
(+ committer Arpit Agarwal @ 2013-08-08)
(+ committer Sanford Ryza @ 2013-07-25)
(+ committer Andrew Wang @ 2013-07-25)
(+ committer Devaraj K @ 2013-07-23)
auth: 74 committers, 44 PMC members

17 Jul 2013 [Chris Douglas / Shane]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

There is little to report, as we submitted an off-cycle report in June.
Security discussions on the dev list converge slowly, but consensus is
developing around implementation tasks, if not the precise shape of that work.
Preparation for the 2.1-beta release continues. Contributors continue to
stabilize APIs, iron out incompatibilities with the 1.x codebase, and integrate
with related projects.

When the Hadoop project spun off subprojects a few years ago, the projects
adjusted their committer roles. We'd been ambivalent about finishing that, but
finally did, removing about 11 accounts (none had participated since then).

RELEASES
- hadoop-0.23.9 @ 2013-07-09

COMMUNITY
auth: 68 committers, 43 PMC members

19 Jun 2013 [Chris Douglas / Greg]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

This is an off-cycle report, as the last few weeks were eventful. The Hadoop
project made five releases from three active development branches, elected 7
members to the PMC, and added five committers. The project amended its
bylaws to eliminate votes on "release plans".

RELEASES
 - hadoop-0.23.7 @ 2013-04-18
 - hadoop-2.0.4 2013-04-23
 - hadoop-1.2 @ 2013-05-13
 - hadoop-0.23.8 @ 2013-06-05
 - hadoop-2.0.5 @ 2013-06-09

COMMUNITY
 (+ PMC Jonathan Eagles 2013-05-29)
 (+ PMC Kihwal Lee 2013-05-29)
 (+ PMC Steve Loughran 2013-05-29)
 (+ PMC Luke Lu 2013-05-29)
 (+ PMC Uma Maheswar Rao G 2013-05-29)
 (+ PMC Hitesh Shah 2013-05-29)
 (+ PMC Daryn Sharp 2013-05-29)
 (+ committer Brandon Li 2013-05-21)
 (+ committer Colin McCabe 2013-05-21)
 (+ committer Jing Zhao 2013-05-22)
 (+ committer Ivan Mitic 2013-05-23)
 (+ committer Chris Narouth 2013-05-23)
 auth: 79 committers, 43 PMC members

The bylaws contained an obscure clause that required release managers to
call a vote on a "release plan". Given that a majority vote of the PMC
establishes a new release, the meaning of this rarely-observed ritual is
ambiguous: there was a vote, but nothing in it was binding. After several
weeks of heated exchanges that accomplished nothing, the PMC voted to remove
the clause from the bylaws entirely. Now, any committer who wants to roll a
release notifies the dev list to explain its motivation and get preliminary
feedback, but there is no vote.

The completely avoidable confusion these threads created has mostly
resolved.

17 Apr 2013 [Chris Douglas / Bertrand]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

Two significant development branches merged to trunk:
- Support for Windows
 ( http://s.apache.org/e7c )
- (HDFS) Fast-path for local reads on Linux (merge vote closing presently)
 ( http://s.apache.org/gM )
 ( http://s.apache.org/7y1 )

Developers have run Hadoop on Windows by emulating its *NIX dependencies,
but the former branch effects a cleaner integration. The latter branch
removed a performance hack for trusted services, replacing it with a more
secure and general implementation for all HDFS clients. Developers on
Windows requested that the workaround remain intact while comparable
functionality is implemented on that platform.

The two merge votes were nearly concurrent, so the development community
discussed the tradeoffs in supporting the new platform, particularly given
the present example of its impact. The informal consensus laid the burden of
support, testing, and monitoring on the subset of developers working on
Windows. Concretely, this extracted commitments to set up and maintain CI
infrastructure while relieving others of requirements to fix breakage on a
platform they may not run. As applied to the HDFS branch being merged, the
implementor(s) of the feature restored the workaround. The dev community
converged on these banal agreements fairly quickly.

Increased collaboration with the Apache Bigtop project in the release
process has improved early detection of downstream integration issues. The
upcoming release of 2.0.4-alpha (currently being voted on) has benefitted
significantly.

Hadoop continues to be an umbrella hosting effectively independent projects
(HDFS, MapReduce, YARN). The PMC has not discussed its disposition to
partition them recently. While one of the prenominate merges is an example
of cross-project work, such patches remain rare.

No issues require board attention at this time.

RELEASES
- hadoop-1.1.2 @ 2013-03-06

COMMUNITY
(+ PMC Jason Lowe 2013-02-28)
auth: 74 committers, 36 PMC members
mailing lists @ 2013-04-01
  1805 general
  3995 user

COMMON
Common is the shared libraries for HDFS and MapReduce.
mailing lists @ 2013-04-01
   390 common-commits
  1789 common-dev
   378 common-issues

HDFS
HDFS is a distributed file system that supports reliable replicated
storage across the cluster using a single name space.
mailing lists @ 2013-04-01
   201 hdfs-commits
   862 hdfs-dev
   258 hdfs-issues

MAPREDUCE
MapReduce is an implementation of the map/reduce programming paradigm.
mailing lists @ 2013-04-01
   198 mapreduce-commits
   904 mapreduce-dev
   256 mapreduce-issues

YARN
YARN is a distributed computation framework for easily writing
distributed applications.
mailing lists @ 2013-04-01
    57 yarn-commits
   221 yarn-dev
    81 yarn-issues

20 Feb 2013

Change the Apache Hadoop Chair

 WHEREAS, the Board of Directors heretofore appointed Arun
 Murthy to the office of Vice President, Apache Hadoop, and

 WHEREAS, the Board of Directors is in receipt of the resignation
 of Arun Murthy from the office of Vice President, Apache
 Hadoop, and

 WHEREAS, the Project Management Committee of the Apache Hadoop
 project has chosen by vote to recommend Chris Douglas as the
 Successor to the post;

 NOW, THEREFORE, BE IT RESOLVED, that Arun Murthy is
 relieved and discharged from the duties and responsibilities of
 the office of Vice President, Apache Hadoop, and

 BE IT FURTHER RESOLVED, that Chris Douglas be and hereby is
 appointed to the office of Vice President, Apache Hadoop, to
 serve in accordance with and subject to the direction of the
 Board of Directors and the Bylaws of the Foundation until
 death, resignation, retirement, removal or disqualification, or
 until a successor is appointed.

 Special Order 7A, Change the Apache Hadoop Chair, was approved
 by Unanimous Vote of the directors present.

20 Feb 2013 [Arun Murthy / Greg]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

On the people side, we have new people joining our ranks.
* We've added 3 new committers - Kihwal Lee, Arpit Gupta, Bikas Saha
* We've added 1 new PMC member: Harsh J
* We've elected a new PMC Chair, Chris Douglas. (Also added to the board
 agenda.)

On the project side, we have made 4 releases:
- hadoop-0.23.5 was released on 28th November, 2012
- hadoop-1.1.1 was released on 1st December, 2012
- hadoop-0.23.6 was released on 6th February, 2013
- hadoop-2.0.3-alpha was released on 13th February, 2013

PMC Chair Vote - We had a fairly contentious discussion after for the
PMC Chair resulted in a tie after STV. The discussions included
*analysis* of voting patterns w.r.t employers, accusations and
counter-accusations about reasons for those patterns such as marketing
etc., a proposal to *rotate PMC chair organization* as one of the
remedies, which eventually veered into a direction where one PMC
member perceived it as a of 'threat to remove all PMC members of an
organization' which was rapidly diffused by a clarification by the
other PMC member. In the end, one of the 2 candidates tied after the
vote withdrew to allow for an amicable solution and also cited
concerns about the nature of some of the discussions.

Clearly, the lesson the Hadoop PMC has learnt is that, in future, voting
should be done via the ASF Voting Tool.

As the outgoing Chair, my personal recommendation is that splitting
the Hadoop project into separate TLPs (HDFS, YARN, MapReduce) will not
only break up the 'umbrella' Hadoop project to better reflect the fact
that the communities are significantly disparate, but will also, more
importantly, help avoid excessive fascination with the Hadoop
brand. We've discussed about this in the past (see October 2012 Board
Report) - some people agree about this, others don't. We'll continue
to talk.

Overall, aside from these skirmishes, the community continues to
function in a healthy manner as evinced by the fact that we continue
to make a significant number of software releases, grow the community
by adding new users/contributors/committers/PMC-members and generally
make great forward progress. Hence, I feel there isn't any reason for
the Board to take any action.

Community:
* 51 committers
* 3932 user@
* 1783 subscribers on general@

COMMON

Common is the shared libraries for HDFS and MapReduce.

Community:
* 1751 subscribers on common-dev

HDFS

HDFS is a distributed file system that supports reliable replicated storage
across the cluster using a single name space.

Community:
* 829 subscribers on hdfs-dev

YARN

YARN is a distributed computation framework for easily writing distributed
applications.

Community:
* 185  subscribers to yarn-dev

MAPREDUCE

MapReduce is an implementation of the map/reduce programming paradigm.

Community:
* 867  subscribers to mapreduce-dev

16 Jan 2013 [Arun Murthy / Roy]

No report was submitted.

AI: Roy to pursue a report for Hadoop

21 Nov 2012 [Arun Murthy / Bertrand]

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

On the people side, we have new people joining our ranks.
* We've added one new committer - Jason Lowe
* We've added 3 new PMC members: Siddharth Seth, Robert Evans, Thomas Graves

On the project side, we have made 6 releases:
- hadoop-2.0.1-alpha was released on 26th July, 2012
- hadoop-0.23.3 was released on 17th September, 2012
- hadoop-2.0.2-alpha was released on 9th October, 2012
- hadoop-1.0.4 was released on 11th October, 2012
- hadoop-1.1.0 was released on 14th October, 2012
- hadoop-0.23.4 was released on 15th October, 2012

Developer community is working well together, even though there was a fresh
(but minor) outbreak of vendor wars with some participation by members of
the PMC. No action from the Board is necessary now.

We've added a new Hadoop YARN sub-project.

We had a fairly contentious public discussion on splitting Apache Hadoop
into separate projects since there are at least 3 very distinct developer
communities in Apache Hadoop now: HDFS, YARN & MapReduce. For now the
community has voted to merge separate committer lists, but there seems to be
some emerging, albeit very early/tenuous consensus that after hadoop-2 is
declared 'stable' we should split the project into separate projects (HDFS,
YARN, MapReduce). This will better reflect reality that they have distinct
communities. No action from the Board is necessary now.

Community:
* 48 committers
* 3817 user@
* 1624 subscribers on general@

COMMON

Common is the shared libraries for HDFS and MapReduce.

Community:
* 1681 subscribers on common-dev

HDFS

HDFS is a distributed file system that supports reliable replicated storage
across the cluster using a single name space.

Community:
* 735 subscribers on hdfs-dev

YARN

YARN is a distributed computation framework for easily writing distributed
applications.

Community:
* 86  subscribers to yarn-dev

MAPREDUCE

MapReduce is an implementation of the map/reduce programming paradigm.

Community:
* 766  subscribers to mapreduce-dev

17 Oct 2012 [Arun Murthy / Doug]

No report was submitted.

AI: Greg to pursue a report for Hadoop

25 Jul 2012 [Arun Murthy / Rich]

Apache Hadoop status report for July 2012

Hadoop is a set of related tools and frameworks for creating and managing
distributed applications running on clusters of commodity computers.

On the people side, we have new people joining our ranks.
* We've added two new committers - Daryn Sharp, Jonathan Eagles
* We've added one new PMC member: Alejandro Abdelnur

On the project side, we have made 1 bug-fix release in the stable line and 1
major new release:
- hadoop-1.0.3 was released on 16th May, 2012
- hadoop-2.0.0-alpha was released on 23rd May, 2012

- Work on further Hadoop 2.0.1-alpha (a security bug-fix release) is done,
and is currently under vote.
- Work on hadoop-1.1.0 is nearly done.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Releases:
- hadoop-1.0.3 was released on 16th May, 2012
- hadoop-2.0.0-alpha was released on 23rd May, 2012

Community:
* 48 committers
* 1613 subscribers on common-dev
*  3151 subscribers on common-user
* 1533 subscribers on general

New committers:
* 2 new committers have been added to this project.

HDFS

HDFS is a distributed file system that supports reliable replicated
storage across the cluster using a single name space.

Releases:
- hadoop-1.0.3 was released on 16th May, 2012
- hadoop-2.0.0-alpha was released on 23rd May, 2012

New committers:
* 1 new committer has been added to this project.

Community:
* 43 committers
* 668 subscribers on hdfs-dev
* 1205 subscribers on hdfs-user

MAPREDUCE

MapReduce is a distributed computation framework for easily writing
applications that process large volumes of data.

Releases:
- hadoop-1.0.3 was released on 16th May, 2012
- hadoop-2.0.0-alpha was released on 23rd May, 2012

New committers:
* 1 new committer has been added to this project.

Community:
* 46 committers
* 689  subscribers to mapreduce-dev
* 1354 subscribers to mapreduce-user

(Hadoop)

18 Apr 2012 [Arun Murthy / Roy]

Apache Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity computers.

On the people side, we have had new people join our ranks:
* We've added new committers - Thomas Graves, Robert Evans, Hitesh Shah & Uma M
* We've added new PMC members: Aaron Myers, Matt Foley

On the project side, we have made 3 releases:
- hadoop-1.0.1 was released on 22nd Feb, 2012
- hadoop-0.23.1 was released on 28th Feb, 2012
- hadoop-1.0.2 was released on 4th April, 2012

- Work on further Hadoop 0.23.2 release is nearly done, and is
 scheduled for a release in the next few days.

- Developer community is working well together.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Community:
* 46 committers
* 1520 subscribers on common-dev
* 2952 subscribers on common-user
* 1503 subscribers on general

New committers:
* 4 new committers (Thomas Graves, Robert Evans, Hitesh Shah & Uma M) have been
 added to this project.

HDFS

HDFS is a distributed file system that supports reliable replicated storage
across the cluster using a single name space.

New committers:
* 1 new committer (Uma M) has been added to this project.

Community:
* 42 committers
* 607 subscribers on hdfs-dev
* 1092 subscribers on hdfs-user

MAPREDUCE

MapReduce is a distributed computation framework for easily writing
applications that process large volumes of data.

New committers:
* 3 new committers (Thomas Graves, Robert Evans & Hitesh Shah) have been
 added to this project.

Community:
* 45 committers
* 637  subscribers to mapreduce-dev
* 1250 subscribers to mapreduce-user

24 Jan 2012 [Arun Murthy / Bertrand]

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

On the people side, we have new person join our ranks.
  * We've added one new committer - Siddharth Seth.

On the project side, we have made some very exciting progress. We have
had a total of 3 releases:
 - hadoop-0.23.0 released from trunk, first one off trunk in nearly 2 years.
 - hadoop-0.22.0 released, branched in early 2011.
 - hadoop-1.0.0 released of branch-0.20.2xx baseline (now branch-1)
 - https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces21

 - Work on further Hadoop 0.23.1 release is continuing, and is
   scheduled for release at the end of the month

 - Developer community is working well together. The public dialogue
   among vendors who employ many in the developer community seems to
   have died down since the last board report. No action from the
   board is required at this stage.

 - Some vendors are continuing to use the lists to promote their own
   products. A few PMC members have responded to discourage this
   practice, but not directly as the PMC. No action from the board is
   required at this stage.

COMMON

 Common is the shared libraries for HDFS and MapReduce.

 Releases:
 * 0.23.0 was released on 11th Nov, 2011.
 * 0.22.0 was released on 10th Dec, 2011.
 * 1.0.0 was released on 29th Dec, 2011.

 Community:
 * 42 committers
 * 1433 subscribers on common-dev
 * 2761 subscribers on common-user
 * 1468 subscribers on general

 New committers:
 * 1 new committer has been added to this project.

HDFS

HDFS is a distributed file system that supports reliable replicated
storage across the cluster using a single name space.

 Community:
 * 41 committers
 * 567 subscribers on hdfs-dev
 * 985 subscribers on hdfs-user

MAPREDUCE

MapReduce is a distributed computation framework for easily writing
applications that process large volumes of data.

 Releases:
 * None this period.

 New committers:
 * 1 new committer has been added to this project.

 Community:
 * 42 committers
 * 587 subscribers to mapreduce-dev
 * 1118 subscribers to mapreduce-user

26 Oct 2011

Change the Apache Hadoop Chair

   WHEREAS, the Board of Directors heretofore appointed Ian
   Holsman to the office of Vice President, Apache Hadoop, and

   WHEREAS, the Board of Directors is in receipt of the resignation
   of Ian Holsman from the office of Vice President, Apache
   Hadoop, and

   WHEREAS, the Project Management Committee of the Apache Hadoop
   project has chosen by vote to recommend Arun Murthy as the
   Successor to the post;

   NOW, THEREFORE, BE IT RESOLVED, that Ian Holsman is
   relieved and discharged from the duties and responsibilities of
   the office of Vice President, Apache Hadoop, and

   BE IT FURTHER RESOLVED, that Arun Murthy be and hereby is
   appointed to the office of Vice President, Apache Hadoop, to
   serve in accordance with and subject to the direction of the
   Board of Directors and the Bylaws of the Foundation until
   death, resignation, retirement, removal or disqualification, or
   until a successor is appointed.

 Resolution 7B was approved by unanimous roll call vote,
 with Doug Cutting abstaining.

26 Oct 2011 [Ian Holsman / Larry]

 Hadoop status report for October 2011

 Hadoop is a set of related tools and frameworks for creating and
 managing distributed applications running on clusters of commodity
 computers.

 On the people side, we have a couple of new people join our ranks.
  * Giri Kesavan, and Jitendra Pandey have accepted a role in the PMC
  * 4 people have accepted committership. Alejandro Abdelnur, Harsh J
    Chouraria, Eric Yang and Ramya Sunil
  * A new PMC Chair (Arun Murthy) is being recommended to the board for
    their approval.

 On the project side, we have made some exciting progress.
 - 0.20.205's vote has closed successfully, and will be released shortly.
   This release integrates two major features (security & append), of which
   the append feature was topic of much internal debate, so this is an
   excellent outcome for the health of Hadoop, and allows other projects like
   HBase to use a 'official' release.

 - Work on Hadoop 0.23 release is continuing, and is scheduled for release
   at the end of the month

 - Konstantin Shvachko is now leading the 0.22 Release process

 - Mavenization of our codebase is complete

 - Developer community is working well together

 - Vendors are continuing to use the lists to promote their own products.
   We are formulating appropriate responses to discourage this practice.
   No action from the board is required at this stage.


COMMON

 Common is the shared libraries for HDFS and MapReduce.


 Releases:
 * 0.20.204.0 (beta) was released on the 5 September.

 Community:
 * 2598 subscribers on common-dev
 * 1341 subscribers on common-user
 * 1392 subscribers on general

HDFS

 HDFS is a distributed file system that supports reliable replicated
 storage across the cluster using a single name space.

 New committers:
 * 4 new committers has been added to this project.

 Community:
 * 41 committers
 * 499 subscribers on hdfs-dev
 * 864 subscribers on hdfs-user

MAPREDUCE

 MapReduce is a distributed computation framework for easily writing
 applications that process large volumes of data.

 Releases:
 * None this period.

 New committers:
 * 4 new committers has been added to this project.

 Community:
 * 41 committers
 * 528  subscribers to mapreduce-dev
 * 1016 subscribers to mapreduce-user

17 Aug 2011 [Ian Holsman / Greg]

 Hadoop status report for August 2011

 Hadoop is a set of related tools and frameworks for creating and
 managing distributed applications running on clusters of commodity
 computers.


 * Hadoop Summit - 1600 people attended
 * HortonWorks launch
 * The 0.20.203.0 release and the divisive vote.
 * 0.20.204.0 is having a rc1 voted on.
 * Hadoop naming debate
 * Lack of progress on contacting the potential trademark infringers
 * 0.22 stalling
 * More weight gathering behind 0.23
 * Growing ecosystem as more incubator project are in the Hadoop ecosystem
 * Commercial forks of Hadoop (eg. MapR) and how to respond to them on the
   lists and attending developer meetups
 * A number of developers active on the HA Jira (HDFS-1623) asked for a
   in-person high bandwidth meeting to to get clarification on the design
   document posted on the Jira, this wasn't publicized on-list
 * Fixed of our site to claim trademark for Hadoop and the other Apache
   projects.
 * Trademarks is proceeding with registering the Hadoop trademark.
 * Yahoo removed the references to the Yahoo Distribution of Hadoop.

  In regards to the releases, we have 3 releases going on. the 0.20.X release
  stream, that has some minor features and mainly bug fixes, and the 0.22 and
  0.23 releases that represent some major changes. 0.22 & 0.23 differ in
  featureset and 0.23 is a superset of 0.22.

COMMON

 Common is the shared libraries for HDFS and MapReduce.


 Releases:
 * None this period.

 Community:
 * 1294 subscribers on common-dev
 * 2487 subscribers on common-user
 * 1375 subscribers on general

HDFS

 HDFS is a distributed file system that supports reliable replicated
 storage across the cluster using a single name space.

 New committers:
 * 1 new committers have been added to this project.

 Community:
 * 38 committers
 * 465 subscribers on hdfs-dev
 * 788 subscribers on hdfs-user

MAPREDUCE

 MapReduce is a distributed computation framework for easily writing
 applications that process large volumes of data.

 Releases:
 * None this period.

 New committers:
 * 3 new committers have been added to this project.

 Community:
 * 40 committers
 * 485 subscribers to mapreduce-dev
 * 938 subscribers to mapreduce-user

Larry asked and Owen answered what the "Hadoop naming debate" is. It is a reference to whether to accept http://wiki.apache.org/hadoop/Defining%20Hadoop which seeks to limit the name "Hadoop" to mean releases from Apache and pushing all other derived products to be "powered by Hadoop." There was generally support except from the companies that use the Hadoop name for derivative products. There was a request to suspend the vote for more discussion, but once the vote stopped the discussion stopped.

20 Jul 2011 [Ian Holsman / Sam]

Report missing; will report next month.

20 Apr 2011 [Ian Holsman / Noirin]

 Hadoop status report for April 2011

 Hadoop is a set of related tools and frameworks for creating and
 managing distributed applications running on clusters of commodity
 computers.

 On the people side, we have a couple of new people join our ranks.
  * Todd Lipcon has accepted a role in the PMC
  * Koji Noguchi has accepted a role as a committer in both HDFS & MR.
  * Matthew Foley has accepted a role as a committer in HDFS.
  * We have another invitations outstanding, and hope he will take
    up the committer role shortly.

 On the branding side, we and the trademark group have been actively
 engaging companies to make proper use and attribution of our Apache Hadoop
 Trademark. These discussions are ongoing, and generally positive.

 On the product release side, Nigel is continuing to progress with the 0.22
 release. We have 18 outstanding blockers.  HADOOP-7106, which re-organizes
 some SVN structure, should be committed by the end of next week.
 MAPREDUCE-2178 is the biggest outstanding blocker that many other depend
 on.  Still no clear plan on getting it fixed.

 and Arun has taken over with the 0.20.200 (formerly known as 0.20.3).
 He pushed a giant patch to the branch-0.20-security branch. Then, based on
 the feedback from the community, Owen took over and committed individual
 patches for the same codebase to the branch. Currently we have a couple of
 unit tests failing, after fixing them we should be good to make an
 official release after getting necessary approvals from the PMC.

 Discussions around rationalize the codebase have started, with mrunit
 being moved to the incubator, and further discussions about either
 maintain the contrib modules or moving them to apache-extras/incubator

 The biggest news is saved for last. Yahoo! has announced that they will
 stop maintaining their own internal codebase, and switch to actively
 developing on the apache one. This is a great step forward, and they have
 also started having more discussions about architecture (MR-279) on the
 list. We look forward to more in-depth discussions happening in the
 public forums.


COMMON

 Common is the shared libraries for HDFS and MapReduce.


 Releases:
 * None this period.

 Community:
 * 1194 subscribers on common-dev
 * 2293 subscribers on common-user
 * 1328 subscribers on general

HDFS

 HDFS is a distributed file system that supports reliable replicated
 storage across the cluster using a single name space.

 New committers:
 * 2 new committers has been added to this project.

 Community:
 * 35 committers
 * 375 subscribers on hdfs-dev
 * 631 subscribers on hdfs-user

MAPREDUCE

 MapReduce is a distributed computation framework for easily writing
 applications that process large volumes of data.

 Releases:
 * None this period.

 New committers:
 * 2 new committer has been added to this project.

 Community:
 * 37 committers
 * 400 subscribers to mapreduce-dev
 * 764 subscribers to mapreduce-user

19 Jan 2011 [Ian Holsman / Sam]

 Hadoop status report for January 2011

     Hadoop is a set of related tools and frameworks for creating and
     managing distributed applications running on clusters of commodity
     computers.

     Nigel has volunteered to RM the 0.22, and it is making progress, the previous
     RM stepped down due to not having enough time since the 6685 patch was not
     going to make this release. Progress on 6685 has not really progressed.

     Owen has volunteered to RM the 0.20.3 release, and there is discussions
     about integrating the 'security' patch-set that Yahoo! is developing, that
     Arun has volunteered to RM. Both of these are separate branches.

     We have invited 11 new committers into the project this month, all have
     accepted, are in the process of getting their accounts setup. We also had
     2-3 people who the PMC felt were not ready for committership yet. There is
     still a lot of discussion about what the criteria of what makes a committer,
     but I think we are in a better place than before.

     We are working with the brand management team about Yahoo!'s and
     Cloudera's use of Hadoop's name. Both of these are showing good progress
     thanks to the brand management teams hard work.

     We are still having lots of discussions about future work on the 0.20 branch
     this includes the security patch-set, adding append, and the 0.20.3 release
     The security patch-set has it's own issues, due to it requiring some work
     if it will be contributed as separate patches, and also how it the work will
     be applied to the upcoming 0.22 release. (see http://s.apache.org/NfJ &
     http://s.apache.org/uf  for the discussions around the append branch & security
     branches) there have been a couple of misunderstandings around the security
     releases.

     We have also started discussions about why we have so many mailing lists,
     what they are used for, and the possibility of combining some of them (and
     2 code bases). We have updated the website to provide better documentation.
     The codebase discussion is more about moving directories around, rather than
     combining them into a single one.

 COMMON

     Common is the shared libraries for HDFS and MapReduce.


     Releases:
     * None this period.

     Community:
     * 1123 subscribers on common-dev
     * 2140 subscribers on common-user
     * 1335 subscribers on general

 HDFS

     HDFS is a distributed file system that supports reliable replicated
     storage across the cluster using a single name space.

     New committers:
     * 5 new committers have been added to this project.

     Community:
     * 33 committers
     * 323 subscribers on hdfs-dev
     * 525 subscribers on hdfs-user

 MAPREDUCE

     MapReduce is a distributed computation framework for easily writing
     applications that process large volumes of data.

     Releases:
     * None this period.

     New committers:
     * 8 new committers have been added to this project.

     Community:
     * 35 committers
     * 342 subscribers to mapreduce-dev
     * 647 subscribers to mapreduce-user

The report indicates that changes have been made that satisfy the board. The project is back on a quarterly reporting schedule.

15 Dec 2010 [Ian Holsman / Noirin]

 Hadoop status report for December 2010

     Hadoop is a set of related tools and frameworks for creating and
     managing distributed applications running on clusters of commodity
     computers.

     There was one contentious issue raised (HADOOP-6685), which ongoing
     discussion has continued about which technical direction is better
     moving forward. There is currently a veto on the patch. This patch is
     not critical to the health of the project.

     6 new PMC members have been added, and votes for several new committers
     have started.
     We would like to welcome the follow people to the Hadoop PMC:
       * Eli Collins
       * Jakob Homan
       * Amareshwari Ramadasu
       * Suresh Srinivas
       * Sharad Agarwal
       * Vinod Kumar Vavilapalli

     We have invited a new committer, but so far he has not responded

     We are working with the brand management team about Yahoo!'s and
     Cloudera's use of Hadoop's name.

     The 0.22 release scheduled for November is still in progress.


 COMMON

     Common is the shared libraries for HDFS and MapReduce.


     Releases:
     * None this period.

     New Committers:
     * None this period.

     Community:
     * 1089 subscribers on common-dev
     * 2106 subscribers on common-user
     * 1294 subscribers on general

 HDFS

     HDFS is a distributed file system that supports reliable replicated
     storage across the cluster using a single name space.

     Releases:
     * None this period.

     New committers:
     * None this period.

     Community:
     * 28 committers
     * 299 subscribers on hdfs-dev
     * 498 subscribers on hdfs-user

 MAPREDUCE

     MapReduce is a distributed computation framework for easily writing
     applications that process large volumes of data.

     Releases:
     * None this period.

     New committers:
     * None this period

     Community:
     * 27 committers
     * 317 subscribers to mapreduce-dev
     * 612 subscribers to mapreduce-user

 ZOOKEEPER

     The ZooKeeper project is now a separate project, and will be
     removed from further notices going forward

17 Nov 2010 [Ian Holsman / Bertrand]

 Hadoop status report for October 2010 to November 2010

 Hadoop is a set of related tools and frameworks for creating and
 managing distributed applications running on clusters of commodity
 computers.

 Discussions have started on the issues that the board identified; we
 seem to have a general agreement on some issues, but we need an official
 consensus on the proposals, and have them discussed openly in the public
 mailing lists.

 Specifically:

 * Everyone is in general agreement that we need to release more often.
   The question revolves around how we test them to ensure they keep to the
   quality that Hadoop releases are known for.

 * The discussion of having 'mentors' to help guide new committers was
   started.

 * The Cloudera branding issue was forwarded to the trademarks group, where
   Shane & Karen are deciding how best to pursue the issue of their
   certification courses and branding on their website.

 * Bylaws have been discussed on general@

 * Owen will be the release manager for the 0.22 release schedule later
   this month.

 * The ZooKeeper project has voted to become a separate TLP. This has been
   raised for the board's consideration.

 * people have started using reviews.apache.org to discuss patches


 COMMON

 Common is the shared libraries for HDFS and MapReduce.


 Releases:
 * None this period.

 New Committers:
 * None this period.

 Community:
 * 1073 subscribers on common-dev
 * 2068 subscribers on common-user

 HDFS

 HDFS is a distributed file system that supports reliable replicated
 storage across the cluster using a single name space.

 Releases:
 * None this period.

 New committers:
 * None this period.

 Community:
 * 26 committers
 * 286 subscribers on hdfs-dev
 * 463 subscribers on hdfs-user

 MAPREDUCE

 MapReduce is a distributed computation framework for easily writing
 applications that process large volumes of data.

 Releases:
 * None this period.

 New committers:
 *  Scott Chen was voted in as a committer in August 2010.


 Community:
 * 26 committers
 * 303 subscribers to mapreduce-dev
 * 568 subscribers to mapreduce-user

 ZOOKEEPER

 ZooKeeper is a reliable coordination service for distributed
 applications.

 Releases:
 * None this period.

 Two releases are in progress, near term a 3.3.2 fix release (1 blocker
 pending), and longer term 3.4.0 feature release.


 New committers:
 none

 Community:

 * 6 active committers, 2 PMC members
 * 176 subscribers on zookeeper-dev
 * 356 subscribers on zookeeper-user

 The ZooKeeper project has petitioned the board to become a TLP.

20 Oct 2010

Change the Apache Hadoop Project Chair

   WHEREAS, the Board of Directors heretofore appointed Owen
   O'Malley to the office of Vice President, Apache Hadoop, and

   WHEREAS, with the desire of the Board of Directors to rotate
   the position of Vice President, Apache Hadoop, the Project
   Management Committee of the Apache Hadoop Project has chosen to
   recommend Ian Holsman as the successor to the post;

   NOW, THEREFORE, BE IT RESOLVED, that Owen O'Malley is relieved
   and discharged from the duties and responsibilities of the
   office of Vice President, Apache Hadoop, and

   BE IT FURTHER RESOLVED, that Ian Holsman be and hereby is appointed
   to the office of Vice President, Apache Hadoop, to serve in
   accordance with and subject to the direction of the Board of
   Directors and the Bylaws of the Foundation until death,
   resignation, retirement, removal or disqualification, or until
   a successor is appointed.

 Approved by unanimous roll call vote with Doug abstaining.

20 Oct 2010 [Owen O'Malley / Geir]

Hadoop status report for July 2010 to October 2010

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

The 2rd annual Hadoop World was held on 12 October in NYC. It had 900
attendees.  before the conference. The program is available here:
http://www.cloudera.com/company/press-center/hadoop-world-nyc/agenda/.

The divestiture of sub-projects has continued. We have promoted Hive,
and Pig to be top level Apache projects and Chukwa to the
Incubator. This has had the positive effect that the majority of the
current PMC is involved in the core projects (Common, HDFS and
MapReduce).

The Hadoop PMC removed one member, who has completely dropped out of contact:
* Jim Kellerman
and as part of moving subprojects out, the following PMC members resigned:
* Alan Gates
* Ashish Thusoo
* Daniel Dai
* Namit Jain
* Olga Natkovich
* Pradeep Kamath

The tension between Cloudera and Yahoo has dramatically increased this
quarter and is past the breaking point. This was exacerbated by the
board's sudden insistence that the Hadoop project pick a new PMC chair
without discussing the issues with anyone other than the Cloudera
employee sitting on the board. Over the last 2.5 years, I've done my
best to do what was right for the Hadoop project and it is too bad the
community has degenerated to the current state. I sincerely want to
get the problems resolved so that we can get back to developing
software and enjoying a community that can work together.

Critical issues for the Hadoop PMC to address:
  * Change is difficult and this will involve change.
  * We need to enact bylaws so that there is a clear understanding of
    the rules.
  * The PMC needs to define and document the goals and processes that the
    project will follow going forward.
     * Expectations about committers reviewing each other's patches
     * Expectations about becoming a committer and PMC member.
     * Policies about expecting PMC members and committers to stay
       involved. People without skin in the game who vote without
       working on the project are just signing up other people for
       work.
  * Poisonous people within the project need to be managed.
  * Cloudera's abuse of the Hadoop trademark in their product names needs to be
     halted.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Releases:
 * 0.21.0 released 24 Aug 2010 with over 1300 jiras resolved.

Committers:
 * We redefined the Common committers to be the union of all HDFS and MapReduce
   committers.

Community:
 * 1062 subscribers on common-dev
 * 2067 subscribers on common-user

HDFS

HDFS is a distributed file system that support reliable replicated
storage across the cluster using a single name space.

Releases:
 * 0.21.0 released 24 Aug 2010 with over 1300 jiras resolved.

New committers:
 None

Community:
* 26 committers
* 280 subscribers on hdfs-dev
* 454 subscribers on hdfs-user

MAPREDUCE

MapReduce is a distribute computation framework for easily writing
applications that process large volumes of data.

Releases:
 * 0.21.0 released 24 Aug 2010 with over 1300 jiras resolved.

New committers:
 * Scott Chen

Community:
* 27 committers
* 300 subscribers to mapreduce-dev
* 553 subscribers to mapreduce-user

ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed
applications.

Releases:
* No releases this quarter

Two releases are in progress, near term a 3.3.2 fix release (1 blocker
pending), and longer term 3.4.0 feature release.


New committers:
none

Community:

* 5 active committers, 2 PMC members

* 176 subscribers on zookeeper-dev (up from 160 3 months ago)
* 347 subscribers on zookeeper-user (up from 307 in the same timeframe)

Three GSOC students completed their projects successfully. This
resulted in significant new functionality being added to the project,
and some renewed interest from a contributor standpoint. Two of the
three students have indicated that they are interested to continue
working in the community.

The discussion to move ZooKeeper to TLP status has been reopened and
is in progress at the time of this writing.

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

Good report; looks like progress is being made here.

21 Jul 2010 [Owen O'Malley / Henri]

Hadoop status report for April 2010 to July 2010

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

The 3rd annual Hadoop Summit was held on 29 June in Santa Clara. It
sold out at 1,000 attendees 10 days before the conference. The program
is available here:
http://developer.yahoo.com/events/hadoopsummit2010/agenda.html. The
slides and videos of the presentations are available online.

The second Hadoop World was announced in NYC on 12 October. The call
for presentations is open until 2 August.

There are a large number of local Hadoop User Groups around the
world. The Bay Area HUG meets monthly and has an audience of roughly
300 people.

To increase communication and reduce tensions, the SF Bay Area core
contributors (Common, HDFS, and MapReduce) have been having monthly
meetings that rotate between venues (Cloudera, Facebook, and
Yahoo!). We've discussed wide-ranging topics from process issues to
new technical ideas. All of the notes and slides are distributed on
the lists to engage developers who can't attend.

The Hadoop PMC added the following members:
* Sanjay Radia (Yahoo)
* Hemath Yamijala (indep)

CHUKWA

Chukwa is a distributed log collection framework that aggregates logs
from across a cluster into a reasonable number of HDFS files.

As part of the continuing Hadoop divestiture of sub-projects, Chukwa's
developers were encouraged to move to Apache Incubator. Although
Chukwa has already completed many of the Incubator graduation
requirements (diversity of committers, code clearance, releases), they
have not voted in new contributors or PMC members. Also, none of the
Chukwa committers have been on any Apache PMC's and need more guidance
than jumping into a TLP would have provided. Some of the work has been
done (accepted by Incubator, moved subversion, added to Incubator
wiki), but more is left to do (web site, mailing lists). They are
scheduled to report next month as a Podling.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Releases:
 * 0.21.0 release candidates for Common, HDFS, and MapReduce have been
   rolled, but there are still some blockers. The hope is to get the
   blockers fixed and a release out next month.

New committers:
 * Amareshwari Sriramadasu (Yahoo)

Community:
 * 1013 subscribers on common-dev
 * 1965 subscribers on common-user

HDFS

HDFS is a distributed file system that support reliable replicated
storage across the cluster using a single name space.

Releases:
* The 0.21 release is still solidifying.
* A new branch called branch-0.20-append was created to support the append
  feature to HDFS files. HBase needs this feature to run without data loss in a
  production environment.

New committers:
* New committer Eli Collins (Cloudera)

Community:
* 26 committers
* 198 code contributors
* 247 subscribers on hdfs-dev
* 390 subscribers on hdfs-user

* Design proposal to support distributed HDFS NameNode.

HIVE

Hive is a data warehouse written on top of Hadoop.  It provides SQL to
query and manage data stored in Hadoop in table and partitions and
provides a metastore to metadata information about the data stored in
hadoop.

Releases:
0.6.0 branched and we are priming up to release it.

New committers:
* John Sichi (Facebook)

Community:
* 164 contributors (commented, filed bugs or contributed to Hive). This was
  115 at the last report time.

MAPREDUCE

MapReduce is a distribute computation framework for easily writing
applications that process large volumes of data.

Releases:
* The 0.21 release is still solidifying.

New committers:
* Amareshwari Sriramadasu (Yahoo)

Community:
* 268 subscribers to mapreduce-dev
* 465 subscribers to mapreduce-user

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.

Releases:
 * Pig 0.7.0 released on 5/13/2010

Community:
 * 12 committers and 5 emeriti (4 retired in the last month)
 * 191 developers (compared to 181 in the last report)
 * 452 users (compared to 402 in the last report)


ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed
applications.

Releases:
* release 3.3.1 on 17/May/10

New committers:
none

Community:

* 5 active committers, 2 PMC members

* 160 subscribers on zookeeper-dev (up from 147 3 months ago)
* 307 subscribers on zookeeper-user (compared to 269 in the same timeframe)

Three student proposals to work on ZooKeeper projects were accepted for
GSOC.

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

Noirin reminded the PMC to let ConCom know when there are events going on in their community, even if the PMC is not the one organizing them.

21 Apr 2010 [Owen O'Malley / Geir]

Hadoop status report for January 2010 to April 2010

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

In response to the board's request that we evaluate the Hadoop
sub-projects with respect to ensuring adequate supervision. Here is
the breakdown by sub-project:
* Avro and HBase have decided to each become a TLP.
* Pig and ZooKeeper have discussed the issue and would prefer to
remain a sub-project for now. The three primary concerns are the work
of splitting themselves out, a lack of organizational diversity in the
committers, and loss of visibility if the Hadoop TLP site doesn't link
to them. The last concern can be addressed by ensuring that the TLP
*does* continue to link to their project pages. These projects are
adequately monitored and have good representation on the PMC, but
the PMC is still discussing what the their recommendation to the board is.
* Hive hasn't discussed the issue, which needs to be addressed. I
expect that it is in the same group as Pig and ZooKeeper.
* Chukwa still struggles to broaden its community from the original
developers and to reach consensus on its goals. It has three committers,
but no representation on the PMC, which makes it difficult to make
releases and ensure adequate supervision. The PMC has not yet discussed
what to do with Chukwa.
* Common, HDFS, and MapReduce are still very tightly bound. Many
patches cross 2 or 3 of the 3 sub-projects and each of the trunks only
builds against the other project's trunks. They are branched and
released in unison. They will likely remain together for a long time.

We started the process of discussing the bylaws that Hadoop should
adopt, but we need to drive this through to completion. I would
suggest that in the future, projects which are becoming TLP establish
bylaws as part of being created. Without explicit bylaws, there are
many votes for which it isn't clear what the required level of
consensus is.

The Hadoop PMC added the following members:
* Namit Jain

AVRO

Avro is an inter-language serialization and RPC library that supports
versioning of schemas and protocols for both compiled and interpreted
languages.

A resolution to promote Avro to a top-level project is currently before
the board.  If the board passes this resolution, this will be Avro's
last report as a Hadoop subproject.

Avro made three releases this quarter, 1.3.0, 1.3.1 and 1.3.2.  We
expect to make a 1.4.0 release in the next quarter.

Development has been active in all versions of Avro: C, C++, Java,
Python, and Ruby.

Three new, legally-independent committers were added this quarter:
* Jeff Hodges
* Scott Carey
* Bruce Mitchener

CHUKWA

Chukwa is a distributed log collection framework that aggregates logs
from across a cluster into a reasonable number of HDFS files.

Release:
* Testing Chukwa 0.4.0 RC1 to RC3

Current state of community
* 4 active contributors
* 15 subscribers on chukwa-dev
* 17 subscribers on chukwa-user

The upcoming 0.4 release will include new real time Hadoop Activity monitor
for small to mid scale Chukwa deployment and JMSAdaptor for pulling data
from JMX.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Releases:
 * 0.20.2 (including HDFS and MapReduce) was released on 2/16/2010
with 29 patches
 * We plan to rebase the 0.21 branch to trunk this month

New committers:
 * none

Community:
 * 963 subscribers on common-dev
 * 1924 subscribers on common-user

The development has continued to be active, including Kerberos-based
and token-based authentication to the RPC.

The previous 0.21 branch failed to be released and we expect to rebase
the branch to the current trunk in the next few weeks. The challenge
is learning how to adapt our project and processes to the growing
importance of Hadoop. We are moving toward a release manager-based
approach similar to the HTTPD one, in the hopes that will lead to
stable releases without stagnating on the 0.20 branch forever. We are
also requiring more thought out, documented, and tested changes.
Changes that are backwards incompatible or potentially destabilizing
must go through a lot of scrutiny. This is all part of the process of
moving from a research prototype to a critical piece of infrastructure
in our respective organizations.

HBASE

HBase is a distributed column-oriented database built on top of Hadoop
Common and HDFS.

A resolution to promote HBase to a top-level project is currently
before the board.  If the board passes this resolution, this will be
HBase's last report as a Hadoop subproject.

Releases:
* 0.20.3 on 2010/01/25 -- 74 fixes.
* There is currently a release candidate out for 0.20.4

New Committers:
* None

Community

* HBase User Group 9 met at Mozilla, 03/10/2010
* HBase User Group 10 and Hackathon happening 04/19/2010

HDFS

HDFS is a distributed file system that support reliable replicated
storage across the cluster using a single name space.

Releases:
* We plan to rebase the 0.21 branch to trunk this month

New committers:
* None

Community:
* 11 new code contributors
* 25 committers
* 194 code contributors
* 205 subscribers to hdfs-dev
* 317 subscribers to hdfs-user

Work is in progress to incorporate security features into HDFS.

HIVE

Hive is a data warehouse written on top of Hadoop.  It provides SQL to query
and manage data stored in Hadoop in table and partitions and provides a
metastore to metadata information about the data stored in hadoop.

Releases:
release 0.5.0 on 2010/02/23. This release has 106 bug fixes, 39 new features
and 26 improvements.

New committers:
* John Sichi

Community:

* Hive User Group meetup at Facebook, 03/18/2010 attended by over 70 people.
* A total of 138 people have commented, filed bugs or contributed on the
  Hive JIRA so far. This number was at 115 at the time of the last report.

MAPREDUCE

MapReduce is a distribute computation framework for easily writing
applications that process large volumes of data.

Releases:
* We plan to rebase the 0.21 branch to trunk this month

New committers:
* None

Community:
* 178 subscribers to mapreduce-dev
* 280 subscribers to mapreduce-user

Features:

Security features are being implemented that include both the
Kerberos-based and token-based authentication and authorization so
that user's can define who is allowed to do what on their job.

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.

Releases:
 * Pig 0.6.0 released on 3/1/2010

New committers:
 * Dmitriy Ryaboy
 * Thejas Nair

Community:
 * 15 committers
 * 181 developers (compared to 171 in the last report)
 * 402 users (compared to 225 in the last report)

We've put out 4 GSOC ideas and received 2 student proposals.

Pig community reviewed the board's request to promote some of the
subprojects to TLP. Pig community consensus is to stay as Hadoop
subproject for the time being. Detailed discussion can be found at
http://www.mail-archive.com/pig-dev@hadoop.apache.org/msg08589.html.


ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed
applications.

Releases:
* release 3.3.0 on 25/March/10

New committers:
none

Community:

* 5 active committers, 2 PMC members

* 147 subscribers on zookeeper-dev (up from 114 3 months ago)
* 269 subscribers on zookeeper-user (up from 225 in the same timeframe)

We've put out a number of GSOC ideas and seen 6 student proposals.
Mentors are reviewing and we hope to gain a number of projects for
GSOC 2010.

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

Work is underway to certify and support running the ZooKeeper service
in production under Windows servers.

The ZooKeeper community reviewed the board's request to examine
subprojects with an eye to graduation to TLP status. Please find the
results of the ZooKeeper as TLP discussion here: http://bit.ly/c4fuZT
There was consensus amongst the development team that we will stay as
a subproject of Hadoop for the time being. Full details of the
discussion can be found in the thread provided

Wide concern that there is a disconnect between how Hadoop is run and the expectation from the board on how Apache projects are run; Jim to join the mailing list.

20 Jan 2010 [Owen O'Malley / Justin]

Hadoop status report for October 2009 to January 2010

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

Hadoop World China was held on 2009/11/15 and was well attended. It had
representation from Cloudera, Facebook, Google, and Yahoo. There was
also a smaller Hadoop Conference in Japan on 2009/11/13.

The Hadoop PMC added the following members:
  * Daniel Dai
  * Pradeep Kamath
  * Zheng Shao
  * Tsz Wo (Nicholas) Sze

AVRO

Avro is an inter-language serialization and RPC library that supports
versioning of schemas and protocols for both compiled and interpreted
languages.

Development has been brisk this quarter.  We're anticipating a 1.3
release in late January.

New committers:
  * Philip Zeyliger
  * Jeff Hammerbacher

Community:
  * 6 active committers
  * 94 subscribers on avro-dev
  * 114 subscribers on avro-user

CHUKWA

Chukwa is a distributed log collection framework that aggregates logs
from across a cluster into a reasonable number of HDFS files.

Release:
  * release 0.3.0 on 2009/11/09 with 40 issues
  * branch 0.4 planned for 2010/02

Current state of community
  * 4 active contributors
  * 15 subscribers on chukwa-dev
  * 17 subscribers on chukwa-user

The upcoming 0.4 release will include new real time Hadoop Activity monitor
for small to mid scale Chukwa deployment.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Releases:
  * branch 0.21 was made on 2009/09/18

New committers:
  * Boris Shkolnik
  * Jakob Homan

Community:
  * 904 subscribers on common-dev
  * 1838 subscribers on common-user

The development has continued to be active, but work on the blockers
on the upcoming 0.21 release has been moving very slowly. Even after
splitting HDFS and MapReduce out of Common a large number of patches
cross the sub-project boundaries.

HBASE

HBase is a distributed column-oriented database built on top of
Hadoop Common and HDFS.

Releases:
  * 0.20.2 on 2009/11/19 -- 40 fixes.
  * There is currently a release candidate out for 0.20.3

New committers:
  * Lars George.

HDFS

HDFS is a distributed file system that support reliable replicated
storage across the cluster using a single name space.

Releases:
* No new releases this quarter. A considerable effort is being made to
  make the earlier release 0.21 stable.

New committers:
  * Boris Shkolnik
  * Jakob Homan

Community:
  * 25 committers
  * 183 code contributors
  * 245 subscribers on hdfs-user
  * 163 subscribers on hdfs-dev

Features:
  The HDFS Append feature is now part of the latest HDFS 0.21 release. A
  design for implementing security in HDFS has been published in the
  Jira forum and is gathering feedback from developers.

HIVE

Hive is a data warehouse written on top of Hadoop. It provides a SQL
to query and manage data stored in Hadoop in table and partitions.

Releases:
  release 0.4.1 on 2009/12/17 with 7 issues

New committers:
  none this quarter

For the upcoming 0.5 release, there are 153 resolved issues and 3 open ones.

MAPREDUCE

MapReduce is a distribute computation framework for easily writing
applications that process large volumes of data.

Releases:
  * branch 0.21 was made on 2009/09/18

New committers:
  none this quarter

Community:
  * 178 subscribers to mapreduce-dev
  * 280 subscribers to mapreduce-user

Features:
  MapReduce 0.21 continues to stabilize relatively slowly. Security
  and changes to support Avro types through the shuffle continue to go in.

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.

Releases:
  * release 0.5.0 on 29/Oct/09 with 48 issues
  * release 0.6.0 branched and no blockers; to be released shortly

New Committers:
  * Ashutosh Chauhan
  * Dmitry Ryaboy
  * Richard Ding
  * Jeff Zhang

Community:
 * 354 subscribers to pig-user
 * 171 subscribers to pig-dev

ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed
applications.

On 12/4/09 we gained a new committer, Henry Robinson of Cloudera!

Releases:
  * release 3.2.1 on 9/Sept/09
  * release 3.1.2 on 14/Dec/09
  * release 3.2.2 on 14/Dec/09

Community:
  * 114 subscribers on zookeeper-dev (up from 99 3 months ago)
  * 225 subscribers on zookeeper-user (up from 175 in the same timeframe)

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

Justin suggested that the board ask that the Hadoop project answer the same questions regarding spinning off subprojects that was asked of Lucene in the previous month. Doug indicated that this was in progress.

21 Oct 2009 [Owen O'Malley / Jim]

Hadoop status report for July to October 2009

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

Hadoop World NYC on 2009/10/02 was well received by roughly 500
attendees. It was organized by Cloudera and sponsored by Yahoo,
Facebook, Amazon WebServices, IBM, Rackspace, Softlayer, eHarmony,
SuperMicro, Intel, Impetus, Booz Allen Hamilton, and Vertica. The
format was similar to Hadoop Summit with a general session with six 20
minute talks in the morning and three tracks each with ten 30 minute
talks in the afternoon. Hadoop World China will be held next month.

AVRO

Avro is an inter-language serialization and RPC library that supports
versioning of schemas and protocols for both compiled and interpreted
languages.

Releases:
  * release 1.2.0 on 2009-10-14
  * release 1.1.0 on 2009-09-08
  * release 1.0.0 on 2009-07-09

New committers:
  Matt Massie
  Thiruvalluvan M. G.

CHUKWA

Chukwa is a distributed log collection framework that aggregates logs
from across a cluster into a reasonable number of HDFS files.

New committers:
  Jerome Boulon

The SALSA and Mochi suite of Hadoop log analysis and visualization
tools, built at Carnegie Mellon, have been progressively phased in and
integrated with the Chukwa log collection and processing
infrastructure. The basic analysis and visualization components are
available, and further work is being done to improve the
user-friendliness of operating these added tools, and to improve the
automated manageability for analysis and visualization. This can also
serve as a roadmap for other analysis tools to be integrated with
Chukwa.

Development has been proceeding steadily. Chukwa is substantially more
reliable, flexible and robust than it was a year ago, or even four
months ago.  The system is in production use at UC Berkeley, and a
number of user suggestions have been incorporated. We intend to
release 0.3 in the coming weeks.

COMMON

Common is the shared libraries for HDFS and MapReduce.

Releases:
  * release 0.19.2 on 2009/06/30 with 40 issues
  * release 0.20.1 on 2009/09/01 with 87 issues
  * branch 0.21 was made on 2009/09/18

New committers:
  Konstantin Boudnik for QA
  Suresh Srinivas

Community:
  * 784 subscribers on common-dev
  * 1738 subscribers on common-user

The upcoming 0.21 release will include the new FileContext API, which
will replace the FileSystem API, and the visibility and audience
annotations that let us mark the intended public-ness of various
classes.

HBASE

HBase is a distributed column-oriented database built on top of Hadoop
Common and HDFS.

HBase had a User Group meeting on August 7th and a Hackathon over the
weekend of August 7-9.  Both events were open to the public and hosted
by StumbleUpon.

Releases:
 * release 0.20.0 on 09/September/2009 - 465 issues addressed by this release
 * release 0.20.1 on 10/12/2009 - 60 issues addressed by this release

Current state of community
  * 23 active comtributors
  * 459 subscribers to hbase-user mailing list
  * 185 subscribers to hbase-dev mailing list

HDFS

HDFS is a distributed file system that support reliable replicated
storage across the cluster using a single name space.

Releases:
  * release 0.19.2 on 2009/06/30 as part of common 0.19.2
  * release 0.20.1 on 2009/09/01 as part of common 0.20.1
  * branch 0.21 was made on 2009/09/18

New committers:
  Konstantin Boudnik for QA
  Suresh Srinivas

Community:
  * 112 subscribers to hdfs-dev
  * 154 subscribers to hdfs-user

HDFS 0.21 was feature frozen and branched. The biggest features are
the much requested feature to append and sync to written files.  There
are 8 remaining blocker issues that need to be resolved.

A developer meet focused entirely on HDFS testing was held at the
Yahoo Sunnyvale campus. It was well represented by
about 15 contributors from Yahoo, Cloudera, Facebook, etc.

HIVE

Hive is a data warehouse written on top of Hadoop. It provides a SQL
to query and manage data stored in Hadoop in table and partitions.

Releases:
  release 0.4.0 on 2009/10/14 with 209 issues

New committers:
  * Edward Capriolo
  * He Yongqiang

Hive 0.4.0 had 46 new features, 115 bug fixes, 6 optimizations, 35
improvements and 2 incompatible changes.

At present there are 617 open issues with none of them as a blocker
for 0.5.0. A total of 619 issues have been resolved so far.

Community:

we continue to see new contributors in the project. Since
the last report the number of contributors in the project have grown
from 21 to 48. Out of these 35 contributors are external to
Facebook. A total of 94 people have commented, filed bugs or
contributed on the Hive JIRA so far. This number was at 49 at the time
of the last report.

MAPREDUCE

MapReduce is a distribute computation framework for easily writing
applications that process large volumes of data.

Releases:
  * release 0.19.2 on 2009/06/30 as part of common 0.19.2
  * release 0.20.1 on 2009/09/01 as part of common 0.20.1
  * branch 0.21 was made on 2009/09/18

New committers:
  Konstantin Boudnik for QA

Community:
  * 121 subscribers to mapreduce-dev
  * 172 subscribers to mapreduce-user

MapReduce 0.21 will have substantially improved Capacity and FairShare
schedulers that let administrators share clusters more
effectively.  The ability to run tasks as the submitting user and
a standardized job history format written in Avro's JSON format.

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.

Releases:
 * release 0.4.0 on 29/Sep/09 with 48 issues

Community:
 * 155 subscribers on pig-dev
 * 269 subscribers on pig-user (I could not update this number because
   my request failed with mailbox full error)

ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed
applications.

Releases:
  * release 3.2.1 on 2009/09/09

Community:
  * 99 subscribers on zookeeper-dev
  * 175 subscribers on zookeeper-user

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

Hadoop is a trademark of the ASF, and Hadoop World is a conference and therefore needs to be approved by ConCom. Originally, there was some confusion, but as ConCom approves of this usage, so no issue.

ConCom will work to clarify policies, such as whether the name of such conferences (in the future, not retroactively) need to be named as Apache Hadoop World or the like.

15 Jul 2009 [Owen O'Malley / Jim]

Hadoop status report for April to July 2009.

Hadoop is a set of related tools and frameworks for creating and
managing distributed applications running on clusters of commodity
computers.

The Hadoop Summit '09 was held in Santa Clara on June 10 and was
attended by more than 750 people. Registration for the event was
$100. The morning was a general track and the afternoon had 3 tracks:
developers, administration, and applications. Cloudera and Yahoo also
offered two free Hadoop training sessions (basics and advanced) the
following day that were filled very quickly.

Two books were published about Hadoop:
  * Hadoop: The Definitive Guide by Tom White
    http://www.hadoopbook.com/
  * Pro Hadoop by Jason Venner
    http://developers.apress.com/book/view/9781430219422

AVRO

Avro is an inter-language serialization and RPC library that supports
versioning of schemas and protocols for both compiled and interpreted
languages.

Releases:
  * coming soon release 1.0.0 with 52 jiras addressed from 12
    contributors

CHUKWA

Chukwa is a distributed log collection framework that aggregates logs
from across a cluster into a reasonable number of HDFS files.

Releases:
  * release 0.1.2 on 14/May/2009 with 132 issues
  * currently voting on release 0.2.0 with 56 issues

COMMON (was previously Core)

Common is the shared libraries for HDFS and map/reduce.

This quarter we split the Core subproject into Common, HDFS, and
Map/Reduce. The old branches and releases are in Common, but for 0.21
in the three subprojects will release independently.

Releases:
  * release 0.20.0 on 22/Apr/09 with 114 issues
  * currently voting on 0.19.2 with 42 issues

Community:
  * 784 subscribers on common-dev
  * 1703 subscribers on common-user

HBASE

HBase is a distributed column-oriented database, build on top of
Hadoop Common and HDFS.

Releases:
  * release 0.19.2 on 09/May/09 - 17 issues addressed by this release
  * release 0.19.3 on 27/May/09 - 15 issues addressed by this release
  * release 0.20.0 (alpha) on 17/Jun/09
  * coming soon release 0.20.9 with 338 out of 354 issues addressed

New Committers:
  * Andrew Purtell (previously missed from the board report)
  * Nitay Joffe
  * Ryan Rawson
  * Jonathan Gray

3. Current state of community
  * 23 active comtributors (159 contributors since project inception)
  * 459 subscribers to hbase-user mailing list
  * 185 subscribers to hbase-dev mailing list

HDFS

HDFS is a distributed file system that support reliable replicated
storage across the cluster using a single name space.

A developer meet for Hadoop was held at the Yahoo Sunnyvale campus to
discuss requirements for HDFS Appends. It was well represented by
about 15 contributors from Yahoo, Microsoft, Facebook, etc. Another
developer meet was held at the Cloudera campus in Burlingame. This
meet discussed, among others, a few short-term HDFS issues that need
attention.

Community:
  * 50 subscribers on hdfs-dev
  * 51 subscribers on hdfs-user

HIVE

Hive is a data warehouse written on top of Hadoop Core. It provides a SQL
to query and manage data stored in hadoop in table and partitions.

Releases:
  * release 0.3.0 on 29/Apr/09 with 52 issues
  * coming soon release 0.4.0 in the next month with 130 issues
 At present there are 248 open issues filed against Hive.

Committers:
  * Yongqiang He

Community:
  * 30 contributors (up from 21 in the last report)
  * 67 people have commented on Hive Jiras

MAP/REDUCE

Map/reduce is a distribute computation framework for easily writing
applications that process large volumes of data.

Community:
  * 51 subscribers to mapreduce-dev
  * 56 subscribers to mapreduce-user

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.

Releases:
  * release 0.3.0 on 25/Jun/09 with 33 issues

Community:
  * 144 subscribers on pig-dev
  * 269 subscribers on pig-user

ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed
applications.

Releases:
  * release 3.2.0 on 8/Jul/09.
      A number of major new features are included, in particular;
    extending the client libraries to include common ZK use cases
    (recipes), namespace support, added python binding support, REST
    based API to the server, Perl binding support, numerous
    optimizations and bug fixes (122 JIRAs in this release).

Community:
  * 83 subscribers on zookeeper-dev
  * 141 subscribers on zookeeper-user

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

Regarding Hadoop Common: Greg wondered if Common could move over to commons.apache.org

Doug suggested that that was premature, and that much of the code may not be useful to non-Hadoop applications.

Brett agreed that that would not make sense.

Regarding developer meet up: Justin: only committers were invited, if somebody else had showed up, would they have been allowed?

Doug: invitations were sent directly to committers.

Jim: would others have been allowed?

Doug: others did hear about it and did attend. I would appreciate clear guidelines.

Roy: committers only is normal for a dev meeting

Jim: I remember an issue with a "closed" meeting with Geronimo, and will dig up those minutes. You want to avoid any impression that it is by invitation only.

Roy: there is no problem with contributors only, the problem is if you only invite a subset of the contributors

Justin: the issue is that contributors may be a superset of committers

Doug: I would be happy with a rule that it should be discussed on the dev list, and be invitation only, with all committers being included

Jim: that makes sense

Roy: suggests updating /dev with this information

Jim: volunteers?

Roy: will do

15 Apr 2009 [Owen O'Malley / Bertrand]

Hadoop is a set of tools for creating and managing distributed
applications, especially those with large data sets.

Hadoop was the focus of a nice article in the New York Times
(http://tinyurl.com/coafzr) on 17 March 2009. Unfortunately, the
article failed to mention that Hadoop is an Apache project.

The PMC added 8 new members: Raghu Angadi, Devaraj Das, Chris Douglas,
Alan Gates, Mahadev Konar, Hairong Kuang, Konstantin Shvachko, and
Ashish Thusoo.

We've also voted to create two new subprojects: Chukwa and Avro.
Chukwa is a distributed log aggregation and cluster monitoring system
that was originally in Core's contrib directory. The initial
committers for Chukwa are Ariel Rabkin and Eric Yang. Avro is a
serialization and RPC library with a focus on supporting versioned
persistent data and supporting scripting languages. The initial
committers for Avro are Doug Cutting and Sharad Agarwal.

Hadoop was well represented at ApacheCon EU, with a track of talks
about Core, HBase, and Pig.

A Hadoop Summit is being organized for June 10th in Santa Clara.

CORE, HDFS, and MAP/REDUCE

Core is the fundamental set of utilities, including RPC,
serialization, and compression that the rest of Hadoop depends
on. HDFS provides a distributed file system. Map/Reduce provides a
framework for distributed applications that process large data sets.

Amazon has started explicitly marketing and supporting Hadoop as a
service on EC2 at a much lower cost than a standard EC2 virtual
machine.

We are still in the process of factoring Map/Reduce and HDFS out of
Core. The code is separated and all that is left to be split are the
unit test cases and their dependencies.

Releases:
0.20.0 is nearing release, with  280 jiras addressed.
0.18.3 was released on 27 Jan 2009 with 51 jiras addressed.

The current plan is to try and release Core, HDFS, and Map/Reduce 1.0
this year.

Community:

Core has added Sharad Agarwal, Giri Kesavan, Ariel Rabkin, Sanjay
Radia, and Eric Yang as committers. The community is active and
growing.

HBASE

HBase is a distributed column-oriented database, build on top of
Hadoop Core.

Releases:
0.18.1 was released on 27 October 2008. 14 issues were addressed.
0.19.0 was released on 21 January 2009. 184 issues were addressed.
0.19.1 was released on 19 March 2009. 43 issues were addressed.

Work is underway on release 0.20.0 with 97 of 174 issues resolved. It
is expected that many of the open issues will be pushed to a
subsequent release.

Meet-ups:

January 14, 2009; March 3, 2009 - HBase User Group meetings in San Francisco
January 30, 2009 - HBase Hackathon in Los Angeles

Community:

There are no new committers since the last report. There are about 7
active contributors (of which 3 are committers).

There are also a number of people who come by to "kick the tires" but
then leave because of possible data loss due to a lack of a patch for
HADOOP-4379.

HIVE

Hive is a data warehouse written on top of Hadoop Core. It provides a SQL
to query and manage data stored in hadoop in table and partitions.

Releases:

Our 0.2.0 branch that was to be released in Feb, 2009 was not released
and was not put for a vote as there were some significant fixes which
the community felt should be checked in before it could be put to
vote. As this branch was not fully soak tested on Facebook production
load, we decided to target the 0.3.0 branch for release.

0.3.0 was branched in Mar, 2009. All the blockers in that branch have
been fixed. We are going to put a release candidate from that branch
up for vote by Apr 15, 2009.

At present there are 177 open issues with none of them as a blocker
for 0.3.0. 111 issues have been resolved since the last report in
January.

Community:

Hive continues to see growth in the number and diversity of
contributors. Since the last report the number of contributors in the
project have grown from 16 to 21.  We added Prasad Chakka, Raghu
Murthy, Johan Oskarsson, and Joydeep Sen Sarma as committers.

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.

A vote was called on Pig 0.2.0 on 3/27/09. This release is major
redesign of the system including addition of type system, significant
(2-10x) performance improvements, addition of Limit, ORDER BY desc and
Grunt shell improvements.

ZOOKEEPER

ZooKeeper is a reliable coordination service for distributed applications.

Releases:

3.1.0 was released on 2009/02/13 with 70 jiras fixed.
3.1.1 was released on 2009/03/27 with 11 jiras fixed.

Our next release, 3.2.0, is slated for 5/26/2009. A number of major
new features will be included, in particular; extending the client
libraries to include common ZK use cases (recipes), adding REST based
API to the server, Perl binding support, numerous optimizations.

Community:

Feedback and community involvement has been slowly increasing, we
frequently meet with users during Hadoop meetups and hold site visits.

There was a discussion around umbrella projects. There's some general concern about splitting up tightly coupled projects. Fear of losing cross pollination. J. Aaron will post to board@/members@.

Bertrand takes the action item to communicate the board view on umbrella projects; recommend thinking about spinning off self-contained projects to TLP.

21 Jan 2009 [Owen O'Malley / Bertrand]

Hadoop status report for September 2008 to January 2009.

Hadoop is a set of tools for creating and managing distributed
applications.

There were various Hadoop user meetings:
 * Beijing
 * Berlin
 * Los Angeles (HBase)
 * New Orleans (as part of ApacheCon US)
 * New York
 * San Diego
 * San Francisco (HBase)
 * Santa Clara

CORE, HDFS, and MAP/REDUCE

Core is the fundamental set of utilities, including RPC,
serialization, and compression that the rest of Hadoop depends
on. HDFS provides a distributed file system. Map/Reduce provides a
framework for distributed applications that process large data sets.

The pace of development in Core is very rapid and the community is
active. Some of the Chinese developers have translated the
documentation for Core into Chinese and submitted them as a
patch.

Although the work to factor out Hive is complete, the factoring
for HDFS and Map/Reduce is pretty close and they should become
separate subprojects in the next 3 months.

Discussions, plans, and work have continued to work toward a 1.0 release of
Core, HDFS, and Map/Reduce. The hope is to achieve the desired levels of
compatibility and stability and release 1.0 this year.

Releases:
0.20.0 is feature-frozen, but unreleased with 184 jiras fixed.
0.19.0 was released on 2008/11/18 with 360 jiras fixed.
0.18.2 was released on 2008/11/3 with 25 jiras fixed.

HBASE

HBase is a distributed column-oriented database, built on top of
Hadoop Core.

Releases:
0.18.1 was released on 2008/10/27 with 14 jiras fixed.

10 of 11 issues have been addressed for 0.18.2, but it is unclear
if 0.18.2 will be released given that 0.19.0 will be released soon.

At this point, 176 of 176 issues have been addressed for hbase-0.19.0.
Testing is in progress at this moment. If no new blocker issues are
identified, a release candidate will be published in the next few days.

HIVE

Hive is a data warehouse written on top of hadoop. It provides a SQL
to query and manage data stored in hadoop in table and partitions.

Hive was split out of Core on 11/12/2008. Most of the migration
related work from hadoop contrib to hadoop subproject has been
completed. Enabling Hudson builds for Hive is still
pending. Continuous builds on committed changes using CABIE are
already enabled.

Releases:

We are planning to make our first release, which is named 0.2.0,
sometime in the Feb 2009. At present we have 130 outstanding issues
with 23 of those identified as blockers for a release. 103 issues have so
far been resolved since Hive was open sourced.

Hive has added Ashish Thusoo and Namit Jain as committers. The number
of contributors to the project has grown from 7 to 16 since Hive
became a hadoop subproject. 6 of these are contributors external to
facebook.

PIG

Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled with
infrastructure for evaluating these programs.

Pig graduated from Apache incubator and became Hadoop subproject on
10/17/08

Releases:
0.1.1 was released on 2008/12/8/08 with 2 jiras.

Release 0.1.1 was primarily focused on integrating with Hadoop 0.18.

Pig welcomed Pradeep Kamath and Santhosh Srinivasan as new committers.

ZOOKEEPER

Zookeeper is a reliable coordination service for distributed applications.

Releases:

3.0.1 was released on 2008/11/24 with 16 jiras fixed.
3.0.0 was released on 2008/10/27 with 108 jiras fixed.

Our next release, 3.1.0, is slated for 1/19/2009. A number of major
new features will be included, in particular; improved management
(JMX) support and Quota (ie. filesystem quota) support will also be
added.

Feedback and community involvement has been slowly increasing.

15 Oct 2008 [Owen O'Malley / Henning]

Hadoop is a set of tools for creating and managing distributed applications.

The PMC felt that the Core project had grown difficult to manage as a
single subproject. With the core-dev email list topping over 3600
messages last month it is difficult to keep on top of the entire
project. We therefore have voted to split Core into 4 pieces: Core,
which is the common infrastructure; HDFS, which is the distributed
file system; Map/Reduce, which is the distributed computation
framework; and Hive, which is a higher-level query processor built on
Map/Reduce. After release 0.19 has stabilized we will work on
splitting up the code bases.

Additionally, we have started a vote whether to accept Pig as a
subproject when it graduates from the incubator.

We have added Arun Murthy to the PMC.

Hadoop will be well represented at ApacheCon US next month. There will
be 3 Hadoop talks in the main series and an assortment of related
talks at the Hadoop Camp. There will be presentations about the Core,
Hive, Pig, and Zookeeper subprojects at Hadoop Camp.

CORE

Core is a framework for building distributed applications, which
includes a distributed file system and map/reduce.

Releases:
0.19.0 is feature-frozen, but unreleased, with 270 jiras.
0.18.2 is unreleased, with 3 jiras.
0.18.1 was released on 2008/09/17 with 6 jiras.
0.18.0 was released on 2008/08/19 with 266 jiras.
0.17.3 is unreleased, with 4 jiras.
0.17.2 was released on 2008/08/11, with 12 jiras.

The development has been ever increasing with 0.19.0 having the
largest number of patches in a release. In 0.19, includes Hive as a
contrib module. We have started discussions on the email lists about
when we should release 1.0 and what level of forwards and backwards
compatibility we should guarantee.

HBASE

HBase is a distributed column-oriented database, build on top of Hadoop Core.

Releases:
0.2.0 was released on 2008/08/08. 293 issues were addressed by this release.
0.2.1 was released on 2008/09/13. 44 issues were addressed by this release.
0.18.0 was released on 2008/09/21. 58 issues were addressed by this release.

The hbase-0.2.x releases run on hadoop-0.17.x. With hbase-0.18.0,
releases have been renumbered to reflect the version of hadoop that
the hbase release runs on.

Work has started on release 0.18.1. 5 of 9 issues have been addressed.
Work has started on release 0.19.0. 25 of 58 issues have addressed.

On Wednesday, October 8, Microsoft agreed to let two of their
engineers, who are committers and PMC members, resume their
contributions to HBase. The contributions had been blocked when
Microsoft acquired Powerset last quarter.

Andrew Purtell was added as an HBase committer.

ZOOKEEPER

Zookeeper is a service for coordinating processes of distributed applications.

Migration from SourceForge to Apache of source, documentation, wiki,
issue tracking, mailing lists, etc... is complete. We are planning to
make our first Apache release, which is named 3.0, on Oct 22nd with
over 85 issues addressed.

We discussed growth of the project... no significant concerns.

Jim to check on Zookeeper's filling out of the Incubator's IP clearance.

16 Jul 2008 [Owen O'Malley / J Aaron]

Hadoop is a set of tools for distributed applications.

The PMC voted to add a new Hadoop subproject, named Zookeeper, which
is a distributed coordination service. Zookeeper was developed by
Yahoo and was granted to Apache. Zookeeper should form a great basis
for Map/Reduce and HDFS high availability. The original committers are
Patrick Hunt, Flavio Junqueira, Mahadev Konar, Andrew Kornev, and Ben
Reed.

There are now monthly Hadoop user get togethers in northern California
(http://upcoming.yahoo.com/event/869166) and there is one scheduled
for August in London (http://upcoming.yahoo.com/event/506444).

CORE

Core is a framework for building distributed applications, which
includes a distributed file system and map/reduce.

Releases:
0.18.0 is feature frozen but unreleased, currently with 254 jiras.
0.17.2 is unreleased, currently with 4 jiras
0.17.1 was released 23 June 2008 fixing 10 jiras
0.17.0 was released 18 May 2008 fixing 200 jiras.
0.16.4 was released 5 May 2008 fixing 4 jiras.
0.16.3 was released 16 April 2008 fixing 7 jiras.

Core won the annual terabyte sort benchmark http://tinyurl.com/4o8bns,
which is the first time that either a Java or an open source program
won the competition. Core has added 4 committers, Johan Oskarsson,
Lohit Vijaya Renu, Zheng Shao, and Tsz Wo Sze. We've had very active
development and active user base.

HBASE

HBase is a distributed column-oriented database, build on top of Hadoop Core.

Releases:
0.1.1 was released on 27 March 2008. 12 issues were addressed by this release.
0.1.2 was released on 13 May 2008. 27 issues were addressed by this release.
0.1.3 was released on 27 June 2008. 16 issues were addressed by this release.

The hbase-0.1.x releases runs on hadoop-0.16.x.

Work continues on release 0.2.0 which will run on hadoop-0.17.x.
231 of 239 issues have been resolved. We are targeting the end of July
for a release candidate.

On Tuesday, July 1, Microsoft and Powerset signed a deal for Microsoft to
acquire Powerset. Two of the HBase committers (who are also members of the
Hadoop PMC) are employed by Powerset and may not be able to continue work
on HBase after the deal closes. They and their manager are working with
Microsoft to determine what will happen, but may not know for several weeks
yet.

ZOOKEEPER

Zookeeper is a service for coordinating processes of distributed applications.

Migration from SourceForge to Apache is in progress. Yahoo's code
grant was filed with the ASF, the SourceForge SVN snapshot has been
loaded into ASF SVN and Hudson is now running daily builds on the
codebase. SourceForge tracker has been fully migrated to Jira and the
developers are now using ASF Jira and mailing lists. Migration of
documentation and website is in progress and expected to be completed
in the next couple of weeks. A new release of ZooKeeper is being
worked on in parallel with the move, completing this will be a major
focus subsequent to the ASF migration.

Ben Reed (Yahoo) and Ted Dunning (Veoh) presented ZooKeeper at the
latest Hadoop social - reaction was extremely positive. Many attendees
were already using ZK, and almost all were at least familiar with the
project.

It was noted that Zookeeper lacks an ip-clearance. Owen agreed to follow up.

Owen reported that Amazon has agreed to donate a few hundred dollars of computer time on EC2 to individual Hadoop developers for testing and benchmarking.

16 Apr 2008 [Owen O'Malley / J Aaron]

TLP

The Hadoop Summit (http://upcoming.yahoo.com/event/436226/) occurred
on March 25 and had more than 300 people attending. It was well
received by the community.

CORE

Development has been active this month with 0.16.2 being released on 2
April 2008. We will likely release 0.16.3 with 7 jiras this week.
Release 0.17, which has 160 jiras, has been branched and will be
released when it is stabilized. Hadoop Core was well represented at
ApacheCon EU with a BOF and 3 talks by Owen O'Malley, Tom White, and
Allen Wittenauer.

HBASE

The first version of HBase as a subproject, version 0.1.0, was
released on March 28th. We are now working on patches for version
0.1.1, which will be released after hadoop-0.16.2. 6 of 8 identified
issues have been resolved.

With the focus on releasing 0.1.0, progress slowed a bit for release
0.2.0. Since last month, an additional 20 issues have been resolved
and an additional 29 have been identified for a total of 74 out of 102
issues resolved.

19 Mar 2008 [Owen O'Malley / Greg]

TLP

We have filed the appropriate paperwork for using cryptography within
Hadoop. The first use will be HADOOP-2239, which will likely be
committed this week.

Yahoo and the Computing Community Consortium are sponsoring a Hadoop
Summit (http://upcoming.yahoo.com/event/436226/) on March 25 to bring
together users and developers. 215 people have signed up to attend.

CORE

We added two committers this month: Mukund Madhugiri for QA and
release engineering, and Hemanth Yamjiala for contrib.

Development has been active this month and we have released 0.16.1
this month, which fixed 40 jiras. Release 0.17 is scheduled to feature
freeze in the first week of April and currently includes 70 committed
jiras.

HBASE

Development has been focused on making our first subproject release,
0.1.0. The 0.1.0 release is feature frozen and runs against Hadoop
Core 0.16.x. 20 of the 25 identified blocker issues have been
resolved.

The priorities for the 0.2 release are robustness and scalability. The
proposal is on the HBase Wiki at:
http://wiki.apache.org/hadoop/Hbase/Plan-0.2. HBase 0.2 is based on
Hadoop Core trunk and is making progress as well with 54 of 73 issues
resolved.

An hbase contributor, Dennis Kubes, bought the domain hbase.org for
the project, which points to hadoop.apache.org/hbase.

A second HBase Users Group meeting was held at Powerset on March 4,
with approximately 30 people attending. The meeting was informal,
mostly getting the user community to discuss problems they had
encountered using HBase and to gather issues blocking the 0.1.0
release.

Greg to work with Owen to arrange for the transfer of the hbase.org domain to the ASF.

20 Feb 2008 [Owen O'Malley / Henning]

TLP

The top-level project completed the split of Hadoop out of Lucene and
into a TLP. The subproject that was Hadoop, is now called Hadoop
Core. We have also moved HBase into a sub-project from being in Hadoop
Core's contrib directory. Although Core and HBase have many ties, the
contributor list and code base is largely disjoint between them and
the split will reduce the heavy traffic on both development lists.

CORE

Hadoop Core has released 0.16.0, 0.15.3, and 0.15.2. As we move toward
more stability, we've moved our feature freezes to every 3 months
(beginning of Jan, Apr, July, and Oct). Development has been very
active, including adding user permissions to HDFS. (Fixed Jira counts:
23 unreleased, 180 for 0.16.0, 4 for 0.15.3, and 15 for 0.15.2)

HBASE

HBase, which is a distributed storage system for structured data, has
become a subproject of Hadoop. We have added Bryan Duxbury as a
committer. Development has been very active (Fixed Jira counts: 7
unreleased, 142 for 0.16.0)

Approved by General Consent.

16 Jan 2008

Establish the Apache Hadoop Project

 WHEREAS, the Board of Directors deems it to be in the best
 interests of the Foundation and consistent with the
 Foundation's purpose to establish a Project Management
 Committee charged with the creation and maintenance of
 open-source software related to a distributed computing
 platform, including a distributed filesystem and an
 implementation of the map/reduce distributed computing
 metaphor, for distribution at no charge to the public.

 NOW, THEREFORE, BE IT RESOLVED, that a Project Management
 Committee (PMC), to be known as the "Apache Hadoop Project",
 be and hereby is established pursuant to Bylaws of the
 Foundation; and be it further

 RESOLVED, that the Apache Hadoop Project be and hereby is
 responsible for the creation and maintenance of software
 related to a distributed computing platform, including a
 distributed filesystem and an implementation of the map/reduce
 distributed computing metaphor; and be it further

 RESOLVED, that the office of "Vice President, Apache Hadoop" be
 and hereby is created, the person holding such office to
 serve at the direction of the Board of Directors as the chair
 of the Apache Hadoop Project, and to have primary responsibility
 for management of the projects within the scope of
 responsibility of the Apache Hadoop Project; and be it further

 RESOLVED, that the persons listed immediately below be and
 hereby are appointed to serve as the initial members of the
 Apache Hadoop Project:

   * Andrzej Bialecki             <ab@apache.org>
   * Doug Cutting                 <cutting@apache.org>
   * Nigel Daley                  <ndaley@apache.org>
   * Jim Kellerman                <jimk@apache.org>
   * Owen O'Malley                <omalley@apache.org>
   * Enis Soztutar                <enis@apache.org>
   * Michael Stack                <stack@apache.org>
   * Christophe Taton             <taton@apache.org>
   * Thomas E. White              <tomwhite@apache.org>

 NOW, THEREFORE, BE IT FURTHER RESOLVED, that Owen O'Malley
 be appointed to the office of Vice President, Apache Hadoop, to
 serve in accordance with and subject to the direction of the
 Board of Directors and the Bylaws of the Foundation until
 death, resignation, retirement, removal or disqualification,
 or until a successor is appointed; and be it further

 RESOLVED, that the Apache Hadoop Project be and hereby
 is tasked with the migration and rationalization of the Apache
 Lucene Hadoop sub-project; and be it further

 RESOLVED, that all responsibilities pertaining to the Apache
 Lucene Hadoop sub-project encumbered upon the
 Apache Lucene Project are hereafter discharged.

 Special order 7C, Establish the Apache Hadoop Project,
 was approved by Unanimous Vote.