This was extracted (@ 2024-11-20 22:10) from a list of minutes
which have been approved by the Board.
Please Note
The Board typically approves the minutes of the previous meeting at the
beginning of every Board meeting; therefore, the list below does not
normally contain details from the minutes of the most recent Board meeting.
WARNING: these pages may omit some original contents of the minutes.
Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Project Status: Current project status: Ongoing Issues for the board: None ## Membership Data: Apache Hadoop was founded 2008-01-16 (17 years ago) There are currently 247 committers and 125 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - No new PMC members. Last addition was Shilun Fan on 2023-10-31. - Jian Zhang was added as a branch committer on 2024-09-07. ## Project Activity: 3.4.0 was released on 2024-03-17. hadoop-thirdparty-1.2.0 was released on 2024-02-07. 3.3.6 was released on 2023-06-25. ## Community Health: - HDFS new feature "Asynchronous Router RPC"[1][2] which will improve performance of Router RPC dispatch capability is in progress, most of basic works are near to be ready, we need to continue to extend some interfaces async. There are four contributors involved now. - HDFS new feature "NameNode Fine-Grained locking Based on Directory Tree" [3][4] which improves the performance of NameNode, The phase one developed near to be ready, but the contributors are inactive over 3 months. Now this feature is suspended due to no more other active contributors being involved. - The key change for YARN is the support for cgroup v2[5]. Significant progress has been made on this feature, but it will still take some time to complete. There are currently 4 active contributors. - We are also working on enabling Hadoop to support JDK17. A major blocking JIRA is HADOOP-15984[6], which involves updating Jersey from 1.19 to 2.x. The development work on this is currently in progress. - Shilun helped to resolve two long-standing Hadoop CVE issues. - Mukund and Steve are preparing release-3.4.1. - The Project shows healthy engagement based on Mailing list/Jira/Github traffic. [1] https://issues.apache.org/jira/browse/HDFS-17531 [2] https://lists.apache.org/thread/k930cmlvo1z2zox9qhfx9797gk561nnc [3] https://issues.apache.org/jira/browse/HDFS-17366 [4] https://lists.apache.org/thread/6wlwx4jbpsfn4xs3617ltgqqxs69prlt [5] https://issues.apache.org/jira/browse/YARN-11669 [6] https://issues.apache.org/jira/browse/HADOOP-15984
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Project Status: Current project status: Ongoing Issues for the board: None ## Membership Data: Apache Hadoop was founded 2008-01-16 (16 years ago) There are currently 246 committers and 125 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - No new PMC members. Last addition was Shilun Fan on 2023-10-31. - Haiyang Hu was added as committer on 2024-04-22 ## Project Activity: 3.4.0 was released on 2024-03-17. hadoop-thirdparty-1.2.0 was released on 2024-02-07. 3.3.6 was released on 2023-06-25. ## Community Health: - We've made one major release: Hadoop-3.4.0 on Mar 17, 2024. - We've started working toward NameNode Fine-Grained locking Based on Directory Tree feature to improve the performance of NameNode which is one bottleneck of HDFS especially about high load HDFS clusters[1][2]. The phase one of FGL(Fine-Grained Locking) feature is near to be ready, and we try to discuss when and how to merge to trunk which means vote to commit when all phases are ready or split different phases. - We've discussed try to improve HDFS Router RPC dispatch performance through "Asynchronous Router RPC"[3][4]. The development is in progress, and we try to involve more developers and reviewers to push this feature forward as expected. - We've organized two online sharing sessions and there are 28 & 55 participants respectively, more details reference to [5]. - The Project shows healthy engagement based on Mailing list/Jira/Github traffic. - Some responses to board comments: About last board report some of them is generated by reporter tool because during last report we focused on Hadoop 3.4.0 release which is one major release and includes 2888 bug fixes, improvement and enhancements, there is no more other information to report. [1] https://issues.apache.org/jira/browse/HDFS-17366 [2] https://lists.apache.org/thread/6wlwx4jbpsfn4xs3617ltgqqxs69prlt [3] https://issues.apache.org/jira/browse/HDFS-17531 [4] https://lists.apache.org/thread/k930cmlvo1z2zox9qhfx9797gk561nnc [5] https://s.apache.org/fa31q
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Project Status: Current project status: Ongoing Issues for the board: None ## Membership Data: Apache Hadoop was founded 2008-01-16 (16 years ago) There are currently 245 committers and 125 PMC members in this project. The Committer-to-PMC ratio is roughly 7:4. Community changes, past quarter: - No new PMC members. Last addition was Shilun Fan on 2023-10-31. - No new committers. Last addition was Simbarashe Dzinamarira on 2023-09-27. ## Project Activity: 3.4.0 was released on 2024-03-17. hadoop-thirdparty-1.2.0 was released on 2024-02-07. 3.3.6 was released on 2023-06-25. 3.3.5 was released on 2023-03-23. ## Community Health: - Mailing list activity: common-dev@hadoop.apache.org had a 23% increase in traffic in the past quarter (557 emails compared to 450) common-issues@hadoop.apache.org had a 71% increase in traffic in the past quarter (8139 emails compared to 4748) hdfs-dev@hadoop.apache.org had a 38% increase in traffic in the past quarter (588 emails compared to 424) hdfs-issues@hadoop.apache.org had a 126% increase in traffic in the past quarter (3887 emails compared to 1717) mapreduce-dev@hadoop.apache.org had a 27% increase in traffic in the past quarter (338 emails compared to 265) mapreduce-issues@hadoop.apache.org had a 96% increase in traffic in the past quarter (240 emails compared to 122) yarn-dev@hadoop.apache.org had a 10% increase in traffic in the past quarter (387 emails compared to 350) yarn-issues@hadoop.apache.org had a 80% increase in traffic in the past quarter (1679 emails compared to 931) - JIRA activity: 283 issues opened in JIRA 210 issues closed in JIRA - Commit activity: 486 commits in the past quarter (19% increase) 59 code contributors in the past quarter (-16% change) - GitHub PR activity: 313 PRs opened on GitHub, past quarter (20% increase) 266 PRs closed on GitHub, past quarter (16% increase)
@Justin: follow up on details to include in board reports
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Project Status: Current project status: Ongoing Issues for the board: None ## Membership Data: Apache Hadoop was founded 2008-01-16 (16 years ago) There are currently 245 committers and 125 PMC members in this project. The Committer-to-PMC ratio is roughly 7:4. Community changes, past quarter: - Shilun Fan was added to the PMC on 2023-10-31 - No new committers. Last addition was Simbarashe Dzinamarira on 2023-09-27. ## Project Activity: Recent releases: 3.3.6 was released on 2023-06-25. 3.3.5 was released on 2023-03-23. 3.3.4 was released on 2022-08-08. We are preparing for the 3.4.0 release, which is ongoing and delayed more than expected time. The announcement of 3.3.7-aws[1] is not from Hadoop PMC and it is not an official release. [1] https://lists.apache.org/thread/0ptzdzs30vddotqg1gnxmr38c03r5xl9 ## Community Health: - Mailing list activity: common-dev@hadoop.apache.org had a 0% increase in traffic in the past quarter (472 emails compared to 471) common-issues@hadoop.apache.org had a 3% decrease in traffic in the past quarter (5436 emails compared to 5560) hdfs-dev@hadoop.apache.org had a 6% decrease in traffic in the past quarter (458 emails compared to 487) hdfs-issues@hadoop.apache.org had a 17% increase in traffic in the past quarter (2410 emails compared to 2044) mapreduce-dev@hadoop.apache.org had a 0% increase in traffic in the past quarter (276 emails compared to 276) mapreduce-issues@hadoop.apache.org had a 10% decrease in traffic in the past quarter (194 emails compared to 214) user@hadoop.apache.org had a 20% decrease in traffic in the past quarter (32 emails compared to 40) yarn-dev@hadoop.apache.org had a 0% increase in traffic in the past quarter (360 emails compared to 358) yarn-issues@hadoop.apache.org had a 0% increase in traffic in the past quarter (1173 emails compared to 1163) - Commit activity: 410 commits in the past quarter (4% increase) 72 code contributors in the past quarter (18% increase) - JIRA activity: 183 issues opened in JIRA, past quarter (-47% change) 129 issues closed in JIRA, past quarter (-38% change) - GitHub PR activity: 261 PRs opened on GitHub, past quarter (-22% change) 229 PRs closed on GitHub, past quarter (-9% change) It looks like JIRA and GitHub traffic are both decreasing in the past quarter. However, the project development overall looks healthy with more contributors and commits check in, also the next release is in progress.
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Project Status: Current project status: Ongoing Issues for the board: None ## Membership Data: Apache Hadoop was founded 2008-01-16 (16 years ago) There are currently 245 committers and 124 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - No new PMC members. Last addition was Mukund Thakur on 2023-01-21. - Ahmar Suhail was added as committer on 2023-08-25 - Simbarashe Dzinamarira was added as committer on 2023-09-27 - Shuyan Zhang was added as committer on 2023-09-27 ## Project Activity: Recent releases: 3.3.6 was released on 2023-06-25. 3.3.5 was released on 2023-03-23. 3.3.4 was released on 2022-08-08. We are preparing on 3.4.0, which will be released before the end of 2023. ## Community Health: - Mailing list activity: common-dev@hadoop.apache.org had a 9% decrease in traffic in the past quarter (489 emails compared to 532) common-issues@hadoop.apache.org had a 0% increase in traffic in the past quarter (5668 emails compared to 5612) hdfs-dev@hadoop.apache.org had a 6% increase in traffic in the past quarter (500 emails compared to 469) hdfs-issues@hadoop.apache.org had a 9% increase in traffic in the past quarter (2065 emails compared to 1891) mapreduce-dev@hadoop.apache.org had a 4% decrease in traffic in the past quarter (283 emails compared to 294) mapreduce-issues@hadoop.apache.org had a 81% increase in traffic in the past quarter (187 emails compared to 103) user@hadoop.apache.org had a 39% decrease in traffic in the past quarter (21 emails compared to 34) yarn-dev@hadoop.apache.org had a 6% decrease in traffic in the past quarter (366 emails compared to 389) yarn-issues@hadoop.apache.org had a 2% increase in traffic in the past quarter (1133 emails compared to 1105) - JIRA activity: 330 issues opened in JIRA, past quarter (20% increase) 191 issues closed in JIRA, past quarter (-16% change) - Commit activity: 359 commits in the past quarter (-32% change) 59 code contributors in the past quarter (-23% change) - GitHub PR activity: 318 PRs opened on GitHub, past quarter (12% increase) 232 PRs closed on GitHub, past quarter (-12% change) From JIRA and Github PR activity, the review bandwidth/active reviewers are not enough, we are trying to improve it and try to explore potential committers and add some new committers.
WHEREAS, the Board of Directors heretofore appointed Wei-Chiu Chuang (weichiu) to the office of Vice President, Apache Hadoop, and WHEREAS, the Board of Directors is in receipt of the resignation of Wei-Chiu Chuang from the office of Vice President, Apache Hadoop, and WHEREAS, the Project Management Committee of the Apache Hadoop project has chosen by vote to recommend Xiaoqiao He (hexiaoqiao) as the successor to the post; NOW, THEREFORE, BE IT RESOLVED, that Wei-Chiu Chuang is relieved and discharged from the duties and responsibilities of the office of Vice President, Apache Hadoop, and BE IT FURTHER RESOLVED, that Xiaoqiao He be and hereby is appointed to the office of Vice President, Apache Hadoop, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed. Special Order 7B, Change the Apache Hadoop Project Chair, was approved by Unanimous Vote of the directors present.
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Project Status: Current project status: Ongoing Issues for the board: None ## Membership Data: Apache Hadoop was founded 2008-01-16 (16 years ago) There are currently 242 committers and 124 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - No new PMC members. Last addition was Mukund Thakur on 2023-01-21. - No new committers. Last addition was Shilun Fan on 2022-11-22. ## Project Activity: Recent releases: 3.3.6 was released on 2023-06-25. 3.3.5 was released on 2023-03-23. 3.3.4 was released on 2022-08-08. ## Community Health: - Mailing list activity: common-dev@hadoop.apache.org had a 2% increase in traffic in the past quarter (529 emails compared to 518) common-issues@hadoop.apache.org had a 7% increase in traffic in the past quarter (5700 emails compared to 5309) dev@hadoop.apache.org had a 500% increase in traffic in the past quarter (6 emails compared to 1): hdfs-dev@hadoop.apache.org had a 11% increase in traffic in the past quarter (494 emails compared to 444) hdfs-issues@hadoop.apache.org had a 27% increase in traffic in the past quarter (2040 emails compared to 1603) mapreduce-dev@hadoop.apache.org had a 0% increase in traffic in the past quarter (297 emails compared to 297) mapreduce-issues@hadoop.apache.org had a 24% increase in traffic in the past quarter (116 emails compared to 93) user@hadoop.apache.org had a 10% decrease in traffic in the past quarter (28 emails compared to 31): yarn-dev@hadoop.apache.org had a 9% increase in traffic in the past quarter (409 emails compared to 372) yarn-issues@hadoop.apache.org had a 19% increase in traffic in the past quarter (1189 emails compared to 992) - JIRA activity: 277 issues opened in JIRA, past quarter (8% increase) 238 issues closed in JIRA, past quarter (29% increase) - Commit activity: 532 commits in the past quarter (53% increase) 75 code contributors in the past quarter (13% increase) - GitHub PR activity: 288 PRs opened on GitHub, past quarter (11% increase) 271 PRs closed on GitHub, past quarter (34% increase) In the recent board report feedback, some board members worried about 'a 100% decrease in dev-list activity does sound quite serious'. We try to check it and only happens for dev@hadoop.apache.org, the reason is that there are several separate dev-list for every hadoop project sub-modules, and only few people use this mailing list thus triffic of dev@hadoop.apache.org is big moves, we are discussing if need to remove this dev-list, other sub-modules dev-list traffic works actively. We even added hadoop-api-shim as a sub-project in a new repo under hadoop: https://github.com/apache/hadoop-api-shim
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hadoop was founded 2008-01-15 (15 years ago) There are currently 242 committers and 124 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - Mukund Thakur was added to the PMC on 2023-01-20 - No new committers. Last addition was Shilun Fan on 2022-11-21. ## Project Activity: 3.3.5 was released on 2023-03-23, which added a new Vectored IO API support (HADOOP-18103). We are continuing to maintain 3.3.x release line. Meanwhile, Steve initiated a thread to discuss dropping JDK8 support moving forward. New feature development started this quarter: * YARN-11411 [Umbrella] Build Concurrent Yarn Scheduler * HADOOP-18671 Add recoverLease(), setSafeMode(), isFileClosed() APIs to FileSystem ## Community Health: It appears the development activities are gradually trending down, though this is to be expected as the project matures. I see that Steve responded to most of the vulnerability reports, which is great. However, the community (me included) collectively should be more vigilant to vulnerability reports. * dev@hadoop.apache.org had a 0% decrease in traffic in the past quarter (1 emails compared to 1) * general@hadoop.apache.org had a 29% decrease in traffic in the past quarter (5 emails compared to 7) * mapreduce-issues@hadoop.apache.org had a 67% decrease in traffic in the past quarter (97 emails compared to 288) * user@hadoop.apache.org had a 62% decrease in traffic in the past quarter (31 emails compared to 80) * yarn-issues@hadoop.apache.org had a 31% decrease in traffic in the past quarter (1102 emails compared to 1584) * 237 issues opened in JIRA, past quarter (-5% change) * 175 issues closed in JIRA, past quarter (-14% change) * 342 commits in the past quarter (4% increase) * 66 code contributors in the past quarter (-9% change) * 244 PRs opened on GitHub, past quarter (-9% change) * 193 PRs closed on GitHub, past quarter (-23% change)
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hadoop was founded 2008-01-15 (15 years ago) There are currently 242 committers and 123 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - No new PMC members. Last addition was Stephen O'Donnell on 2022-07-25. - Shilun Fan was added as committer on 2022-11-21 ## Project Activity: No new release was GA in this quarter. However, 3.3.5 is pending release. ## Community Health: Overall community activities seem to be getting lower, partly due to the holiday season. * dev@hadoop.apache.org had a 100% decrease in traffic in the past quarter (0 emails compared to 6) * mapreduce-issues@hadoop.apache.org had a 59% decrease in traffic in the past quarter (87 emails compared to 208) * user@hadoop.apache.org had a 50% decrease in traffic in the past quarter (26 emails compared to 51) * user-zh@hadoop.apache.org had a 100% increase in traffic in the past quarter (6 emails compared to 3) * yarn-dev@hadoop.apache.org had a 26% decrease in traffic in the past quarter (349 emails compared to 466) * yarn-issues@hadoop.apache.org had a 39% decrease in traffic in the past quarter (751 emails compared to 1227) * 236 issues opened in JIRA, past quarter (-50% change) * 191 issues closed in JIRA, past quarter (-41% change) * 310 commits in the past quarter (-21% change) * 73 code contributors in the past quarter (32% increase) * 254 PRs opened on GitHub, past quarter (-46% change) * 238 PRs closed on GitHub, past quarter (-36% change)
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hadoop was founded 2008-01-15 (15 years ago) There are currently 241 committers and 123 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - Stephen O'Donnell was added to the PMC on 2022-07-25 - Mehakmeet Singh was added as committer on 2022-07-29 - Zander Xu was added as committer on 2022-09-28 ## Project Activity: 3.3.4 was released on 2022-08-08. 3.2.4 was released on 2022-07-22. 3.3.5 release work is under way. Announced CVE: * CVE-2022-25168 Command injection in org.apache.hadoop.fs.FileUtil.unTarUsingTar * CVE-2021-25642 Apache Hadoop YARN remote code execution in ZKConfigurationStore of capacity scheduler ## Community Health: It looks like JIRA and github traffic are both decreasing. However, the project development overall looks healthy with a number of releases published or in progress. Additionally ApacheCon NA took place in October and a number of talks were related to Hadoop. Hadoop Meetup took place in Shanghai on Sep 24. Lots of talks and large crowd. * dev@hadoop.apache.org had a 100% decrease in traffic in the past quarter (0 emails compared to 6) * mapreduce-issues@hadoop.apache.org had a 59% decrease in traffic in the past quarter (87 emails compared to 208) * user@hadoop.apache.org had a 50% decrease in traffic in the past quarter (26 emails compared to 51) * user-zh@hadoop.apache.org had a 100% increase in traffic in the past quarter (6 emails compared to 3) * yarn-dev@hadoop.apache.org had a 26% decrease in traffic in the past quarter (349 emails compared to 466) * yarn-issues@hadoop.apache.org had a 39% decrease in traffic in the past quarter (751 emails compared to 1227) * 429 issues opened in JIRA, past quarter (10% increase) * 299 issues closed in JIRA, past quarter (2% increase) * 353 commits in the past quarter (-28% change) * 55 code contributors in the past quarter (-37% change) * 423 PRs opened on GitHub, past quarter (4% increase) * 336 PRs closed on GitHub, past quarter (1% increase)
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hadoop was founded 2008-01-15 (14 years ago) There are currently 239 committers and 122 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - No new PMC members. Last addition was Sun Chao on 2022-03-07. - Tao Li was added as committer on 2022-04-22 ## Project Activity: 2.10.2 was released on 2022-05-31. 3.3.3 was released on 2022-05-17. Steve Loughran is planning to release 3.3.4, and Masatake is preparing for 3.2.4. Three CVEs were published: * CVE-2022-26612 Arbitrary file write during untar on Windows * CVE-2021-37404 Heap buffer overflow in libhdfs native library * CVE-2021-33036 Apache Hadoop Privilege escalation vulnerability ## Community Health: Overall community health is good. JIRA and GitHub activities are trending up while mailing lists are trending down. I see a number of new contributors joined. One contributor, Tao Li, was invited to become a committer and there are more contributors being discussed in the private mailing list. The community is prioritizing security fixes & publishing security vulnerability announcements, thanks to Masatake, Akira and others. * general@hadoop.apache.org had a 5% increase in traffic in the past quarter (18 emails compared to 17) * dev@hadoop.apache.org had a 100% decrease in traffic in the past quarter (0 emails compared to 6) * user@hadoop.apache.org had a 50% decrease in traffic in the past quarter (26 emails compared to 51) * user-zh@hadoop.apache.org had a 100% increase in traffic in the past quarter (6 emails compared to 3) * yarn-dev@hadoop.apache.org had a 26% decrease in traffic in the past quarter (349 emails compared to 466) * yarn-issues@hadoop.apache.org had a 39% decrease in traffic in the past quarter (751 emails compared to 1227) * common-dev@hadoop.apache.org had a 22% decrease in traffic in the past quarter (475 emails compared to 608) * common-issues@hadoop.apache.org had a 21% decrease in traffic in the past quarter (6753 emails compared to 8517) * hdfs-dev@hadoop.apache.org had a 11% decrease in traffic in the past quarter (507 emails compared to 568) * hdfs-issues@hadoop.apache.org had a 7% decrease in traffic in the past quarter (2819 emails compared to 3004) * mapreduce-dev@hadoop.apache.org had a 22% decrease in traffic in the past quarter (231 emails compared to 294) * mapreduce-issues@hadoop.apache.org had a 59% decrease in traffic in the past quarter (87 emails compared to 208) * 356 issues opened in JIRA, past quarter (21% increase) * 267 issues closed in JIRA, past quarter (20% increase) * 437 commits in the past quarter (-5% change) * 88 code contributors in the past quarter (4% increase) * 374 PRs opened on GitHub, past quarter (25% increase) * 299 PRs closed on GitHub, past quarter (13% increase)
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hadoop was founded 2008-01-16 (14 years ago) There are currently 238 committers and 122 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - Sun Chao was added to the PMC on 2022-03-08 - Benjamin Teke was added as committer on 2022-03-24 - András Győri was added as committer on 2022-02-15 ## Project Activity: 3.3.2 was released on 2022-03-02. 3.2.3 was released on 2022-03-28. Masatake Iwasaki volunteered to be RM for 2.10.2. Steve Loughran volunteered to be RM for 3.3.3. * Patch attachment via JIRA is now disabled. All contributions should be made via GitHub PR. (HADOOP-17798) ## Community Health: There appears to be a downward trend in the amount of contribution. But judging from the number of contributors, the number of which remain stable. * dev@hadoop.apache.org had a 100% decrease in traffic in the past quarter (0 emails compared to 6) * mapreduce-issues@hadoop.apache.org had a 59% decrease in traffic in the past quarter (87 emails compared to 208) * user@hadoop.apache.org had a 50% decrease in traffic in the past quarter (26 emails compared to 51) * user-zh@hadoop.apache.org had a 100% increase in traffic in the past quarter (6 emails compared to 3) * yarn-dev@hadoop.apache.org had a 26% decrease in traffic in the past quarter (349 emails compared to 466) * yarn-issues@hadoop.apache.org had a 39% decrease in traffic in the past quarter (751 emails compared to 1227) * 273 issues opened in JIRA, past quarter (-27% change) * 201 issues closed in JIRA, past quarter (-37% change) * 379 commits in the past quarter (-33% change) * 99 code contributors in the past quarter (12% increase) * 260 PRs opened on GitHub, past quarter (-27% change) * 232 PRs closed on GitHub, past quarter (-26% change)
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform * hadoop-thirdparty is a set of internal artifacts used by the project to mitigate the impact of our dependency choices on the wider ecosystem. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hadoop was founded 2008-01-16 (14 years ago) There are currently 236 committers and 121 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - No new PMC members. Last addition was Xiaoqiao He on 2021-05-05. - Gautham Banasandra was added as committer on 2021-11-04 ## Project Activity: We've not had a new PMC member added for a while. Given the amount of traffic in the community I am pretty sure there are a number of good candidates out there that we should nominate. I sent an email to initiate the discussion. Release 3.3.2: RC0 was cut and dropped due to a number of issues. RC1 is being prepared. Release 3.2.3 is stalled. Notable feature development: * HADOOP-17124 Support LZO using aircompressor * HADOOP-18055 Async Profiler endpoint for Hadoop daemons * HADOOP-17979 Interface EtagSource to allow FileStatus subclasses to provide etags * YARN-11025 Implement distributed decommissioning ## Community Health: The overall mailing list traffic, Jira and Github activities were down, which is expected given the holiday season. * dev@hadoop.apache.org had a 100% decrease in traffic in the past quarter (0 emails compared to 6) * mapreduce-issues@hadoop.apache.org had a 59% decrease in traffic in the past quarter (87 emails compared to 208) * user@hadoop.apache.org had a 50% decrease in traffic in the past quarter (26 emails compared to 51) * user-zh@hadoop.apache.org had a 100% increase in traffic in the past quarter (6 emails compared to 3) * yarn-dev@hadoop.apache.org had a 26% decrease in traffic in the past quarter (349 emails compared to 466) * yarn-issues@hadoop.apache.org had a 39% decrease in traffic in the past quarter (751 emails compared to 1227) * 339 issues opened in JIRA, past quarter (-24% change) * 289 issues closed in JIRA, past quarter (-11% change) * 521 commits in the past quarter (-13% change) * 85 code contributors in the past quarter (-13% change) * 322 PRs opened on GitHub, past quarter (-9% change) * 281 PRs closed on GitHub, past quarter (-9% change) Statistics of the ASF slack channels: #hdfs: 151 users, up from 138. #hadoop: 160 users, up from 148. #yarn: 56 users, up from 52.
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform * hadoop-thirdparty is a set of internal artifacts used by the project to mitigate the impact of our dependency choices on the wider ecosystem. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hadoop was founded 2008-01-16 (14 years ago) There are currently 235 committers and 121 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - No new PMC members. Last addition was Xiaoqiao He on 2021-05-05. - Ahmed Hussein was added as committer on 2021-09-24 ## Project Activity: No new releases this quarter, but the community, led by Brahma Reddy Battula, is gearing up for a 3.2.3 release. Similarly, Chao Sun is leading the 3.3.2 release work. Notable feature development: * YARN-10496 ([Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler) was completed in this quarter. * YARN-8849 (DynoYARN: A simulation and testing infrastructure for YARN clusters). This feature was proposed in the Hadoop jira and completed in LinkedIn's github repo, so not a Hadoop feature yet. * YARN-9698 ([Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler) was completed in this quarter. Follow up work is under the umbrella YARN-10843. * MAPREDUCE-7341 (Add a task-manifest output committer for Azure and GCS) is ongoing. ## Community Health: A number of community members spoke at the ApacheCon Asia held in August: * Bigtop 3.0: Rerising community driven Hadoop distribution by Kengo Seki, Masatake Iwasaki. * Technical tips for secure Apache Hadoop cluster by Akira Ajisaka, Kei KORI. * Data Lake accelerator on Hadoop-COS in Tencent Cloud by Li Cheng. A number of community members spoke at the ApacheCon@Home held in September: * YARN Resource Management and Dynamic Max by Fang Liu, Fengguang Tian, Prashant Golash, Hanxiong Zhang, Shuyi Zhang * Uber HDFS Unit Storage Cost 10x Deduction by Jeffrey Zhong, Jing Zhao, Leon Gao * Scaling the Namenode - Lessons learnt by Dinesh Chitlangia * How Uber achieved millions of savings by managing disk IO across HDFS cluster by Leon Gao, Ekanth Sethuramalingam * Containing an Elephant: How we moved Hadoop/HBase into Kubernetes and Public Cloud by Dhiraj Hegde I have been tracking the following metrics over the past 5 quarters and they have been steadily trending up. This is the first quarter we had more than a hundred code contributors! The number of commits is dwindling because we maintain only three branches now. * 406 issues opened in JIRA, past quarter (-16% change) * 287 issues closed in JIRA, past quarter (-25% change) * 551 commits in the past quarter (-13% change) * 101 code contributors in the past quarter (16% increase) * 323 PRs opened on GitHub, past quarter (-3% change) * 273 PRs closed on GitHub, past quarter (-9% change) Statistics of the ASF slack channels: #hdfs: 138 users, up from 132. #hadoop: 148 users, up from 142. #yarn: 52 users, up from 49.
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform * hadoop-thirdparty is a set of internal artifacts used by the project to mitigate the impact of our dependency choices on the wider ecosystem. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hadoop was founded 2008-01-16 (13 years ago) There are currently 234 committers and 121 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - Xiaoqiao He was added to the PMC on 2021-05-05 - Fengnan Li was added as committer on 2021-06-23 - Gergely Pollák was added as committer on 2021-05-26 - Qi Zhu was added as committer on 2021-05-14 ## Project Activity: We had one release, Hadoop 3.3.1, which was released on 2021-06-13. In preparation of the release, we also made two releases of hadoop-thirdparty. - hadoop-thirdparty-1.1.1 was released on 2021-06-01. - hadoop-thirdparty-1.1.0 was released on 2021-05-18. In parallel, we declared the EOL of the 3.1 release line. Currently, we maintain only three release lines: 3.3, 3.2 and 2.10. Notable feature development: completed in the quarter - HADOOP-16829 Über-jira: S3A Hadoop 3.3.1 features. - HDFS-15759 EC: Verify EC reconstruction correctness on DataNode - HDFS-13916 Distcp SnapshotDiff to support WebHDFS - HDFS-15790 Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist - Gautham added CI for several OSes: CentOS 7, CentOS 8 and Debian 10. Ongoing development - MAPREDUCE-7341 Add a task-manifest output committer for Azure and GCS - HDFS-15982 Deleted data using HTTP API should be saved to the trash - HDFS-14703 NameNode Fine-Grained Locking via Metadata Partitioning - HADOOP-11890 Uber-JIRA: Hadoop should support IPv6 ## Community Health: Looking at the JIRA and GitHub statistics, the project is a little quiet. We have a number of outstanding PRs (255/279=91%) unresolved in the quarter, but we managed to close slightly more PRs than before. Overall, the activity is around the same ball park since Ozone went TLP. - 413 issues opened in JIRA, past quarter (-29% change) - 342 issues closed in JIRA, past quarter (-27% change) - 571 commits in the past quarter (-29% change) - 89 code contributors in the past quarter (-10% change) - 279 PRs opened on GitHub, past quarter (-9% change) - 255 PRs closed on GitHub, past quarter (2% increase) Statistics of the ASF slack channels: I'm seeing more users and more activities in the slack channels, which is a good sign. #hdfs: 132 users #hadoop: 142 users #yarn: 49 users Notable mailing list statistics: - common-dev@hadoop.apache.org had a 30% increase in traffic in the past quarter (807 emails compared to 617) - dev@hadoop.apache.org had a 266% increase in traffic in the past quarter (44 emails compared to 12) - general@hadoop.apache.org had a 46% decrease in traffic in the past quarter (6 emails compared to 11) - mapreduce-dev@hadoop.apache.org had a 43% increase in traffic in the past quarter (405 emails compared to 282) - user@hadoop.apache.org had a 31% increase in traffic in the past quarter (38 emails compared to 29) - yarn-issues@hadoop.apache.org had a 53% decrease in traffic in the past quarter (1497 emails compared to 3144)
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Issues: The security vulnerability handling is becoming a hot potato. There is an increasing attention to vulnerabilities as well as updating vulnerable third party dependencies. I started a thread to discuss ways to expedite resolution. GitHub raised 28 alerts as of today, most of them Javascript packages used by YARN UI. But we lack volunteers working to update these packages. The AWS EMR team is interested in knowing/collaborating more with the Apache Hadoop project on the vulnerabilities announced. Obviously, without a committer in the project prevent them from knowing/participating in addressing these vulnerabilities. Meanwhile, AWS EMR is one of the largest commercial providers of Hadoop, it would be irresponsible for our users if EMR can't take the appropriate actions. Can/should we find a way to include EMR (as well as other cloud providers) in the discussion of vulnerabilities? ## Membership Data: Apache Hadoop was founded 2008-01-16 (13 years ago) There are currently 230 committers and 120 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - Szilard Nemeth was added to the PMC on 2021-04-02 - Jinglun was added as committer on 2021-03-27 - Mukund Thakur was added as committer on 2021-02-04 ## Project Activity: Hadoop 3.2.3 was officially released this quarter (01/09/2021). Hadoop 3.3.1 release is being discussed/planned. Feature development: (completed) - HDFS-15714 HDFS Provided Storage Read/Write Mount Support On-the-fly - work started this quarter and resolved in early April. Release: 3.4.0 - HADOOP-16830 Add Public IOStatistics API - completed this Jan. Release: 3.3.1 - HADOOP-16492 Support HuaweiCloud Object Storage as a Hadoop Backend File System - this work started Aug'19 and finally completed this Jan. Release: 3.4.0 - HADOOP-16524 Automatic keystore reloading for HttpServer2. Release: 3.4.0 and 3.3.1. (ongoing) - HDFS-15714 HDFS Provided Storage Read/Write Mount Support On-the-fly - HDFS-15747 RBF: Rename across sub-namespaces. -- this one is near completion. - YARN-10370 [Umbrella] Reduce the feature gap between FS Placement Rules and CS Queue Mapping rules -- this one is done, with remaining work moving to "Part II" - YARN-10534 Enable runC container transformations. - HADOOP-16829 Über-jira: S3A Hadoop 3.3.1 features - MAPREDUCE-6749 MR AM should reuse containers for Map/Reduce Tasks -- we will be creating a branch for this development. - HADOOP-17474 Optimise abfs incremental listings -- this work started this quarter. - YARN-10496 [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler -- this work started from last quarter. ## Community Health: Activity is picking up again after the holiday season. 359 issues opened in JIRA, past quarter (4% increase) 318 issues closed in JIRA, past quarter (21% increase) 778 commits in the past quarter (24% increase) 99 code contributors in the past quarter (26% increase) 266 PRs opened on GitHub, past quarter (13% increase) 214 PRs closed on GitHub, past quarter (5% increase)
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Issues: The Ozone sub project completed split from the Hadoop project. The transition went well. ## Membership Data: Apache Hadoop was founded 2008-01-15 (13 years ago) There are currently 228 committers and 119 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - Eric Badger was added to the PMC on 2020-11-17 - Takanobu Asanuma was added to the PMC on 2020-11-08 - No new committers. Last addition was Lisheng Sun on 2020-10-01. ## Project Activity: No new release was announced in this quarter. However, several RCs for 3.2.2 was cut and voted on several times in this quarter. 3.2.2 later passed vote in January 2021. One CVE was announced: CVE-2018-11764 Apache Hadoop Privilege escalation in web endpoint Web endpoint authentication check is broken. Authenticated users may impersonate any user even if no proxy user is configured. Versions affected: 3.0.0-alpha4, 3.0.0-beta1, 3.0.0 Fixed versions: 3.0.1 Impact: privilege escalation Reporter: Daryn Sharp Reported Date: 2018/03/17 Issue Announced: 2020/10/21 Hadoop Common: The object store work is keeping people busy. * There's now a storage connector for HuaweiCloud Object Storage, so Hadoop and other applications using the FileSystem APIs can work with data stored in Huawei's cloud. * A new IOStatistics API has gone in to allow applications to query input classes (filesystems, streams, iterators) for IO performance details. This should allow tests and applications to identify performance issues during profiling and hopefully production * AWS S3 is now consistent. This enables the maintainers of the S3A connector to remove all the S3Guard code, which relied on DynamoDB for a consistent view of the data. They are looking forward to this. * The guava update is becoming a major friction for downstream applications to adopt new Hadoop releases. The community is working to shade guava as the solution. (HADOOP-16924) * The native compression libraries for Snappy and LZ4 are now shipped with Hadoop binary, no longer requiring manual installation of the native libraries on the host machines, making them easier to use. (HADOOP-17125 and HADOOP-17292) HDFS: * A new encryption codec "SM4/CTR/NoPadding" was added (HDFS-15098). * HDFS Router Based Federation received a number of new improvements, including balancer (HDFS-15294), isolation (HDFS-14090). Rename support is being worked on, starting this quarter. (HDFS-15747) * The new View FS implementation is near completion. (HDFS-15289) * The community is working to add dynamic mount support for both read and write for HDFS Provided Storage. (HDFS-15714) * Dynamic disk-level tiering (HDFS-15547) continued from last quarter. YARN: * The consolidation of FairScheduler and CapacityScheduler started in Q3 and is near completion. (YARN-10370) * Capacity scheduler is being enhanced to support auto queue creation. (YARN-10496) ## Community Health: Overall, the community participation appears relatively healthy despite Ozone's recent move to TLP. We had a steady supply of new contributors and new features this quarter. Erasure Coding appears to get traction in the last two quarters. Numerous EC bug fixes and improvements were raised this quarter. It looks like Hadoop 3 is getting adopted. Code development and mailing list traffic were both down significantly quarter over quarter, possibly due to the holiday season. Traffic in ozone-dev and ozone-issues mailing lists were down because of the Ozone TLP. dev@hadoop.apache.org had a 75% decrease in traffic in the past quarter (10 emails compared to 39) general@hadoop.apache.org had a 67% decrease in traffic in the past quarter (15 emails compared to 45) mapreduce-issues@hadoop.apache.org had a 39% increase in traffic in the past quarter (237 emails compared to 170) ozone-dev@hadoop.apache.org had a 93% decrease in traffic in the past quarter (13 emails compared to 174) ozone-issues@hadoop.apache.org had a 80% decrease in traffic in the past quarter (1180 emails compared to 5804) user@hadoop.apache.org had a 30% decrease in traffic in the past quarter (56 emails compared to 80) user-zh@hadoop.apache.org had a 45% decrease in traffic in the past quarter (5 emails compared to 9) 322 issues opened in JIRA, past quarter (-31% decrease) 242 issues closed in JIRA, past quarter (-34% decrease) 591 commits in the past quarter (-3% decrease) 82 code contributors in the past quarter (-16% decrease) 214 PRs opened on GitHub, past quarter (-13% decrease) 185 PRs closed on GitHub, past quarter (-20% decrease) In addition to mailing lists, JIRA and GitHub PR, we are seeing more traffic in the official ASF slack hdfs (113 users), hadoop (119 users) and yarn (39 users) channels over the last quarter. They are being used to communicate community online meetup events and troubleshooting issues.
WHEREAS, the Board of Directors heretofore appointed Vinod Kumar Vavilapalli (vinodkv) to the office of Vice President, Apache Hadoop, and WHEREAS, the Board of Directors is in receipt of the resignation of Vinod Kumar Vavilapalli from the office of Vice President, Apache Hadoop, and WHEREAS, the Project Management Committee of the Apache Hadoop project has chosen by vote to recommend Wei-Chiu Chuang (weichiu) as the successor to the post; NOW, THEREFORE, BE IT RESOLVED, that Vinod Kumar Vavilapalli is relieved and discharged from the duties and responsibilities of the office of Vice President, Apache Hadoop, and BE IT FURTHER RESOLVED, that Wei-Chiu Chuang be and hereby is appointed to the office of Vice President, Apache Hadoop, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed. Special Order 7F, Change the Apache Hadoop Project Chair, was approved by Unanimous Vote of the directors present.
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Issues: - The Hadoop community passed a proposal to spin-off the Ozone project (a Hadoop subproject) to a Top Level Project. - Vinod stepped down from Chair. Wei-Chiu Chuang is elected as the new Chair. ## Membership Data: Apache Hadoop was founded 2008-01-15 (13 years ago) There are currently 228 committers and 117 PMC members in this project. The Committer-to-PMC ratio is roughly 5:3. Community changes, past quarter: - Ayush Saxena was added to the PMC on 2020-07-21 - Adam Antal was added as committer on 2020-07-21 - Andras Bokor was added as committer on 2020-09-23 - Hui Fei was added as committer on 2020-09-25 - Jim Brennan was added as committer on 2020-08-04 - Peter Bacsko was added as committer on 2020-07-27 - István Fajth was added as committer on 2020-09-03 (Ozone branch committer) - Prashant Pogde was added as committer on 2020-09-03 (Ozone branch committer) - Lisheng Sun was added as committer on 2020-10-01 ## Project Activity: Recent releases: 2.9 release line was declared EOL on 2020-09-07. 3.3.0 was released on 2020-07-14. 3.1.4 was released on 2020-08-03. 2.10.1 was released on 2020-09-21. 3.2.2 is being prepared by Xiaoqiao. A major milestone was achieved when the Ozone project announced the 1.0.0 release on 2020-09-02. During the recent ApacheCon@Home, 7 (and probably some more) Hadoop talks were given by the community members. ## Community Health: The community is healthy. To highlight, 6 committers and 1 PMC were added in the Hadoop Core project, and two branch committers were added to the Ozone project. Release activities have gone up dramatically with four (including Ozone) releases announced and one being prepared.
## Description: The mission of Hadoop is the creation and maintenance of software related to Distributed computing platform ## Issues: As Ozone gearing towards the GA release, Marton started a thread to discuss the plan to make Ozone a TLP. There is a general consensus within the community to move Ozone out of Hadoop. The proposal is still being discussed, no actual steps are taken yet. [DISCUSS] making Ozone a separate Apache project https://s.apache.org/wpc3m ## Membership Data: Apache Hadoop was founded 2008-01-15 (12 years ago) There are currently 219 committers and 116 PMC members in this project. The Committer-to-PMC ratio is roughly 7:4. Community changes, past quarter: - Masatake Iwasaki was added to the PMC on 2020-04-16 - Lokesh Jain was added to the PMC on 2020-06-15 - David Mollitor was added as committer on 2020-04-11 - Xiaoqiao He was added as committer on 2020-06-11 - Li Cheng was added as committer on 2020-05-04 - Nilotpal Nandi was added as committer on 2020-04-10 - Siddharth Wagle was added as committer on 2020-06-11 - Vivek Ratnavel Subramanian was added as committer on 2020-04-11 - Yisheng Lien was added as committer on 2020-04-20 Adam Antal and Peter Bacsko are both voted to become committers and both accepted the invite at the end of the quarter. The karma is yet to be added. ## Project Activity: Diversity&inclusion has recently received attention. A discussion thread is happening in the private mailing list to take actions to make the Hadoop project more inclusive, including removing offending branch names, source code and etc. Sammi Chen is the RM for Ozone 0.6 release. Brahma Reddy Battula is continuing on the Hadoop 3.3.0 release and preparing the initial release candidate. Since the Submarine project has become its own TLP, the Submarine code is removed from the Hadoop 3.3.0 release. Gabor started releasing Hadoop 3.1.4 ## Community Health: The weekly Ozone dev community sync is going strong. Recently, a separate, Asia-Pacific time zone friendly sync for the Ozone community is started. The new user-zh@ mailing list is not being well utilized in this quarter. We should promote to make the project more inclusive. Community Diversity: Of the new committers added to the project, 4 out of 7 are affiliated with Cloudera. 4 out of 7 are located in Asia.
## Description: The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. ## Issues: There are no problematic issues requiring board attention at the moment. ## General - A new mailing list user-zh@hadoop.apache.org was created towards end of Feb 2020 to foster questions about Apache Hadoop in Chinese for individuals who feel more comfortable communicating in Chinese. ## Project Activity: ### Releases - Apache Hadoop Ozone 0.5.0, the first beta release of Ozone, was announced on March 24 2020 - hadoop-thirdparty-1.0.0 was released on 2020-03-18. hadoop-thirdparty is a set of internal artifacts used by the project to mitigate the impact of our dependency version updates on the wider ecosystem. ### Other release related news - Apache Hadoop 3.3.0 release originally planned for mid March 2020 is running late - Apache Hadoop 2.8.x release line is marked as end-of-life ## Membership Data: Apache Hadoop was founded 2008-01-16 (12 years ago) There are currently 215 committers and 114 PMC members in this project. The Committer-to-PMC ratio is roughly 7:4. ### PMC changes, past quarter: - Currently 114 PMC members - New PMC members since last report: 1 - Zhankun Tang was added to the PMC on 2020-03-24 ### Committer base changes, past quarter: - Currently 215 committers - New committers since last report: 6 - Nilotpal Nandi was added as committer on 2020-04-11 - David Mollitor was added as committer on 2020-04-11 - Vivek Ratnavel Subramanian was added as committer on 2020-04-11 - Wilfred Spiegelenburg was added as committer on 2020-03-24 - Siyao Meng was added as committer on 2020-03-24 - Aravindan Vijayan was added as committer on 2020-02-03 ## Community Health: ### JIRA Activity Slightly down from last quarter - 1074 JIRA tickets created since the last board report [ project in (YARN, SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND createdDate >= 2020-01-15 ] - 841 JIRA tickets resolved since the last board report [ project in (YARN, SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND resolutiondate >= 2020-01-15 ] ### Mailing list subscriptions & activity: Mailing list activity on existing JIRA related lists (issues, commits) continues to go down across the board - presumably due to lower release activities. The dev lists are a mixed bag with common-dev seeing more activity. The new list user-zh obviously has net positive activity.
## Description: The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. ## Issues: There are no problematic issues requiring board attention at the moment. ## General - Submarine as a TLP was approved by the board at the previous board meeting. Development and releases of the Submarine module inside Apache Hadoop have since moved over to the new TLP project. - Apache Hadoop 3.3.0 release is being planned for mid March 2020 ## Project Activity: ### Releases Apache Hadoop 2.10.0 was released on 2019-10-29 Apache Hadoop 3.1.3 was released on 2019-10-21 Ozone 0.4.1-alpha was released on 2019-10-13 ## Membership Data Apache Hadoop was founded 2008-01-16 (12 years ago) There are currently 209 committers and 113 PMC members in this project. The Committer-to-PMC ratio is roughly 7:4. ### PMC changes, past quarter: - Currently 113 PMC members. - New PMC members since last report: 5 - Chen Liang was added to the PMC on 2019-12-16 - Giovanni Matteo Fumarola was added to the PMC on 2019-12-24 - Nanda kumar was added to the PMC on 2019-10-17 - Shashikant Banerjee was added to the PMC on 2019-12-16 - Surendra Singh Lilhore was added to the PMC on 2020-01-06 ### Committer base changes, past quarter: - Currently 209 committers. - New committers since last report: 4 (1 new branch committer) - Attila Doroszlai was added as committer on 2019-12-17 - Prabhu Joseph was added as committer on 2019-10-23 - Stephen O'Donnell was added as a branch committer on 2019-11-08 (HDDS-1880-Decom branch) - Chao Sun (previously a branch committer on HDFS-12943, Standby reads) was added as a committer on 2019-12-24 ## Community Health: ### JIRA Activity Slightly down from last quarter - 1249 JIRA tickets created since the last board report [ project in (YARN, SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND createdDate >= 2019-10-14 ] - 977 JIRA tickets resolved since the last board report [ project in (YARN, SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND resolutiondate >= 2019-10-14 ] ### Mailing list subscriptions & activity: Mailing list activity down across the board on previously existing lists. Submarine sub-module spinning out to a TLP should be a contributor. Also, new lists created for Ozone sub-module should also contribute to the down-activity on on the issue lists.
## Description The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. ## Issues There are no problematic issues requiring board attention ATM. The community voted to spin off the Submarine module to a separate top-level Apache project and is pursuing board's approval. ## General # A significant Hadoop Community Meetup @ Beijing happened in Aug 2019. Coverage here: https://blogs.apache.org/hadoop/entry/hadoop-community-meetup-beijing-aug # Branch EOL discussion finally happened and resolved. Release lines 2.6, 2.7, 3.0 are marked EOL # Ozone moved to a separate source tree in addition to stand alone releases. Initial thoughts are exchanged if it'd also go the Submarine way of a TLP # 2.10 release process is underway # CVE Announcements: CVE-2018-11768 was announced on Oct 4 2019: HDFS FSImage Corruption # Comment on previous report > da: I don't understand "Branch Committer" are these people Committers or not? AFAIK we don't recognise any other role. vinodkv: They enjoy all the rights of a committer but their voting-in is expedited on a specific speculative branch. Please see the corresponding section in hadoop bylaws here: http://hadoop.apache.org/bylaws.html. Happy to add more pointers if need be. ## Membership Data: Apache Hadoop was founded 2008-01-16 (12 years ago) There are currently 206 committers and 108 PMC members in this project. The Committer-to-PMC ratio is roughly 7:4. ### PMC changes, past quarter: - Currently 108 PMC members. - New PMC members since last report: 4 - Bharat Viswanadham was added to the PMC on 2019-09-26 - Marton Elek was added to the PMC on 2019-07-29 - Hanisha Koneru was added to the PMC on 2019-09-26 - Jonathan Hung was added to the PMC on 2019-10-04 ### Committer base changes, past quarter: - Currently 206 committers. - New committers since last report: 3 - Dinesh Chitlangia was added as committer on 2019-10-05 - Liu Xun was added as committer on 2019-10-05 - Zac Zhou was added as committer on 2019-10-09 ## Project Activity: ### Releases - Apache Hadoop 3.2.1 was released on 2019-09-22. - Apache Hadoop 3.1.3 release is getting wrapped up after the vote passed - Apache Hadoop Ozone 0.4.1 Alpha is being put to vote ## Community Health: ### JIRA Activity Significantly up compared to last quarter - 1341 JIRA tickets created since the last board report [ project in (YARN, SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND createdDate >= 2019-07-15 ] - 1136 JIRA tickets resolved since the last board report [ project in (YARN, SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND resolutiondate >= 2019-07-15 ] ### Github Activity Significantly up, as more and more of the community is moving patch reviews from JIRA over to Github - 569 PRs opened on GitHub, past quarter (60% increase) - 606 PRs closed on GitHub, past quarter (108% increase) ### Mailing list subscriptions & activity: Mailing list traffic is significantly back up, the last quarter being down slightly was likely a one-off.
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. GENERAL - Release 3.0.4 / branch-3.0 EOL discussion happened in May 2019 - not fully concluded - A full day Hadoop Community Meetup happened at Cloudera Palo Alto on June 26: https://www.meetup.com/Hadoop-Contributors/events/262055924/ RELEASES - Apache Hadoop Ozone 0.4.0-alpha was released on May 7 2019 - Apache Hadoop Submarine 0.2.0 was released on Jul 2 2019 COMMUNITY ## PMC changes: - New PMC Members since last report: 3 - Currently 104 PMC members. - Mukul Kumar Singh was added to the PMC on Mon May 13 2019 - Billie Rinaldi was added to the PMC on Tue May 14 2019 - Aaron Fabbri was added to the PMC on Tue Jun 18 2019 ## Committer base changes: - New committers since last report: 7 - Currently 203 committers. - Thomas Marquardt was added as a committer on Wed June 19 2019 (was previously a branch committer for ABFS connector work HADOOP-15407 since Jun 2018). - Gabor Bota was added as a committer on Tue Jun 25 2019 - Daniel Zhou was added as a committer on Wed Jun 26 2019 - Szilard Nemeth was added as a committer on Sat Jun 29 2019 - Abhishek Modi was added as a committer on Sat Jul 06 2019 - Tao Yang was added as a committer on Tue Jul 09 2019 - Ayush Saxena was added as a committer on Tue July 11 2019 (was previously a branch committer to RBF HDFS-13891 branch since Mar 2019). ## JIRA Activity - 999 JIRA tickets created since the last board report [ project in (YARN, SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND createdDate >= 2019-04-15 ] - 720 JIRA tickets resolved since the last board report [ project in (YARN, SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) AND resolutiondate >= 2019-04-15 ] ## Mailing list subscriptions & activity: Slightly down (on both subscriber count as well as emails sent)
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. GENERAL - The community voted by Feb 10 2019 and created a new submodule named "hadoop-submarine" for enabling deep learning training & serving jobs on Hadoop. It follows an independent release cycle - a process already established for Ozone. - Branch-2.7 EOL is being discussed - CVE announcements: CVE-2018-1296, CVE-2018-11767 RELEASES - Apache Hadoop 3.1.2 was released on Mon Feb 04 2019 - Apache Hadoop 3.2.0 was released on Tue Jan 15 2019 - Apache Hadoop Ozone Hadoop Ozone 0.4.0 is being voted COMMUNITY ## PMC changes: - No new PMC additions in the last three months - Currently 101 PMC members. ## Committer base changes: - Currently 198 committers. - New committers since last report: 5 - Chandni Singh was added as a committer on Wed Mar 20 2019 - Ayush Saxena was added as a *branch committer* for HDFS-13891 branch on Wed Mar 13 2019 - Zhankun Tang was added as a committer on Tue Mar 12 2019 - Eric Badger was added as a committer on Tue Mar 05 2019 - Lokesh Jain was added as a committer on Thu Feb 21 2019 ## JIRA Activity (Previous reports were based on the reporter tool and were buggy. Now using the right keys - YARN, SUBMARINE, HADOOP, HDT, HDDS, HDFS, MAPREDUCE) - 534 JIRA tickets created since the last board meeting - 878 JIRA tickets resolved since the last board meeting ## Mailing list subscriptions & activity: Steady
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. GENERAL - Cloudera and Hortonworks who employ committers and PMC of this project have merged. This merger is going to reduce community diversity w.r.t contributors/reviewers/committers/PMC members. - The latest two Hadoop releases are done/being done by new release-managers. This is helping spread around release responsibilities further. - Apache Hadoop git repository moved from git-wip-us server to gitbox.apache.org per the larger ASF INFRA changes - Community is making progress on Java 11. With faster Java updates from Oracle, Community is looking at how to track better this moving target of Java support. RELEASES - Apache Hadoop 2.9.2 was released on Sun Nov 18 2018 - Apache Hadoop Ozone 0.3.0-alpha was released onThu Nov 22 2018 - Apache Hadoop 3.2.0 release is being voted, some security issues held up the release. COMMUNITY ## PMC changes: - Currently 101 PMC members. - New PMC members since last report: 3 - Haibo Chen was added to the PMC on Mon Nov 19 2018 - Iñigo Goiri was added to the PMC on Thu Dec 13 2018 - Yiqun Lin was added to the PMC on Mon Nov 19 2018 ## Committer base changes: - Currently 193 committers. - New committers since last report: 3 - Shashikant Banerjee was added as a committer on Thu Oct 11 2018 - Boton Huang was added as a committer (previously a branch committer) on Thu Oct 16 2018 - Suma Shivaprasad was added as a committer on Mon Nov 19 2018 ## JIRA Activity - 1543 JIRA tickets created in the last 3 months - 1209 JIRA tickets closed/resolved in the last 3 months ## Mailing list subscriptions & activity: Steady
Apache Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. GENERAL - Community has moved to a newly built website by September 1st week. - Community events of note -- Hadoop Contributors meetup happened on Tue Sep 25 2018, hosted by Oath in Sunnyvale, CA with dial-in for remote attendees. -- Bay Area Hadoop User Group meetup happened on Wed Aug 29 2018, hosted by Hortonworks in Santa Clara RELEASES - Apache Hadoop 2.8.5 was released on Sat Sep 15 2018 - Apache Hadoop 3.1.1 was released on Tue Aug 08 2018 - Apache Hadoop 3.2.0 release is being worked on, closer to a RC. - Apache Hadoop Ozone 0.2.1-alpha was released on Mon Oct 01 2018 -- Ozone is a newer module in the project that is getting its own independently versioned release artifacts. COMMUNITY ## PMC changes: - Currently 98 PMC members. - New PMC members: 4 - Bibin Chundatt was added to the PMC on Mon Aug 13 2018 - Vrushali Channapattan was added to the PMC on Sun Jul 29 2018 - Weiwei Yang was added to the PMC on Mon Aug 13 2018 - Yufei Gu was added to the PMC on Mon Jul 30 2018 ## Committer base changes: - Currently 191 committers. - New commmitters: - Ajay Kumar was added as a committer on Thu Sep 13 2018 - Marton Elek was added as a committer on Mon Jul 30 2018 - Takanobu Asanuma was added as a committer on Tue Jul 24 2018 - Branch committers - Kasper Janssens was added as a branch committer (HDFS-12090) on Mon Jul 23 2018 - vote was in Jan 2018. ## JIRA Activity - 1846 JIRA tickets created in the last 3 months - 1557 JIRA tickets closed/resolved in the last 3 months ## Mailing list subscriptions & activity: Steady
Apache Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. UPDATES Maintaining 5+ active release branches is proving to be a pain in general, more so with handling of security vulnerabilities. RELEASES - 2.7.6 was released on Sun Apr 15 2018 - 3.0.2 was released on Sun Apr 22 2018 - 2.9.1 was released on Wed May 09 2018 - 2.8.4 was released on Mon May 14 2018 - 3.0.3 was released on Wed May 30 2018 COMMUNITY ## PMC changes: - Currently 94 PMC members. - New PMC members: - Sammi Chen was added to the PMC on Thu Jun 07 2018 - Sean Mackrory was added to the PMC on Wed Jun 13 2018 ## Committer base changes: - Currently 187 committers. - New commmitters: - Jonathan Hung was added as a committer on May 5 2018 [ Doesn't show up on https://reporter.apache.org ] - Shane Kumpf was added as a committer on Mon May 14 2018 - Nanda Kumar was added as a committer on Wed June 20 2018 [ Doesn't show up on https://reporter.apache.org ] - Ewan Higgs was added as a committer on Wed June 19 2018 [ Doesn't show up on https://reporter.apache.org ] - Giovanni Matteo Fumarola was added as a committer on Fri June 22 2018 [ Doesn't show up on https://reporter.apache.org ] - New branch commmitters: - Duo Zhang was added as a branch committer for work on Non-blocking HDFS Access for H3 (HDFS-13572) on Wed Jun 06 2018 [ Doesn't show up on https://reporter.apache.org ] - Esfandiar Manii was added as a branch committer for ABFS connector work (HADOOP-15407) on Fri Jun 08 2018 - Thomas Marqardt was added as a branch committer for ABFS connector work (HADOOP-15407) on Fri Jun 08 2018 - Botong Huang was added as a branch committer on Hadoop + Windows Server work (HADOOP-15461) on Mon Jun 25 2018. Already a branch committer on another branch YARN-7402. ## Mailing list activity: Steady SECURITY Announced CVEs - CVE-2016-6811 on April 30 2018: Apache Hadoop Privilege escalation vulnerability (Issue fixed long time ago, but CVE announcement slipped through the cracks)
WHEREAS, the Board of Directors heretofore appointed Chris Douglas (cdouglas) to the office of Vice President, Apache Hadoop, and WHEREAS, the Board of Directors is in receipt of the resignation of Chris Douglas from the office of Vice President, Apache Hadoop, and WHEREAS, the Project Management Committee of the Apache Hadoop project has chosen by vote to recommend Vinod Kumar Vavilapalli (vinodkv) as the successor to the post; NOW, THEREFORE, BE IT RESOLVED, that Chris Douglas is relieved and discharged from the duties and responsibilities of the office of Vice President, Apache Hadoop, and BE IT FURTHER RESOLVED, that Vinod Kumar Vavilapalli be and hereby is appointed to the office of Vice President, Apache Hadoop, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed. Special Order 7C, Change the Apache Hadoop Project Chair, was approved by Unanimous Vote of the directors present.
Apache Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. Hadoop maintains five to seven active release branches. We hope this will be the peak. That said, the 3.x releases are a healthy balance of stabilization in 3.0.1 and new features merged and released in 3.1.0. RELEASES 3.0.1 was released 2018-03-22 3.1.0 was released 2018-04-05 COMMUNITY (+ PMC Rakesh Radhakrishnan 2018-01-23) (+ committer Hanisha Koneru 2018-01-10) (+ committer Mukul Kumar Singh 2018-02-09) (+ committer Rushabh Shah 2018-04-06) (+ committer Bharat Viswanadham 2018-04-06) (+ branch-HDFS-12090 Bert Verslyppe 2018-03-14) (+ branch-HDFS-12090 Ewan Higgs 2018-01-19) (+ branch-HDFS-12943 Chao Sun 2018-01-02) (+ branch-HDFS-12943 Erik Krogen 2018-01-10) (+ branch-YARN-7402 Botong Huang 2018-01-31) (+ branch-YARN-7402 Giovanni Matteo Fumarola 2018-01-31) auth: 183 committers (including branch) and 92 PMC members
Apache Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. Hadoop 3.0.0 is GA. This not only unlocks new features and development, it also puts the project on stable footing to continue a steady cadence with less backporting. We're not there yet, as we currently have four active release branches to support existing users, but it is a significant milestone. Several feature branches have merged, or are preparing to merge, for a 3.1.0 release in February. RELEASES 3.0.0-beta1 was released 2017-10-02 2.8.2 was released 2017-10-23 2.9.0 was released 2017-11-16 2.8.3 was released 2017-12-12 3.0.0 was released 2017-12-12 2.7.5 was released 2017-12-13 COMMUNITY (+ PMC Brahma Reddy Battula 2017-12-14) (+ PMC Konstantinos Karanasos 2017-11-26) (+ committer Billie Rinaldi 2017-10-26) (+ committer Miklos Szegedi 2017-12-27) (+ committer Sammi Chen 2017-10-15) (+ committer Virajith Jalaparti 2017-12-29) (+ committer Inigo Goiri 2017-10-19) auth: 176 committers (including branch) and 91 PMC members
Apache Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. The 3.x series entered beta as the community merged the final set of features and is focused on stabilization. The 2.x series received a bugfix release in 2.7.4 as the community prepares for the 2.8.2 and 2.9 releases. Details of the release roadmap [1] and blockers/status for individual releases [2,3] are tracked in wiki. [1] https://cwiki.apache.org/confluence/display/HADOOP/Roadmap [2] https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.9+Release [3] https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates RELEASES 2.7.4 was released 2017-08-03 3.0.0-beta1 was released 2017-10-02 COMMUNITY (+ PMC Anu Engineer 2017-08-16) (+ PMC Daniel Templeton 2017-08-24) (+ PMC Eric Payne 2017-07-24) (+ PMC John Zhuge 2017-08-29) (+ PMC Kai Zheng 2017-08-07) (+ PMC Mingliang Liu 2017-09-21) (+ PMC Naganarasimha 2017-08-10) (+ PMC Ray Chiang 2017-08-24) (+ PMC Sunil G 2017-07-28) (+ PMC Varun Saxena 2017-07-21) (+ PMC Wei-Chiu Chuang 2017-08-29) (+ PMC Xiao Chen 2017-08-29) (+ committer Aaron Fabbri 2017-09-06) (+ committer Chen Liang 2017-09-05) (+ committer Sean Busbey 2017-09-14) (+ committer Surendra Singh Lilhore 2017-09-18) (+ committer Wei Yan 2017-08-23) (+ committer Weiwei Yang 2017-09-25) (+ branch-HDFS-7240 Mukul Kumar Singh 2017-09-21) (+ branch-HDFS-7240 Nanda kumar 2017-09-20) (+ branch-HDFS-7240 Yuanbo Liu 2017-09-20) (+ branch-YARN-1011 Miklos Szegedi 2017-09-29) auth: 175 committers (including branch) and 89 PMC members
Apache Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. The community is completing the 3.x-alpha series of releases from trunk, moving to a stabilizing, -beta series. The 2.7.4 release series will receive a bugfix release, likely in the next few weeks. The 2.8 (and 2.9) release branches are also likely to be released this year while 3.x enters GA. RELEASES 3.0.0-alpha3 was released 2017-05-25 3.0.0-alpha4 was released 2017-07-06 COMMUNITY (+ PMC Subru Krishnan 2017-07-04) (+ committer Chris Trezzo 2017-04-24) (+ committer Vrushali Channapattan 2017-04-24) (+ committer Yufei Gu 2017-05-19) (+ committer Nathan Roberts 2017-05-22) (+ committer James Clampffer 2017-05-31) (+ committer Sean Mackrory 2017-06-16) (+ committer Manoj Govindassamy 2017-07-03) auth: 169 committers (including branch) and 77 PMC members. SECURITY CVE-2017-7669: Apache Hadoop privilege escalation
Apache Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. The project cut a release from the long-lived 2.8 branch, iterating on its stable series of releases. Concurrently, it is stabilizing and adding new features to a 3.x series, anticipating one more alpha release before stabilizing in beta. Activity in both HDFS and YARN proceeds both in feature branches (e.g., object storage, "native task" support) and in a steady stream of fixes and smaller features in mainline branches. RELEASES 2.8.0 was released 2017-03-22 3.0.0-alpha2 was released 2017-01-24 COMMUNITY (+ PMC Ravi Prakash 2017-02-07) (+ committer John Zhuge 2017-02-24) (+ committer Yiqun Lin 2017-01-14) (+ committer Haibo Chen 2017-04-13) (+ branch-HADOOP-13335 Sean Mackrory 2017-02-13) (+ branch-HDFS-7240 Chen Liang 2017-03-31) (+ branch-HDFS-7240 Weiwei Yang 2017-04-13) auth: 164 committers (including branch) and 76 PMC members
Apache Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. The community is working through criteria for its 3.x and 2.x series, particularly w.r.t. compatibility e.g., [1]. Progress on 3.0.0-alpha2 [2,3] and a 2.8 [4] will likely produce RCs soon. [1] https://issues.apache.org/jira/browse/HDFS-11096 [2] https://s.apache.org/zBhP [3] https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0.0+release [4] https://s.apache.org/smEX RELEASES Last release: 2.6.5 2016-10-07 COMMUNITY (+ PMC Carlo Curino 2016-11-03) (+ PMC Li Lu 2017-01-10) (+ PMC Ming Ma 2016-11-03) (+ PMC Rohith Sharma K S 2016-11-17) (+ PMC Varun Vasudev 2016-10-20) (+ PMC Zhe Zhang 2016-11-03) (+ committer Bibin Chundatt 2016-12-12) (+ committer Konstantinos Karanasos 2017-01-12) (+ committer Rakesh Radhakrishnan 2016-12-30) (+ committer Sidharta Seethana 2016-12-15) (+ committer Sunil Govind 2016-10-27) (+ committer Yiqun Lin 2017-01-14) (+ branch-HDFS-9806 Thomas Demoor 2016-10-24) (+ branch-YARN-5734 Jonathan Hung 2016-12-13) (+ branch-YARN-5734 Min Shen 2016-12-13) (+ branch-YARN-5734 Ye Zhou 2016-12-13) auth: 161 committers and 75 PMC members SECURITY CVE-2016-3086: Apache Hadoop YARN NodeManager vulnerability CVE-2016-5001: Apache Hadoop Information Disclosure
Apache Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. Apache Hadoop cut its first release from trunk since 2011. Releases in the 2.x series will continue to follow stricter compatibility guidelines on a branch. The 3.0.0-alpha1 release has kicked off several API cleanups, reasoning about dependencies, and other pains endured to avoid downstream breakage. RELEASES 2.6.5 was released 2016-10-07 2.7.3 was released 2016-08-24 3.0.0-alpha1 was released 2016-09-02 COMMUNITY (+ committer Anu Engineer 2016-07-27) (+ committer Mingliang Liu 2016-08-14) (+ committer Wei-Chiu Chuang 2016-07-20) (+ committer Xiao Chen 2016-07-20) (+ committer Larry McKay 2016-07-20) (+ branch-HADOOP-13345 Rajesh Balamohan 2016-08-06) (+ branch-HADOOP-10285 Rakesh Radhakrishnan 2016-09-26) (+ branch-HADOOP-12756 Mingfei Shi 2016-08-22) (+ branch-HADOOP-13345 Aaron Fabbri 2016-08-14) (+ branch-HDFS-9806 Ewan Higgs 2016-09-21) (+ branch-HDFS-9806 Virajith Jalaparti 2016-09-21) (+ branch-HDFS-9806 Pieter Reuse 2016-09-21) (+ branch-YARN-4752 Daniel Templeton 2016-08-14) (+ branch-YARN-5079 Billie Rinaldi 2016-08-12) (+ branch-YARN-5079 Gour K Saha 2016-08-12) auth:154 committers and 69 PMC members TRADEMARKS The project updated its logo to include "Apache" [1]. We have an outstanding request to trademarks@ to register our yellow elephant logo with the USPTO. [1] https://issues.apache.org/jira/browse/HADOOP-13184
There was a discussion concerning who is responsible for trademark enforcement, and a general consensus that it isn't currently working. Shane and Chris Douglas to continue this discussion offline.
@shane report back on resolving the hadoop trademark enforcement issues.
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. YARN timeline server v2 (YARN-2928) merged. Container queuing and resource-aware scheduling made progress. HDFS intra-datanode rebalancing (HDFS-1312) merged. Object storage (Ozone) and the native client made progress. Design of an async FileSystem API and an implementation in HDFS started in JIRA. HADOOP integrations with cloud storage were particularly active this quarter. The S3A client received significant updates from a diverse set of community members. Clients for the Aliyun Object Store Service (OSS) and Microsoft Azure Data Lake Store (ADLS) also posted proposals and prototypes. MAPREDUCE was updated to work with the next generation of the YARN timeline service. The Yetus project has greatly improved CI and regression testing, particularly across branches. Given the Hadoop project's intent to cut releases from trunk again, Yetus's support for feature branches is particularly helpful. RELEASES Releases have been blocked on HADOOP-12893, bringing the NOTICE and LICENSE files up to date. It is recently resolved. COMMUNITY (+ PMC Xiaoyu Yao 2016-06-14) (+ PMC Lei Xu 2016-05-15) (+ PMC Arun Suresh 2016-06-23) (+ committer Brahma Reddy Battula 2016-06-11) (+ committer Ray Chiang 2016-06-17) (+ committer Subru Krishnan 2016-06-14) (+ committer Varun Saxena 2016-06-22) (+ branch-YARN-3368 Sreenath Somarajapuram 2016-06-21) (+ branch-YARN-3368 Sunil Govind 2016-06-03) auth: 142 committers (including branch), 69 PMC members
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. YARN development yielded improvements in preemption, hardening of the timeline server, and a new web UI. Designs for resource-aware scheduling and long-running services are being shaped in JIRA. HDFS erasure coding, native client, object store (Ozone), and intra-datanode rebalancing are significant areas under development. MapReduce continues to receive bug fixes, but even maintenance has slowed. Trademark enforcement continues to be a challenge. While most vendors have engaged quickly and positively, slow (non-)compliance falls off the radar. We are working with trademarks@ to amortize the costs of engagement with templates and will track these incidents in the BRAND JIRA as appropriate, to track followup. RELEASES - 2.6.4 was released on Feb 10 2016 - 2.7.2 was released on Jan 26 2016 COMMUNITY (+ PMC Yongjun Zhang 2016-02-18) (+ PMC Sangjin Lee 2016-04-12) (+ committer Masatake Iwasaki 2016-01-20) (+ committer Eric Payne 2016-02-08) (+ committer Li Lu 2016-02-21) (+ committer Naganarasimha Garla 2016-03-29) (+ committer Kai Zheng 2016-04-07) (+ committer Larry McCay 2016-04-08) (+ branch-HDFS-1312 Anu Engineer 2016-03-03) (+ branch-HDFS-8707 Bob Hansen 2016-01-13) (+ branch-YARN-1011 Iñigo Goiri 2016-01-29) auth: 138 committers (including branch), 66 PMC members
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. In YARN, resource-aware scheduling (YARN-1011) continues to make progress on a branch, particularly for oversubscription and distributed scheduling. Other areas of active development include node labels, reservations, docker support, and the timeline server. In HDFS, a long-awaited native client (HDFS-8707) has made progress in a branch. Other areas include the WebHDFS protocol, truncate, erasure coding, and intra-datanode rebalancing. Discussion on the dev list suggests that support for erasure coding will likely be pushed from the next release, to 2.9 or 3.0. Bug fixes and stability improvements continue to be filed and fixed in MapReduce. The community prepares maintenance releases (2.6.4 and 2.7.2) concurrently with a release of the head of branch-2 as 2.8.0. RELEASES - 2.6.3 was released on Wed Dec 16 2015 COMMUNITY (+ PMC Yi Liu 2015-11-09) (+ PMC Tsuyoshi Ozawa 2015-12-09) (+ PMC Wangda Tan 2015-12-09) (+ PMC Akira Ajisaka 2015-12-16) (+ PMC Robert Kanter 2016-01-12) (+ branch-YARN-2928 Varun Saxena 2015-12-04) (+ branch-YARN-2928 Naganarasimha 2015-12-04) (+ branch-HDFS-8707 Stephen Walkauskas 2016-01-07) auth: 134 committers (including branch), 64 PMC members
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. In YARN, support for resizeable containers (YARN-1197) merged to trunk and branch-2, where its development continues. Application priorities (YARN-1963) and the v2 timeline server (YARN-2928) continue to make progress. Failover, HA, and rolling upgrade support were polished. Some issues related to resource-aware scheduling advanced, tentatively. Support for Docker containers (YARN-3611) improved, trending toward support for multiple runtimes (YARN-3853). In HDFS, support for erasure coding (HDFS-7285) merged to trunk. Many improvements focused on improving interactions between features (storage policies, upgrade domains, erasure coding, etc.). Separation between the namespace and block management, years in development, has received renewed attention (e.g., HDFS-8966). HDFS also separated its client(s) into a separate package. Another native client implementation (HDFS-8707) has made steady progress. In MapReduce, bug fixes, stability improvements, and documentation comprised most of the activity. It remains in maintenance mode. In Common, Hadoop dev support scripts were rewritten and split into Yetus, a new TLP. The s3a and wasb filesystem bindings also received many bug fixes and improvements. Portability of native code improved. The community continues to stabilize the 2.6.x and 2.7.x branches (currently voting on 2.7.2), and has discussed a 2.8.0 release. It also opened a discussion of patch workflows, as alternatives to JIRA/patch files/RTC. While the Github integration is currently enabled, project members are working with other communitities and infra on alternatives (e.g., Gerrit). RELEASES - hadoop-2.6.1 @ 2015-09-23 - hadoop-2.6.2 @ 2015-10-28 COMMUNITY (+ PMC Devaraj K 2015-07-20) (+ PMC Yi Liu 2015-11-09) (+ committer Zhihai Xu 2015-07-27) (+ committer Anubhav Dhoot 2015-09-22) (+ committer Sangjin Lee 2015-09-30) (+ committer Zhe Zhang 2015-10-16) (+ committer Walter Su 2015-10-27) Branch: Timeline service (+ branch-YARN-2928 Vrushali Channapattan 2015-09-14) (+ branch-YARN-2928 Li Lu 2015-09-29) Branch: IPv6 support (+ branch-HADOOP-11890 Elliott Clark 2015-09-03) (+ branch-HADOOP-11890 Nate Edel 2015-09-04) Branch: C++ HDFS client (+ branch-HDFS-8707 James Clampffer 2015-07-29) auth: 131 committers (including branch), 60 PMC members
No report was submitted.
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. In YARN, work generalizing node labels, improving Docker support, and implementing v2 of the timeline server made progress. Support for resizable containers also appears on-track. A proposal for federated YARN clusters has design docs and some preliminary code in a branch. In HDFS, the erasure coding work continues to make progress in a branch. The object store also has a design doc for discussion and a preliminary set of patches has been committed to a branch. A native client and prototype HTTP/2 protocol have also made progress in branches. In Common, work refining the test-patch scripts has expanded its scope to become a separable component that could support other projects in the ecosystem. Work revising the native build for Solaris also expanded to remake much of the native build infrastructure. The S3 filesystem shim was also updated extensively. RELEASES - hadoop-2.7.0 @ 2015-04-22 - hadoop-2.7.1 @ 2015-07-08 COMMUNITY (+ PMC Vinayakumar B 2015-07-07) (+ PMC Junping Du 2015-07-07) (+ PMC Xuan Gong 2015-07-07) (+ PMC Haohui Mai 2015-02-20) (+ committer Lei Xu 2015-06-14) (+ committer Ming Ma 2015-06-18) (+ committer Xiaoyu Yao 2015-04-16) (+ committer Varun Vasudev 2015-05-28) (+ committer Rohith Sharma K S 2015-06-17) Branch: Split test-patch off into its own TLP (HADOOP-12111) (+ branch-HADOOP-12111 Andrew Kyle Purtell 2015-06-27) (+ branch-HADOOP-12111 Nick Dimiduk 2015-06-27) (+ branch-HADOOP-12111 Andrew Bayer 2015-06-27) (+ branch-HADOOP-12111 Sean Busbey 2015-06-27) Branch: YARN Federation (YARN-2915) (+ branch-YARN-2915 Subru Krishnan 2015-07-06) (+ branch-YARN-2915 Kishore Chaliparambil 2015-07-06) Branch: Distributed scheduling (YARN-2877) (+ branch-YARN-2877 Sriram Rao 2015-05-21) (+ branch-YARN-2877 Konstantinos Karanasos 2015-05-21) Branch: Object store (HDFS-7240) (+ branch-HDFS-7240 Anu Engineer 2015-07-10) Branch: Data Transfer Protocol via HTTP/2 (HDFS-7966) (+ branch-HDFS-7966 Duo Zhang 2015-07-02) auth: 124 committers (including branch), 58 PMC members
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. In YARN, the next iteration of the TimelineServer, work on network shaping, per-queue policies, and collecting node metrics for scheduling have made progress. Work on erasure coding in HDFS continues. A design document for the object store (HDFS-7240) also appeared. Activity is low in MapReduce, mostly bug fixes and repairs for unstable tests. Overhaul of shell scripts continues in Common, in addition to changes supporting pluggable authentication and authorization. RELEASES None COMMUNITY (+ PMC Haohui Mai 2015-02) (+ committer Arun Suresh 2015-03) (+ committer Xiaoyu Yao 2015-03) auth: 110 committers (including branch), 55 PMC members
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. The 2.6 release added a large set of features and made many improvements, including transparent encryption, heterogeneous/tiered storage, support for Docker containers, reservation-based scheduling, node labels, S3a support, key management server (KMS), service registry, and rolling upgrades in YARN. Ongoing development in YARN includes a new round of improvements to the timeline server (YARN-2928), nodemanager decommission and work-preserving restart (YARN-914, YARN-1336, YARN-556), improved locking in the RM (YARN-3091), shared cache (YARN-1492), and disk as a resource (YARN-2139). Ongoing development in HDFS includes erasure coding (HDFS-7285), support for truncate (HDFS-3107), namenode synchronization (HDFS-7396), and a native client (HDFS-6994). MapReduce received a healthy set of bug fixes and stability improvements. RELEASES - hadoop-2.6.0 @ 2014-11-19 - hadoop-2.5.2 @ 2014-11-20 COMMUNITY (+ PMC Zhijie Shen @ 2014-11) (+ PMC Jian He @ 2014-11) (+ committer Yi Liu @ 2014-11) (+ committer Carlo Curino @ 2014-11) (+ committer Gera Shegalov @ 2014-12) (+ committer Robert Kanter @ 2014-12) (+ committer Tsuyoshi Ozawa @ 2014-12) (+ committer Akira Ajisaka @ 2015-01) (+ committer Wangda Tan @ 2015-01) (+ branch-HDFS-7285 Zhe Zhang @ 2014-11) (+ branch-HDFS-7285 Kai Zhang @ 2014-11) (+ branch-HDFS-7285 Bo Li @ 2014-11) (+ branch-YARN-2139 Wei Yan @ 2014-12) auth: 108 committers (including branch), 54 PMC members
No report was submitted.
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. The 2.6 release has branched for release. In addition to bug fixes, it adds several new features and refines existing work in the 2.x release series. Among the notable work in YARN: improvements to its fault tolerance and support for rolling upgrades (YARN-556, YARN-1336), timeline/history server (YARN-1530), log handling (YARN-2443), admission control/planning (YARN-1051), support for long-running services (YARN-913), node labels (YARN-796), and large container allocation (YARN-1769). Among the notable work in HDFS: tiered storage in archival (HDFS-6584), in-memory replicas (HDFS-6581), inotify support (HDFS-6634), extended attributes (HDFS-2006), encryption (HDFS-6134, HADOOP-10150), and a native client implementation. In MapReduce, a native collector (MAPREDUCE-2841) offers improved performance to many deployments. Work started earlier in the 2.x branch- particularly related to security, encryption, and high availability- continues apace. RELEASES - hadoop-2.5.0 @ 2014-08-12 - hadoop-2.5.1 @ 2014-09-11 COMMUNITY (+ PMC Karthik Kambatla @ 2014-09-18) (+ committer Benoy Antony @ 2014-08-07) (+ committer Akira Ajisaka @ 2014-08-21) (+ branch-MAPREDUCE-2841 Binglin Zhang @ 2014-07-14) (+ branch-MAPREDUCE-2841 Sean Zhong @ 2014-07-14) (+ branch-MAPREDUCE-2841 Manu Zhang @ 2014-08-21) auth: 101 committers (including branch), 52 PMC members
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. YARN development of a generic TimelineServer, resource tracking for disk and network resources, caching of common dependencies, support for container preemption, and other features continues. The HDFS extended attributes feature branch merged to trunk (2014-06-11). Thorough specification of FileSystem semantics (HADOOP-9361) also successfully merged. Native checksumming, hedged reads, features built over HA interfaces, NFS, and ACLs are also actively developed in trunk and release branches. Across all projects, work adding encryption and security features continues in a development branch. The project changed its bylaws to allow 5 days for release votes, instead of the 7 allocated for other decisions. RELEASES - hadoop-0.23.11 @ 2014-06-27 - hadoop-2.4.1 @ 2014-06-29 COMMUNITY (+ PMC Andrew Wang @ 2014-06-01) (+ PMC Arpit Agarwal @ 2014-06-01) (+ PMC Brandon Li @ 2014-06-01) (+ PMC Chris Nauroth @ 2014-06-01) (+ PMC Colin McCabe @ 2014-06-01) (+ PMC Jing Zhao @ 2014-06-01) (+ PMC Sandy Ryza @ 2014-06-01) (+ branch-HADOOP-10388 Abraham Elmahrek @ 2014-05-01) (+ branch-HADOOP-10388 Yongjun Zhang @ 2014-05-01) (+ branch-HDFS-2006 Charles Lamb @ 2014-05-12) (+ branch-HDFS-2006 Yi Liu @ 2014-05-12) (+ branch-fs-encryption Charles Lamb @ 2014-05-14) (+ branch-fs-encryption Yi Liu @ 2014-05-14) (+ branch-YARN-1051 Carlo Curino @ 2014-06-15) (+ branch-YARN-1051 Subramaniam Venkatraman Krishnan @ 2014-06-15) auth: 94 committers (including branch), 51 PMC members
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. The YARN execution platform continues to evolve by generalizing from the specific requirements of the MapReduce framework. As one prominent example, a development branch implementing a more general application history server (YARN-321) merged to trunk and the 2.x release series. The operability and robustness of the platform is also improved by recent attention to failover and recovery in the ResourceManager and NodeManager components (e.g., YARN-1336, YARN-1815). The HDFS subproject also merged two significant development branches to trunk: rolling upgrades (HDFS-5535) and ACLs (HDFS-4685). Improvements in the Common RPC layer, short-circuit reads, and 'hedged' reads (HDFS-5776) evolve Hadoop storage toward more heterogeneous workloads and architectures. RELEASES - hadoop-2.3.0 @ 2014-02-20 - hadoop-2.4.0 @ 2014-04-07 COMMUNITY (+ committer Haohui Mai @ 2014-02-11) (+ committer Vinayakumar B @ 2014-03-04) (+ committer Xuan Gong @ 2014-03-13) (+ branch-HADOOP-10388 Binglin Chang @ 2014-03-13) (+ branch-HADOOP-10388 Wenwu Peng @ 2014-04-07) auth: 88 committers (including branch), 44 PMC members The last addition to the PMC was Bikas Saha 2013-10
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. The Hadoop project reached a significant milestone, releasing Hadoop 2.2.0 as the first GA artifact in that series. Two development branches have merged to trunk: In-memory caching of HDFS blocks (HDFS-4949) (29-Oct-2013) and the first phase in presenting heterogeneous storage to applications (HDFS-2832) (13-Dec-2013). Development of these features continues in trunk. YARN continues to refine its resource model. Salient issues include modifying containers (YARN-1197), delegating cluster resources (YARN-1488), and improving its model for services (YARN-896). Work on improving high availability in the ResourceManager (YARN-149), particularly YARN-1029, has made very promising progress. RELEASES - hadoop-2.2.0 @ 2013-10-15 - hadoop-0.23.10 @ 2013-12-02 COMMUNITY (+ committer Roman Shaposhnik @ 2013-10-25) (+ committer Jun Ping Du @ 2013-12-04) (+ committer Jian He @ 2013-12-04) (+ committer Mayank Bansal @ 2013-12-04) (+ committer Karthik Kambatla @ 2013-12-04) (+ committer Ravi Prakash @ 2013-12-04) (+ committer Omkar Joshi @ 2013-12-04) (+ committer Zhijie Shen @ 2013-12-04) (+ branch-YARN-1492 Chris Trezzo 2013-12-18) (+ branch-YARN-1492 Sangjin Lee 2013-12-18) (+ branch-HDFS-4685 Haohui Mai 2013-12-29) auth: 84 committers (including branch), 44 PMC members
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. The 2.x series reached beta status in August, targeting a GA release before the end of the year. Work hardening the release continues apace. The project updated its bylaws to allow for "branch committers" in support of feature development by newer contributors. The first set are collaborating on a security initiative that has been discussed on the dev list, in meetups, and in related JIRAs since last spring. No work on the branch has started, despite exchanges on the lists on possible, seminal issues to tackle. RELEASES - hadoop-1.2.1 @ 2013-08-05 - hadoop-2.0.6-alpha @ 2013-08-22 - hadoop-2.1.0-beta @ 2013-08-25 - hadoop-2.1.1-beta @ 2013-09-30 COMMUNITY (+ PMC Bikas Saha @ 2013-10-07) (+ committer Arpit Agarwal @ 2013-08-08) (+ committer Sanford Ryza @ 2013-07-25) (+ committer Andrew Wang @ 2013-07-25) (+ committer Devaraj K @ 2013-07-23) auth: 74 committers, 44 PMC members
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. There is little to report, as we submitted an off-cycle report in June. Security discussions on the dev list converge slowly, but consensus is developing around implementation tasks, if not the precise shape of that work. Preparation for the 2.1-beta release continues. Contributors continue to stabilize APIs, iron out incompatibilities with the 1.x codebase, and integrate with related projects. When the Hadoop project spun off subprojects a few years ago, the projects adjusted their committer roles. We'd been ambivalent about finishing that, but finally did, removing about 11 accounts (none had participated since then). RELEASES - hadoop-0.23.9 @ 2013-07-09 COMMUNITY auth: 68 committers, 43 PMC members
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. This is an off-cycle report, as the last few weeks were eventful. The Hadoop project made five releases from three active development branches, elected 7 members to the PMC, and added five committers. The project amended its bylaws to eliminate votes on "release plans". RELEASES - hadoop-0.23.7 @ 2013-04-18 - hadoop-2.0.4 2013-04-23 - hadoop-1.2 @ 2013-05-13 - hadoop-0.23.8 @ 2013-06-05 - hadoop-2.0.5 @ 2013-06-09 COMMUNITY (+ PMC Jonathan Eagles 2013-05-29) (+ PMC Kihwal Lee 2013-05-29) (+ PMC Steve Loughran 2013-05-29) (+ PMC Luke Lu 2013-05-29) (+ PMC Uma Maheswar Rao G 2013-05-29) (+ PMC Hitesh Shah 2013-05-29) (+ PMC Daryn Sharp 2013-05-29) (+ committer Brandon Li 2013-05-21) (+ committer Colin McCabe 2013-05-21) (+ committer Jing Zhao 2013-05-22) (+ committer Ivan Mitic 2013-05-23) (+ committer Chris Narouth 2013-05-23) auth: 79 committers, 43 PMC members The bylaws contained an obscure clause that required release managers to call a vote on a "release plan". Given that a majority vote of the PMC establishes a new release, the meaning of this rarely-observed ritual is ambiguous: there was a vote, but nothing in it was binding. After several weeks of heated exchanges that accomplished nothing, the PMC voted to remove the clause from the bylaws entirely. Now, any committer who wants to roll a release notifies the dev list to explain its motivation and get preliminary feedback, but there is no vote. The completely avoidable confusion these threads created has mostly resolved.
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. Two significant development branches merged to trunk: - Support for Windows ( http://s.apache.org/e7c ) - (HDFS) Fast-path for local reads on Linux (merge vote closing presently) ( http://s.apache.org/gM ) ( http://s.apache.org/7y1 ) Developers have run Hadoop on Windows by emulating its *NIX dependencies, but the former branch effects a cleaner integration. The latter branch removed a performance hack for trusted services, replacing it with a more secure and general implementation for all HDFS clients. Developers on Windows requested that the workaround remain intact while comparable functionality is implemented on that platform. The two merge votes were nearly concurrent, so the development community discussed the tradeoffs in supporting the new platform, particularly given the present example of its impact. The informal consensus laid the burden of support, testing, and monitoring on the subset of developers working on Windows. Concretely, this extracted commitments to set up and maintain CI infrastructure while relieving others of requirements to fix breakage on a platform they may not run. As applied to the HDFS branch being merged, the implementor(s) of the feature restored the workaround. The dev community converged on these banal agreements fairly quickly. Increased collaboration with the Apache Bigtop project in the release process has improved early detection of downstream integration issues. The upcoming release of 2.0.4-alpha (currently being voted on) has benefitted significantly. Hadoop continues to be an umbrella hosting effectively independent projects (HDFS, MapReduce, YARN). The PMC has not discussed its disposition to partition them recently. While one of the prenominate merges is an example of cross-project work, such patches remain rare. No issues require board attention at this time. RELEASES - hadoop-1.1.2 @ 2013-03-06 COMMUNITY (+ PMC Jason Lowe 2013-02-28) auth: 74 committers, 36 PMC members mailing lists @ 2013-04-01 1805 general 3995 user COMMON Common is the shared libraries for HDFS and MapReduce. mailing lists @ 2013-04-01 390 common-commits 1789 common-dev 378 common-issues HDFS HDFS is a distributed file system that supports reliable replicated storage across the cluster using a single name space. mailing lists @ 2013-04-01 201 hdfs-commits 862 hdfs-dev 258 hdfs-issues MAPREDUCE MapReduce is an implementation of the map/reduce programming paradigm. mailing lists @ 2013-04-01 198 mapreduce-commits 904 mapreduce-dev 256 mapreduce-issues YARN YARN is a distributed computation framework for easily writing distributed applications. mailing lists @ 2013-04-01 57 yarn-commits 221 yarn-dev 81 yarn-issues
WHEREAS, the Board of Directors heretofore appointed Arun Murthy to the office of Vice President, Apache Hadoop, and WHEREAS, the Board of Directors is in receipt of the resignation of Arun Murthy from the office of Vice President, Apache Hadoop, and WHEREAS, the Project Management Committee of the Apache Hadoop project has chosen by vote to recommend Chris Douglas as the Successor to the post; NOW, THEREFORE, BE IT RESOLVED, that Arun Murthy is relieved and discharged from the duties and responsibilities of the office of Vice President, Apache Hadoop, and BE IT FURTHER RESOLVED, that Chris Douglas be and hereby is appointed to the office of Vice President, Apache Hadoop, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed. Special Order 7A, Change the Apache Hadoop Chair, was approved by Unanimous Vote of the directors present.
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. On the people side, we have new people joining our ranks. * We've added 3 new committers - Kihwal Lee, Arpit Gupta, Bikas Saha * We've added 1 new PMC member: Harsh J * We've elected a new PMC Chair, Chris Douglas. (Also added to the board agenda.) On the project side, we have made 4 releases: - hadoop-0.23.5 was released on 28th November, 2012 - hadoop-1.1.1 was released on 1st December, 2012 - hadoop-0.23.6 was released on 6th February, 2013 - hadoop-2.0.3-alpha was released on 13th February, 2013 PMC Chair Vote - We had a fairly contentious discussion after for the PMC Chair resulted in a tie after STV. The discussions included *analysis* of voting patterns w.r.t employers, accusations and counter-accusations about reasons for those patterns such as marketing etc., a proposal to *rotate PMC chair organization* as one of the remedies, which eventually veered into a direction where one PMC member perceived it as a of 'threat to remove all PMC members of an organization' which was rapidly diffused by a clarification by the other PMC member. In the end, one of the 2 candidates tied after the vote withdrew to allow for an amicable solution and also cited concerns about the nature of some of the discussions. Clearly, the lesson the Hadoop PMC has learnt is that, in future, voting should be done via the ASF Voting Tool. As the outgoing Chair, my personal recommendation is that splitting the Hadoop project into separate TLPs (HDFS, YARN, MapReduce) will not only break up the 'umbrella' Hadoop project to better reflect the fact that the communities are significantly disparate, but will also, more importantly, help avoid excessive fascination with the Hadoop brand. We've discussed about this in the past (see October 2012 Board Report) - some people agree about this, others don't. We'll continue to talk. Overall, aside from these skirmishes, the community continues to function in a healthy manner as evinced by the fact that we continue to make a significant number of software releases, grow the community by adding new users/contributors/committers/PMC-members and generally make great forward progress. Hence, I feel there isn't any reason for the Board to take any action. Community: * 51 committers * 3932 user@ * 1783 subscribers on general@ COMMON Common is the shared libraries for HDFS and MapReduce. Community: * 1751 subscribers on common-dev HDFS HDFS is a distributed file system that supports reliable replicated storage across the cluster using a single name space. Community: * 829 subscribers on hdfs-dev YARN YARN is a distributed computation framework for easily writing distributed applications. Community: * 185 subscribers to yarn-dev MAPREDUCE MapReduce is an implementation of the map/reduce programming paradigm. Community: * 867 subscribers to mapreduce-dev
No report was submitted.
AI: Roy to pursue a report for Hadoop
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. On the people side, we have new people joining our ranks. * We've added one new committer - Jason Lowe * We've added 3 new PMC members: Siddharth Seth, Robert Evans, Thomas Graves On the project side, we have made 6 releases: - hadoop-2.0.1-alpha was released on 26th July, 2012 - hadoop-0.23.3 was released on 17th September, 2012 - hadoop-2.0.2-alpha was released on 9th October, 2012 - hadoop-1.0.4 was released on 11th October, 2012 - hadoop-1.1.0 was released on 14th October, 2012 - hadoop-0.23.4 was released on 15th October, 2012 Developer community is working well together, even though there was a fresh (but minor) outbreak of vendor wars with some participation by members of the PMC. No action from the Board is necessary now. We've added a new Hadoop YARN sub-project. We had a fairly contentious public discussion on splitting Apache Hadoop into separate projects since there are at least 3 very distinct developer communities in Apache Hadoop now: HDFS, YARN & MapReduce. For now the community has voted to merge separate committer lists, but there seems to be some emerging, albeit very early/tenuous consensus that after hadoop-2 is declared 'stable' we should split the project into separate projects (HDFS, YARN, MapReduce). This will better reflect reality that they have distinct communities. No action from the Board is necessary now. Community: * 48 committers * 3817 user@ * 1624 subscribers on general@ COMMON Common is the shared libraries for HDFS and MapReduce. Community: * 1681 subscribers on common-dev HDFS HDFS is a distributed file system that supports reliable replicated storage across the cluster using a single name space. Community: * 735 subscribers on hdfs-dev YARN YARN is a distributed computation framework for easily writing distributed applications. Community: * 86 subscribers to yarn-dev MAPREDUCE MapReduce is an implementation of the map/reduce programming paradigm. Community: * 766 subscribers to mapreduce-dev
No report was submitted.
AI: Greg to pursue a report for Hadoop
Apache Hadoop status report for July 2012 Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. On the people side, we have new people joining our ranks. * We've added two new committers - Daryn Sharp, Jonathan Eagles * We've added one new PMC member: Alejandro Abdelnur On the project side, we have made 1 bug-fix release in the stable line and 1 major new release: - hadoop-1.0.3 was released on 16th May, 2012 - hadoop-2.0.0-alpha was released on 23rd May, 2012 - Work on further Hadoop 2.0.1-alpha (a security bug-fix release) is done, and is currently under vote. - Work on hadoop-1.1.0 is nearly done. COMMON Common is the shared libraries for HDFS and MapReduce. Releases: - hadoop-1.0.3 was released on 16th May, 2012 - hadoop-2.0.0-alpha was released on 23rd May, 2012 Community: * 48 committers * 1613 subscribers on common-dev * 3151 subscribers on common-user * 1533 subscribers on general New committers: * 2 new committers have been added to this project. HDFS HDFS is a distributed file system that supports reliable replicated storage across the cluster using a single name space. Releases: - hadoop-1.0.3 was released on 16th May, 2012 - hadoop-2.0.0-alpha was released on 23rd May, 2012 New committers: * 1 new committer has been added to this project. Community: * 43 committers * 668 subscribers on hdfs-dev * 1205 subscribers on hdfs-user MAPREDUCE MapReduce is a distributed computation framework for easily writing applications that process large volumes of data. Releases: - hadoop-1.0.3 was released on 16th May, 2012 - hadoop-2.0.0-alpha was released on 23rd May, 2012 New committers: * 1 new committer has been added to this project. Community: * 46 committers * 689 subscribers to mapreduce-dev * 1354 subscribers to mapreduce-user
(Hadoop)
Apache Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. On the people side, we have had new people join our ranks: * We've added new committers - Thomas Graves, Robert Evans, Hitesh Shah & Uma M * We've added new PMC members: Aaron Myers, Matt Foley On the project side, we have made 3 releases: - hadoop-1.0.1 was released on 22nd Feb, 2012 - hadoop-0.23.1 was released on 28th Feb, 2012 - hadoop-1.0.2 was released on 4th April, 2012 - Work on further Hadoop 0.23.2 release is nearly done, and is scheduled for a release in the next few days. - Developer community is working well together. COMMON Common is the shared libraries for HDFS and MapReduce. Community: * 46 committers * 1520 subscribers on common-dev * 2952 subscribers on common-user * 1503 subscribers on general New committers: * 4 new committers (Thomas Graves, Robert Evans, Hitesh Shah & Uma M) have been added to this project. HDFS HDFS is a distributed file system that supports reliable replicated storage across the cluster using a single name space. New committers: * 1 new committer (Uma M) has been added to this project. Community: * 42 committers * 607 subscribers on hdfs-dev * 1092 subscribers on hdfs-user MAPREDUCE MapReduce is a distributed computation framework for easily writing applications that process large volumes of data. New committers: * 3 new committers (Thomas Graves, Robert Evans & Hitesh Shah) have been added to this project. Community: * 45 committers * 637 subscribers to mapreduce-dev * 1250 subscribers to mapreduce-user
Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. On the people side, we have new person join our ranks. * We've added one new committer - Siddharth Seth. On the project side, we have made some very exciting progress. We have had a total of 3 releases: - hadoop-0.23.0 released from trunk, first one off trunk in nearly 2 years. - hadoop-0.22.0 released, branched in early 2011. - hadoop-1.0.0 released of branch-0.20.2xx baseline (now branch-1) - https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces21 - Work on further Hadoop 0.23.1 release is continuing, and is scheduled for release at the end of the month - Developer community is working well together. The public dialogue among vendors who employ many in the developer community seems to have died down since the last board report. No action from the board is required at this stage. - Some vendors are continuing to use the lists to promote their own products. A few PMC members have responded to discourage this practice, but not directly as the PMC. No action from the board is required at this stage. COMMON Common is the shared libraries for HDFS and MapReduce. Releases: * 0.23.0 was released on 11th Nov, 2011. * 0.22.0 was released on 10th Dec, 2011. * 1.0.0 was released on 29th Dec, 2011. Community: * 42 committers * 1433 subscribers on common-dev * 2761 subscribers on common-user * 1468 subscribers on general New committers: * 1 new committer has been added to this project. HDFS HDFS is a distributed file system that supports reliable replicated storage across the cluster using a single name space. Community: * 41 committers * 567 subscribers on hdfs-dev * 985 subscribers on hdfs-user MAPREDUCE MapReduce is a distributed computation framework for easily writing applications that process large volumes of data. Releases: * None this period. New committers: * 1 new committer has been added to this project. Community: * 42 committers * 587 subscribers to mapreduce-dev * 1118 subscribers to mapreduce-user
WHEREAS, the Board of Directors heretofore appointed Ian Holsman to the office of Vice President, Apache Hadoop, and WHEREAS, the Board of Directors is in receipt of the resignation of Ian Holsman from the office of Vice President, Apache Hadoop, and WHEREAS, the Project Management Committee of the Apache Hadoop project has chosen by vote to recommend Arun Murthy as the Successor to the post; NOW, THEREFORE, BE IT RESOLVED, that Ian Holsman is relieved and discharged from the duties and responsibilities of the office of Vice President, Apache Hadoop, and BE IT FURTHER RESOLVED, that Arun Murthy be and hereby is appointed to the office of Vice President, Apache Hadoop, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed. Resolution 7B was approved by unanimous roll call vote, with Doug Cutting abstaining.
Hadoop status report for October 2011 Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. On the people side, we have a couple of new people join our ranks. * Giri Kesavan, and Jitendra Pandey have accepted a role in the PMC * 4 people have accepted committership. Alejandro Abdelnur, Harsh J Chouraria, Eric Yang and Ramya Sunil * A new PMC Chair (Arun Murthy) is being recommended to the board for their approval. On the project side, we have made some exciting progress. - 0.20.205's vote has closed successfully, and will be released shortly. This release integrates two major features (security & append), of which the append feature was topic of much internal debate, so this is an excellent outcome for the health of Hadoop, and allows other projects like HBase to use a 'official' release. - Work on Hadoop 0.23 release is continuing, and is scheduled for release at the end of the month - Konstantin Shvachko is now leading the 0.22 Release process - Mavenization of our codebase is complete - Developer community is working well together - Vendors are continuing to use the lists to promote their own products. We are formulating appropriate responses to discourage this practice. No action from the board is required at this stage. COMMON Common is the shared libraries for HDFS and MapReduce. Releases: * 0.20.204.0 (beta) was released on the 5 September. Community: * 2598 subscribers on common-dev * 1341 subscribers on common-user * 1392 subscribers on general HDFS HDFS is a distributed file system that supports reliable replicated storage across the cluster using a single name space. New committers: * 4 new committers has been added to this project. Community: * 41 committers * 499 subscribers on hdfs-dev * 864 subscribers on hdfs-user MAPREDUCE MapReduce is a distributed computation framework for easily writing applications that process large volumes of data. Releases: * None this period. New committers: * 4 new committers has been added to this project. Community: * 41 committers * 528 subscribers to mapreduce-dev * 1016 subscribers to mapreduce-user
Hadoop status report for August 2011 Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. * Hadoop Summit - 1600 people attended * HortonWorks launch * The 0.20.203.0 release and the divisive vote. * 0.20.204.0 is having a rc1 voted on. * Hadoop naming debate * Lack of progress on contacting the potential trademark infringers * 0.22 stalling * More weight gathering behind 0.23 * Growing ecosystem as more incubator project are in the Hadoop ecosystem * Commercial forks of Hadoop (eg. MapR) and how to respond to them on the lists and attending developer meetups * A number of developers active on the HA Jira (HDFS-1623) asked for a in-person high bandwidth meeting to to get clarification on the design document posted on the Jira, this wasn't publicized on-list * Fixed of our site to claim trademark for Hadoop and the other Apache projects. * Trademarks is proceeding with registering the Hadoop trademark. * Yahoo removed the references to the Yahoo Distribution of Hadoop. In regards to the releases, we have 3 releases going on. the 0.20.X release stream, that has some minor features and mainly bug fixes, and the 0.22 and 0.23 releases that represent some major changes. 0.22 & 0.23 differ in featureset and 0.23 is a superset of 0.22. COMMON Common is the shared libraries for HDFS and MapReduce. Releases: * None this period. Community: * 1294 subscribers on common-dev * 2487 subscribers on common-user * 1375 subscribers on general HDFS HDFS is a distributed file system that supports reliable replicated storage across the cluster using a single name space. New committers: * 1 new committers have been added to this project. Community: * 38 committers * 465 subscribers on hdfs-dev * 788 subscribers on hdfs-user MAPREDUCE MapReduce is a distributed computation framework for easily writing applications that process large volumes of data. Releases: * None this period. New committers: * 3 new committers have been added to this project. Community: * 40 committers * 485 subscribers to mapreduce-dev * 938 subscribers to mapreduce-user
Larry asked and Owen answered what the "Hadoop naming debate" is. It is a reference to whether to accept http://wiki.apache.org/hadoop/Defining%20Hadoop which seeks to limit the name "Hadoop" to mean releases from Apache and pushing all other derived products to be "powered by Hadoop." There was generally support except from the companies that use the Hadoop name for derivative products. There was a request to suspend the vote for more discussion, but once the vote stopped the discussion stopped.
Report missing; will report next month.
Hadoop status report for April 2011 Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. On the people side, we have a couple of new people join our ranks. * Todd Lipcon has accepted a role in the PMC * Koji Noguchi has accepted a role as a committer in both HDFS & MR. * Matthew Foley has accepted a role as a committer in HDFS. * We have another invitations outstanding, and hope he will take up the committer role shortly. On the branding side, we and the trademark group have been actively engaging companies to make proper use and attribution of our Apache Hadoop Trademark. These discussions are ongoing, and generally positive. On the product release side, Nigel is continuing to progress with the 0.22 release. We have 18 outstanding blockers. HADOOP-7106, which re-organizes some SVN structure, should be committed by the end of next week. MAPREDUCE-2178 is the biggest outstanding blocker that many other depend on. Still no clear plan on getting it fixed. and Arun has taken over with the 0.20.200 (formerly known as 0.20.3). He pushed a giant patch to the branch-0.20-security branch. Then, based on the feedback from the community, Owen took over and committed individual patches for the same codebase to the branch. Currently we have a couple of unit tests failing, after fixing them we should be good to make an official release after getting necessary approvals from the PMC. Discussions around rationalize the codebase have started, with mrunit being moved to the incubator, and further discussions about either maintain the contrib modules or moving them to apache-extras/incubator The biggest news is saved for last. Yahoo! has announced that they will stop maintaining their own internal codebase, and switch to actively developing on the apache one. This is a great step forward, and they have also started having more discussions about architecture (MR-279) on the list. We look forward to more in-depth discussions happening in the public forums. COMMON Common is the shared libraries for HDFS and MapReduce. Releases: * None this period. Community: * 1194 subscribers on common-dev * 2293 subscribers on common-user * 1328 subscribers on general HDFS HDFS is a distributed file system that supports reliable replicated storage across the cluster using a single name space. New committers: * 2 new committers has been added to this project. Community: * 35 committers * 375 subscribers on hdfs-dev * 631 subscribers on hdfs-user MAPREDUCE MapReduce is a distributed computation framework for easily writing applications that process large volumes of data. Releases: * None this period. New committers: * 2 new committer has been added to this project. Community: * 37 committers * 400 subscribers to mapreduce-dev * 764 subscribers to mapreduce-user
Hadoop status report for January 2011 Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. Nigel has volunteered to RM the 0.22, and it is making progress, the previous RM stepped down due to not having enough time since the 6685 patch was not going to make this release. Progress on 6685 has not really progressed. Owen has volunteered to RM the 0.20.3 release, and there is discussions about integrating the 'security' patch-set that Yahoo! is developing, that Arun has volunteered to RM. Both of these are separate branches. We have invited 11 new committers into the project this month, all have accepted, are in the process of getting their accounts setup. We also had 2-3 people who the PMC felt were not ready for committership yet. There is still a lot of discussion about what the criteria of what makes a committer, but I think we are in a better place than before. We are working with the brand management team about Yahoo!'s and Cloudera's use of Hadoop's name. Both of these are showing good progress thanks to the brand management teams hard work. We are still having lots of discussions about future work on the 0.20 branch this includes the security patch-set, adding append, and the 0.20.3 release The security patch-set has it's own issues, due to it requiring some work if it will be contributed as separate patches, and also how it the work will be applied to the upcoming 0.22 release. (see http://s.apache.org/NfJ & http://s.apache.org/uf for the discussions around the append branch & security branches) there have been a couple of misunderstandings around the security releases. We have also started discussions about why we have so many mailing lists, what they are used for, and the possibility of combining some of them (and 2 code bases). We have updated the website to provide better documentation. The codebase discussion is more about moving directories around, rather than combining them into a single one. COMMON Common is the shared libraries for HDFS and MapReduce. Releases: * None this period. Community: * 1123 subscribers on common-dev * 2140 subscribers on common-user * 1335 subscribers on general HDFS HDFS is a distributed file system that supports reliable replicated storage across the cluster using a single name space. New committers: * 5 new committers have been added to this project. Community: * 33 committers * 323 subscribers on hdfs-dev * 525 subscribers on hdfs-user MAPREDUCE MapReduce is a distributed computation framework for easily writing applications that process large volumes of data. Releases: * None this period. New committers: * 8 new committers have been added to this project. Community: * 35 committers * 342 subscribers to mapreduce-dev * 647 subscribers to mapreduce-user
The report indicates that changes have been made that satisfy the board. The project is back on a quarterly reporting schedule.
Hadoop status report for December 2010 Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. There was one contentious issue raised (HADOOP-6685), which ongoing discussion has continued about which technical direction is better moving forward. There is currently a veto on the patch. This patch is not critical to the health of the project. 6 new PMC members have been added, and votes for several new committers have started. We would like to welcome the follow people to the Hadoop PMC: * Eli Collins * Jakob Homan * Amareshwari Ramadasu * Suresh Srinivas * Sharad Agarwal * Vinod Kumar Vavilapalli We have invited a new committer, but so far he has not responded We are working with the brand management team about Yahoo!'s and Cloudera's use of Hadoop's name. The 0.22 release scheduled for November is still in progress. COMMON Common is the shared libraries for HDFS and MapReduce. Releases: * None this period. New Committers: * None this period. Community: * 1089 subscribers on common-dev * 2106 subscribers on common-user * 1294 subscribers on general HDFS HDFS is a distributed file system that supports reliable replicated storage across the cluster using a single name space. Releases: * None this period. New committers: * None this period. Community: * 28 committers * 299 subscribers on hdfs-dev * 498 subscribers on hdfs-user MAPREDUCE MapReduce is a distributed computation framework for easily writing applications that process large volumes of data. Releases: * None this period. New committers: * None this period Community: * 27 committers * 317 subscribers to mapreduce-dev * 612 subscribers to mapreduce-user ZOOKEEPER The ZooKeeper project is now a separate project, and will be removed from further notices going forward
Hadoop status report for October 2010 to November 2010 Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. Discussions have started on the issues that the board identified; we seem to have a general agreement on some issues, but we need an official consensus on the proposals, and have them discussed openly in the public mailing lists. Specifically: * Everyone is in general agreement that we need to release more often. The question revolves around how we test them to ensure they keep to the quality that Hadoop releases are known for. * The discussion of having 'mentors' to help guide new committers was started. * The Cloudera branding issue was forwarded to the trademarks group, where Shane & Karen are deciding how best to pursue the issue of their certification courses and branding on their website. * Bylaws have been discussed on general@ * Owen will be the release manager for the 0.22 release schedule later this month. * The ZooKeeper project has voted to become a separate TLP. This has been raised for the board's consideration. * people have started using reviews.apache.org to discuss patches COMMON Common is the shared libraries for HDFS and MapReduce. Releases: * None this period. New Committers: * None this period. Community: * 1073 subscribers on common-dev * 2068 subscribers on common-user HDFS HDFS is a distributed file system that supports reliable replicated storage across the cluster using a single name space. Releases: * None this period. New committers: * None this period. Community: * 26 committers * 286 subscribers on hdfs-dev * 463 subscribers on hdfs-user MAPREDUCE MapReduce is a distributed computation framework for easily writing applications that process large volumes of data. Releases: * None this period. New committers: * Scott Chen was voted in as a committer in August 2010. Community: * 26 committers * 303 subscribers to mapreduce-dev * 568 subscribers to mapreduce-user ZOOKEEPER ZooKeeper is a reliable coordination service for distributed applications. Releases: * None this period. Two releases are in progress, near term a 3.3.2 fix release (1 blocker pending), and longer term 3.4.0 feature release. New committers: none Community: * 6 active committers, 2 PMC members * 176 subscribers on zookeeper-dev * 356 subscribers on zookeeper-user The ZooKeeper project has petitioned the board to become a TLP.
WHEREAS, the Board of Directors heretofore appointed Owen O'Malley to the office of Vice President, Apache Hadoop, and WHEREAS, with the desire of the Board of Directors to rotate the position of Vice President, Apache Hadoop, the Project Management Committee of the Apache Hadoop Project has chosen to recommend Ian Holsman as the successor to the post; NOW, THEREFORE, BE IT RESOLVED, that Owen O'Malley is relieved and discharged from the duties and responsibilities of the office of Vice President, Apache Hadoop, and BE IT FURTHER RESOLVED, that Ian Holsman be and hereby is appointed to the office of Vice President, Apache Hadoop, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed. Approved by unanimous roll call vote with Doug abstaining.
Hadoop status report for July 2010 to October 2010 Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. The 2rd annual Hadoop World was held on 12 October in NYC. It had 900 attendees. before the conference. The program is available here: http://www.cloudera.com/company/press-center/hadoop-world-nyc/agenda/. The divestiture of sub-projects has continued. We have promoted Hive, and Pig to be top level Apache projects and Chukwa to the Incubator. This has had the positive effect that the majority of the current PMC is involved in the core projects (Common, HDFS and MapReduce). The Hadoop PMC removed one member, who has completely dropped out of contact: * Jim Kellerman and as part of moving subprojects out, the following PMC members resigned: * Alan Gates * Ashish Thusoo * Daniel Dai * Namit Jain * Olga Natkovich * Pradeep Kamath The tension between Cloudera and Yahoo has dramatically increased this quarter and is past the breaking point. This was exacerbated by the board's sudden insistence that the Hadoop project pick a new PMC chair without discussing the issues with anyone other than the Cloudera employee sitting on the board. Over the last 2.5 years, I've done my best to do what was right for the Hadoop project and it is too bad the community has degenerated to the current state. I sincerely want to get the problems resolved so that we can get back to developing software and enjoying a community that can work together. Critical issues for the Hadoop PMC to address: * Change is difficult and this will involve change. * We need to enact bylaws so that there is a clear understanding of the rules. * The PMC needs to define and document the goals and processes that the project will follow going forward. * Expectations about committers reviewing each other's patches * Expectations about becoming a committer and PMC member. * Policies about expecting PMC members and committers to stay involved. People without skin in the game who vote without working on the project are just signing up other people for work. * Poisonous people within the project need to be managed. * Cloudera's abuse of the Hadoop trademark in their product names needs to be halted. COMMON Common is the shared libraries for HDFS and MapReduce. Releases: * 0.21.0 released 24 Aug 2010 with over 1300 jiras resolved. Committers: * We redefined the Common committers to be the union of all HDFS and MapReduce committers. Community: * 1062 subscribers on common-dev * 2067 subscribers on common-user HDFS HDFS is a distributed file system that support reliable replicated storage across the cluster using a single name space. Releases: * 0.21.0 released 24 Aug 2010 with over 1300 jiras resolved. New committers: None Community: * 26 committers * 280 subscribers on hdfs-dev * 454 subscribers on hdfs-user MAPREDUCE MapReduce is a distribute computation framework for easily writing applications that process large volumes of data. Releases: * 0.21.0 released 24 Aug 2010 with over 1300 jiras resolved. New committers: * Scott Chen Community: * 27 committers * 300 subscribers to mapreduce-dev * 553 subscribers to mapreduce-user ZOOKEEPER ZooKeeper is a reliable coordination service for distributed applications. Releases: * No releases this quarter Two releases are in progress, near term a 3.3.2 fix release (1 blocker pending), and longer term 3.4.0 feature release. New committers: none Community: * 5 active committers, 2 PMC members * 176 subscribers on zookeeper-dev (up from 160 3 months ago) * 347 subscribers on zookeeper-user (up from 307 in the same timeframe) Three GSOC students completed their projects successfully. This resulted in significant new functionality being added to the project, and some renewed interest from a contributor standpoint. Two of the three students have indicated that they are interested to continue working in the community. The discussion to move ZooKeeper to TLP status has been reopened and is in progress at the time of this writing. Feedback and community involvement has been slowly increasing, we frequently meet with users during Hadoop meetups and hold site visits.
Good report; looks like progress is being made here.
Hadoop status report for April 2010 to July 2010 Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. The 3rd annual Hadoop Summit was held on 29 June in Santa Clara. It sold out at 1,000 attendees 10 days before the conference. The program is available here: http://developer.yahoo.com/events/hadoopsummit2010/agenda.html. The slides and videos of the presentations are available online. The second Hadoop World was announced in NYC on 12 October. The call for presentations is open until 2 August. There are a large number of local Hadoop User Groups around the world. The Bay Area HUG meets monthly and has an audience of roughly 300 people. To increase communication and reduce tensions, the SF Bay Area core contributors (Common, HDFS, and MapReduce) have been having monthly meetings that rotate between venues (Cloudera, Facebook, and Yahoo!). We've discussed wide-ranging topics from process issues to new technical ideas. All of the notes and slides are distributed on the lists to engage developers who can't attend. The Hadoop PMC added the following members: * Sanjay Radia (Yahoo) * Hemath Yamijala (indep) CHUKWA Chukwa is a distributed log collection framework that aggregates logs from across a cluster into a reasonable number of HDFS files. As part of the continuing Hadoop divestiture of sub-projects, Chukwa's developers were encouraged to move to Apache Incubator. Although Chukwa has already completed many of the Incubator graduation requirements (diversity of committers, code clearance, releases), they have not voted in new contributors or PMC members. Also, none of the Chukwa committers have been on any Apache PMC's and need more guidance than jumping into a TLP would have provided. Some of the work has been done (accepted by Incubator, moved subversion, added to Incubator wiki), but more is left to do (web site, mailing lists). They are scheduled to report next month as a Podling. COMMON Common is the shared libraries for HDFS and MapReduce. Releases: * 0.21.0 release candidates for Common, HDFS, and MapReduce have been rolled, but there are still some blockers. The hope is to get the blockers fixed and a release out next month. New committers: * Amareshwari Sriramadasu (Yahoo) Community: * 1013 subscribers on common-dev * 1965 subscribers on common-user HDFS HDFS is a distributed file system that support reliable replicated storage across the cluster using a single name space. Releases: * The 0.21 release is still solidifying. * A new branch called branch-0.20-append was created to support the append feature to HDFS files. HBase needs this feature to run without data loss in a production environment. New committers: * New committer Eli Collins (Cloudera) Community: * 26 committers * 198 code contributors * 247 subscribers on hdfs-dev * 390 subscribers on hdfs-user * Design proposal to support distributed HDFS NameNode. HIVE Hive is a data warehouse written on top of Hadoop. It provides SQL to query and manage data stored in Hadoop in table and partitions and provides a metastore to metadata information about the data stored in hadoop. Releases: 0.6.0 branched and we are priming up to release it. New committers: * John Sichi (Facebook) Community: * 164 contributors (commented, filed bugs or contributed to Hive). This was 115 at the last report time. MAPREDUCE MapReduce is a distribute computation framework for easily writing applications that process large volumes of data. Releases: * The 0.21 release is still solidifying. New committers: * Amareshwari Sriramadasu (Yahoo) Community: * 268 subscribers to mapreduce-dev * 465 subscribers to mapreduce-user PIG Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Releases: * Pig 0.7.0 released on 5/13/2010 Community: * 12 committers and 5 emeriti (4 retired in the last month) * 191 developers (compared to 181 in the last report) * 452 users (compared to 402 in the last report) ZOOKEEPER ZooKeeper is a reliable coordination service for distributed applications. Releases: * release 3.3.1 on 17/May/10 New committers: none Community: * 5 active committers, 2 PMC members * 160 subscribers on zookeeper-dev (up from 147 3 months ago) * 307 subscribers on zookeeper-user (compared to 269 in the same timeframe) Three student proposals to work on ZooKeeper projects were accepted for GSOC. Feedback and community involvement has been slowly increasing, we frequently meet with users during Hadoop meetups and hold site visits.
Noirin reminded the PMC to let ConCom know when there are events going on in their community, even if the PMC is not the one organizing them.
Hadoop status report for January 2010 to April 2010 Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. In response to the board's request that we evaluate the Hadoop sub-projects with respect to ensuring adequate supervision. Here is the breakdown by sub-project: * Avro and HBase have decided to each become a TLP. * Pig and ZooKeeper have discussed the issue and would prefer to remain a sub-project for now. The three primary concerns are the work of splitting themselves out, a lack of organizational diversity in the committers, and loss of visibility if the Hadoop TLP site doesn't link to them. The last concern can be addressed by ensuring that the TLP *does* continue to link to their project pages. These projects are adequately monitored and have good representation on the PMC, but the PMC is still discussing what the their recommendation to the board is. * Hive hasn't discussed the issue, which needs to be addressed. I expect that it is in the same group as Pig and ZooKeeper. * Chukwa still struggles to broaden its community from the original developers and to reach consensus on its goals. It has three committers, but no representation on the PMC, which makes it difficult to make releases and ensure adequate supervision. The PMC has not yet discussed what to do with Chukwa. * Common, HDFS, and MapReduce are still very tightly bound. Many patches cross 2 or 3 of the 3 sub-projects and each of the trunks only builds against the other project's trunks. They are branched and released in unison. They will likely remain together for a long time. We started the process of discussing the bylaws that Hadoop should adopt, but we need to drive this through to completion. I would suggest that in the future, projects which are becoming TLP establish bylaws as part of being created. Without explicit bylaws, there are many votes for which it isn't clear what the required level of consensus is. The Hadoop PMC added the following members: * Namit Jain AVRO Avro is an inter-language serialization and RPC library that supports versioning of schemas and protocols for both compiled and interpreted languages. A resolution to promote Avro to a top-level project is currently before the board. If the board passes this resolution, this will be Avro's last report as a Hadoop subproject. Avro made three releases this quarter, 1.3.0, 1.3.1 and 1.3.2. We expect to make a 1.4.0 release in the next quarter. Development has been active in all versions of Avro: C, C++, Java, Python, and Ruby. Three new, legally-independent committers were added this quarter: * Jeff Hodges * Scott Carey * Bruce Mitchener CHUKWA Chukwa is a distributed log collection framework that aggregates logs from across a cluster into a reasonable number of HDFS files. Release: * Testing Chukwa 0.4.0 RC1 to RC3 Current state of community * 4 active contributors * 15 subscribers on chukwa-dev * 17 subscribers on chukwa-user The upcoming 0.4 release will include new real time Hadoop Activity monitor for small to mid scale Chukwa deployment and JMSAdaptor for pulling data from JMX. COMMON Common is the shared libraries for HDFS and MapReduce. Releases: * 0.20.2 (including HDFS and MapReduce) was released on 2/16/2010 with 29 patches * We plan to rebase the 0.21 branch to trunk this month New committers: * none Community: * 963 subscribers on common-dev * 1924 subscribers on common-user The development has continued to be active, including Kerberos-based and token-based authentication to the RPC. The previous 0.21 branch failed to be released and we expect to rebase the branch to the current trunk in the next few weeks. The challenge is learning how to adapt our project and processes to the growing importance of Hadoop. We are moving toward a release manager-based approach similar to the HTTPD one, in the hopes that will lead to stable releases without stagnating on the 0.20 branch forever. We are also requiring more thought out, documented, and tested changes. Changes that are backwards incompatible or potentially destabilizing must go through a lot of scrutiny. This is all part of the process of moving from a research prototype to a critical piece of infrastructure in our respective organizations. HBASE HBase is a distributed column-oriented database built on top of Hadoop Common and HDFS. A resolution to promote HBase to a top-level project is currently before the board. If the board passes this resolution, this will be HBase's last report as a Hadoop subproject. Releases: * 0.20.3 on 2010/01/25 -- 74 fixes. * There is currently a release candidate out for 0.20.4 New Committers: * None Community * HBase User Group 9 met at Mozilla, 03/10/2010 * HBase User Group 10 and Hackathon happening 04/19/2010 HDFS HDFS is a distributed file system that support reliable replicated storage across the cluster using a single name space. Releases: * We plan to rebase the 0.21 branch to trunk this month New committers: * None Community: * 11 new code contributors * 25 committers * 194 code contributors * 205 subscribers to hdfs-dev * 317 subscribers to hdfs-user Work is in progress to incorporate security features into HDFS. HIVE Hive is a data warehouse written on top of Hadoop. It provides SQL to query and manage data stored in Hadoop in table and partitions and provides a metastore to metadata information about the data stored in hadoop. Releases: release 0.5.0 on 2010/02/23. This release has 106 bug fixes, 39 new features and 26 improvements. New committers: * John Sichi Community: * Hive User Group meetup at Facebook, 03/18/2010 attended by over 70 people. * A total of 138 people have commented, filed bugs or contributed on the Hive JIRA so far. This number was at 115 at the time of the last report. MAPREDUCE MapReduce is a distribute computation framework for easily writing applications that process large volumes of data. Releases: * We plan to rebase the 0.21 branch to trunk this month New committers: * None Community: * 178 subscribers to mapreduce-dev * 280 subscribers to mapreduce-user Features: Security features are being implemented that include both the Kerberos-based and token-based authentication and authorization so that user's can define who is allowed to do what on their job. PIG Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Releases: * Pig 0.6.0 released on 3/1/2010 New committers: * Dmitriy Ryaboy * Thejas Nair Community: * 15 committers * 181 developers (compared to 171 in the last report) * 402 users (compared to 225 in the last report) We've put out 4 GSOC ideas and received 2 student proposals. Pig community reviewed the board's request to promote some of the subprojects to TLP. Pig community consensus is to stay as Hadoop subproject for the time being. Detailed discussion can be found at http://www.mail-archive.com/pig-dev@hadoop.apache.org/msg08589.html. ZOOKEEPER ZooKeeper is a reliable coordination service for distributed applications. Releases: * release 3.3.0 on 25/March/10 New committers: none Community: * 5 active committers, 2 PMC members * 147 subscribers on zookeeper-dev (up from 114 3 months ago) * 269 subscribers on zookeeper-user (up from 225 in the same timeframe) We've put out a number of GSOC ideas and seen 6 student proposals. Mentors are reviewing and we hope to gain a number of projects for GSOC 2010. Feedback and community involvement has been slowly increasing, we frequently meet with users during Hadoop meetups and hold site visits. Work is underway to certify and support running the ZooKeeper service in production under Windows servers. The ZooKeeper community reviewed the board's request to examine subprojects with an eye to graduation to TLP status. Please find the results of the ZooKeeper as TLP discussion here: http://bit.ly/c4fuZT There was consensus amongst the development team that we will stay as a subproject of Hadoop for the time being. Full details of the discussion can be found in the thread provided
Wide concern that there is a disconnect between how Hadoop is run and the expectation from the board on how Apache projects are run; Jim to join the mailing list.
Hadoop status report for October 2009 to January 2010 Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. Hadoop World China was held on 2009/11/15 and was well attended. It had representation from Cloudera, Facebook, Google, and Yahoo. There was also a smaller Hadoop Conference in Japan on 2009/11/13. The Hadoop PMC added the following members: * Daniel Dai * Pradeep Kamath * Zheng Shao * Tsz Wo (Nicholas) Sze AVRO Avro is an inter-language serialization and RPC library that supports versioning of schemas and protocols for both compiled and interpreted languages. Development has been brisk this quarter. We're anticipating a 1.3 release in late January. New committers: * Philip Zeyliger * Jeff Hammerbacher Community: * 6 active committers * 94 subscribers on avro-dev * 114 subscribers on avro-user CHUKWA Chukwa is a distributed log collection framework that aggregates logs from across a cluster into a reasonable number of HDFS files. Release: * release 0.3.0 on 2009/11/09 with 40 issues * branch 0.4 planned for 2010/02 Current state of community * 4 active contributors * 15 subscribers on chukwa-dev * 17 subscribers on chukwa-user The upcoming 0.4 release will include new real time Hadoop Activity monitor for small to mid scale Chukwa deployment. COMMON Common is the shared libraries for HDFS and MapReduce. Releases: * branch 0.21 was made on 2009/09/18 New committers: * Boris Shkolnik * Jakob Homan Community: * 904 subscribers on common-dev * 1838 subscribers on common-user The development has continued to be active, but work on the blockers on the upcoming 0.21 release has been moving very slowly. Even after splitting HDFS and MapReduce out of Common a large number of patches cross the sub-project boundaries. HBASE HBase is a distributed column-oriented database built on top of Hadoop Common and HDFS. Releases: * 0.20.2 on 2009/11/19 -- 40 fixes. * There is currently a release candidate out for 0.20.3 New committers: * Lars George. HDFS HDFS is a distributed file system that support reliable replicated storage across the cluster using a single name space. Releases: * No new releases this quarter. A considerable effort is being made to make the earlier release 0.21 stable. New committers: * Boris Shkolnik * Jakob Homan Community: * 25 committers * 183 code contributors * 245 subscribers on hdfs-user * 163 subscribers on hdfs-dev Features: The HDFS Append feature is now part of the latest HDFS 0.21 release. A design for implementing security in HDFS has been published in the Jira forum and is gathering feedback from developers. HIVE Hive is a data warehouse written on top of Hadoop. It provides a SQL to query and manage data stored in Hadoop in table and partitions. Releases: release 0.4.1 on 2009/12/17 with 7 issues New committers: none this quarter For the upcoming 0.5 release, there are 153 resolved issues and 3 open ones. MAPREDUCE MapReduce is a distribute computation framework for easily writing applications that process large volumes of data. Releases: * branch 0.21 was made on 2009/09/18 New committers: none this quarter Community: * 178 subscribers to mapreduce-dev * 280 subscribers to mapreduce-user Features: MapReduce 0.21 continues to stabilize relatively slowly. Security and changes to support Avro types through the shuffle continue to go in. PIG Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Releases: * release 0.5.0 on 29/Oct/09 with 48 issues * release 0.6.0 branched and no blockers; to be released shortly New Committers: * Ashutosh Chauhan * Dmitry Ryaboy * Richard Ding * Jeff Zhang Community: * 354 subscribers to pig-user * 171 subscribers to pig-dev ZOOKEEPER ZooKeeper is a reliable coordination service for distributed applications. On 12/4/09 we gained a new committer, Henry Robinson of Cloudera! Releases: * release 3.2.1 on 9/Sept/09 * release 3.1.2 on 14/Dec/09 * release 3.2.2 on 14/Dec/09 Community: * 114 subscribers on zookeeper-dev (up from 99 3 months ago) * 225 subscribers on zookeeper-user (up from 175 in the same timeframe) Feedback and community involvement has been slowly increasing, we frequently meet with users during Hadoop meetups and hold site visits.
Justin suggested that the board ask that the Hadoop project answer the same questions regarding spinning off subprojects that was asked of Lucene in the previous month. Doug indicated that this was in progress.
Hadoop status report for July to October 2009 Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. Hadoop World NYC on 2009/10/02 was well received by roughly 500 attendees. It was organized by Cloudera and sponsored by Yahoo, Facebook, Amazon WebServices, IBM, Rackspace, Softlayer, eHarmony, SuperMicro, Intel, Impetus, Booz Allen Hamilton, and Vertica. The format was similar to Hadoop Summit with a general session with six 20 minute talks in the morning and three tracks each with ten 30 minute talks in the afternoon. Hadoop World China will be held next month. AVRO Avro is an inter-language serialization and RPC library that supports versioning of schemas and protocols for both compiled and interpreted languages. Releases: * release 1.2.0 on 2009-10-14 * release 1.1.0 on 2009-09-08 * release 1.0.0 on 2009-07-09 New committers: Matt Massie Thiruvalluvan M. G. CHUKWA Chukwa is a distributed log collection framework that aggregates logs from across a cluster into a reasonable number of HDFS files. New committers: Jerome Boulon The SALSA and Mochi suite of Hadoop log analysis and visualization tools, built at Carnegie Mellon, have been progressively phased in and integrated with the Chukwa log collection and processing infrastructure. The basic analysis and visualization components are available, and further work is being done to improve the user-friendliness of operating these added tools, and to improve the automated manageability for analysis and visualization. This can also serve as a roadmap for other analysis tools to be integrated with Chukwa. Development has been proceeding steadily. Chukwa is substantially more reliable, flexible and robust than it was a year ago, or even four months ago. The system is in production use at UC Berkeley, and a number of user suggestions have been incorporated. We intend to release 0.3 in the coming weeks. COMMON Common is the shared libraries for HDFS and MapReduce. Releases: * release 0.19.2 on 2009/06/30 with 40 issues * release 0.20.1 on 2009/09/01 with 87 issues * branch 0.21 was made on 2009/09/18 New committers: Konstantin Boudnik for QA Suresh Srinivas Community: * 784 subscribers on common-dev * 1738 subscribers on common-user The upcoming 0.21 release will include the new FileContext API, which will replace the FileSystem API, and the visibility and audience annotations that let us mark the intended public-ness of various classes. HBASE HBase is a distributed column-oriented database built on top of Hadoop Common and HDFS. HBase had a User Group meeting on August 7th and a Hackathon over the weekend of August 7-9. Both events were open to the public and hosted by StumbleUpon. Releases: * release 0.20.0 on 09/September/2009 - 465 issues addressed by this release * release 0.20.1 on 10/12/2009 - 60 issues addressed by this release Current state of community * 23 active comtributors * 459 subscribers to hbase-user mailing list * 185 subscribers to hbase-dev mailing list HDFS HDFS is a distributed file system that support reliable replicated storage across the cluster using a single name space. Releases: * release 0.19.2 on 2009/06/30 as part of common 0.19.2 * release 0.20.1 on 2009/09/01 as part of common 0.20.1 * branch 0.21 was made on 2009/09/18 New committers: Konstantin Boudnik for QA Suresh Srinivas Community: * 112 subscribers to hdfs-dev * 154 subscribers to hdfs-user HDFS 0.21 was feature frozen and branched. The biggest features are the much requested feature to append and sync to written files. There are 8 remaining blocker issues that need to be resolved. A developer meet focused entirely on HDFS testing was held at the Yahoo Sunnyvale campus. It was well represented by about 15 contributors from Yahoo, Cloudera, Facebook, etc. HIVE Hive is a data warehouse written on top of Hadoop. It provides a SQL to query and manage data stored in Hadoop in table and partitions. Releases: release 0.4.0 on 2009/10/14 with 209 issues New committers: * Edward Capriolo * He Yongqiang Hive 0.4.0 had 46 new features, 115 bug fixes, 6 optimizations, 35 improvements and 2 incompatible changes. At present there are 617 open issues with none of them as a blocker for 0.5.0. A total of 619 issues have been resolved so far. Community: we continue to see new contributors in the project. Since the last report the number of contributors in the project have grown from 21 to 48. Out of these 35 contributors are external to Facebook. A total of 94 people have commented, filed bugs or contributed on the Hive JIRA so far. This number was at 49 at the time of the last report. MAPREDUCE MapReduce is a distribute computation framework for easily writing applications that process large volumes of data. Releases: * release 0.19.2 on 2009/06/30 as part of common 0.19.2 * release 0.20.1 on 2009/09/01 as part of common 0.20.1 * branch 0.21 was made on 2009/09/18 New committers: Konstantin Boudnik for QA Community: * 121 subscribers to mapreduce-dev * 172 subscribers to mapreduce-user MapReduce 0.21 will have substantially improved Capacity and FairShare schedulers that let administrators share clusters more effectively. The ability to run tasks as the submitting user and a standardized job history format written in Avro's JSON format. PIG Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Releases: * release 0.4.0 on 29/Sep/09 with 48 issues Community: * 155 subscribers on pig-dev * 269 subscribers on pig-user (I could not update this number because my request failed with mailbox full error) ZOOKEEPER ZooKeeper is a reliable coordination service for distributed applications. Releases: * release 3.2.1 on 2009/09/09 Community: * 99 subscribers on zookeeper-dev * 175 subscribers on zookeeper-user Feedback and community involvement has been slowly increasing, we frequently meet with users during Hadoop meetups and hold site visits.
Hadoop is a trademark of the ASF, and Hadoop World is a conference and therefore needs to be approved by ConCom. Originally, there was some confusion, but as ConCom approves of this usage, so no issue.
ConCom will work to clarify policies, such as whether the name of such conferences (in the future, not retroactively) need to be named as Apache Hadoop World or the like.
Hadoop status report for April to July 2009. Hadoop is a set of related tools and frameworks for creating and managing distributed applications running on clusters of commodity computers. The Hadoop Summit '09 was held in Santa Clara on June 10 and was attended by more than 750 people. Registration for the event was $100. The morning was a general track and the afternoon had 3 tracks: developers, administration, and applications. Cloudera and Yahoo also offered two free Hadoop training sessions (basics and advanced) the following day that were filled very quickly. Two books were published about Hadoop: * Hadoop: The Definitive Guide by Tom White http://www.hadoopbook.com/ * Pro Hadoop by Jason Venner http://developers.apress.com/book/view/9781430219422 AVRO Avro is an inter-language serialization and RPC library that supports versioning of schemas and protocols for both compiled and interpreted languages. Releases: * coming soon release 1.0.0 with 52 jiras addressed from 12 contributors CHUKWA Chukwa is a distributed log collection framework that aggregates logs from across a cluster into a reasonable number of HDFS files. Releases: * release 0.1.2 on 14/May/2009 with 132 issues * currently voting on release 0.2.0 with 56 issues COMMON (was previously Core) Common is the shared libraries for HDFS and map/reduce. This quarter we split the Core subproject into Common, HDFS, and Map/Reduce. The old branches and releases are in Common, but for 0.21 in the three subprojects will release independently. Releases: * release 0.20.0 on 22/Apr/09 with 114 issues * currently voting on 0.19.2 with 42 issues Community: * 784 subscribers on common-dev * 1703 subscribers on common-user HBASE HBase is a distributed column-oriented database, build on top of Hadoop Common and HDFS. Releases: * release 0.19.2 on 09/May/09 - 17 issues addressed by this release * release 0.19.3 on 27/May/09 - 15 issues addressed by this release * release 0.20.0 (alpha) on 17/Jun/09 * coming soon release 0.20.9 with 338 out of 354 issues addressed New Committers: * Andrew Purtell (previously missed from the board report) * Nitay Joffe * Ryan Rawson * Jonathan Gray 3. Current state of community * 23 active comtributors (159 contributors since project inception) * 459 subscribers to hbase-user mailing list * 185 subscribers to hbase-dev mailing list HDFS HDFS is a distributed file system that support reliable replicated storage across the cluster using a single name space. A developer meet for Hadoop was held at the Yahoo Sunnyvale campus to discuss requirements for HDFS Appends. It was well represented by about 15 contributors from Yahoo, Microsoft, Facebook, etc. Another developer meet was held at the Cloudera campus in Burlingame. This meet discussed, among others, a few short-term HDFS issues that need attention. Community: * 50 subscribers on hdfs-dev * 51 subscribers on hdfs-user HIVE Hive is a data warehouse written on top of Hadoop Core. It provides a SQL to query and manage data stored in hadoop in table and partitions. Releases: * release 0.3.0 on 29/Apr/09 with 52 issues * coming soon release 0.4.0 in the next month with 130 issues At present there are 248 open issues filed against Hive. Committers: * Yongqiang He Community: * 30 contributors (up from 21 in the last report) * 67 people have commented on Hive Jiras MAP/REDUCE Map/reduce is a distribute computation framework for easily writing applications that process large volumes of data. Community: * 51 subscribers to mapreduce-dev * 56 subscribers to mapreduce-user PIG Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Releases: * release 0.3.0 on 25/Jun/09 with 33 issues Community: * 144 subscribers on pig-dev * 269 subscribers on pig-user ZOOKEEPER ZooKeeper is a reliable coordination service for distributed applications. Releases: * release 3.2.0 on 8/Jul/09. A number of major new features are included, in particular; extending the client libraries to include common ZK use cases (recipes), namespace support, added python binding support, REST based API to the server, Perl binding support, numerous optimizations and bug fixes (122 JIRAs in this release). Community: * 83 subscribers on zookeeper-dev * 141 subscribers on zookeeper-user Feedback and community involvement has been slowly increasing, we frequently meet with users during Hadoop meetups and hold site visits.
Regarding Hadoop Common: Greg wondered if Common could move over to commons.apache.org
Doug suggested that that was premature, and that much of the code may not be useful to non-Hadoop applications.
Brett agreed that that would not make sense.
Regarding developer meet up: Justin: only committers were invited, if somebody else had showed up, would they have been allowed?
Doug: invitations were sent directly to committers.
Jim: would others have been allowed?
Doug: others did hear about it and did attend. I would appreciate clear guidelines.
Roy: committers only is normal for a dev meeting
Jim: I remember an issue with a "closed" meeting with Geronimo, and will dig up those minutes. You want to avoid any impression that it is by invitation only.
Roy: there is no problem with contributors only, the problem is if you only invite a subset of the contributors
Justin: the issue is that contributors may be a superset of committers
Doug: I would be happy with a rule that it should be discussed on the dev list, and be invitation only, with all committers being included
Jim: that makes sense
Roy: suggests updating /dev with this information
Jim: volunteers?
Roy: will do
Hadoop is a set of tools for creating and managing distributed applications, especially those with large data sets. Hadoop was the focus of a nice article in the New York Times (http://tinyurl.com/coafzr) on 17 March 2009. Unfortunately, the article failed to mention that Hadoop is an Apache project. The PMC added 8 new members: Raghu Angadi, Devaraj Das, Chris Douglas, Alan Gates, Mahadev Konar, Hairong Kuang, Konstantin Shvachko, and Ashish Thusoo. We've also voted to create two new subprojects: Chukwa and Avro. Chukwa is a distributed log aggregation and cluster monitoring system that was originally in Core's contrib directory. The initial committers for Chukwa are Ariel Rabkin and Eric Yang. Avro is a serialization and RPC library with a focus on supporting versioned persistent data and supporting scripting languages. The initial committers for Avro are Doug Cutting and Sharad Agarwal. Hadoop was well represented at ApacheCon EU, with a track of talks about Core, HBase, and Pig. A Hadoop Summit is being organized for June 10th in Santa Clara. CORE, HDFS, and MAP/REDUCE Core is the fundamental set of utilities, including RPC, serialization, and compression that the rest of Hadoop depends on. HDFS provides a distributed file system. Map/Reduce provides a framework for distributed applications that process large data sets. Amazon has started explicitly marketing and supporting Hadoop as a service on EC2 at a much lower cost than a standard EC2 virtual machine. We are still in the process of factoring Map/Reduce and HDFS out of Core. The code is separated and all that is left to be split are the unit test cases and their dependencies. Releases: 0.20.0 is nearing release, with 280 jiras addressed. 0.18.3 was released on 27 Jan 2009 with 51 jiras addressed. The current plan is to try and release Core, HDFS, and Map/Reduce 1.0 this year. Community: Core has added Sharad Agarwal, Giri Kesavan, Ariel Rabkin, Sanjay Radia, and Eric Yang as committers. The community is active and growing. HBASE HBase is a distributed column-oriented database, build on top of Hadoop Core. Releases: 0.18.1 was released on 27 October 2008. 14 issues were addressed. 0.19.0 was released on 21 January 2009. 184 issues were addressed. 0.19.1 was released on 19 March 2009. 43 issues were addressed. Work is underway on release 0.20.0 with 97 of 174 issues resolved. It is expected that many of the open issues will be pushed to a subsequent release. Meet-ups: January 14, 2009; March 3, 2009 - HBase User Group meetings in San Francisco January 30, 2009 - HBase Hackathon in Los Angeles Community: There are no new committers since the last report. There are about 7 active contributors (of which 3 are committers). There are also a number of people who come by to "kick the tires" but then leave because of possible data loss due to a lack of a patch for HADOOP-4379. HIVE Hive is a data warehouse written on top of Hadoop Core. It provides a SQL to query and manage data stored in hadoop in table and partitions. Releases: Our 0.2.0 branch that was to be released in Feb, 2009 was not released and was not put for a vote as there were some significant fixes which the community felt should be checked in before it could be put to vote. As this branch was not fully soak tested on Facebook production load, we decided to target the 0.3.0 branch for release. 0.3.0 was branched in Mar, 2009. All the blockers in that branch have been fixed. We are going to put a release candidate from that branch up for vote by Apr 15, 2009. At present there are 177 open issues with none of them as a blocker for 0.3.0. 111 issues have been resolved since the last report in January. Community: Hive continues to see growth in the number and diversity of contributors. Since the last report the number of contributors in the project have grown from 16 to 21. We added Prasad Chakka, Raghu Murthy, Johan Oskarsson, and Joydeep Sen Sarma as committers. PIG Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. A vote was called on Pig 0.2.0 on 3/27/09. This release is major redesign of the system including addition of type system, significant (2-10x) performance improvements, addition of Limit, ORDER BY desc and Grunt shell improvements. ZOOKEEPER ZooKeeper is a reliable coordination service for distributed applications. Releases: 3.1.0 was released on 2009/02/13 with 70 jiras fixed. 3.1.1 was released on 2009/03/27 with 11 jiras fixed. Our next release, 3.2.0, is slated for 5/26/2009. A number of major new features will be included, in particular; extending the client libraries to include common ZK use cases (recipes), adding REST based API to the server, Perl binding support, numerous optimizations. Community: Feedback and community involvement has been slowly increasing, we frequently meet with users during Hadoop meetups and hold site visits.
There was a discussion around umbrella projects. There's some general concern about splitting up tightly coupled projects. Fear of losing cross pollination. J. Aaron will post to board@/members@.
Bertrand takes the action item to communicate the board view on umbrella projects; recommend thinking about spinning off self-contained projects to TLP.
Hadoop status report for September 2008 to January 2009. Hadoop is a set of tools for creating and managing distributed applications. There were various Hadoop user meetings: * Beijing * Berlin * Los Angeles (HBase) * New Orleans (as part of ApacheCon US) * New York * San Diego * San Francisco (HBase) * Santa Clara CORE, HDFS, and MAP/REDUCE Core is the fundamental set of utilities, including RPC, serialization, and compression that the rest of Hadoop depends on. HDFS provides a distributed file system. Map/Reduce provides a framework for distributed applications that process large data sets. The pace of development in Core is very rapid and the community is active. Some of the Chinese developers have translated the documentation for Core into Chinese and submitted them as a patch. Although the work to factor out Hive is complete, the factoring for HDFS and Map/Reduce is pretty close and they should become separate subprojects in the next 3 months. Discussions, plans, and work have continued to work toward a 1.0 release of Core, HDFS, and Map/Reduce. The hope is to achieve the desired levels of compatibility and stability and release 1.0 this year. Releases: 0.20.0 is feature-frozen, but unreleased with 184 jiras fixed. 0.19.0 was released on 2008/11/18 with 360 jiras fixed. 0.18.2 was released on 2008/11/3 with 25 jiras fixed. HBASE HBase is a distributed column-oriented database, built on top of Hadoop Core. Releases: 0.18.1 was released on 2008/10/27 with 14 jiras fixed. 10 of 11 issues have been addressed for 0.18.2, but it is unclear if 0.18.2 will be released given that 0.19.0 will be released soon. At this point, 176 of 176 issues have been addressed for hbase-0.19.0. Testing is in progress at this moment. If no new blocker issues are identified, a release candidate will be published in the next few days. HIVE Hive is a data warehouse written on top of hadoop. It provides a SQL to query and manage data stored in hadoop in table and partitions. Hive was split out of Core on 11/12/2008. Most of the migration related work from hadoop contrib to hadoop subproject has been completed. Enabling Hudson builds for Hive is still pending. Continuous builds on committed changes using CABIE are already enabled. Releases: We are planning to make our first release, which is named 0.2.0, sometime in the Feb 2009. At present we have 130 outstanding issues with 23 of those identified as blockers for a release. 103 issues have so far been resolved since Hive was open sourced. Hive has added Ashish Thusoo and Namit Jain as committers. The number of contributors to the project has grown from 7 to 16 since Hive became a hadoop subproject. 6 of these are contributors external to facebook. PIG Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Pig graduated from Apache incubator and became Hadoop subproject on 10/17/08 Releases: 0.1.1 was released on 2008/12/8/08 with 2 jiras. Release 0.1.1 was primarily focused on integrating with Hadoop 0.18. Pig welcomed Pradeep Kamath and Santhosh Srinivasan as new committers. ZOOKEEPER Zookeeper is a reliable coordination service for distributed applications. Releases: 3.0.1 was released on 2008/11/24 with 16 jiras fixed. 3.0.0 was released on 2008/10/27 with 108 jiras fixed. Our next release, 3.1.0, is slated for 1/19/2009. A number of major new features will be included, in particular; improved management (JMX) support and Quota (ie. filesystem quota) support will also be added. Feedback and community involvement has been slowly increasing.
Hadoop is a set of tools for creating and managing distributed applications. The PMC felt that the Core project had grown difficult to manage as a single subproject. With the core-dev email list topping over 3600 messages last month it is difficult to keep on top of the entire project. We therefore have voted to split Core into 4 pieces: Core, which is the common infrastructure; HDFS, which is the distributed file system; Map/Reduce, which is the distributed computation framework; and Hive, which is a higher-level query processor built on Map/Reduce. After release 0.19 has stabilized we will work on splitting up the code bases. Additionally, we have started a vote whether to accept Pig as a subproject when it graduates from the incubator. We have added Arun Murthy to the PMC. Hadoop will be well represented at ApacheCon US next month. There will be 3 Hadoop talks in the main series and an assortment of related talks at the Hadoop Camp. There will be presentations about the Core, Hive, Pig, and Zookeeper subprojects at Hadoop Camp. CORE Core is a framework for building distributed applications, which includes a distributed file system and map/reduce. Releases: 0.19.0 is feature-frozen, but unreleased, with 270 jiras. 0.18.2 is unreleased, with 3 jiras. 0.18.1 was released on 2008/09/17 with 6 jiras. 0.18.0 was released on 2008/08/19 with 266 jiras. 0.17.3 is unreleased, with 4 jiras. 0.17.2 was released on 2008/08/11, with 12 jiras. The development has been ever increasing with 0.19.0 having the largest number of patches in a release. In 0.19, includes Hive as a contrib module. We have started discussions on the email lists about when we should release 1.0 and what level of forwards and backwards compatibility we should guarantee. HBASE HBase is a distributed column-oriented database, build on top of Hadoop Core. Releases: 0.2.0 was released on 2008/08/08. 293 issues were addressed by this release. 0.2.1 was released on 2008/09/13. 44 issues were addressed by this release. 0.18.0 was released on 2008/09/21. 58 issues were addressed by this release. The hbase-0.2.x releases run on hadoop-0.17.x. With hbase-0.18.0, releases have been renumbered to reflect the version of hadoop that the hbase release runs on. Work has started on release 0.18.1. 5 of 9 issues have been addressed. Work has started on release 0.19.0. 25 of 58 issues have addressed. On Wednesday, October 8, Microsoft agreed to let two of their engineers, who are committers and PMC members, resume their contributions to HBase. The contributions had been blocked when Microsoft acquired Powerset last quarter. Andrew Purtell was added as an HBase committer. ZOOKEEPER Zookeeper is a service for coordinating processes of distributed applications. Migration from SourceForge to Apache of source, documentation, wiki, issue tracking, mailing lists, etc... is complete. We are planning to make our first Apache release, which is named 3.0, on Oct 22nd with over 85 issues addressed.
We discussed growth of the project... no significant concerns.
Jim to check on Zookeeper's filling out of the Incubator's IP clearance.
Hadoop is a set of tools for distributed applications. The PMC voted to add a new Hadoop subproject, named Zookeeper, which is a distributed coordination service. Zookeeper was developed by Yahoo and was granted to Apache. Zookeeper should form a great basis for Map/Reduce and HDFS high availability. The original committers are Patrick Hunt, Flavio Junqueira, Mahadev Konar, Andrew Kornev, and Ben Reed. There are now monthly Hadoop user get togethers in northern California (http://upcoming.yahoo.com/event/869166) and there is one scheduled for August in London (http://upcoming.yahoo.com/event/506444). CORE Core is a framework for building distributed applications, which includes a distributed file system and map/reduce. Releases: 0.18.0 is feature frozen but unreleased, currently with 254 jiras. 0.17.2 is unreleased, currently with 4 jiras 0.17.1 was released 23 June 2008 fixing 10 jiras 0.17.0 was released 18 May 2008 fixing 200 jiras. 0.16.4 was released 5 May 2008 fixing 4 jiras. 0.16.3 was released 16 April 2008 fixing 7 jiras. Core won the annual terabyte sort benchmark http://tinyurl.com/4o8bns, which is the first time that either a Java or an open source program won the competition. Core has added 4 committers, Johan Oskarsson, Lohit Vijaya Renu, Zheng Shao, and Tsz Wo Sze. We've had very active development and active user base. HBASE HBase is a distributed column-oriented database, build on top of Hadoop Core. Releases: 0.1.1 was released on 27 March 2008. 12 issues were addressed by this release. 0.1.2 was released on 13 May 2008. 27 issues were addressed by this release. 0.1.3 was released on 27 June 2008. 16 issues were addressed by this release. The hbase-0.1.x releases runs on hadoop-0.16.x. Work continues on release 0.2.0 which will run on hadoop-0.17.x. 231 of 239 issues have been resolved. We are targeting the end of July for a release candidate. On Tuesday, July 1, Microsoft and Powerset signed a deal for Microsoft to acquire Powerset. Two of the HBase committers (who are also members of the Hadoop PMC) are employed by Powerset and may not be able to continue work on HBase after the deal closes. They and their manager are working with Microsoft to determine what will happen, but may not know for several weeks yet. ZOOKEEPER Zookeeper is a service for coordinating processes of distributed applications. Migration from SourceForge to Apache is in progress. Yahoo's code grant was filed with the ASF, the SourceForge SVN snapshot has been loaded into ASF SVN and Hudson is now running daily builds on the codebase. SourceForge tracker has been fully migrated to Jira and the developers are now using ASF Jira and mailing lists. Migration of documentation and website is in progress and expected to be completed in the next couple of weeks. A new release of ZooKeeper is being worked on in parallel with the move, completing this will be a major focus subsequent to the ASF migration. Ben Reed (Yahoo) and Ted Dunning (Veoh) presented ZooKeeper at the latest Hadoop social - reaction was extremely positive. Many attendees were already using ZK, and almost all were at least familiar with the project.
It was noted that Zookeeper lacks an ip-clearance. Owen agreed to follow up.
Owen reported that Amazon has agreed to donate a few hundred dollars of computer time on EC2 to individual Hadoop developers for testing and benchmarking.
TLP The Hadoop Summit (http://upcoming.yahoo.com/event/436226/) occurred on March 25 and had more than 300 people attending. It was well received by the community. CORE Development has been active this month with 0.16.2 being released on 2 April 2008. We will likely release 0.16.3 with 7 jiras this week. Release 0.17, which has 160 jiras, has been branched and will be released when it is stabilized. Hadoop Core was well represented at ApacheCon EU with a BOF and 3 talks by Owen O'Malley, Tom White, and Allen Wittenauer. HBASE The first version of HBase as a subproject, version 0.1.0, was released on March 28th. We are now working on patches for version 0.1.1, which will be released after hadoop-0.16.2. 6 of 8 identified issues have been resolved. With the focus on releasing 0.1.0, progress slowed a bit for release 0.2.0. Since last month, an additional 20 issues have been resolved and an additional 29 have been identified for a total of 74 out of 102 issues resolved.
TLP We have filed the appropriate paperwork for using cryptography within Hadoop. The first use will be HADOOP-2239, which will likely be committed this week. Yahoo and the Computing Community Consortium are sponsoring a Hadoop Summit (http://upcoming.yahoo.com/event/436226/) on March 25 to bring together users and developers. 215 people have signed up to attend. CORE We added two committers this month: Mukund Madhugiri for QA and release engineering, and Hemanth Yamjiala for contrib. Development has been active this month and we have released 0.16.1 this month, which fixed 40 jiras. Release 0.17 is scheduled to feature freeze in the first week of April and currently includes 70 committed jiras. HBASE Development has been focused on making our first subproject release, 0.1.0. The 0.1.0 release is feature frozen and runs against Hadoop Core 0.16.x. 20 of the 25 identified blocker issues have been resolved. The priorities for the 0.2 release are robustness and scalability. The proposal is on the HBase Wiki at: http://wiki.apache.org/hadoop/Hbase/Plan-0.2. HBase 0.2 is based on Hadoop Core trunk and is making progress as well with 54 of 73 issues resolved. An hbase contributor, Dennis Kubes, bought the domain hbase.org for the project, which points to hadoop.apache.org/hbase. A second HBase Users Group meeting was held at Powerset on March 4, with approximately 30 people attending. The meeting was informal, mostly getting the user community to discuss problems they had encountered using HBase and to gather issues blocking the 0.1.0 release.
Greg to work with Owen to arrange for the transfer of the hbase.org domain to the ASF.
TLP The top-level project completed the split of Hadoop out of Lucene and into a TLP. The subproject that was Hadoop, is now called Hadoop Core. We have also moved HBase into a sub-project from being in Hadoop Core's contrib directory. Although Core and HBase have many ties, the contributor list and code base is largely disjoint between them and the split will reduce the heavy traffic on both development lists. CORE Hadoop Core has released 0.16.0, 0.15.3, and 0.15.2. As we move toward more stability, we've moved our feature freezes to every 3 months (beginning of Jan, Apr, July, and Oct). Development has been very active, including adding user permissions to HDFS. (Fixed Jira counts: 23 unreleased, 180 for 0.16.0, 4 for 0.15.3, and 15 for 0.15.2) HBASE HBase, which is a distributed storage system for structured data, has become a subproject of Hadoop. We have added Bryan Duxbury as a committer. Development has been very active (Fixed Jira counts: 7 unreleased, 142 for 0.16.0)
Approved by General Consent.
WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to a distributed computing platform, including a distributed filesystem and an implementation of the map/reduce distributed computing metaphor, for distribution at no charge to the public. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache Hadoop Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Hadoop Project be and hereby is responsible for the creation and maintenance of software related to a distributed computing platform, including a distributed filesystem and an implementation of the map/reduce distributed computing metaphor; and be it further RESOLVED, that the office of "Vice President, Apache Hadoop" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Hadoop Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Hadoop Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Hadoop Project: * Andrzej Bialecki <ab@apache.org> * Doug Cutting <cutting@apache.org> * Nigel Daley <ndaley@apache.org> * Jim Kellerman <jimk@apache.org> * Owen O'Malley <omalley@apache.org> * Enis Soztutar <enis@apache.org> * Michael Stack <stack@apache.org> * Christophe Taton <taton@apache.org> * Thomas E. White <tomwhite@apache.org> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Owen O'Malley be appointed to the office of Vice President, Apache Hadoop, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the Apache Hadoop Project be and hereby is tasked with the migration and rationalization of the Apache Lucene Hadoop sub-project; and be it further RESOLVED, that all responsibilities pertaining to the Apache Lucene Hadoop sub-project encumbered upon the Apache Lucene Project are hereafter discharged. Special order 7C, Establish the Apache Hadoop Project, was approved by Unanimous Vote.