
This was extracted (@ 2021-01-20 20:10) from a list of minutes
which have been approved by the Board.
Please Note
The Board typically approves the minutes of the previous meeting at the
beginning of every Board meeting; therefore, the list below does not
normally contain details from the minutes of the most recent Board meeting.
Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (7 months ago) There are currently 12 committers and 9 PMC members in this project. The Committer-to-PMC ratio is 4:3. Community changes, past quarter: - No new PMC members. Last addition was Anton Okolnychyi on 2020-05-19. - Jingsong Lee was added as committer on 2020-10-09 - Zheng Hu was added as committer on 2020-10-09 ## Project Activity: Recent releases: * 0.10.0 was released on 2020-11-11. The 0.10.0 release included: * A new Flink module supporting DataStreams and SQL writes and (batch) reads * A new Hive module supporting reads * Row-level delete implementation, part of the v2 spec, for engine integration More recently, the community has added: * Stored procedures for Spark that perform table maintenance from SQL * New catalog implementations for Nessie and Glue * Writers to support Flink CDC events and Spark MERGE plans * Handling for NaN values in metadata, and NaN predicates The project is making significant progress. ## Community Health: Community activity continues to increase. Recent video sync calls have had 20+ participants, code contributions are increasing in frequency (588 PRs opened and 552 PRs closed), and there are many new community members joining in. The community added two new committers this quarter.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (4 months ago) There are currently 10 committers and 9 PMC members in this project. The Committer-to-PMC ratio is roughly 1:1. Community changes, past quarter: - No new PMC members since graduation on 2020-05-19 - Shardul Mahadik was added as committer on 2020-07-25 ## Project Activity: Recent releases: * 0.9.0 was released on 2020-07-13 * 0.9.1 was released on 2020-08-14 The community expects to release 0.10.0 soon with support for Hive reads, Flink writes, and the utilities needed to implement row-level deletes in external processing engines, like Presto. Notable improvements this month include: * Implemented end-to-end row-level deletes in the client library (direct reads) * Committed Flink write support for both DataStreams and SQL * Added Hive predicate pushdown and a runtime bundle * Committed name mapping support for reading ORC files from non-Iceberg tables * Added a new snapshot expiration action that runs in parallel using Spark * Added metadata to configure tables with a preferred sort order The community is actively working on Hive column pruning, Hive write support, Flink read support, and row-level deletes in more processing engines. ## Community Health: The number of unique contributors increased in the last month to 26, from the previous high watermark of 21. Contributions are still healthy, with 74 commits in the past month. New community members have been contributing documentation and build improvements (PR labels, fixing warnings); it is great to have these valuable contributions in addition to features and bug fixes.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 months ago) There are currently 10 committers and 9 PMC members in this project. The Committer-to-PMC ratio is roughly 1:1. Community changes, past quarter: - No new PMC members (project graduated recently). - Shardul Mahadik was added as committer on 2020-07-25 ## Project Activity: 0.9.0 was released, including support for Spark 3 and SQL DDL commands, support for JDK 11, vectorized Parquet reads, and an action to compact data files. Since the 0.9.0 release, the community has made progress in several areas: - The Hive StorageHandler now provides access to query Iceberg tables (work is ongoing to implement projection and predicate pushdown). - Flink integration has made substantial progress toward using native RowData, and the first stage of the Flink sink (data file writers) has been committed. - An action to expire snapshots using Spark was added and is an improvement on the incremental approach because it compares the reachable file sets. - The implementation of row-level deletes is nearing completion. Scan planning now supports delete files, merge-based and set-based row filters have been committed, and delete file writers are under review. The delete file writers allow storing deleted row data in support of Flink CDC use cases. Releases: - 0.9.0 was released on 2020-07-13 - 0.9.1 has an ongoing vote ## Community Health: The month since the last report has been one of the busiest since the project started. 80 pull requests were merged in the last 4 weeks, and more importantly, came from 21 different contributors. Both of these are new high watermarks. Community members gave 2 Iceberg talks at Subsurface Conf, on enabling Hive queries against Iceberg tables and working with petabyte-scale Iceberg tables. Iceberg was also mentioned in the keynotes.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 months ago) There are currently 9 committers and 9 PMC members in this project. The Committer-to-PMC ratio is 1:1. Community changes, past quarter: - No new PMC members (project graduated recently). - No new committers were added. ## Project Activity: In July, the community held one sync meeting to discuss general topics, and one specifically to discuss how to include both groups that have been working on integration with Hive. To address the question on the last board report, the community sync meetings are video conferences that anyone in the community is welcome to attend. The discussion is documented and summarized for anyone that can't attend. We have found these to be a good way to exchange context and ideas more quickly, but recognize that this isn't the best way for some people to participate and so we don't consider these a forum for making decisions or voting. If we come to a tentative conclusion on a topic, it is still open for further discussion on the dev list. The idea for this comes from the Parquet community that has been doing this for several years. Development activity: * Spark vectorized reads for flat schemas was merged and benchmarked * The Spark 3 integration branch was merged into master * Name mapping for Parquet files without IDs was committed * And action to compact data files was added * Support was added for managing and adding delete files in table metadata * Refactoring to support reuse Spark components for Flink * Several PRs for Flink support have been committed and more are open * CI tests for JDK 11 have been added The community also plans to release 0.9.0 with Spark 3 support soon. ## Community Health: Most community metrics have again increased in the last month, although dev list traffic is a bit lower. More importantly, the community has made further progress on several large areas with different groups leading the efforts, like Hive support, Spark 3 support, and Flink support.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (21 days ago) There are currently 9 committers and 9 PMC members in this project. The Committer-to-PMC ratio is 1:1. Community changes, past quarter: - No new PMC members (project graduated recently). - No new committers were added. ## Project Activity: There were two community syncs in May, with good discussions on adding secondary indexes and fixing some persistent issues, like Guava library conflicts and how to support multiple Spark versions. Development activity: - Row-level delete progress continues with several PRs merged - Added support for ORC predicate push-down and metrics filtering, which is a significant step toward performance parity with Parquet - The vectorized Parquet read path is passing end-to-end tests for flat data - Guava is now shaded and relocated, unblocking integration with Hive - The build changed dependency locking plugins to unblock Hive and Spark 3 work - Flink contributors opened pull requests to merge the prototype sink ## Community Health: Nearly all metrics (list traffic, pull requests, and issues opened) are showing an increase in the last month, and the community has made significant progress on several large extensions (ORC and Flink, notably).
WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to managing huge analytic datasets using a standard at-rest table format that is designed for high performance and ease of use. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache Iceberg Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Iceberg Project be and hereby is responsible for the creation and maintenance of software related to managing huge analytic datasets using a standard at-rest table format that is designed for high performance and ease of use; and be it further RESOLVED, that the office of "Vice President, Apache Iceberg" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Iceberg Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Iceberg Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Iceberg Project: * Anton Okolnychyi <aokolnychyi@apache.org> * Carl Steinbach <cws@apache.org> * Daniel C. Weeks <dweeks@apache.org> * James R. Taylor <jamestaylor@apache.org> * Julien Le Dem <julien@apache.org> * Owen O'Malley <omalley@apache.org> * Parth Brahmbhatt <parth@apache.org> * Ratandeep Ratti <rdsr@apache.org> * Ryan Blue <blue@apache.org> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Ryan Blue be appointed to the office of Vice President, Apache Iceberg, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the Apache Iceberg Project be and hereby is tasked with the migration and rationalization of the Apache Incubator Iceberg podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator Iceberg podling encumbered upon the Apache Incubator PMC are hereafter discharged. Special Order 7G, Establish the Apache Iceberg Project, was approved by Unanimous Vote of the directors present.
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. ### Three most important unfinished issues to address before graduating: 1. Grow the Iceberg community 2. Add more committers and PPMC members ### Are there any issues that the IPMC or ASF Board need to be aware of? No issues. ### How has the community developed since the last report? In the 4 months since the last report, 138 pull requests were merged for an average of 34.5 per month. While this is down from the previous monthly average of 49.6 per month for June through August, this contribution rate is still very active and healthy. Contributions are coming from a regular group of contributors outside of the initial set of committers, which is a positive indication for adding new committers and PPMC members over the next few months. The community released the first version of Apache Iceberg, 0.7.0-incubating. This release used the "standard" incubator disclaimer and included convenience binaries. The release candidate votes were very active with community members testing out the release and reporting problems. There was an Apache Iceberg talk at ApacheCon NA in September. ### How has the project developed since the last report? - The community is building support for the upcoming Spark 3.0 release - The first PR from the vectorization branch has been merged into master - Support for IN and NOT IN predicates was contributed - Python added support for Hive metastore tables and the read path is near commit - Flaky tests have been fixed - Baseline checks (style, errorprone, findbugs) are now applied to all modules ### How would you assess the podling's maturity? Please feel free to add your own commentary. - [ ] Initial setup - [ ] Working towards first release - [x] Community building - [x] Nearing graduation - [ ] Other: ### Date of last release: - 0.7.0-incubating was released 25 October 2019 ### When were the last committers or PPMC members elected? - Anton Okolnychyi was added 30 August 2019 ### Have your mentors been helpful and responsive? Yes. 4 of 5 mentors voted on the 0.7.0-incubating IPMC vote. Thanks to our mentors for being active! ### Is the PPMC managing the podling's brand / trademarks? Yes, the podling is managing the brand and is not aware of any issues. The project name has been approved. ### Signed-off-by: - [x] (iceberg) Ryan Blue Comments: - [ ] (iceberg) Julien Le Dem Comments: - [X] (iceberg) Owen O'Malley Comments: - [ ] (iceberg) James Taylor Comments: - [ ] (iceberg) Carl Steinbach Comments: ### IPMC/Shepherd notes:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. ### Three most important unfinished issues to address before graduating: 1. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 2. Grow the Iceberg community 3. Add more committers and PPMC members ### Are there any issues that the IPMC or ASF Board need to be aware of? No issues. ### How has the community developed since the last report? The community continues to grow steadily. In the last month: * 59 pull requests have been merged * 17 people contributed the merged PRs * 18 issues have been closed, 22 issues were opened For comparison, the last report had 74 pull requests merged over 3 months. ### How has the project developed since the last report? * License documentation has been completed for the Java project, unblocking the first release * Added more documentation to iceberg.apache.org * Started vectorized read branch with significantly better performance * Added metadata tables * Added configuration to control statistics and truncate long values * Improved Hive Metastore integration * A working python read path has been submitted in PRs ### How would you assess the podling's maturity? - [ ] Initial setup - [x] Working towards first release - [x] Community building - [x] Nearing graduation - [ ] Other: ### Date of last release: * No release yet ### When were the last committers or PPMC members elected? * Anton Okolnychyi was added 30 August 2019 ### Have your mentors been helpful and responsive? Yes ### Signed-off-by: - [x] (iceberg) Ryan Blue Comments: - [ ] (iceberg) Julien Le Dem Comments: - [X] (iceberg) Owen O'Malley Comments: The project also gave two presentations: * Berlin Buzzwords (June 2019) * ApacheCon NA (Sep 2019) Iceberg is being used in production at Netflix on huge tables, up to 25 petabytes. - [X] (iceberg) James Taylor Comments: - [X] (iceberg) Carl Steinbach Comments: Approval added by Ryan Blue, Carl had trouble editing the new report location ### IPMC/Shepherd notes: Justin Mclean: The included stats don't really mean much to anyone outside of your project, please drop them from future reports. The community growth section might as well be blank. I find it surprising that this project thinks that it is near graduation. Please discuss this with your mentors.
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. ### Three most important unfinished issues to address before graduating: 1. Update build for Apache release, add LICENSE/NOTICE to Jars. 2. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 3. Grow the Iceberg community ### Are there any issues that the IPMC or ASF Board need to be aware of? * No issues that require attention. ### How has the community developed since the last report? * Community growth has continued with several new contributors and reviewers * Community has decided on style and added checking to CI for most modules * Community has started work on extending the spec for new use cases ### How has the project developed since the last report? * Much more content on iceberg.apache.org has been added * 74 pull requests have been merged, many reviewed by new community members * Work has begun to add row-level deletes and upserts to the format * Added support for Spark streaming, a catalog API, and numerous bug fixes * Contributors are reviewing code, submitting substantial features, and improving dev practices ### How would you assess the podling's maturity? Please feel free to add your own commentary. - [ ] Initial setup (name clearance approval pending) - [X] Working towards first release - [X] Community building - [ ] Nearing graduation - [ ] Other: ### Date of last release: None yet ### When were the last committers or PPMC members elected? None yet ### Have your mentors been helpful and responsive? Yes. ### Signed-off-by: - [X](iceberg) Ryan Blue Comments: I wrote the first pass of the report. - [ ](iceberg) Julien Le Dem Comments: - [X](iceberg) Owen O'Malley Comments: +1 from discussion on dev list - [ ](iceberg) James Taylor Comments: - [ ](iceberg) Carl Steinbach Comments: ### IPMC/Shepherd notes:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. Three most important issues to address in the move towards graduation: 1. Update build for Apache release, add LICENSE/NOTICE to Jars. 2. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 3. Grow the Iceberg community Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * No issues that require attention. How has the community developed since the last report? * The community has continued to receive new contributors * Several contributors are reliable helping review pull requests. Because of these review contributions and the small number of committers, the community voted to relax the RTC requirements and allow committers to push their own changes if the community has reviewed the PR. This helps develop reviewers and gets changes in faster. The vote also set reasonable limits for this practice: PRs must be up for at least 2 days and this is only for the first year, while we are working with a small set of committers. How has the project developed since the last report? * Podling name search concluded that Iceberg is a suitable name. (See PODLINGNAMESEARCH-163) * The community voted to accept a large PR with a Python implementation. * Contributors are fixing important predicate push-down issues, including case sensitivity, filtering on nested types, missing file metrics, etc. * Contributors added support for plugging in file stream encryption. How would you assess the podling's maturity? Please feel free to add your own commentary. [ ] Initial setup (name clearance approval pending) [X] Working towards first release [X] Community building [ ] Nearing graduation [ ] Other: Date of last release: None yet When were the last committers or PPMC members elected? None yet Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. Yes. Signed-off-by: [X](iceberg) Ryan Blue Comments: I wrote the first pass of the report. [ ](iceberg) Julien Le Dem Comments: [X](iceberg) Owen O'Malley Comments: (Approval copied from +1 on dev list) [ ](iceberg) James Taylor Comments: [ ](iceberg) Carl Steinbach Comments:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. Three most important issues to address in the move towards graduation: 1. Update build for Apache release, add LICENSE/NOTICE to Jars. 2. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 3. Grow the Iceberg community Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * No issues that require attention. How has the community developed since the last report? * Pull requests from 6 contributors were merged, 7 new contributors How has the project developed since the last report? * Submitted evidence for podling name search: PODLINGNAMESEARCH-163 * Netflix submitted a revised trademark agreement for counter-signing * Abstracted data file locations for community use cases * Reviewing proposed API update for file stream encryption plugins * New contributor highlights: - A new contributor is fixing case sensitivity in expressions - A new contributor opened a PR to add a startsWith predicate - A new contributor reviewed 4 pull requests and opened another How would you assess the podling's maturity? Please feel free to add your own commentary. [X] Initial setup (name clearance approval pending) [X] Working towards first release [X] Community building [ ] Nearing graduation [ ] Other: Date of last release: None yet When were the last committers or PPMC members elected? None yet Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. Yes. Signed-off-by: [X](iceberg) Ryan Blue Comments: dev list traffic appears to be increasing also [ ](iceberg) Julien Le Dem Comments: [ ](iceberg) Owen O'Malley Comments: [x](iceberg) James Taylor Comments: [X](iceberg) Carl Steinbach Comments: From dev list: "Looks good to me. +1" IPMC/Shepherd notes:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. Three most important issues to address in the move towards graduation: 1. Finish the name clearance and trademark agreement. 2. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 3. Grow the Iceberg community Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * Gitbox traffic is now going to issues@. The community was losing dev@ subscribers because of the high volume of traffic from Gitbox. However, now all updates are sent to issues@. It would be nice to have emails from creation go to dev@, while updates and resolutions would go the issues@. * The trademark agreement proposed by Netflix was not acceptable to the ASF. It would be helpful if the ASF published the terms that the ASF requires to avoid trial and error. Netflix is drafting a new agreement. How has the community developed since the last report? * Moved gitbox notifications to avoid loss of dev@ subscribers (self-reported leaving dev@). * New contributor activity: 3 new issues opened, 4 PRs submitted * 5 PRs from non-committers merged * 2 contributors started reviewing PRs * New design doc proposed by a community contributor * Moved issues from Netflix repository to Apache repository How has the project developed since the last report? * Planned blockers for first release, 0.1.0, in milestone 1 * Partial python implementation submitted * Manifest listing file added to the spec and implementation committed (blocker for initial release). Resulted in a significant improvement in query planning time for large tables. * Abstracted file IO API to support community use cases * Reviewing community proposal for external plugins to support file-level encryption * Added doc strings to schemas How would you assess the podling's maturity? Please feel free to add your own commentary. [X] Initial setup (name clearance pending) [X] Working towards first release [ ] Community building [ ] Nearing graduation [ ] Other: Date of last release: None yet When were the last committers or PPMC members elected? None yet Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. Last month was December, so traffic has been low and both PPMC members and mentors were slow to respond. This is not abnormal, but the PPMC missed the deadline to file this report. We will ensure this doesn't recur. Signed-off-by: [X](iceberg) Ryan Blue Comments: I wrote the first pass of the report, but after the deadline. [ ](iceberg) Julien Le Dem Comments: [X](iceberg) Owen O'Malley Comments: Approval from +1 on dev list. [ ](iceberg) James Taylor Comments: [ ](iceberg) Carl Steinbach Comments:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. Three most important issues to address in the move towards graduation: 1. Get the SGA accepted. 2. Finish the name clearance. 3. Make the first Apache release. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * Gitbox integration has helped a lot, although it is frustrating that the team members are not allowed to configure the project and must go through infra for every change. * The traffic on the dev list from Github pull requests and issues is pretty heavy. It would be nice to have emails from creation go to dev@, while updates and resolutions would go the issues@. How has the community developed since the last report? This is the first report. How has the project developed since the last report? This is the first report. Both the software grant and trademark agreements have been submitted. Code has been imported and updated to use the ASF license header. LICENSE and NOTICE files have been updated to comply with ASF policy. Podling website is up at https://iceberg.apache.org. How would you assess the podling's maturity? Please feel free to add your own commentary. [X] Initial setup [ ] Working towards first release [ ] Community building [ ] Nearing graduation [ ] Other: Date of last release: None yet When were the last committers or PPMC members elected? None yet Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. We're working through the issues as they come up. Signed-off-by: [X](iceberg) Ryan Blue Comments: [ ](iceberg) Julien Le Dem Comments: [X](iceberg) Owen O'Malley Comments: I wrote the first pass of the report. [X](iceberg) James Taylor Comments: [X](iceberg) Carl Steinbach Comments: IPMC/Shepherd notes: