
This was extracted (@ 2023-09-20 23:10) from a list of minutes
which have been approved by the Board.
Please Note
The Board typically approves the minutes of the previous meeting at the
beginning of every Board meeting; therefore, the list below does not
normally contain details from the minutes of the most recent Board meeting.
WARNING: these pages may omit some original contents of the minutes.
Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).
Report was filed, but display is awaiting the approval of the Board minutes.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Project Status: Current project status: Ongoing Issues for the board: none ## Membership Data: Apache Iceberg was founded 2020-05-19 (3 years ago) There are currently 24 committers and 16 PMC members in this project. The Committer-to-PMC ratio is 3:2. Community changes, past quarter: - Fokko Driesprong was added to the PMC on 2023-04-06 - Steven Wu was added to the PMC on 2023-04-06 - Szehon Ho was added to the PMC on 2023-04-20 - Yufei Gu was added to the PMC on 2023-04-06 - Amogh Jahagirdar was added as committer on 2023-04-25 - Eduard Tudenhoefner was added as committer on 2023-04-25 ## Project Activity: * 1.3.0 was released on 2023-05-26 * 1.2.1 was released on 2023-04-01 * 1.2.0 was released on 2023-03-20 The 1.3.0 release added support for Spark 3.4 and Flink 1.17. It also included several updates and fixes, including: * Better Spark file distribution for row-level plans like MERGE * Improved bit density in the object storage layout * Readable metrics in metadata tables * Optimized vectorized reads for decimal types * Spark timestamp_ntz and UUID support The Python implementation is nearing an 0.4.0 release that will include: * Delete file support * Metadata updates for tables * Improved compatibility The community is also continuing to build a view specification, expand REST catalog support, and add encryption to the table spec. ## Community Health: The community continues to be healthy, with most metrics steady this quarter.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (3 years ago) There are currently 22 committers and 15 PMC members in this project. The Committer-to-PMC ratio is roughly 3:2. Community changes, past quarter: - Fokko Driesprong was added to the PMC on 2023-04-06 - Steven Wu was added to the PMC on 2023-04-06 - Yufei Gu was added to the PMC on 2023-04-06 - No new committers. Last addition was Steven Wu on 2022-10-07. ## Project Activity: Releases: * 1.2.0 was released on 2023-03-20, followed by 1.2.1 on 2023-04-11 * Python 0.3.0 was released on 2023-02-09 The Python implementation has reached feature parity with the "legacy" codebase, so the legacy code that was never part of an ASF release has been removed! The Python implementation now supports full read planing, including parallel metadata reads, manifest pruning, partition pruning, and column stats pruning. Python frameworks that use Apache Arrow can use data from Iceberg tables, including Arrow compute, Pandas, DuckDB, and Ray. Write support is the next milestone for the Python implementation. The Java implementation's latest release included several new capabilities: * Branching and tagging, with support in Flink and Spark using VERSION AS OF * Spark DDL for branches and tags * Metadata query pushdown in Spark * Changelog reads in Spark * Throttling for streaming reads in Flink * FileIO support for ORC readers and writers * SigV4 support for REST catalog auth * Remote signing client for S3 * The ability to read Snowflake Iceberg tables There are also efforts to add encryption to the format and to support multi- table transactions. The community is also discussing a Rust or C++ implementation hosted by the ASF. ## Community Health: The community remains healthy, with a reasonable increase in both opened and closed pull requests, as well as a stead number of unique contributors. The Python implementation has been bringing a lot of new contributors. Iceberg was featured in 14 talks at Subsurface, as well as in a panel.
No report was submitted.
@Sander: pursue a roll call for Iceberg
No report was submitted.
No report was submitted.
No report was submitted.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 years ago) There are currently 22 committers and 12 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - No new PMC members. Last addition was Jack Ye on 2021-11-14. - Fokko Driesprong was added as committer on 2022-08-21 - Steven Wu was added as committer on 2022-10-07 - Yufei Gu was added as committer on 2022-08-25 ## Project Activity: The community had 2 releases in the 0.14.x line and an initial Python release, 0.1.0. In addition, the vote for a 1.0.0 release is currently passing. The Python release is the result of significant community effort and includes a new CLI utility (pyiceberg), support for Hive and REST catalogs, and the ability to read table metadata. The next goal is a 0.2.0 release that can handle query planning to enable reads in Python and Python-based engines. The 1.0.0 JVM release adds API guarantees to the API module, but is closely based on 0.14.1 to make transitioning to a new major version simple. Next, the community is preparing a 1.1.0 release with significant new updates: * The ability to read and write table branches * Scan metrics reporting * Support for Spark FunctionCatalog * FLIP-27 reader support in Flink SQL * Z-order support when rewriting or compacting data files * Support for Puffin stats in table metadata ## Community Health: The community continues to be healthy in terms of commits. The number of unique contributors decreased slightly, which indicates the community should ensure pull requests from contributors are getting enough attention. The increase of issues closed is due to setting up a stale issues bot to help keep issues fresh and relevant. The community also added issue templates to make bug reports and feature requests better and more clear. This year, there were 4 presentations about Iceberg at ApacheCon: * Accelerate Data Lakehouse deployment with Apache Iceberg in Cloudera Data Platform (Attila Turoczy, Bill Zhang) * Apache Iceberg's REST Catalog - A Gateway to Enriching Data Access via the Simplicity of an HTTP Service (Sam Redai) * Iceberg's Best Secret: Exploring Metadata Tables (Szehon Ho) * Integrated Audits: Streamlined Data Observability with Apache Iceberg (Sam Redai) There were also 2 Iceberg presentations at Flink Forward: * Batch Processing at Scale with Flink & Iceberg (Andreas Hailu) * Tame the small files problem and optimize data layout for streaming ingestion to Iceberg (Steven Wu, Gang Ye)
No report was submitted.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 years ago) There are currently 19 committers and 12 PMC members in this project. The Committer-to-PMC ratio is roughly 5:3. Community changes, past quarter: - No new PMC members. Last addition was Jack Ye on 2021-11-14. - No new committers. Last addition was Szehon Ho on 2022-03-07. ## Project Activity: The community recently released 0.13.2 on 2022-06-13 and is currently voting on a candidate for the 0.14.0 release. 0.14.0 will be followed closely by a 1.0.0, which will make API stability guarantees. The 0.14.0 release contains significant new features, including: * Support for Apache Spark 3.3 * Support for Apache Flink 1.15 * MERGE and UPDATE plans using row-level deletes * A FLIP-27 reader for Flink * The new REST catalog implementation with change-based commits * A new file format for index and stats data, Puffin * Zorder sorting when rewriting data files * Range and tail reads for IO * Additional metrics collection The community has also been working on new features, including: * Table-level statistics and data sketches using Puffin * Table branching and tagging * View metadata tracking * Default values in schemas * A native Python implementation The python implementation has been making good progress and may see a release next quarter. The project has also been working to improve documentation and has a new site design that tracks older versions. ## Community Health: Community health continues to be good. The community's primary gauge of activity is pull requests and commits, and there were 1088 PRs opened this quarter a (5% increase) and 780 commits (a slight decrease). There were approximately the same number of contributors as the previous quarter, 79.
No report was submitted.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 years ago) There are currently 19 committers and 12 PMC members in this project. The Committer-to-PMC ratio is roughly 5:3. Community changes, past quarter: - No new PMC members. Last addition was Jack Ye on 2021-11-14. - Szehon Ho was added as committer on 2022-03-07 ## Project Activity: Iceberg 0.13.0 was released on 2022-02-01, and was followed quickly by 0.13.1 on 2022-02-14 to fix a performance regression. The 0.13 release included many significant new features: * Spark 3.2 support and overhauled row-level plans * Flink 1.13 and 1.14 support * Spark and Flink modules built and tested against each engine version * GCS and Aliyun OSS IO integration The community has also been working on some major features: * Delta-based MERGE INTO and UPDATE plans for Spark (complete) * Scala 2.13 support for Spark 3.2 (complete) * IO metrics collection (complete) * Vectorized reads with delete files (complete) * An implementation of table branching and tagging * Addition of a REST catalog spec, like the Hive Thrift interface for Iceberg * Addition of a view spec that tracks SQL or other plan representations * Spec updates for secondary indexes and metrics * Spec updates for default values In addition to features, the community also overhauled the ASF site. The new site better communicates Iceberg's major features and has version-specific docs. There were 6 Iceberg talks at the Subsurface conference and the conference organizers noted it was a major theme. On the last report, we were asked whether the presentations are available on the Iceberg site. They are present under the Talks tab. ## Community Health: Community health continues to be good. There were some decreases this quarter in metrics like dev list traffic and issues opened, but this isn't concerning with the context of the last few quarters of growth and that this report includes December when many people are on holiday. In addition, the number of unique contributors increased to 81 (20% higher).
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 years ago) There are currently 18 committers and 12 PMC members in this project. The Committer-to-PMC ratio is 3:2. Community changes, past quarter: - Jack Ye was added to the PMC on 2021-11-14 - Russell Spitzer was added to the PMC on 2021-11-13 - No new committers. Last addition was Jack Ye on 2021-07-02. ## Project Activity: 0.12.1 was released on 2021-11-08. The community is also working on the next release, 0.13.0. * A spec for table branching and tagging was written and is nearing completion * Iceberg's documentation is being updated so that multiple versions can be easily maintained and updated. * Delete file compaction was added to the rewrite files action and stored procedure. Additional compaction options are planned. * Sort based compaction was added * Flink and Spark plugins have been refactored so that each version is independent and is compiled against the correct engine version. While this duplicates some code, it makes integrating new features easier and reduces the risk of runtime incompatibilities. * Added support for Flink 1.14.x and Spark 3.2.x * A REST catalog API spec is taking shape. This should standardize an interface for providing a table catalog, similar to the thrift metastore interface used by Hive. * Aliyun OSS support was added as an IO module * The community decided on goals for a 1.0 release, targeted for early next year * Python implementation is making progress ## Community Health: Community metrics show healthy growth. Notably, there were 66 unique contributors this quarter, up from 50 last quarter. Total PRs submitted was more than 750, about 50% more than the 500 last quarter. Similarly, PRs closed also increased to 682 from about 400 last quarter, a 64% increase. The most significant stat is the increase in unique contributors, which signals that more people are interested in the project. This quarter, there were talks featuring Iceberg at AWS re:Invent (where Athena announced support), Trino summit, and community events for PrestoDB, lakeFS, and SF Big Analytics.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (a year ago) There are currently 18 committers and 10 PMC members in this project. The Committer-to-PMC ratio is 9:5. Community changes, past quarter: - Zheng Hu was added to the PMC on 2021-06-28 - Jack Ye was added as committer on 2021-07-02 ## Project Activity: 0.12.0 was released on 2021-08-15 and is a significant update from 0.11.1. The community voted to adopt version 2 of the Iceberg table format that adds row-level updates and deletes. The community is also working on several improvements: * Preparing for 1.0 of the Java reference implementation * Adding an Iceberg specification for SQL views * Spark implementations of MERGE and UPDATE that use row-level deletes * Flink UPSERT support * Z-order specification * Relative path support for disaster recovery * Branching and tagging table snapshots * Additional storage integration modules (Dell EMC, Aliyun OSS) * Encryption ## Community Health: The community is healthy and continues to grow. Unique contributors grew by 2% this quarter to 50. Contributions increased to more than 500 PRs and more than 400 PRs were addressed. The community is discussing how to scale and coordinate with a published roadmap and github projects to link the roadmap to individual issues. The community is also forming a new group of contributors around Python. This group has held sync meetings and is planning to make the python API more pythonic and to get to an initial python release.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. Apologies that this report is late. The community will report next month if needed. ## Membership Data: Apache Iceberg was founded 2020-05-19 (a year ago) There are currently 17 committers and 9 PMC members in this project. The Committer-to-PMC ratio is roughly 9:5. Community changes, past quarter: - No new PMC members. Last addition was Anton Okolnychyi on 2020-05-19. - Ted Gooch was added as committer on 2021-05-11 - Russell Spitzer was added as committer on 2021-04-02 - Ryan Murray was added as committer on 2021-03-26 - Yan Yan was added as committer on 2021-03-23 ## Project Activity: 0.11.1 was released on 2021-04-03. The community is currently working on the 0.12.0 release, which will update support for Spark 3.1 to fix the Iceberg SQL extensions. Several features were finished: * Spark UPDATE support was committed * Row identifier fields were added to schemas to support Flink UPSERT * An action to import existing data files was added * Hive integration has been updated to allow using multiple catalogs In addition, there are several on-going projects: * The community is working on updates for Spark 3.1 * Spark data file compaction strategies and a new implementation have been discussed and should be available in 0.12.0 * A design for encryption support has been proposed that will support Parquet and ORC encryption, as well as encryption for the metadata tree. * There have been design discussions for adding secondary indexes that can be updated asynchronously to keep commit latency low. * There have been design discussions for adding default field values * Support for Spark 3.0 structured streaming with the DSv2 API is under review * A DynamoDB catalog has been submitted as a PR ## Community Health: The community is healthy and showed an increase in contributors in the past quarter. New contributors are working on significant projects, like Spark streaming support and default values. The community also added 4 new committers in the past quarter!
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (9 months ago) There are currently 13 committers and 9 PMC members in this project. The Committer-to-PMC ratio is roughly 7:5. Community changes, past quarter: - No new PMC members. Last addition was Anton Okolnychyi on 2020-05-19. - Peter Vary was added as committer on 2021-01-23 ## Project Activity: 0.11.0 was released on 2021-01-26 and included several important new features: * Support for partition evolution * Spark SQL extensions with support for MERGE INTO, DELETE FROM, and new DDL * Spark support for table maintenance through stored procedures * Streaming reads, filter pushdown, and experimental CDC writes in Flink * AWS module with better integration for S3 and Glue metastore * Nessie metastore module The community is working toward finalizing the v2 format spec and the next release. There is good progress on metrics collection for Avro data files, Hive integration, Spark UPDATE support, and more table maintenance actions. ## Community Health: The overall number of pull requests merged in the last quarter decreased from the previous quarter, but this is mostly due to annual holidays. Leading up to the release in January, the project set a new record at 78 PRs merged in a week. More importantly, although there was a decrease in PRs merged, the number of contributors still increased slightly, to 50 code contributors in the quarter. Peter Vary was also added as a committer in January.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (7 months ago) There are currently 12 committers and 9 PMC members in this project. The Committer-to-PMC ratio is 4:3. Community changes, past quarter: - No new PMC members. Last addition was Anton Okolnychyi on 2020-05-19. - Jingsong Lee was added as committer on 2020-10-09 - Zheng Hu was added as committer on 2020-10-09 ## Project Activity: Recent releases: * 0.10.0 was released on 2020-11-11. The 0.10.0 release included: * A new Flink module supporting DataStreams and SQL writes and (batch) reads * A new Hive module supporting reads * Row-level delete implementation, part of the v2 spec, for engine integration More recently, the community has added: * Stored procedures for Spark that perform table maintenance from SQL * New catalog implementations for Nessie and Glue * Writers to support Flink CDC events and Spark MERGE plans * Handling for NaN values in metadata, and NaN predicates The project is making significant progress. ## Community Health: Community activity continues to increase. Recent video sync calls have had 20+ participants, code contributions are increasing in frequency (588 PRs opened and 552 PRs closed), and there are many new community members joining in. The community added two new committers this quarter.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (4 months ago) There are currently 10 committers and 9 PMC members in this project. The Committer-to-PMC ratio is roughly 1:1. Community changes, past quarter: - No new PMC members since graduation on 2020-05-19 - Shardul Mahadik was added as committer on 2020-07-25 ## Project Activity: Recent releases: * 0.9.0 was released on 2020-07-13 * 0.9.1 was released on 2020-08-14 The community expects to release 0.10.0 soon with support for Hive reads, Flink writes, and the utilities needed to implement row-level deletes in external processing engines, like Presto. Notable improvements this month include: * Implemented end-to-end row-level deletes in the client library (direct reads) * Committed Flink write support for both DataStreams and SQL * Added Hive predicate pushdown and a runtime bundle * Committed name mapping support for reading ORC files from non-Iceberg tables * Added a new snapshot expiration action that runs in parallel using Spark * Added metadata to configure tables with a preferred sort order The community is actively working on Hive column pruning, Hive write support, Flink read support, and row-level deletes in more processing engines. ## Community Health: The number of unique contributors increased in the last month to 26, from the previous high watermark of 21. Contributions are still healthy, with 74 commits in the past month. New community members have been contributing documentation and build improvements (PR labels, fixing warnings); it is great to have these valuable contributions in addition to features and bug fixes.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 months ago) There are currently 10 committers and 9 PMC members in this project. The Committer-to-PMC ratio is roughly 1:1. Community changes, past quarter: - No new PMC members (project graduated recently). - Shardul Mahadik was added as committer on 2020-07-25 ## Project Activity: 0.9.0 was released, including support for Spark 3 and SQL DDL commands, support for JDK 11, vectorized Parquet reads, and an action to compact data files. Since the 0.9.0 release, the community has made progress in several areas: - The Hive StorageHandler now provides access to query Iceberg tables (work is ongoing to implement projection and predicate pushdown). - Flink integration has made substantial progress toward using native RowData, and the first stage of the Flink sink (data file writers) has been committed. - An action to expire snapshots using Spark was added and is an improvement on the incremental approach because it compares the reachable file sets. - The implementation of row-level deletes is nearing completion. Scan planning now supports delete files, merge-based and set-based row filters have been committed, and delete file writers are under review. The delete file writers allow storing deleted row data in support of Flink CDC use cases. Releases: - 0.9.0 was released on 2020-07-13 - 0.9.1 has an ongoing vote ## Community Health: The month since the last report has been one of the busiest since the project started. 80 pull requests were merged in the last 4 weeks, and more importantly, came from 21 different contributors. Both of these are new high watermarks. Community members gave 2 Iceberg talks at Subsurface Conf, on enabling Hive queries against Iceberg tables and working with petabyte-scale Iceberg tables. Iceberg was also mentioned in the keynotes.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 months ago) There are currently 9 committers and 9 PMC members in this project. The Committer-to-PMC ratio is 1:1. Community changes, past quarter: - No new PMC members (project graduated recently). - No new committers were added. ## Project Activity: In July, the community held one sync meeting to discuss general topics, and one specifically to discuss how to include both groups that have been working on integration with Hive. To address the question on the last board report, the community sync meetings are video conferences that anyone in the community is welcome to attend. The discussion is documented and summarized for anyone that can't attend. We have found these to be a good way to exchange context and ideas more quickly, but recognize that this isn't the best way for some people to participate and so we don't consider these a forum for making decisions or voting. If we come to a tentative conclusion on a topic, it is still open for further discussion on the dev list. The idea for this comes from the Parquet community that has been doing this for several years. Development activity: * Spark vectorized reads for flat schemas was merged and benchmarked * The Spark 3 integration branch was merged into master * Name mapping for Parquet files without IDs was committed * And action to compact data files was added * Support was added for managing and adding delete files in table metadata * Refactoring to support reuse Spark components for Flink * Several PRs for Flink support have been committed and more are open * CI tests for JDK 11 have been added The community also plans to release 0.9.0 with Spark 3 support soon. ## Community Health: Most community metrics have again increased in the last month, although dev list traffic is a bit lower. More importantly, the community has made further progress on several large areas with different groups leading the efforts, like Hive support, Spark 3 support, and Flink support.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (21 days ago) There are currently 9 committers and 9 PMC members in this project. The Committer-to-PMC ratio is 1:1. Community changes, past quarter: - No new PMC members (project graduated recently). - No new committers were added. ## Project Activity: There were two community syncs in May, with good discussions on adding secondary indexes and fixing some persistent issues, like Guava library conflicts and how to support multiple Spark versions. Development activity: - Row-level delete progress continues with several PRs merged - Added support for ORC predicate push-down and metrics filtering, which is a significant step toward performance parity with Parquet - The vectorized Parquet read path is passing end-to-end tests for flat data - Guava is now shaded and relocated, unblocking integration with Hive - The build changed dependency locking plugins to unblock Hive and Spark 3 work - Flink contributors opened pull requests to merge the prototype sink ## Community Health: Nearly all metrics (list traffic, pull requests, and issues opened) are showing an increase in the last month, and the community has made significant progress on several large extensions (ORC and Flink, notably).
WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to managing huge analytic datasets using a standard at-rest table format that is designed for high performance and ease of use. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache Iceberg Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Iceberg Project be and hereby is responsible for the creation and maintenance of software related to managing huge analytic datasets using a standard at-rest table format that is designed for high performance and ease of use; and be it further RESOLVED, that the office of "Vice President, Apache Iceberg" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Iceberg Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Iceberg Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Iceberg Project: * Anton Okolnychyi <aokolnychyi@apache.org> * Carl Steinbach <cws@apache.org> * Daniel C. Weeks <dweeks@apache.org> * James R. Taylor <jamestaylor@apache.org> * Julien Le Dem <julien@apache.org> * Owen O'Malley <omalley@apache.org> * Parth Brahmbhatt <parth@apache.org> * Ratandeep Ratti <rdsr@apache.org> * Ryan Blue <blue@apache.org> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Ryan Blue be appointed to the office of Vice President, Apache Iceberg, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the Apache Iceberg Project be and hereby is tasked with the migration and rationalization of the Apache Incubator Iceberg podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator Iceberg podling encumbered upon the Apache Incubator PMC are hereafter discharged. Special Order 7G, Establish the Apache Iceberg Project, was approved by Unanimous Vote of the directors present.
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. ### Three most important unfinished issues to address before graduating: 1. Grow the Iceberg community 2. Add more committers and PPMC members ### Are there any issues that the IPMC or ASF Board need to be aware of? No issues. ### How has the community developed since the last report? In the 4 months since the last report, 138 pull requests were merged for an average of 34.5 per month. While this is down from the previous monthly average of 49.6 per month for June through August, this contribution rate is still very active and healthy. Contributions are coming from a regular group of contributors outside of the initial set of committers, which is a positive indication for adding new committers and PPMC members over the next few months. The community released the first version of Apache Iceberg, 0.7.0-incubating. This release used the "standard" incubator disclaimer and included convenience binaries. The release candidate votes were very active with community members testing out the release and reporting problems. There was an Apache Iceberg talk at ApacheCon NA in September. ### How has the project developed since the last report? - The community is building support for the upcoming Spark 3.0 release - The first PR from the vectorization branch has been merged into master - Support for IN and NOT IN predicates was contributed - Python added support for Hive metastore tables and the read path is near commit - Flaky tests have been fixed - Baseline checks (style, errorprone, findbugs) are now applied to all modules ### How would you assess the podling's maturity? Please feel free to add your own commentary. - [ ] Initial setup - [ ] Working towards first release - [x] Community building - [x] Nearing graduation - [ ] Other: ### Date of last release: - 0.7.0-incubating was released 25 October 2019 ### When were the last committers or PPMC members elected? - Anton Okolnychyi was added 30 August 2019 ### Have your mentors been helpful and responsive? Yes. 4 of 5 mentors voted on the 0.7.0-incubating IPMC vote. Thanks to our mentors for being active! ### Is the PPMC managing the podling's brand / trademarks? Yes, the podling is managing the brand and is not aware of any issues. The project name has been approved. ### Signed-off-by: - [x] (iceberg) Ryan Blue Comments: - [ ] (iceberg) Julien Le Dem Comments: - [X] (iceberg) Owen O'Malley Comments: - [ ] (iceberg) James Taylor Comments: - [ ] (iceberg) Carl Steinbach Comments: ### IPMC/Shepherd notes:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. ### Three most important unfinished issues to address before graduating: 1. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 2. Grow the Iceberg community 3. Add more committers and PPMC members ### Are there any issues that the IPMC or ASF Board need to be aware of? No issues. ### How has the community developed since the last report? The community continues to grow steadily. In the last month: * 59 pull requests have been merged * 17 people contributed the merged PRs * 18 issues have been closed, 22 issues were opened For comparison, the last report had 74 pull requests merged over 3 months. ### How has the project developed since the last report? * License documentation has been completed for the Java project, unblocking the first release * Added more documentation to iceberg.apache.org * Started vectorized read branch with significantly better performance * Added metadata tables * Added configuration to control statistics and truncate long values * Improved Hive Metastore integration * A working python read path has been submitted in PRs ### How would you assess the podling's maturity? - [ ] Initial setup - [x] Working towards first release - [x] Community building - [x] Nearing graduation - [ ] Other: ### Date of last release: * No release yet ### When were the last committers or PPMC members elected? * Anton Okolnychyi was added 30 August 2019 ### Have your mentors been helpful and responsive? Yes ### Signed-off-by: - [x] (iceberg) Ryan Blue Comments: - [ ] (iceberg) Julien Le Dem Comments: - [X] (iceberg) Owen O'Malley Comments: The project also gave two presentations: * Berlin Buzzwords (June 2019) * ApacheCon NA (Sep 2019) Iceberg is being used in production at Netflix on huge tables, up to 25 petabytes. - [X] (iceberg) James Taylor Comments: - [X] (iceberg) Carl Steinbach Comments: Approval added by Ryan Blue, Carl had trouble editing the new report location ### IPMC/Shepherd notes: Justin Mclean: The included stats don't really mean much to anyone outside of your project, please drop them from future reports. The community growth section might as well be blank. I find it surprising that this project thinks that it is near graduation. Please discuss this with your mentors.
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. ### Three most important unfinished issues to address before graduating: 1. Update build for Apache release, add LICENSE/NOTICE to Jars. 2. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 3. Grow the Iceberg community ### Are there any issues that the IPMC or ASF Board need to be aware of? * No issues that require attention. ### How has the community developed since the last report? * Community growth has continued with several new contributors and reviewers * Community has decided on style and added checking to CI for most modules * Community has started work on extending the spec for new use cases ### How has the project developed since the last report? * Much more content on iceberg.apache.org has been added * 74 pull requests have been merged, many reviewed by new community members * Work has begun to add row-level deletes and upserts to the format * Added support for Spark streaming, a catalog API, and numerous bug fixes * Contributors are reviewing code, submitting substantial features, and improving dev practices ### How would you assess the podling's maturity? Please feel free to add your own commentary. - [ ] Initial setup (name clearance approval pending) - [X] Working towards first release - [X] Community building - [ ] Nearing graduation - [ ] Other: ### Date of last release: None yet ### When were the last committers or PPMC members elected? None yet ### Have your mentors been helpful and responsive? Yes. ### Signed-off-by: - [X](iceberg) Ryan Blue Comments: I wrote the first pass of the report. - [ ](iceberg) Julien Le Dem Comments: - [X](iceberg) Owen O'Malley Comments: +1 from discussion on dev list - [ ](iceberg) James Taylor Comments: - [ ](iceberg) Carl Steinbach Comments: ### IPMC/Shepherd notes:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. Three most important issues to address in the move towards graduation: 1. Update build for Apache release, add LICENSE/NOTICE to Jars. 2. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 3. Grow the Iceberg community Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * No issues that require attention. How has the community developed since the last report? * The community has continued to receive new contributors * Several contributors are reliable helping review pull requests. Because of these review contributions and the small number of committers, the community voted to relax the RTC requirements and allow committers to push their own changes if the community has reviewed the PR. This helps develop reviewers and gets changes in faster. The vote also set reasonable limits for this practice: PRs must be up for at least 2 days and this is only for the first year, while we are working with a small set of committers. How has the project developed since the last report? * Podling name search concluded that Iceberg is a suitable name. (See PODLINGNAMESEARCH-163) * The community voted to accept a large PR with a Python implementation. * Contributors are fixing important predicate push-down issues, including case sensitivity, filtering on nested types, missing file metrics, etc. * Contributors added support for plugging in file stream encryption. How would you assess the podling's maturity? Please feel free to add your own commentary. [ ] Initial setup (name clearance approval pending) [X] Working towards first release [X] Community building [ ] Nearing graduation [ ] Other: Date of last release: None yet When were the last committers or PPMC members elected? None yet Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. Yes. Signed-off-by: [X](iceberg) Ryan Blue Comments: I wrote the first pass of the report. [ ](iceberg) Julien Le Dem Comments: [X](iceberg) Owen O'Malley Comments: (Approval copied from +1 on dev list) [ ](iceberg) James Taylor Comments: [ ](iceberg) Carl Steinbach Comments:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. Three most important issues to address in the move towards graduation: 1. Update build for Apache release, add LICENSE/NOTICE to Jars. 2. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 3. Grow the Iceberg community Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * No issues that require attention. How has the community developed since the last report? * Pull requests from 6 contributors were merged, 7 new contributors How has the project developed since the last report? * Submitted evidence for podling name search: PODLINGNAMESEARCH-163 * Netflix submitted a revised trademark agreement for counter-signing * Abstracted data file locations for community use cases * Reviewing proposed API update for file stream encryption plugins * New contributor highlights: - A new contributor is fixing case sensitivity in expressions - A new contributor opened a PR to add a startsWith predicate - A new contributor reviewed 4 pull requests and opened another How would you assess the podling's maturity? Please feel free to add your own commentary. [X] Initial setup (name clearance approval pending) [X] Working towards first release [X] Community building [ ] Nearing graduation [ ] Other: Date of last release: None yet When were the last committers or PPMC members elected? None yet Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. Yes. Signed-off-by: [X](iceberg) Ryan Blue Comments: dev list traffic appears to be increasing also [ ](iceberg) Julien Le Dem Comments: [ ](iceberg) Owen O'Malley Comments: [x](iceberg) James Taylor Comments: [X](iceberg) Carl Steinbach Comments: From dev list: "Looks good to me. +1" IPMC/Shepherd notes:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. Three most important issues to address in the move towards graduation: 1. Finish the name clearance and trademark agreement. 2. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 3. Grow the Iceberg community Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * Gitbox traffic is now going to issues@. The community was losing dev@ subscribers because of the high volume of traffic from Gitbox. However, now all updates are sent to issues@. It would be nice to have emails from creation go to dev@, while updates and resolutions would go the issues@. * The trademark agreement proposed by Netflix was not acceptable to the ASF. It would be helpful if the ASF published the terms that the ASF requires to avoid trial and error. Netflix is drafting a new agreement. How has the community developed since the last report? * Moved gitbox notifications to avoid loss of dev@ subscribers (self-reported leaving dev@). * New contributor activity: 3 new issues opened, 4 PRs submitted * 5 PRs from non-committers merged * 2 contributors started reviewing PRs * New design doc proposed by a community contributor * Moved issues from Netflix repository to Apache repository How has the project developed since the last report? * Planned blockers for first release, 0.1.0, in milestone 1 * Partial python implementation submitted * Manifest listing file added to the spec and implementation committed (blocker for initial release). Resulted in a significant improvement in query planning time for large tables. * Abstracted file IO API to support community use cases * Reviewing community proposal for external plugins to support file-level encryption * Added doc strings to schemas How would you assess the podling's maturity? Please feel free to add your own commentary. [X] Initial setup (name clearance pending) [X] Working towards first release [ ] Community building [ ] Nearing graduation [ ] Other: Date of last release: None yet When were the last committers or PPMC members elected? None yet Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. Last month was December, so traffic has been low and both PPMC members and mentors were slow to respond. This is not abnormal, but the PPMC missed the deadline to file this report. We will ensure this doesn't recur. Signed-off-by: [X](iceberg) Ryan Blue Comments: I wrote the first pass of the report, but after the deadline. [ ](iceberg) Julien Le Dem Comments: [X](iceberg) Owen O'Malley Comments: Approval from +1 on dev list. [ ](iceberg) James Taylor Comments: [ ](iceberg) Carl Steinbach Comments:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. Three most important issues to address in the move towards graduation: 1. Get the SGA accepted. 2. Finish the name clearance. 3. Make the first Apache release. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * Gitbox integration has helped a lot, although it is frustrating that the team members are not allowed to configure the project and must go through infra for every change. * The traffic on the dev list from Github pull requests and issues is pretty heavy. It would be nice to have emails from creation go to dev@, while updates and resolutions would go the issues@. How has the community developed since the last report? This is the first report. How has the project developed since the last report? This is the first report. Both the software grant and trademark agreements have been submitted. Code has been imported and updated to use the ASF license header. LICENSE and NOTICE files have been updated to comply with ASF policy. Podling website is up at https://iceberg.apache.org. How would you assess the podling's maturity? Please feel free to add your own commentary. [X] Initial setup [ ] Working towards first release [ ] Community building [ ] Nearing graduation [ ] Other: Date of last release: None yet When were the last committers or PPMC members elected? None yet Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. We're working through the issues as they come up. Signed-off-by: [X](iceberg) Ryan Blue Comments: [ ](iceberg) Julien Le Dem Comments: [X](iceberg) Owen O'Malley Comments: I wrote the first pass of the report. [X](iceberg) James Taylor Comments: [X](iceberg) Carl Steinbach Comments: IPMC/Shepherd notes: