
This was extracted (@ 2025-02-19 17:10) from a list of minutes
which have been approved by the Board.
Please Note
The Board typically approves the minutes of the previous meeting at the
beginning of every Board meeting; therefore, the list below does not
normally contain details from the minutes of the most recent Board meeting.
WARNING: these pages may omit some original contents of the minutes.
Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Project Status: Current project status: Ongoing Issues for the board: None ## Membership Data: Apache Iceberg was founded 2020-05-19 (5 years ago) There are currently 32 committers and 21 PMC members in this project. The Committer-to-PMC ratio is roughly 4:3. Community changes, past quarter: - No new PMC members. Last addition was Amogh Jahagirdar on 2024-08-12. - Matthew Topol was added as committer on 2024-12-09 - Scott Donnelly was added as committer on 2024-12-10 ## Project Activity: Releases - 1.7.1 was released on 2024-12-06. - 1.7.0 was released on 2024-11-08. - PyIceberg 0.8.1 was released on 2024-12-06. - PyIceberg 0.8.0 was released on 2024-11-18. - Go 0.1.0 was released on 2024-11-18. Table format (v3) - Added deletion vectors and synchronous maintenance to improve row-level ops - Added row lineage fields and requirements for fine-grained row tracking - Proposal for geography and geometry types is close to consensus - Update to add Parquet's variant type is approved, waiting on Parquet upstream - Finalized new type promotion rules Puffin format - Added deletion vector blob type to support DVs in tables REST catalog spec - Added storage credentials passing - Added credential refresh - Created a docker image for catalog testing - Discussing proposal for partial metadata commits - Discussed partial metadata loading Views - Discussions about materialized view metadata are ongoing Java - Released new Kafka Connect sink - Added default values implementation for Avro - Added nanosecond timestamps - Added v3 DV support in core, ongoing work in Spark - Flink: Made FLIP-27 source the default - Spark: Removed Spark 3.3 support - Hive: Removing Hive 2.x and 3.x (Iceberg support is in Hive for 4.x and on) - Pig: Removed the iceberg-pig module that is no longer used PyIceberg - Support: Added Python 3.12, dropped Python 3.8 Rust - Support for default values and type promotion in reads - Added TableMetadataBuilder - Implemented table requirements Go - Produced the first go release! - Supports scan planning and reading (data and metadata) - Supports loading and listing tables with the Glue catalog - Supports local and S3 storage C++ - Added a C++ repository for a Puffin implementation ## Community Health: The PMC has published guidelines for contributors that want to know more about how they can become committers on the Iceberg site. This guide should help contributors understand how Iceberg and other ASF communities decide and add committers, and should set expectations clearly. This was the most important follow up from discussions on the dev list earlier this year, where it became clear that contributors did not understand the requirements or process. The community has started planning a second Iceberg Summit, intended to be held in Spring of 2025. The proposal details are being finalized (such as the members of the selection committee) and will be submitted for approval in the next few weeks. The community added two new committers this quarter and had a slight increase in the number of contributors. There were also a number of commercial announcements from companies adding or expanding support for Iceberg.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Project Status: Current project status: Ongoing Issues for the board: None ## Membership Data: Apache Iceberg was founded 2020-05-19 (4 years ago) There are currently 31 committers and 21 PMC members in this project. The Committer-to-PMC ratio is roughly 4:3. Community changes, past quarter: - Amogh Jahagirdar was added to the PMC on 2024-08-12 - Eduard Tudenhoefner was added to the PMC on 2024-08-12 - Honah J. was added to the PMC on 2024-07-22 - Renjie Liu was added to the PMC on 2024-07-22 - Peter Vary was added to the PMC on 2024-08-12 - Piotr Findeisen was added as committer on 2024-07-24 - Kevin Liu was added as committer on 2024-07-24 - Sung Yun was added as committer on 2024-07-24 - Hao Ding was added as committer on 2024-07-23 ## Project Activity: Releases: - Java 1.6.1 was released on 2024-08-28 - Rust 0.3.0 was released on 2024-08-20 - PyIceberg 0.7.1 was released on 2024-08-18 - PyIceberg 0.7.0 was released on 2024-07-30 - Java 1.6.0 was released on 2024-07-23 Table format: - Work for v3 is picking up - Committed timestamp_ns implementation - Ongoing discussion/proposal for improvements to row-level deletes - Ongoing discussion/proposal for row-level metadata for change tracking - Discussion for adding variant type and where to maintain the spec (Parquet) - Making progress on geometry types - Clarified transform requirements to add transforms as needed (to support geo) - Discovered issues affecting new type promotion cases, reduced scope View format: - Ongoing discussions for tracking metadata for materialized views REST protocol specification: - Added server-side scan planning - Support for removing partition specs - Support for endpoint discovery for future additions - Clarified failure requirements for unknown actions or validations Java: - Added classes for v3 table writes - Fixed rewrites in tables with 1000+ columns - Added Kafka Connect runtime bundle - Support for Flink 1.20 - Added range distribution support in Flink - Dropped support for Java 8 PyIceberg: - Discussed adding a dependency on iceberg-rust for native extensions - Write support for time and identity transforms - Parallelized large writes - Support for deletes using filter predicates - Staged table creation for atomic CTAS - Support manifest merging on write - Better integration with PyArrow to produce lazy readers from scans - New API to add existing Parquet files - Support custom catalogs Rust: - Established subproject pyiceberg_core to support PyIceberg - Implemented OAuth for catalog REST client - Added Parquet writer and reader capabilities with support for data projection. - Introduced memory catalog and memory file IO support - Initialized SQL Catalog - Added support for GCS storage and AWS session tokens - Implemented concurrent table scans and data file fetching - Enhanced predicate builders and expression evaluators - Added support for timestamp columns in row filters Go: - Implemented expressions and expression visitors ## Community Health: Several new committers and PMC members were added this quarter, which is a good indicator for community health. There was also a significant number of threads on the mailing list about setting expectations for contributors and clearly document how the community operates. New guidelines for merging PRs have been added to the website and the community is also discussing guidelines for how contributors can become committers. This builds on work from last quarter that clarified the process for design discussions. Many of the topics under discussion were raised because of the acquisition that was noted in the last board report. The community has been working to address the concerns raised, which are primarily in 3 areas: - How decisions are made about designs and commits (now clarified) - How contributors become committers and PMC members (under discussion) - How the community operates when people cannot reach consensus The last concern has historically not been a problem; people have so far chosen to "disagree and commit" when a large majority in the community has a different opinion. However, the first instance of this was encountered near the end of the quarter. The community and PMC need to discuss how to make progress on the issue.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Project Status: Current project status: Ongoing Issues for the board: None ## Membership Data: Apache Iceberg was founded 2020-05-19 (4 years ago) There are currently 27 committers and 16 PMC members in this project. The Committer-to-PMC ratio is roughly 7:4. Community changes, past quarter: - No new PMC members. Last addition was Szehon Ho on 2023-04-20. - No new committers. Last addition was Renjie Liu on 2024-03-06. ## Project Activity: Releases: - 1.5.1 was released on 2024-04-25 - 1.5.2 was released on 2024-05-09 - PyIceberg 0.6.1 was released on 2024-04-30 PyIceberg: - Contributors are working to release more often - Improved retries for Hive catalog locking - Added register table support for Glue catalogs - Adding metadata table support (snapshots, manifests, etc.) - Working toward 0.7.0 release with partitioned writes and staged table creation Rust: - Implemented projection to support partition-based file pruning - Implemented the inclusive metrics evaluator and predicate pushdown to Parquet - Added Hive catalog support - Improved REST catalog with OAuth2 and custom headers - Added integration with DataFusion Go: - Working toward full expression support; added literals Iceberg Java: - The next Java release, 1.6.0, is targeted for release in June - Specs: - Discussions about standardizing metadata for materialized views have made good progress. The community decided to use existing objects rather than creating a new combined table/view object and is working on metadata details. - An extension to the REST protocol for privilege GRANT and REVOKE operations was proposed. - Many discussions for extending the REST protocol are ongoing, including adding routes to plan scans, adding auth decisions, and appending data files - There are also discussions for v3 features, like additional types (variant, timestampns, and others) ## Community Health: The Iceberg community continues to be healthy, with a large number of commits and individual contributors over the past quarter. Although overall commits decreased, the change corresponds with the number of opened PRs so the change is not a concern for health; PRs are getting reviewed. The community is formalizing design discussions and has added github labels and documented a process for making changes to community specs. The community also held the first Iceberg Summit this quarter, with 32 sessions that are now available on the YouTube (https://tinyurl.com/iceberg-summit). Community members also spoke at CoC EU. A company that employs 3 PMC members and 2 committers was acquired. The PMC members (2 of whom are ASF members) have been reminded to act as individuals, not as representatives of their employer, when interacting in the community. Concentrations of PMC members is a risk that the community is aware of and will note in future board reports. Other projects and announcements: - Trino added support for Iceberg views - Beam has added an Iceberg sink - Confluent, Terradata, and Oracle announced Iceberg support - Snowflake announced a new open source REST catalog project, Polaris - Databricks released its Unity catalog that implements the REST protocol - Nessie added support for the Iceberg REST catalog protocol - Gravino, which supports the REST protocol, was added to the incubator
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Project Status: Current project status: Ongoing Issues for the board: None ## Membership Data: Apache Iceberg was founded 2020-05-19 (4 years ago) There are currently 27 committers and 16 PMC members in this project. The Committer-to-PMC ratio is roughly 7:4. Community changes, past quarter: - No new PMC members. Last addition was Szehon Ho on 2023-04-20. - Bryan Keller was added as committer on 2024-03-02 - Honah J. was added as committer on 2024-01-11 - Renjie Liu was added as committer on 2024-03-06 ## Project Activity: Releases: - Java 1.5.0 was released on 2024-03-11 - Rust 0.2.0 was released on 2024-02-20 (first release!) - PyIceberg 0.6.0 was released on 2024-02-19 - Java 1.4.3 was released on 2023-12-27 Java implementation: - 1.5.0 is the first release supporting Iceberg Views - Added View resolution support in Spark engine integration - Added View commands to Spark (SHOW/CREATE/DROP/etc.) - View support in Trino is unblocked by the 1.5.0 release - Added View support to REST, Nessie, and JDBC catalogs - Discussing Materialized View extensions to Iceberg specs - Added EncryptingFileIO to minimize encryption-related API changes - Added StandardEncryptionManager to implement Iceberg Encryption spec - Added Parquet (native) and Avro (AES GCM) encryption support - Added pagination to listing in the REST catalog protocol - Discussing multiple extensions to the REST protocol (appends, planning) - Added delete file cache to Spark - Added support for Flink 1.18 - Removed support for Spark 3.2 PyIceberg Python implementation: - 0.6.0 is the first release supporting native writes - Append and full table overwrite are supported - Only writes to unpartitioned tables are supported - Added commit support to JDBC, Glue, and Hive catalogs - Implemented name mapping support for reading Parquet files without field IDs - Actively working on writes to partitioned tables and engine integration Rust implementation: - 0.2.0 is the first Rust release - Supports reading metadata files - Supports REST catalog interaction - Scan planning is the next active area of work Documentation: - Switched to new site build in the iceberg repository so contributing is easier ## Community Health: The Iceberg community continues to be healthy. Although commit and PR activity declined, the metrics indicate that activity was still strong (with 70 contributors and nearly 1,000 commits). This quarter also included holidays (which usually have decreased activity) and a huge increase in mailing list traffic (60%) because the community has been having many design discussions about evolving the REST spec, introducing new specs (materialized views), and discussions around how to keep track of new design proposals. The community also started organizing an Iceberg Summit, to be held May 14-15. The summit has been cleared by trademarks and the call for proposals has been posted. More information can be found at: * The Iceberg Summit website: https://iceberg-summit.org/ * The Call for Proposals: https://sessionize.com/iceberg-summit-2024/
@Jean-Baptiste: follow up with Iceberg PMC about committer requirements
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Project Status: Current project status: Ongoing Issues for the board: None ## Membership Data: Apache Iceberg was founded 2020-05-19 (4 years ago) There are currently 24 committers and 17 PMC members in this project. The Committer-to-PMC ratio is 3:2. Community changes, past quarter: - No new PMC members. Last addition was Szehon Ho on 2023-04-20. - Rushan Jiang was added as a committer on 2023-01-05. ## Project Activity: Releases: - 1.4.3 was released on 2022-12-27. - 1.4.2 was released on 2023-11-02. - Python 0.5.0 was released on 2023-09-18. REST protocol spec: - Considering an extension to delegate scan planning to the catalog - Discussing how to exchange access decisions/restrictions for tables - An extension was proposed for server-side commits Java: - Started planning for a 2.0.0 release to clean up deprecated APIs - Added an encryption manager that supports Parquet native encryption - Ongoing effort to add encryption for table metadata using AES GCM streams - Added support for Flink 1.18 - Completed the view API and support in the REST and Nessie catalogs - Added view read support in Spark - Ongoing work to improve Spark delete file performance PyIceberg: - Write support is nearing completion Rust: - Working toward first release (documentation, additional tests) - Readers and writers for manifests and manifest lists were committed Documentation: - The Iceberg site is moving back into the main repo to make contribution easier ## Community Health: The project continues to be healthy, with no concerning changes to metrics. Technical progress is strong and growing in the new language implementations. The community also expects a proposal for a third-party organized Iceberg conference.
No report was submitted.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Project Status: Current project status: Ongoing Issues for the board: none ## Membership Data: Apache Iceberg was founded 2020-05-19 (3 years ago) There are currently 24 committers and 16 PMC members in this project. The Committer-to-PMC ratio is 3:2. Community changes, past quarter: - No new PMC members. Last addition was Szehon Ho on 2023-04-20. - No new committers. Last addition was Amogh Jahagirdar on 2023-04-25. ## Project Activity: Releases: * PyIcberg 0.4.0 was released on 2023-07-23 * 1.3.1 was released on 2023-07-25 Java: * Preparing for a 1.4.0 release in Sept/Oct * Added dependency bundles for AWS, GCP, and Azure * Added Azure FileIO implementation * Added API for multi-table commits * Performance optimizations for delete file scan planning * Spark: Implemented adaptive split sizing * Spark: Implemented function pushdown in v2 expressions * Flink: Added bucketing only key-by strategy * Build: Updated to Gradle version catalog * Making progress on the reference implementation of common views * Continuing work on table encryption Python: * 0.5.0 rc1 vote is under way * Added support for serverless environments * Implemented schema evolution * Moved to Pydantic v2 * Added support for positional deletes * Substantially improved Avro read performance * Added conversion from Parquet to Iceberg schemas * Added support for FSSpec and HDFS data * Added SQL filter parsing Rust: * Created a repository for the Rust implementation, iceberg-rust * 25 PRs merged * Implemented base table metadata (e.g., types, transforms) * Implemented visitors for working with nested structures * Added Avro/Iceberg schema conversion * Added build tooling Go: * Created a repository for the Go implementation, iceberg-go * Added schema and types ## Community Health: The largest development in the community is the addition of the Rust and Go repositories, which is shown in the increase in code contributors this quarter. The new implementations will also lead to new committers and PMC members. The community has had good discussions about how manage contributions, to build confidence in the implementations as well as to help new contributors become familiar with the way the Apache community operates. (Along with ASF requirements like license documentation.) Two community metrics show decreases. Dev list traffic tends to vary because of how the community uses the dev list -- that is, mostly for large design discussions. The number of issues closed was also lower than normal and is not expected to fluctuate. We will take a look and see what the difference is.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Project Status: Current project status: Ongoing Issues for the board: none ## Membership Data: Apache Iceberg was founded 2020-05-19 (3 years ago) There are currently 24 committers and 16 PMC members in this project. The Committer-to-PMC ratio is 3:2. Community changes, past quarter: - Fokko Driesprong was added to the PMC on 2023-04-06 - Steven Wu was added to the PMC on 2023-04-06 - Szehon Ho was added to the PMC on 2023-04-20 - Yufei Gu was added to the PMC on 2023-04-06 - Amogh Jahagirdar was added as committer on 2023-04-25 - Eduard Tudenhoefner was added as committer on 2023-04-25 ## Project Activity: * 1.3.0 was released on 2023-05-26 * 1.2.1 was released on 2023-04-01 * 1.2.0 was released on 2023-03-20 The 1.3.0 release added support for Spark 3.4 and Flink 1.17. It also included several updates and fixes, including: * Better Spark file distribution for row-level plans like MERGE * Improved bit density in the object storage layout * Readable metrics in metadata tables * Optimized vectorized reads for decimal types * Spark timestamp_ntz and UUID support The Python implementation is nearing an 0.4.0 release that will include: * Delete file support * Metadata updates for tables * Improved compatibility The community is also continuing to build a view specification, expand REST catalog support, and add encryption to the table spec. ## Community Health: The community continues to be healthy, with most metrics steady this quarter.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (3 years ago) There are currently 22 committers and 15 PMC members in this project. The Committer-to-PMC ratio is roughly 3:2. Community changes, past quarter: - Fokko Driesprong was added to the PMC on 2023-04-06 - Steven Wu was added to the PMC on 2023-04-06 - Yufei Gu was added to the PMC on 2023-04-06 - No new committers. Last addition was Steven Wu on 2022-10-07. ## Project Activity: Releases: * 1.2.0 was released on 2023-03-20, followed by 1.2.1 on 2023-04-11 * Python 0.3.0 was released on 2023-02-09 The Python implementation has reached feature parity with the "legacy" codebase, so the legacy code that was never part of an ASF release has been removed! The Python implementation now supports full read planing, including parallel metadata reads, manifest pruning, partition pruning, and column stats pruning. Python frameworks that use Apache Arrow can use data from Iceberg tables, including Arrow compute, Pandas, DuckDB, and Ray. Write support is the next milestone for the Python implementation. The Java implementation's latest release included several new capabilities: * Branching and tagging, with support in Flink and Spark using VERSION AS OF * Spark DDL for branches and tags * Metadata query pushdown in Spark * Changelog reads in Spark * Throttling for streaming reads in Flink * FileIO support for ORC readers and writers * SigV4 support for REST catalog auth * Remote signing client for S3 * The ability to read Snowflake Iceberg tables There are also efforts to add encryption to the format and to support multi- table transactions. The community is also discussing a Rust or C++ implementation hosted by the ASF. ## Community Health: The community remains healthy, with a reasonable increase in both opened and closed pull requests, as well as a stead number of unique contributors. The Python implementation has been bringing a lot of new contributors. Iceberg was featured in 14 talks at Subsurface, as well as in a panel.
No report was submitted.
@Sander: pursue a roll call for Iceberg
No report was submitted.
No report was submitted.
No report was submitted.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 years ago) There are currently 22 committers and 12 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - No new PMC members. Last addition was Jack Ye on 2021-11-14. - Fokko Driesprong was added as committer on 2022-08-21 - Steven Wu was added as committer on 2022-10-07 - Yufei Gu was added as committer on 2022-08-25 ## Project Activity: The community had 2 releases in the 0.14.x line and an initial Python release, 0.1.0. In addition, the vote for a 1.0.0 release is currently passing. The Python release is the result of significant community effort and includes a new CLI utility (pyiceberg), support for Hive and REST catalogs, and the ability to read table metadata. The next goal is a 0.2.0 release that can handle query planning to enable reads in Python and Python-based engines. The 1.0.0 JVM release adds API guarantees to the API module, but is closely based on 0.14.1 to make transitioning to a new major version simple. Next, the community is preparing a 1.1.0 release with significant new updates: * The ability to read and write table branches * Scan metrics reporting * Support for Spark FunctionCatalog * FLIP-27 reader support in Flink SQL * Z-order support when rewriting or compacting data files * Support for Puffin stats in table metadata ## Community Health: The community continues to be healthy in terms of commits. The number of unique contributors decreased slightly, which indicates the community should ensure pull requests from contributors are getting enough attention. The increase of issues closed is due to setting up a stale issues bot to help keep issues fresh and relevant. The community also added issue templates to make bug reports and feature requests better and more clear. This year, there were 4 presentations about Iceberg at ApacheCon: * Accelerate Data Lakehouse deployment with Apache Iceberg in Cloudera Data Platform (Attila Turoczy, Bill Zhang) * Apache Iceberg's REST Catalog - A Gateway to Enriching Data Access via the Simplicity of an HTTP Service (Sam Redai) * Iceberg's Best Secret: Exploring Metadata Tables (Szehon Ho) * Integrated Audits: Streamlined Data Observability with Apache Iceberg (Sam Redai) There were also 2 Iceberg presentations at Flink Forward: * Batch Processing at Scale with Flink & Iceberg (Andreas Hailu) * Tame the small files problem and optimize data layout for streaming ingestion to Iceberg (Steven Wu, Gang Ye)
No report was submitted.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 years ago) There are currently 19 committers and 12 PMC members in this project. The Committer-to-PMC ratio is roughly 5:3. Community changes, past quarter: - No new PMC members. Last addition was Jack Ye on 2021-11-14. - No new committers. Last addition was Szehon Ho on 2022-03-07. ## Project Activity: The community recently released 0.13.2 on 2022-06-13 and is currently voting on a candidate for the 0.14.0 release. 0.14.0 will be followed closely by a 1.0.0, which will make API stability guarantees. The 0.14.0 release contains significant new features, including: * Support for Apache Spark 3.3 * Support for Apache Flink 1.15 * MERGE and UPDATE plans using row-level deletes * A FLIP-27 reader for Flink * The new REST catalog implementation with change-based commits * A new file format for index and stats data, Puffin * Zorder sorting when rewriting data files * Range and tail reads for IO * Additional metrics collection The community has also been working on new features, including: * Table-level statistics and data sketches using Puffin * Table branching and tagging * View metadata tracking * Default values in schemas * A native Python implementation The python implementation has been making good progress and may see a release next quarter. The project has also been working to improve documentation and has a new site design that tracks older versions. ## Community Health: Community health continues to be good. The community's primary gauge of activity is pull requests and commits, and there were 1088 PRs opened this quarter a (5% increase) and 780 commits (a slight decrease). There were approximately the same number of contributors as the previous quarter, 79.
No report was submitted.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 years ago) There are currently 19 committers and 12 PMC members in this project. The Committer-to-PMC ratio is roughly 5:3. Community changes, past quarter: - No new PMC members. Last addition was Jack Ye on 2021-11-14. - Szehon Ho was added as committer on 2022-03-07 ## Project Activity: Iceberg 0.13.0 was released on 2022-02-01, and was followed quickly by 0.13.1 on 2022-02-14 to fix a performance regression. The 0.13 release included many significant new features: * Spark 3.2 support and overhauled row-level plans * Flink 1.13 and 1.14 support * Spark and Flink modules built and tested against each engine version * GCS and Aliyun OSS IO integration The community has also been working on some major features: * Delta-based MERGE INTO and UPDATE plans for Spark (complete) * Scala 2.13 support for Spark 3.2 (complete) * IO metrics collection (complete) * Vectorized reads with delete files (complete) * An implementation of table branching and tagging * Addition of a REST catalog spec, like the Hive Thrift interface for Iceberg * Addition of a view spec that tracks SQL or other plan representations * Spec updates for secondary indexes and metrics * Spec updates for default values In addition to features, the community also overhauled the ASF site. The new site better communicates Iceberg's major features and has version-specific docs. There were 6 Iceberg talks at the Subsurface conference and the conference organizers noted it was a major theme. On the last report, we were asked whether the presentations are available on the Iceberg site. They are present under the Talks tab. ## Community Health: Community health continues to be good. There were some decreases this quarter in metrics like dev list traffic and issues opened, but this isn't concerning with the context of the last few quarters of growth and that this report includes December when many people are on holiday. In addition, the number of unique contributors increased to 81 (20% higher).
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 years ago) There are currently 18 committers and 12 PMC members in this project. The Committer-to-PMC ratio is 3:2. Community changes, past quarter: - Jack Ye was added to the PMC on 2021-11-14 - Russell Spitzer was added to the PMC on 2021-11-13 - No new committers. Last addition was Jack Ye on 2021-07-02. ## Project Activity: 0.12.1 was released on 2021-11-08. The community is also working on the next release, 0.13.0. * A spec for table branching and tagging was written and is nearing completion * Iceberg's documentation is being updated so that multiple versions can be easily maintained and updated. * Delete file compaction was added to the rewrite files action and stored procedure. Additional compaction options are planned. * Sort based compaction was added * Flink and Spark plugins have been refactored so that each version is independent and is compiled against the correct engine version. While this duplicates some code, it makes integrating new features easier and reduces the risk of runtime incompatibilities. * Added support for Flink 1.14.x and Spark 3.2.x * A REST catalog API spec is taking shape. This should standardize an interface for providing a table catalog, similar to the thrift metastore interface used by Hive. * Aliyun OSS support was added as an IO module * The community decided on goals for a 1.0 release, targeted for early next year * Python implementation is making progress ## Community Health: Community metrics show healthy growth. Notably, there were 66 unique contributors this quarter, up from 50 last quarter. Total PRs submitted was more than 750, about 50% more than the 500 last quarter. Similarly, PRs closed also increased to 682 from about 400 last quarter, a 64% increase. The most significant stat is the increase in unique contributors, which signals that more people are interested in the project. This quarter, there were talks featuring Iceberg at AWS re:Invent (where Athena announced support), Trino summit, and community events for PrestoDB, lakeFS, and SF Big Analytics.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (a year ago) There are currently 18 committers and 10 PMC members in this project. The Committer-to-PMC ratio is 9:5. Community changes, past quarter: - Zheng Hu was added to the PMC on 2021-06-28 - Jack Ye was added as committer on 2021-07-02 ## Project Activity: 0.12.0 was released on 2021-08-15 and is a significant update from 0.11.1. The community voted to adopt version 2 of the Iceberg table format that adds row-level updates and deletes. The community is also working on several improvements: * Preparing for 1.0 of the Java reference implementation * Adding an Iceberg specification for SQL views * Spark implementations of MERGE and UPDATE that use row-level deletes * Flink UPSERT support * Z-order specification * Relative path support for disaster recovery * Branching and tagging table snapshots * Additional storage integration modules (Dell EMC, Aliyun OSS) * Encryption ## Community Health: The community is healthy and continues to grow. Unique contributors grew by 2% this quarter to 50. Contributions increased to more than 500 PRs and more than 400 PRs were addressed. The community is discussing how to scale and coordinate with a published roadmap and github projects to link the roadmap to individual issues. The community is also forming a new group of contributors around Python. This group has held sync meetings and is planning to make the python API more pythonic and to get to an initial python release.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. Apologies that this report is late. The community will report next month if needed. ## Membership Data: Apache Iceberg was founded 2020-05-19 (a year ago) There are currently 17 committers and 9 PMC members in this project. The Committer-to-PMC ratio is roughly 9:5. Community changes, past quarter: - No new PMC members. Last addition was Anton Okolnychyi on 2020-05-19. - Ted Gooch was added as committer on 2021-05-11 - Russell Spitzer was added as committer on 2021-04-02 - Ryan Murray was added as committer on 2021-03-26 - Yan Yan was added as committer on 2021-03-23 ## Project Activity: 0.11.1 was released on 2021-04-03. The community is currently working on the 0.12.0 release, which will update support for Spark 3.1 to fix the Iceberg SQL extensions. Several features were finished: * Spark UPDATE support was committed * Row identifier fields were added to schemas to support Flink UPSERT * An action to import existing data files was added * Hive integration has been updated to allow using multiple catalogs In addition, there are several on-going projects: * The community is working on updates for Spark 3.1 * Spark data file compaction strategies and a new implementation have been discussed and should be available in 0.12.0 * A design for encryption support has been proposed that will support Parquet and ORC encryption, as well as encryption for the metadata tree. * There have been design discussions for adding secondary indexes that can be updated asynchronously to keep commit latency low. * There have been design discussions for adding default field values * Support for Spark 3.0 structured streaming with the DSv2 API is under review * A DynamoDB catalog has been submitted as a PR ## Community Health: The community is healthy and showed an increase in contributors in the past quarter. New contributors are working on significant projects, like Spark streaming support and default values. The community also added 4 new committers in the past quarter!
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (9 months ago) There are currently 13 committers and 9 PMC members in this project. The Committer-to-PMC ratio is roughly 7:5. Community changes, past quarter: - No new PMC members. Last addition was Anton Okolnychyi on 2020-05-19. - Peter Vary was added as committer on 2021-01-23 ## Project Activity: 0.11.0 was released on 2021-01-26 and included several important new features: * Support for partition evolution * Spark SQL extensions with support for MERGE INTO, DELETE FROM, and new DDL * Spark support for table maintenance through stored procedures * Streaming reads, filter pushdown, and experimental CDC writes in Flink * AWS module with better integration for S3 and Glue metastore * Nessie metastore module The community is working toward finalizing the v2 format spec and the next release. There is good progress on metrics collection for Avro data files, Hive integration, Spark UPDATE support, and more table maintenance actions. ## Community Health: The overall number of pull requests merged in the last quarter decreased from the previous quarter, but this is mostly due to annual holidays. Leading up to the release in January, the project set a new record at 78 PRs merged in a week. More importantly, although there was a decrease in PRs merged, the number of contributors still increased slightly, to 50 code contributors in the quarter. Peter Vary was also added as a committer in January.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (7 months ago) There are currently 12 committers and 9 PMC members in this project. The Committer-to-PMC ratio is 4:3. Community changes, past quarter: - No new PMC members. Last addition was Anton Okolnychyi on 2020-05-19. - Jingsong Lee was added as committer on 2020-10-09 - Zheng Hu was added as committer on 2020-10-09 ## Project Activity: Recent releases: * 0.10.0 was released on 2020-11-11. The 0.10.0 release included: * A new Flink module supporting DataStreams and SQL writes and (batch) reads * A new Hive module supporting reads * Row-level delete implementation, part of the v2 spec, for engine integration More recently, the community has added: * Stored procedures for Spark that perform table maintenance from SQL * New catalog implementations for Nessie and Glue * Writers to support Flink CDC events and Spark MERGE plans * Handling for NaN values in metadata, and NaN predicates The project is making significant progress. ## Community Health: Community activity continues to increase. Recent video sync calls have had 20+ participants, code contributions are increasing in frequency (588 PRs opened and 552 PRs closed), and there are many new community members joining in. The community added two new committers this quarter.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (4 months ago) There are currently 10 committers and 9 PMC members in this project. The Committer-to-PMC ratio is roughly 1:1. Community changes, past quarter: - No new PMC members since graduation on 2020-05-19 - Shardul Mahadik was added as committer on 2020-07-25 ## Project Activity: Recent releases: * 0.9.0 was released on 2020-07-13 * 0.9.1 was released on 2020-08-14 The community expects to release 0.10.0 soon with support for Hive reads, Flink writes, and the utilities needed to implement row-level deletes in external processing engines, like Presto. Notable improvements this month include: * Implemented end-to-end row-level deletes in the client library (direct reads) * Committed Flink write support for both DataStreams and SQL * Added Hive predicate pushdown and a runtime bundle * Committed name mapping support for reading ORC files from non-Iceberg tables * Added a new snapshot expiration action that runs in parallel using Spark * Added metadata to configure tables with a preferred sort order The community is actively working on Hive column pruning, Hive write support, Flink read support, and row-level deletes in more processing engines. ## Community Health: The number of unique contributors increased in the last month to 26, from the previous high watermark of 21. Contributions are still healthy, with 74 commits in the past month. New community members have been contributing documentation and build improvements (PR labels, fixing warnings); it is great to have these valuable contributions in addition to features and bug fixes.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 months ago) There are currently 10 committers and 9 PMC members in this project. The Committer-to-PMC ratio is roughly 1:1. Community changes, past quarter: - No new PMC members (project graduated recently). - Shardul Mahadik was added as committer on 2020-07-25 ## Project Activity: 0.9.0 was released, including support for Spark 3 and SQL DDL commands, support for JDK 11, vectorized Parquet reads, and an action to compact data files. Since the 0.9.0 release, the community has made progress in several areas: - The Hive StorageHandler now provides access to query Iceberg tables (work is ongoing to implement projection and predicate pushdown). - Flink integration has made substantial progress toward using native RowData, and the first stage of the Flink sink (data file writers) has been committed. - An action to expire snapshots using Spark was added and is an improvement on the incremental approach because it compares the reachable file sets. - The implementation of row-level deletes is nearing completion. Scan planning now supports delete files, merge-based and set-based row filters have been committed, and delete file writers are under review. The delete file writers allow storing deleted row data in support of Flink CDC use cases. Releases: - 0.9.0 was released on 2020-07-13 - 0.9.1 has an ongoing vote ## Community Health: The month since the last report has been one of the busiest since the project started. 80 pull requests were merged in the last 4 weeks, and more importantly, came from 21 different contributors. Both of these are new high watermarks. Community members gave 2 Iceberg talks at Subsurface Conf, on enabling Hive queries against Iceberg tables and working with petabyte-scale Iceberg tables. Iceberg was also mentioned in the keynotes.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (2 months ago) There are currently 9 committers and 9 PMC members in this project. The Committer-to-PMC ratio is 1:1. Community changes, past quarter: - No new PMC members (project graduated recently). - No new committers were added. ## Project Activity: In July, the community held one sync meeting to discuss general topics, and one specifically to discuss how to include both groups that have been working on integration with Hive. To address the question on the last board report, the community sync meetings are video conferences that anyone in the community is welcome to attend. The discussion is documented and summarized for anyone that can't attend. We have found these to be a good way to exchange context and ideas more quickly, but recognize that this isn't the best way for some people to participate and so we don't consider these a forum for making decisions or voting. If we come to a tentative conclusion on a topic, it is still open for further discussion on the dev list. The idea for this comes from the Parquet community that has been doing this for several years. Development activity: * Spark vectorized reads for flat schemas was merged and benchmarked * The Spark 3 integration branch was merged into master * Name mapping for Parquet files without IDs was committed * And action to compact data files was added * Support was added for managing and adding delete files in table metadata * Refactoring to support reuse Spark components for Flink * Several PRs for Flink support have been committed and more are open * CI tests for JDK 11 have been added The community also plans to release 0.9.0 with Spark 3 support soon. ## Community Health: Most community metrics have again increased in the last month, although dev list traffic is a bit lower. More importantly, the community has made further progress on several large areas with different groups leading the efforts, like Hive support, Spark 3 support, and Flink support.
## Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of use. ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Iceberg was founded 2020-05-19 (21 days ago) There are currently 9 committers and 9 PMC members in this project. The Committer-to-PMC ratio is 1:1. Community changes, past quarter: - No new PMC members (project graduated recently). - No new committers were added. ## Project Activity: There were two community syncs in May, with good discussions on adding secondary indexes and fixing some persistent issues, like Guava library conflicts and how to support multiple Spark versions. Development activity: - Row-level delete progress continues with several PRs merged - Added support for ORC predicate push-down and metrics filtering, which is a significant step toward performance parity with Parquet - The vectorized Parquet read path is passing end-to-end tests for flat data - Guava is now shaded and relocated, unblocking integration with Hive - The build changed dependency locking plugins to unblock Hive and Spark 3 work - Flink contributors opened pull requests to merge the prototype sink ## Community Health: Nearly all metrics (list traffic, pull requests, and issues opened) are showing an increase in the last month, and the community has made significant progress on several large extensions (ORC and Flink, notably).
WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to managing huge analytic datasets using a standard at-rest table format that is designed for high performance and ease of use. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache Iceberg Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Iceberg Project be and hereby is responsible for the creation and maintenance of software related to managing huge analytic datasets using a standard at-rest table format that is designed for high performance and ease of use; and be it further RESOLVED, that the office of "Vice President, Apache Iceberg" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Iceberg Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Iceberg Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Iceberg Project: * Anton Okolnychyi <aokolnychyi@apache.org> * Carl Steinbach <cws@apache.org> * Daniel C. Weeks <dweeks@apache.org> * James R. Taylor <jamestaylor@apache.org> * Julien Le Dem <julien@apache.org> * Owen O'Malley <omalley@apache.org> * Parth Brahmbhatt <parth@apache.org> * Ratandeep Ratti <rdsr@apache.org> * Ryan Blue <blue@apache.org> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Ryan Blue be appointed to the office of Vice President, Apache Iceberg, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the Apache Iceberg Project be and hereby is tasked with the migration and rationalization of the Apache Incubator Iceberg podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator Iceberg podling encumbered upon the Apache Incubator PMC are hereafter discharged. Special Order 7G, Establish the Apache Iceberg Project, was approved by Unanimous Vote of the directors present.
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. ### Three most important unfinished issues to address before graduating: 1. Grow the Iceberg community 2. Add more committers and PPMC members ### Are there any issues that the IPMC or ASF Board need to be aware of? No issues. ### How has the community developed since the last report? In the 4 months since the last report, 138 pull requests were merged for an average of 34.5 per month. While this is down from the previous monthly average of 49.6 per month for June through August, this contribution rate is still very active and healthy. Contributions are coming from a regular group of contributors outside of the initial set of committers, which is a positive indication for adding new committers and PPMC members over the next few months. The community released the first version of Apache Iceberg, 0.7.0-incubating. This release used the "standard" incubator disclaimer and included convenience binaries. The release candidate votes were very active with community members testing out the release and reporting problems. There was an Apache Iceberg talk at ApacheCon NA in September. ### How has the project developed since the last report? - The community is building support for the upcoming Spark 3.0 release - The first PR from the vectorization branch has been merged into master - Support for IN and NOT IN predicates was contributed - Python added support for Hive metastore tables and the read path is near commit - Flaky tests have been fixed - Baseline checks (style, errorprone, findbugs) are now applied to all modules ### How would you assess the podling's maturity? Please feel free to add your own commentary. - [ ] Initial setup - [ ] Working towards first release - [x] Community building - [x] Nearing graduation - [ ] Other: ### Date of last release: - 0.7.0-incubating was released 25 October 2019 ### When were the last committers or PPMC members elected? - Anton Okolnychyi was added 30 August 2019 ### Have your mentors been helpful and responsive? Yes. 4 of 5 mentors voted on the 0.7.0-incubating IPMC vote. Thanks to our mentors for being active! ### Is the PPMC managing the podling's brand / trademarks? Yes, the podling is managing the brand and is not aware of any issues. The project name has been approved. ### Signed-off-by: - [x] (iceberg) Ryan Blue Comments: - [ ] (iceberg) Julien Le Dem Comments: - [X] (iceberg) Owen O'Malley Comments: - [ ] (iceberg) James Taylor Comments: - [ ] (iceberg) Carl Steinbach Comments: ### IPMC/Shepherd notes:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. ### Three most important unfinished issues to address before graduating: 1. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 2. Grow the Iceberg community 3. Add more committers and PPMC members ### Are there any issues that the IPMC or ASF Board need to be aware of? No issues. ### How has the community developed since the last report? The community continues to grow steadily. In the last month: * 59 pull requests have been merged * 17 people contributed the merged PRs * 18 issues have been closed, 22 issues were opened For comparison, the last report had 74 pull requests merged over 3 months. ### How has the project developed since the last report? * License documentation has been completed for the Java project, unblocking the first release * Added more documentation to iceberg.apache.org * Started vectorized read branch with significantly better performance * Added metadata tables * Added configuration to control statistics and truncate long values * Improved Hive Metastore integration * A working python read path has been submitted in PRs ### How would you assess the podling's maturity? - [ ] Initial setup - [x] Working towards first release - [x] Community building - [x] Nearing graduation - [ ] Other: ### Date of last release: * No release yet ### When were the last committers or PPMC members elected? * Anton Okolnychyi was added 30 August 2019 ### Have your mentors been helpful and responsive? Yes ### Signed-off-by: - [x] (iceberg) Ryan Blue Comments: - [ ] (iceberg) Julien Le Dem Comments: - [X] (iceberg) Owen O'Malley Comments: The project also gave two presentations: * Berlin Buzzwords (June 2019) * ApacheCon NA (Sep 2019) Iceberg is being used in production at Netflix on huge tables, up to 25 petabytes. - [X] (iceberg) James Taylor Comments: - [X] (iceberg) Carl Steinbach Comments: Approval added by Ryan Blue, Carl had trouble editing the new report location ### IPMC/Shepherd notes: Justin Mclean: The included stats don't really mean much to anyone outside of your project, please drop them from future reports. The community growth section might as well be blank. I find it surprising that this project thinks that it is near graduation. Please discuss this with your mentors.
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. ### Three most important unfinished issues to address before graduating: 1. Update build for Apache release, add LICENSE/NOTICE to Jars. 2. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 3. Grow the Iceberg community ### Are there any issues that the IPMC or ASF Board need to be aware of? * No issues that require attention. ### How has the community developed since the last report? * Community growth has continued with several new contributors and reviewers * Community has decided on style and added checking to CI for most modules * Community has started work on extending the spec for new use cases ### How has the project developed since the last report? * Much more content on iceberg.apache.org has been added * 74 pull requests have been merged, many reviewed by new community members * Work has begun to add row-level deletes and upserts to the format * Added support for Spark streaming, a catalog API, and numerous bug fixes * Contributors are reviewing code, submitting substantial features, and improving dev practices ### How would you assess the podling's maturity? Please feel free to add your own commentary. - [ ] Initial setup (name clearance approval pending) - [X] Working towards first release - [X] Community building - [ ] Nearing graduation - [ ] Other: ### Date of last release: None yet ### When were the last committers or PPMC members elected? None yet ### Have your mentors been helpful and responsive? Yes. ### Signed-off-by: - [X](iceberg) Ryan Blue Comments: I wrote the first pass of the report. - [ ](iceberg) Julien Le Dem Comments: - [X](iceberg) Owen O'Malley Comments: +1 from discussion on dev list - [ ](iceberg) James Taylor Comments: - [ ](iceberg) Carl Steinbach Comments: ### IPMC/Shepherd notes:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. Three most important issues to address in the move towards graduation: 1. Update build for Apache release, add LICENSE/NOTICE to Jars. 2. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 3. Grow the Iceberg community Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * No issues that require attention. How has the community developed since the last report? * The community has continued to receive new contributors * Several contributors are reliable helping review pull requests. Because of these review contributions and the small number of committers, the community voted to relax the RTC requirements and allow committers to push their own changes if the community has reviewed the PR. This helps develop reviewers and gets changes in faster. The vote also set reasonable limits for this practice: PRs must be up for at least 2 days and this is only for the first year, while we are working with a small set of committers. How has the project developed since the last report? * Podling name search concluded that Iceberg is a suitable name. (See PODLINGNAMESEARCH-163) * The community voted to accept a large PR with a Python implementation. * Contributors are fixing important predicate push-down issues, including case sensitivity, filtering on nested types, missing file metrics, etc. * Contributors added support for plugging in file stream encryption. How would you assess the podling's maturity? Please feel free to add your own commentary. [ ] Initial setup (name clearance approval pending) [X] Working towards first release [X] Community building [ ] Nearing graduation [ ] Other: Date of last release: None yet When were the last committers or PPMC members elected? None yet Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. Yes. Signed-off-by: [X](iceberg) Ryan Blue Comments: I wrote the first pass of the report. [ ](iceberg) Julien Le Dem Comments: [X](iceberg) Owen O'Malley Comments: (Approval copied from +1 on dev list) [ ](iceberg) James Taylor Comments: [ ](iceberg) Carl Steinbach Comments:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. Three most important issues to address in the move towards graduation: 1. Update build for Apache release, add LICENSE/NOTICE to Jars. 2. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 3. Grow the Iceberg community Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * No issues that require attention. How has the community developed since the last report? * Pull requests from 6 contributors were merged, 7 new contributors How has the project developed since the last report? * Submitted evidence for podling name search: PODLINGNAMESEARCH-163 * Netflix submitted a revised trademark agreement for counter-signing * Abstracted data file locations for community use cases * Reviewing proposed API update for file stream encryption plugins * New contributor highlights: - A new contributor is fixing case sensitivity in expressions - A new contributor opened a PR to add a startsWith predicate - A new contributor reviewed 4 pull requests and opened another How would you assess the podling's maturity? Please feel free to add your own commentary. [X] Initial setup (name clearance approval pending) [X] Working towards first release [X] Community building [ ] Nearing graduation [ ] Other: Date of last release: None yet When were the last committers or PPMC members elected? None yet Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. Yes. Signed-off-by: [X](iceberg) Ryan Blue Comments: dev list traffic appears to be increasing also [ ](iceberg) Julien Le Dem Comments: [ ](iceberg) Owen O'Malley Comments: [x](iceberg) James Taylor Comments: [X](iceberg) Carl Steinbach Comments: From dev list: "Looks good to me. +1" IPMC/Shepherd notes:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. Three most important issues to address in the move towards graduation: 1. Finish the name clearance and trademark agreement. 2. Make the first Apache release. (https://github.com/apache/incubator-iceberg/milestone/1) 3. Grow the Iceberg community Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * Gitbox traffic is now going to issues@. The community was losing dev@ subscribers because of the high volume of traffic from Gitbox. However, now all updates are sent to issues@. It would be nice to have emails from creation go to dev@, while updates and resolutions would go the issues@. * The trademark agreement proposed by Netflix was not acceptable to the ASF. It would be helpful if the ASF published the terms that the ASF requires to avoid trial and error. Netflix is drafting a new agreement. How has the community developed since the last report? * Moved gitbox notifications to avoid loss of dev@ subscribers (self-reported leaving dev@). * New contributor activity: 3 new issues opened, 4 PRs submitted * 5 PRs from non-committers merged * 2 contributors started reviewing PRs * New design doc proposed by a community contributor * Moved issues from Netflix repository to Apache repository How has the project developed since the last report? * Planned blockers for first release, 0.1.0, in milestone 1 * Partial python implementation submitted * Manifest listing file added to the spec and implementation committed (blocker for initial release). Resulted in a significant improvement in query planning time for large tables. * Abstracted file IO API to support community use cases * Reviewing community proposal for external plugins to support file-level encryption * Added doc strings to schemas How would you assess the podling's maturity? Please feel free to add your own commentary. [X] Initial setup (name clearance pending) [X] Working towards first release [ ] Community building [ ] Nearing graduation [ ] Other: Date of last release: None yet When were the last committers or PPMC members elected? None yet Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. Last month was December, so traffic has been low and both PPMC members and mentors were slow to respond. This is not abnormal, but the PPMC missed the deadline to file this report. We will ensure this doesn't recur. Signed-off-by: [X](iceberg) Ryan Blue Comments: I wrote the first pass of the report, but after the deadline. [ ](iceberg) Julien Le Dem Comments: [X](iceberg) Owen O'Malley Comments: Approval from +1 on dev list. [ ](iceberg) James Taylor Comments: [ ](iceberg) Carl Steinbach Comments:
Iceberg is a table format for large, slow-moving tabular data. Iceberg has been incubating since 2018-11-16. Three most important issues to address in the move towards graduation: 1. Get the SGA accepted. 2. Finish the name clearance. 3. Make the first Apache release. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * Gitbox integration has helped a lot, although it is frustrating that the team members are not allowed to configure the project and must go through infra for every change. * The traffic on the dev list from Github pull requests and issues is pretty heavy. It would be nice to have emails from creation go to dev@, while updates and resolutions would go the issues@. How has the community developed since the last report? This is the first report. How has the project developed since the last report? This is the first report. Both the software grant and trademark agreements have been submitted. Code has been imported and updated to use the ASF license header. LICENSE and NOTICE files have been updated to comply with ASF policy. Podling website is up at https://iceberg.apache.org. How would you assess the podling's maturity? Please feel free to add your own commentary. [X] Initial setup [ ] Working towards first release [ ] Community building [ ] Nearing graduation [ ] Other: Date of last release: None yet When were the last committers or PPMC members elected? None yet Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. We're working through the issues as they come up. Signed-off-by: [X](iceberg) Ryan Blue Comments: [ ](iceberg) Julien Le Dem Comments: [X](iceberg) Owen O'Malley Comments: I wrote the first pass of the report. [X](iceberg) James Taylor Comments: [X](iceberg) Carl Steinbach Comments: IPMC/Shepherd notes: