ApacheCon is Coming 9-12 Sept. 2019 - Las Vegas The Apache Software Foundation
Apache 20th Anniversary Logo

Community-led development "The Apache Way"

Apache Support Logo

This was extracted (@ 2020-09-29 22:10) from a list of minutes which have been approved by the Board.
Please Note The Board typically approves the minutes of the previous meeting at the beginning of every Board meeting; therefore, the list below does not normally contain details from the minutes of the most recent Board meeting.

Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).

Iceberg

16 Sep 2020 [Ryan Blue / Sam]

Report was filed, but display is awaiting the approval of the Board minutes.

19 Aug 2020 [Ryan Blue / Shane]

## Description:
Apache Iceberg is a table format for huge analytic datasets that is designed
for high performance and ease of use.

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Iceberg was founded 2020-05-19 (2 months ago)
There are currently 10 committers and 9 PMC members in this project.
The Committer-to-PMC ratio is roughly 1:1.

Community changes, past quarter:
- No new PMC members (project graduated recently).
- Shardul Mahadik was added as committer on 2020-07-25

## Project Activity:
0.9.0 was released, including support for Spark 3 and SQL DDL commands, support
for JDK 11, vectorized Parquet reads, and an action to compact data files.

Since the 0.9.0 release, the community has made progress in several areas:
- The Hive StorageHandler now provides access to query Iceberg tables
 (work is ongoing to implement projection and predicate pushdown).
- Flink integration has made substantial progress toward using native RowData,
 and the first stage of the Flink sink (data file writers) has been committed.
- An action to expire snapshots using Spark was added and is an improvement on
 the incremental approach because it compares the reachable file sets.
- The implementation of row-level deletes is nearing completion. Scan planning
 now supports delete files, merge-based and set-based row filters have been
 committed, and delete file writers are under review. The delete file writers
 allow storing deleted row data in support of Flink CDC use cases.

Releases:
- 0.9.0 was released on 2020-07-13
- 0.9.1 has an ongoing vote

## Community Health:
The month since the last report has been one of the busiest since the project
started. 80 pull requests were merged in the last 4 weeks, and more importantly,
came from 21 different contributors. Both of these are new high watermarks.

Community members gave 2 Iceberg talks at Subsurface Conf, on enabling Hive
queries against Iceberg tables and working with petabyte-scale Iceberg tables.
Iceberg was also mentioned in the keynotes.

15 Jul 2020 [Ryan Blue / Craig]

## Description:
Apache Iceberg is a table format for huge analytic datasets that is designed
for high performance and ease of use.

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Iceberg was founded 2020-05-19 (2 months ago)
There are currently 9 committers and 9 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members (project graduated recently).
- No new committers were added.

## Project Activity:
In July, the community held one sync meeting to discuss general topics, and
one specifically to discuss how to include both groups that have been working
on integration with Hive.

To address the question on the last board report, the community sync meetings
are video conferences that anyone in the community is welcome to attend. The
discussion is documented and summarized for anyone that can't attend. We have
found these to be a good way to exchange context and ideas more quickly, but
recognize that this isn't the best way for some people to participate and so
we don't consider these a forum for making decisions or voting. If we come to
a tentative conclusion on a topic, it is still open for further discussion
on the dev list. The idea for this comes from the Parquet community that has
been doing this for several years.

Development activity:
* Spark vectorized reads for flat schemas was merged and benchmarked
* The Spark 3 integration branch was merged into master
* Name mapping for Parquet files without IDs was committed
* And action to compact data files was added
* Support was added for managing and adding delete files in table metadata
* Refactoring to support reuse Spark components for Flink
* Several PRs for Flink support have been committed and more are open
* CI tests for JDK 11 have been added

The community also plans to release 0.9.0 with Spark 3 support soon.

## Community Health:
Most community metrics have again increased in the last month, although dev
list traffic is a bit lower. More importantly, the community has made further
progress on several large areas with different groups leading the efforts,
like Hive support, Spark 3 support, and Flink support.

17 Jun 2020 [Ryan Blue / Roy]

## Description:
Apache Iceberg is a table format for huge analytic datasets that is designed
for high performance and ease of use.

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Iceberg was founded 2020-05-19 (21 days ago)
There are currently 9 committers and 9 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members (project graduated recently).
- No new committers were added.

## Project Activity:
There were two community syncs in May, with good discussions on adding secondary
indexes and fixing some persistent issues, like Guava library conflicts and how
to support multiple Spark versions.

Development activity:
- Row-level delete progress continues with several PRs merged
- Added support for ORC predicate push-down and metrics filtering, which is a
 significant step toward performance parity with Parquet
- The vectorized Parquet read path is passing end-to-end tests for flat data
- Guava is now shaded and relocated, unblocking integration with Hive
- The build changed dependency locking plugins to unblock Hive and Spark 3 work
- Flink contributors opened pull requests to merge the prototype sink

## Community Health:
Nearly all metrics (list traffic, pull requests, and issues opened) are showing
an increase in the last month, and the community has made significant progress
on several large extensions (ORC and Flink, notably).

20 May 2020

Establish the Apache Iceberg Project

 WHEREAS, the Board of Directors deems it to be in the best interests of
 the Foundation and consistent with the Foundation's purpose to establish
 a Project Management Committee charged with the creation and maintenance
 of open-source software, for distribution at no charge to the public,
 related to managing huge analytic datasets using a standard at-rest
 table format that is designed for high performance and ease of use.

 NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
 (PMC), to be known as the "Apache Iceberg Project", be and hereby is
 established pursuant to Bylaws of the Foundation; and be it further

 RESOLVED, that the Apache Iceberg Project be and hereby is responsible
 for the creation and maintenance of software related to managing huge
 analytic datasets using a standard at-rest table format that is designed
 for high performance and ease of use; and be it further

 RESOLVED, that the office of "Vice President, Apache Iceberg" be and
 hereby is created, the person holding such office to serve at the
 direction of the Board of Directors as the chair of the Apache Iceberg
 Project, and to have primary responsibility for management of the
 projects within the scope of responsibility of the Apache Iceberg
 Project; and be it further

 RESOLVED, that the persons listed immediately below be and hereby are
 appointed to serve as the initial members of the Apache Iceberg Project:

  * Anton Okolnychyi <aokolnychyi@apache.org>
  * Carl Steinbach   <cws@apache.org>
  * Daniel C. Weeks  <dweeks@apache.org>
  * James R. Taylor  <jamestaylor@apache.org>
  * Julien Le Dem    <julien@apache.org>
  * Owen O'Malley    <omalley@apache.org>
  * Parth Brahmbhatt <parth@apache.org>
  * Ratandeep Ratti  <rdsr@apache.org>
  * Ryan Blue        <blue@apache.org>

 NOW, THEREFORE, BE IT FURTHER RESOLVED, that Ryan Blue be appointed to
 the office of Vice President, Apache Iceberg, to serve in accordance
 with and subject to the direction of the Board of Directors and the
 Bylaws of the Foundation until death, resignation, retirement, removal
 or disqualification, or until a successor is appointed; and be it
 further

 RESOLVED, that the Apache Iceberg Project be and hereby is tasked with
 the migration and rationalization of the Apache Incubator Iceberg
 podling; and be it further

 RESOLVED, that all responsibilities pertaining to the Apache Incubator
 Iceberg podling encumbered upon the Apache Incubator PMC are hereafter
 discharged.

 Special Order 7G, Establish the Apache Iceberg Project, was
 approved by Unanimous Vote of the directors present.

15 Jan 2020

Iceberg is a table format for large, slow-moving tabular data.

Iceberg has been incubating since 2018-11-16.

### Three most important unfinished issues to address before graduating:

 1. Grow the Iceberg community
 2. Add more committers and PPMC members

### Are there any issues that the IPMC or ASF Board need to be aware of?

 No issues.

### How has the community developed since the last report?

 In the 4 months since the last report, 138 pull requests were merged for an
 average of 34.5 per month. While this is down from the previous monthly
 average of 49.6 per month for June through August, this contribution rate
 is still very active and healthy. Contributions are coming from a regular
 group of contributors outside of the initial set of committers, which is a
 positive indication for adding new committers and PPMC members over the
 next few months.

 The community released the first version of Apache Iceberg,
 0.7.0-incubating. This release used the "standard" incubator disclaimer and
 included convenience binaries. The release candidate votes were very active
 with community members testing out the release and reporting problems.

 There was an Apache Iceberg talk at ApacheCon NA in September.

### How has the project developed since the last report?

 - The community is building support for the upcoming Spark 3.0 release
 - The first PR from the vectorization branch has been merged into master
 - Support for IN and NOT IN predicates was contributed
 - Python added support for Hive metastore tables and the read path is
 near commit
 - Flaky tests have been fixed
 - Baseline checks (style, errorprone, findbugs) are now applied to all
 modules

### How would you assess the podling's maturity?
Please feel free to add your own commentary.

 - [ ] Initial setup
 - [ ] Working towards first release
 - [x] Community building
 - [x] Nearing graduation
 - [ ] Other:

### Date of last release:

 - 0.7.0-incubating was released 25 October 2019

### When were the last committers or PPMC members elected?

 - Anton Okolnychyi was added 30 August 2019

### Have your mentors been helpful and responsive?

 Yes. 4 of 5 mentors voted on the 0.7.0-incubating IPMC vote. Thanks to our
 mentors for being active!

### Is the PPMC managing the podling's brand / trademarks?

 Yes, the podling is managing the brand and is not aware of any issues.
 The project name has been approved.

### Signed-off-by:

 - [x] (iceberg) Ryan Blue
    Comments:
 - [ ] (iceberg) Julien Le Dem
    Comments:
 - [X] (iceberg) Owen O'Malley
    Comments:
 - [ ] (iceberg) James Taylor
    Comments:
 - [ ] (iceberg) Carl Steinbach
    Comments:

### IPMC/Shepherd notes:

18 Sep 2019

Iceberg is a table format for large, slow-moving tabular data.

Iceberg has been incubating since 2018-11-16.

### Three most important unfinished issues to address before graduating:

 1. Make the first Apache release.
 (https://github.com/apache/incubator-iceberg/milestone/1)
 2. Grow the Iceberg community
 3. Add more committers and PPMC members

### Are there any issues that the IPMC or ASF Board need to be aware of?

 No issues.

### How has the community developed since the last report?

 The community continues to grow steadily. In the last month:
 * 59 pull requests have been merged
 * 17 people contributed the merged PRs
 * 18 issues have been closed, 22 issues were opened

 For comparison, the last report had 74 pull requests merged over 3 months.

### How has the project developed since the last report?

 * License documentation has been completed for the Java project,
unblocking
 the first release
 * Added more documentation to iceberg.apache.org
 * Started vectorized read branch with significantly better performance
 * Added metadata tables
 * Added configuration to control statistics and truncate long values
 * Improved Hive Metastore integration
 * A working python read path has been submitted in PRs

### How would you assess the podling's maturity?

 - [ ] Initial setup
 - [x] Working towards first release
 - [x] Community building
 - [x] Nearing graduation
 - [ ] Other:

### Date of last release:

 * No release yet

### When were the last committers or PPMC members elected?

 * Anton Okolnychyi was added 30 August 2019

### Have your mentors been helpful and responsive?

 Yes

### Signed-off-by:

 - [x] (iceberg) Ryan Blue
    Comments:
 - [ ] (iceberg) Julien Le Dem
    Comments:
 - [X] (iceberg) Owen O'Malley
    Comments:
      The project also gave two presentations:
        * Berlin Buzzwords (June 2019)
        * ApacheCon NA (Sep 2019)
      Iceberg is being used in production at Netflix on huge tables, up to
25 petabytes.

 - [X] (iceberg) James Taylor
    Comments:
 - [X] (iceberg) Carl Steinbach
    Comments:
      Approval added by Ryan Blue, Carl had trouble editing the new report
      location

### IPMC/Shepherd notes:
 Justin Mclean: The included stats don't really mean much to anyone
 outside of your project, please drop them from future reports.
 The community growth section might as well be blank.
 I find it surprising that this project thinks that it is near graduation.
 Please discuss this with your mentors.

17 Jul 2019

Iceberg is a table format for large, slow-moving tabular data.
Iceberg has been incubating since 2018-11-16.

### Three most important unfinished issues to address before graduating:

 1. Update build for Apache release, add LICENSE/NOTICE to Jars.
 2. Make the first Apache release.
 (https://github.com/apache/incubator-iceberg/milestone/1)
 3. Grow the Iceberg community

### Are there any issues that the IPMC or ASF Board need to be aware of?

 * No issues that require attention.

### How has the community developed since the last report?

 * Community growth has continued with several new contributors and
 reviewers
 * Community has decided on style and added checking to CI for most modules
 * Community has started work on extending the spec for new use cases

### How has the project developed since the last report?

 * Much more content on iceberg.apache.org has been added
 * 74 pull requests have been merged, many reviewed by new community
 members
 * Work has begun to add row-level deletes and upserts to the format
 * Added support for Spark streaming, a catalog API, and numerous bug fixes
 * Contributors are reviewing code, submitting substantial features, and
 improving dev practices

### How would you assess the podling's maturity?
 Please feel free to add your own commentary.

 - [ ] Initial setup (name clearance approval pending)
 - [X] Working towards first release
 - [X] Community building
 - [ ] Nearing graduation
 - [ ] Other:

### Date of last release:

 None yet

### When were the last committers or PPMC members elected?

 None yet

### Have your mentors been helpful and responsive?

 Yes.

### Signed-off-by:

 - [X](iceberg) Ryan Blue
    Comments: I wrote the first pass of the report.
 - [ ](iceberg) Julien Le Dem
    Comments:
 - [X](iceberg) Owen O'Malley
    Comments: +1 from discussion on dev list
 - [ ](iceberg) James Taylor
    Comments:
 - [ ](iceberg) Carl Steinbach
    Comments:

### IPMC/Shepherd notes:

20 Mar 2019

Iceberg is a table format for large, slow-moving tabular data.

Iceberg has been incubating since 2018-11-16.

Three most important issues to address in the move towards graduation:

 1. Update build for Apache release, add LICENSE/NOTICE to Jars.
 2. Make the first Apache release.
 (https://github.com/apache/incubator-iceberg/milestone/1)
 3. Grow the Iceberg community

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 * No issues that require attention.

How has the community developed since the last report?

 * The community has continued to receive new contributors
 * Several contributors are reliable helping review pull requests. Because
   of these review contributions and the small number of committers, the
   community voted to relax the RTC requirements and allow committers to
   push their own changes if the community has reviewed the PR. This helps
   develop reviewers and gets changes in faster. The vote also set reasonable
   limits for this practice: PRs must be up for at least 2 days and this is only
   for the first year, while we are working with a small set of committers.

How has the project developed since the last report?

 * Podling name search concluded that Iceberg is a suitable name.
   (See PODLINGNAMESEARCH-163)
 * The community voted to accept a large PR with a Python implementation.
 * Contributors are fixing important predicate push-down issues, including
   case sensitivity, filtering on nested types, missing file metrics, etc.
 * Contributors added support for plugging in file stream encryption.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [ ] Initial setup (name clearance approval pending)
 [X] Working towards first release
 [X] Community building
 [ ] Nearing graduation
 [ ] Other:

Date of last release:

 None yet

When were the last committers or PPMC members elected?

 None yet

Have your mentors been helpful and responsive or are things falling
through the cracks? In the latter case, please list any open issues
that need to be addressed.

 Yes.

Signed-off-by:

 [X](iceberg) Ryan Blue
    Comments: I wrote the first pass of the report.
 [ ](iceberg) Julien Le Dem
    Comments:
 [X](iceberg) Owen O'Malley
    Comments: (Approval copied from +1 on dev list)
 [ ](iceberg) James Taylor
    Comments:
 [ ](iceberg) Carl Steinbach
    Comments:

20 Feb 2019

Iceberg is a table format for large, slow-moving tabular data.

Iceberg has been incubating since 2018-11-16.

Three most important issues to address in the move towards graduation:

 1. Update build for Apache release, add LICENSE/NOTICE to Jars.
 2. Make the first Apache release.
 (https://github.com/apache/incubator-iceberg/milestone/1)
 3. Grow the Iceberg community

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 * No issues that require attention.

How has the community developed since the last report?

 * Pull requests from 6 contributors were merged, 7 new contributors

How has the project developed since the last report?

 * Submitted evidence for podling name search: PODLINGNAMESEARCH-163
 * Netflix submitted a revised trademark agreement for counter-signing
 * Abstracted data file locations for community use cases
 * Reviewing proposed API update for file stream encryption plugins
 * New contributor highlights:
   - A new contributor is fixing case sensitivity in expressions
   - A new contributor opened a PR to add a startsWith predicate
   - A new contributor reviewed 4 pull requests and opened another

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [X] Initial setup (name clearance approval pending)
 [X] Working towards first release
 [X] Community building
 [ ] Nearing graduation
 [ ] Other:

Date of last release:

 None yet

When were the last committers or PPMC members elected?

 None yet

Have your mentors been helpful and responsive or are things falling
through the cracks? In the latter case, please list any open issues
that need to be addressed.

 Yes.

Signed-off-by:

 [X](iceberg) Ryan Blue
    Comments: dev list traffic appears to be increasing also
 [ ](iceberg) Julien Le Dem
    Comments:
 [ ](iceberg) Owen O'Malley
    Comments:
 [x](iceberg) James Taylor
    Comments:
 [X](iceberg) Carl Steinbach
    Comments: From dev list: "Looks good to me. +1"

IPMC/Shepherd notes:

16 Jan 2019

Iceberg is a table format for large, slow-moving tabular data.

Iceberg has been incubating since 2018-11-16.

Three most important issues to address in the move towards graduation:

 1. Finish the name clearance and trademark agreement.
 2. Make the first Apache release.
(https://github.com/apache/incubator-iceberg/milestone/1)
 3. Grow the Iceberg community

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 * Gitbox traffic is now going to issues@. The community was losing dev@
   subscribers because of the high volume of traffic from Gitbox. However,
   now all updates are sent to issues@. It would be nice to have emails
   from creation go to dev@, while updates and resolutions would go the
   issues@.
 * The trademark agreement proposed by Netflix was not acceptable to the
   ASF. It would be helpful if the ASF published the terms that the ASF
   requires to avoid trial and error. Netflix is drafting a new agreement.

How has the community developed since the last report?

 * Moved gitbox notifications to avoid loss of dev@ subscribers
 (self-reported leaving dev@).
 * New contributor activity: 3 new issues opened, 4 PRs submitted
 * 5 PRs from non-committers merged
 * 2 contributors started reviewing PRs
 * New design doc proposed by a community contributor
 * Moved issues from Netflix repository to Apache repository

How has the project developed since the last report?

 * Planned blockers for first release, 0.1.0, in milestone 1
 * Partial python implementation submitted
 * Manifest listing file added to the spec and implementation committed
 (blocker for initial release). Resulted in a significant improvement in
 query planning time for large tables.
 * Abstracted file IO API to support community use cases
 * Reviewing community proposal for external plugins to support file-level
 encryption
 * Added doc strings to schemas

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [X] Initial setup (name clearance pending)
 [X] Working towards first release
 [ ] Community building
 [ ] Nearing graduation
 [ ] Other:

Date of last release:

 None yet

When were the last committers or PPMC members elected?

 None yet

Have your mentors been helpful and responsive or are things falling
through the cracks? In the latter case, please list any open issues
that need to be addressed.

 Last month was December, so traffic has been low and both PPMC members and
 mentors were slow to respond. This is not abnormal, but the PPMC missed
 the deadline to file this report. We will ensure this doesn't recur.

Signed-off-by:

 [X](iceberg) Ryan Blue
    Comments: I wrote the first pass of the report, but after the deadline.
 [ ](iceberg) Julien Le Dem
    Comments:
 [X](iceberg) Owen O'Malley
    Comments: Approval from +1 on dev list.
 [ ](iceberg) James Taylor
    Comments:
 [ ](iceberg) Carl Steinbach
    Comments:

19 Dec 2018

Iceberg is a table format for large, slow-moving tabular data.

Iceberg has been incubating since 2018-11-16.

Three most important issues to address in the move towards graduation:

 1. Get the SGA accepted.
 2. Finish the name clearance.
 3. Make the first Apache release.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 * Gitbox integration has helped a lot, although it is frustrating that
   the team members are not allowed to configure the project and must go
   through infra for every change.
 * The traffic on the dev list from Github pull requests and issues is
   pretty heavy. It would be nice to have emails from creation go to dev@,
   while updates and resolutions would go the issues@.

How has the community developed since the last report?

 This is the first report.

How has the project developed since the last report?

 This is the first report.
 Both the software grant and trademark agreements have been submitted.
 Code has been imported and updated to use the ASF license header. LICENSE
 and NOTICE files have been updated to comply with ASF policy.
 Podling website is up at https://iceberg.apache.org.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [X] Initial setup
 [ ] Working towards first release
 [ ] Community building
 [ ] Nearing graduation
 [ ] Other:

Date of last release:

 None yet

When were the last committers or PPMC members elected?

 None yet

Have your mentors been helpful and responsive or are things falling
through the cracks? In the latter case, please list any open issues
that need to be addressed.

 We're working through the issues as they come up.

Signed-off-by:

 [X](iceberg) Ryan Blue
    Comments:
 [ ](iceberg) Julien Le Dem
    Comments:
 [X](iceberg) Owen O'Malley
    Comments: I wrote the first pass of the report.
 [X](iceberg) James Taylor
    Comments:
 [X](iceberg) Carl Steinbach
    Comments:

IPMC/Shepherd notes: