This was extracted (@ 2024-11-20 22:10) from a list of minutes
which have been approved by the Board.
Please Note
The Board typically approves the minutes of the previous meeting at the
beginning of every Board meeting; therefore, the list below does not
normally contain details from the minutes of the most recent Board meeting.
WARNING: these pages may omit some original contents of the minutes.
Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).
## Description: A column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk and is supported in many programming languages and analytics tools. ## Project Status: Current project status: Ongoing - improve Parquet footer metadata using Flatbuffers. - Define a Variant type based on the work from the Spark project - encryption feature improvements - new geometry logical type Issues for the board: none ## Membership Data: Apache Parquet was founded 2015-04-21 (9 years ago) There are currently 39 committers and 30 PMC members in this project. The Committer-to-PMC ratio is roughly 7:5. Community changes, past quarter: - Antoine Pitrou was added to the PMC on 2024-07-17 - Micah Kornfield was added to the PMC on 2024-07-17 - No new committers. Last addition was Xuwei Fu on 2024-07-11. ## Project Activity: There are a couple ongoing projects in the community. Highlighting two in particular. Improved metadata footer: - the mechanism for replacing the existing footer in a backwards compatible way with a transition period is approved - the new footer in flatbuffers format is under POC. Parquet users are encouraged to donate anonymized footers to test performance on real-life metadata. New Variant type: - The Parquet community as agreed on principle to adopt the binary format defined in the Spark community - The spec is being iterated on as part of the parquet-format repo - implementations will be contributed as well - a columnar shredding algorithm is also under discussion New geometry type: - Two PoC (java and c++) implementations are finished - The spec proposal has reached consensus from other communities (GeoParquet, Sedona, Iceberg) - Parquet community is finalizing the spec and hopefully will be released with Iceberg V3 together. ## Community Health: Healthy community. Regular discussions are held: - on the mailing list (traffic is back to normal after a surge of discussions around the start of the V3 effort) - in a recurring bi-weekly online meeting open to all and notes are posted on the mailing list.
## Description: A column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk and is supported in many programming languages and analytics tools. ## Project Status: Current project status: Parquet is an ongoing, fairly mature project. As a file format, new features are added relatively slowly as backward compatibility is required. There is an increase of activity towards making changes to improve the format under the "Parquet V3" label (see project activity below). Issues for the board: none ## Membership Data: Apache Parquet was founded 2015-04-21 (9 years ago) There are currently 38 committers and 28 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - Gang Wu was added to the PMC on 2024-05-10 - No new committers. Last addition was Gang Wu on 2023-02-28. - Julien Le Dem is now the PMC chair. Thank you Xinli for your service! ## Project Activity: - Discussions on adding Parquet extension support: (Parquet extensions: https://docs.google.com/document/d/1KkoR0DjzYnLQXO-d0oRBv2k157IZU0_injqd4eV4WiI/edit). The end goal is to allow fast iteration for new features and accelerate innovation. - Adding support for geo data types in Parquet. This is a feature that progresses in the wider Open Source data ecosystem (including in Iceberg for example). - There are discussions to clarify the process for adopting new features for parquet-format and release for Parquet Java https://lists.apache.org/thread/nq7n6pbp222txrfo232ybgpvlvpmykbp - "Parquet V3": parquet-format 2.10.0 was released on 2023-11-20 There are a few discussions under the "Parquet V3" label. I put this in quotes as the goal is not to make a major incompatible release but instead to add functionality or change the format in a backwards compatible way in a few areas: - Improve footer metadata format to improve wide schemas access: Wide schemas are schemas with many columns (1000s. 10,000s or more) Currently, the footer is one thrift data structure. This means that when reading a few columns of a very wide file, one must scan all the columns' metadata to read the few interesting columns. When the metadata is large, this is significant overhead. Current discussion includes splitting the thrift metadata or using flatbuffers (like the Arrow project). In particular this requires a mechanism to add a new footer in a way that doesn't break old readers in the transition period. - New encodings: In particular, encodings that compress better time series or strings. Consensus is to add few encodings that will solve this well on average. A few research papers on this topic have been mentioned. - Cross validation: As the ecosystem has grown quite a bit since the initial release of Parquet. There are discussions to introduce a new cross compatibility testing framework to ensure various integrations in open source or proprietary projects are compatible and respect the same semantics. See https://github.com/apache/parquet-format/issues/441 - The Parquet-MR has been renamed to Parquet-Java to better reflect what’s in the repository. Parquet-Java has done two releases: 1.14.0 in May 2024, and 1.14.1 in June 2024. - Parquet C++ implementation location: A while back the Parquet C++ was moved to the Arrow repo to ease dependency management between the 2 code bases. The C++ language in particular makes cross repo dependencies difficult. This has raised questions on whether the Parquet C++ code base should move back to its own repo to clarify governance. The current consensus (across the Parquet and Arrow PMCs) is to keep it as is because of technical difficulties to move it without making C++ development across the two repo painful. - Issue migration to GitHub: as issue tracking was being migrated for the parquet-cpp codebase, moving other issues to GitHub added relatively little overhead. We migrated 2485 past and current issues from Parquet Jira to GitHub issue trackers. We strived to keep contents and metadata as close to the originals as possible to minimize disruption to work of contributors and keep the historical record of work. Comments, issue crosslinks, attachments, versions, priorities and labels were preserved wherever possible. Authorship is indicated with Jira and GitHub (where known) usernames. All issues for Apache Parquet are now tracked in GitHub issue trackers of parquet-java, parquet-format, parquet-testing, parquet-site and arrow (for parquet-cpp). - There is some effort to document the client feature compatibility matrix across the ecosystem that is currently under discussion: https://github.com/apache/parquet-site/pull/34 ## Community Health: There is a surge in email traffic linked to the "Parquet V3" discussion summarized above (~+300% on the dev list). This should sustain over the next few quarters as we make progress towards a V3.
WHEREAS, the Board of Directors heretofore appointed Xinli Shang (shangxinli) to the office of Vice President, Apache Parquet, and WHEREAS, the Board of Directors is in receipt of the resignation of Xinli Shang from the office of Vice President, Apache Parquet, and WHEREAS, the Project Management Committee of the Apache Parquet project has chosen by vote to recommend Julien Le Dem (julien) as the successor to the post; NOW, THEREFORE, BE IT RESOLVED, that Xinli Shang is relieved and discharged from the duties and responsibilities of the office of Vice President, Apache Parquet, and BE IT FURTHER RESOLVED, that Julien Le Dem be and hereby is appointed to the office of Vice President, Apache Parquet, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed. Special Order 7F, Change the Apache Parquet Project Chair, was approved by Unanimous Vote of the directors present.
## Description: The mission of Parquet is the creation and maintenance of software related to columnar storage format available to any project in the Apache Hadoop ecosystem ## Project Status: Current project status: Ongoing Issues for the board: No ## Membership Data: Apache Parquet was founded 2015-04-21 (9 years ago) There are currently 38 committers and 27 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - No new PMC members. Last addition was Gidon Gershinsky on 2021-11-23. - No new committers. Last addition was Gang Wu on 2023-02-28. ## Project Activity: Recent releases: Format 2.10.0 was released on 2023-11-20. 1.13.1 was released on 2023-05-18. MR-1.11.2 was released on 2021-10-06. ## Community Health: dev@ had a 87% decrease in traffic (190 emails compared to 1436) issues@ had a 352% increase in traffic (661 emails compared to 146) Low number of issues or PRs are seen in past quarter.
## Description: The mission of Parquet is the creation and maintenance of software related to columnar storage format available to any project in the Apache Hadoop ecosystem ## Project Status: Current project status: Ongoing Issues for the board: n/a ## Membership Data: Apache Parquet was founded 2015-04-21 (9 years ago) There are currently 38 committers and 27 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - No new PMC members. Last addition was Gidon Gershinsky on 2021-11-23. - No new committers. Last addition was Gang Wu on 2023-02-28. ## Project Activity: Format 2.10.0 was released on 2023-11-20. 1.13.1 was released on 2023-05-18. MR-1.11.2 was released on 2021-10-06. ## Community Health: dev@parquet.apache.org had a 88% increase in traffic in the past quarter issues@parquet.apache.org had a big increase in traffic in the past quarter 47 issues opened in JIRA, past quarter (17% increase) 62 issues closed in JIRA, past quarter (195% increase) 100 commits in the past quarter (100% increase) 22 code contributors in the past quarter (100% increase) 77 PRs opened on GitHub, past quarter (37% increase) 86 PRs closed on GitHub, past quarter (79% increase)
## Description: The mission of Parquet is the creation and maintenance of software related to columnar storage format available to any project in the Apache Hadoop ecosystem ## Project Status: Current project status: Ongoing Issues for the board: No issues ## Membership Data: Apache Parquet was founded 2015-04-21 (8 years ago) There are currently 38 committers and 27 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - No new PMC members. Last addition was Gidon Gershinsky on 2021-11-23. - No new committers. Last addition was Gang Wu on 2023-02-28. ## Project Activity: Recent releases: MR-1.13.1 was released on 2023-05-18. MR-1.11.2 was released on 2021-10-06. MR-1.12.2 was released on 2021-10-06. ## Community Health: dev@parquet.apache.org had 842 emails in the past quarter(-42% change) 39 issues opened in JIRA, past quarter (-30% change) 23 issues closed in JIRA, past quarter (-41% change) 48 commits in the past quarter (-65% change) 12 code contributors in the past quarter (-55% change) 49 PRs opened on GitHub, past quarter (-46% change) 46 PRs closed on GitHub, past quarter (-47% change)
## Description: The mission of Parquet is the creation and maintenance of software related to columnar storage format available to any project in the Apache Hadoop ecosystem ## Project Status: Current project status: green Issues for the board: no issues ## Membership Data: Apache Parquet was founded 2015-04-21 (8 years ago) There are currently 38 committers and 27 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - No new PMC members. Last addition was Gidon Gershinsky on 2021-11-23. - No new committers. Last addition was Gang Wu on 2023-02-28. ## Project Activity: Recent releases: 1.13.1 was released on 2023-05-18. MR-1.11.2 was released on 2021-10-06. MR-1.12.2 was released on 2021-10-06. ## Community Health: dev@parquet.apache.org had a 28% decrease in traffic in the past quarter 43 issues opened in JIRA, past quarter (-15% change) 25 issues closed in JIRA, past quarter (-58% change) 109 commits in the past quarter (37% increase) 21 code contributors in the past quarter (-22% change) 65 PRs opened on GitHub, past quarter (-4% change) 68 PRs closed on GitHub, past quarter (17% increase)
## Description: The mission of Parquet is the creation and maintenance of software related to columnar storage format available to any project in the Apache Hadoop ecosystem ## Issues: There was no issues found in the community. ## Membership Data: Apache Parquet was founded 2015-04-21 (8 years ago) There are currently 38 committers and 27 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - No new PMC members. Last addition was Gidon Gershinsky on 2021-11-23. - Gang Wu was added as committer on 2023-02-28 ## Project Activity: MR-1.13.0 was released on 2023-04-06. MR-1.12.3 was released on 2022-05-26. MR-1.12.2 was released on 2021-10-06. MR-1.11.2 was released on 2021-10-06. ## Community Health: 40 issues opened in JIRA, past quarter (81% increase) 44 issues closed in JIRA, past quarter (238% increase) 48 commits in the past quarter (242% increase) 22 code contributors in the past quarter (100% increase) 46 PRs opened on GitHub, past quarter (100% increase) 40 PRs closed on GitHub, past quarter (100% increase) dev@parquet.apache.org had a 151% increase in traffic in the past quarter
## Description: The mission of Parquet is the creation and maintenance of software related to columnar storage format available to any project in the Apache Hadoop ecosystem ## Issues: No issues found ## Membership Data: Apache Parquet was founded 2015-04-21 (8 years ago) There are currently 37 committers and 27 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - No new PMC members. Last addition was Gidon Gershinsky on 2021-11-23. - No new committers. Last addition was Gidon Gershinsky on 2021-04-05. ## Project Activity: Recent releases: MR-1.11.2 was released on 2021-10-06. MR-1.12.2 was released on 2021-10-06. MR-1.12.0 was released on 2021-03-25. ## Community Health: dev@parquet.apache.org had a 65% decrease in traffic in the past quarter 28 issues opened in JIRA, past quarter (-22% change) 14 issues closed in JIRA, past quarter (40% increase) 16 commits in the past quarter (-27% change) 12 code contributors in the past quarter (-33% change) 25 PRs opened on GitHub, past quarter (-24% change) 22 PRs closed on GitHub, past quarter (-8% change
## Description: The mission of Parquet is the creation and maintenance of software related to columnar storage format available to any project in the Apache Hadoop ecosystem ## Issues: No issues found ## Membership Data: Apache Parquet was founded 2015-04-21 (7 years ago) There are currently 37 committers and 27 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - No new PMC members. Last addition was Gidon Gershinsky on 2021-11-23. - No new committers. Last addition was Gidon Gershinsky on 2021-04-05. ## Project Activity: Recent releases: MR-1.11.2 was released on 2021-10-06. MR-1.12.2 was released on 2021-10-06. MR-1.12.0 was released on 2021-03-25. ## Community Health: dev@parquet.apache.org had a 65 decrease in traffic 37 issues opened in JIRA, past quarter (32% increase) 9 issues closed in JIRA, past quarter (no change) 23 commits in the past quarter (-41% change) 17 code contributors in the past quarter (30% increase) 34 PRs opened on GitHub, past quarter (21% increase) 22 PRs closed on GitHub, past quarter (29% increase)
## Description: The mission of Parquet is the creation and maintenance of software related to columnar storage format available to any project in the Apache Hadoop ecosystem ## Issues: There is no issue found. ## Membership Data: Apache Parquet was founded 2015-04-21 (7 years ago) There are currently 37 committers and 27 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - No new PMC members. Last addition was Gidon Gershinsky on 2021-11-23. - No new committers. Last addition was Gidon Gershinsky on 2021-04-05. ## Project Activity: MR-1.12.3 was released on 2022-05-26. MR-1.11.2 was released on 2021-10-06. MR-1.12.2 was released on 2021-10-06. MR-1.12.0 was released on 2021-03-25. ## Community Health: dev@ had a 65% decrease in the past quarter (270 emails compared to 751) 27 issues opened in JIRA, past quarter (no change) 8 issues closed in JIRA, past quarter (-52% change) 38 commits in the past quarter (18% increase) 12 code contributors in the past quarter (20% increase) 27 PRs opened on GitHub, past quarter (-20% change) 17 PRs closed on GitHub, past quarter (-43% change
## Description: The mission of Parquet is the creation and maintenance of software related to columnar storage format available to any project in the Apache Hadoop ecosystem ## Issues: No issues found ## Membership Data: Apache Parquet was founded 2015-04-21 (7 years ago) There are currently 37 committers and 27 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - No new PMC members. Last addition was Gidon Gershinsky on 2021-11-23. - No new committers. Last addition was Gidon Gershinsky on 2021-04-05. ## Project Activity: Recent releases: MR-1.11.2 was released on 2021-10-06 MR-1.12.2 was released on 2021-10-06 MR-1.12.0 was released on 2021-03-25 New website parquet.apache.org was launched in March 2022 ## Community Health: 25 issues opened in JIRA, past quarter (150% increase) 18 issues closed in JIRA, past quarter (63% increase) 30 commits in the past quarter (172% increase) 10 code contributors in the past quarter (42% increase) 32 PRs opened on GitHub, past quarter (190% increase) 29 PRs closed on GitHub, past quarter (163% increase) dev@parquet.apache.org had a 65% decrease in traffic in the past quarter
## Description: The mission of Parquet is the creation and maintenance of software related to columnar storage format available to any project in the Apache Hadoop ecosystem ## Issues: No issues found ## Membership Data: Apache Parquet was founded 2015-04-21 (7 years ago) There are currently 37 committers and 27 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - Gidon Gershinsky was added to the PMC on 2021-11-23 - No new committers. Last addition was Gidon Gershinsky on 2021-04-05. ## Project Activity: Recent releases: MR-1.11.2 was released on 2021-10-06. MR-1.12.2 was released on 2021-10-06. ## Community Health: dev@parquet.apache.org had a 65% decrease in traffic in the past quarter 9 issues opened in JIRA, past quarter (-75% change) 11 issues closed in JIRA, past quarter (-45% change) 7 commits in the past quarter (-85% change) 7 code contributors in the past quarter (-53% change) 11 PRs opened on GitHub, past quarter (-47% change) 10 PRs closed on GitHub, past quarter (-54% change)
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. (Now as part of apache arrow) ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Parquet was founded 2015-04-21 (6 years ago) There are currently 37 committers and 26 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - No new PMC members. Last addition was Xinli Shang on 2020-11-09. - No new committers. Last addition was Gidon Gershinsky on 2021-04-05. ## Project Activity: - Released parquet-mr 1.12.1 on 2021-09-13. - Adding high throughput column encryption rewriter. - Support native 'in' predicate in FilterAPI - Bug fixes ## Community Health: Commit activity has dropped over the summer, with a decrease (-26%). The overall activities are lower than last quarter. We will see if it can regain in next quarter.
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. (Now as part of apache arrow) ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Parquet was founded 2015-04-21 (6 years ago) There are currently 37 committers and 26 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - No new PMC members. Last addition was Xinli Shang on 2020-11-09. - Gidon Gershinsky was added as committer on 2021-04-05 ## Project Activity: - define core features / compliance levels for different implementations of parquet-format. - bug fixes - improvements related to ZSTD, INT96, and reliability ## Community Health: The regained activity continues, although it is slightly down in June. It could be attributed to the coming of summer.
WHEREAS, the Board of Directors heretofore appointed Julien Le Dem (julien) to the office of Vice President, Apache Parquet, and WHEREAS, the Board of Directors is in receipt of the resignation of Julien Le Dem from the office of Vice President, Apache Parquet, and WHEREAS, the Project Management Committee of the Apache Parquet project has chosen by vote to recommend Xinli Shang (shangxinli) as the successor to the post; NOW, THEREFORE, BE IT RESOLVED, that Julien Le Dem is relieved and discharged from the duties and responsibilities of the office of Vice President, Apache Parquet, and BE IT FURTHER RESOLVED, that Xinli Shang be and hereby is appointed to the office of Vice President, Apache Parquet, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed. Special Order 7A, Change the Apache Parquet Project Chair, was approved by Unanimous Vote of the directors present.
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. (Now as part of apache arrow) ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Parquet was founded 2015-04-21 (6 years ago) There are currently 37 committers and 26 PMC members in this project. The Committer-to-PMC ratio is roughly 5:4. Community changes, past quarter: - No new PMC members. Last addition was Xinli Shang on 2020-11-09. - Gidon Gershinsky was added as committer on 2021-04-05 ## Project Activity: Latest release: MR-1.12.0 was released on 2021-03-25. main features: - encryption - bloom filter - BYTE_STREAM_SPLIT encoding many bug fixes https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1120 ## Community Health: Nice to see an increase in activity after the somewhat slower activity for the past year.
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. (Now as part of apache arrow) ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Parquet was founded 2015-04-21 (6 years ago) There are currently 36 committers and 26 PMC members in this project. The Committer-to-PMC ratio is roughly 9:7. Community changes, past quarter: - Xinli Shang was added to the PMC on 2020-11-09 - No new committers. Last addition was Antoine Pitrou on 2020-05-21. ## Project Activity: - bug fixes - improvements related to encryption feature - dependency maintenance updates ## Community Health: Regain of activity after the pandemic slow down, in particular on the mailing list and contributors on github.
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. (Now as part of apache arrow) ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Parquet was founded 2015-04-21 (5 years ago) There are currently 36 committers and 25 PMC members in this project. The Committer-to-PMC ratio is roughly 9:7. Community changes, past quarter: - No new PMC members. Last addition was Gábor Szádovszky on 2019-06-27. - No new committers. Last addition was Antoine Pitrou on 2020-05-21. ## Project Activity: Ongoing efforts: - encryption - integrations improvements / bug fixes: (avro, thrift, protobuf) - release ## Community Health: - Still lower activity at the moment. Possibly attributed to the pandemic. - github activity is picking up.
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. (Now as part of apache arrow) ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Parquet was founded 2015-04-21 (5 years ago) There are currently 36 committers and 25 PMC members in this project. The Committer-to-PMC ratio is roughly 9:7. Community changes, past quarter: - No new PMC members. Last addition was Gábor Szádovszky on 2019-06-27. - Antoine Pitrou was added as committer on 2020-05-21 - Micah Kornfield was added as committer on 2020-05-21 ## Project Activity: Ongoing discussion regarding: - encryption feature (now used in production at Uber) - Hardware acceleration (in particular for compression) - bug fixes - next release ## Community Health: - Somewhat lower activity this quarter that might be related to the ongoing pandemic.
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. (Now as part of apache arrow) ## Issues: there are no issues requiring board attention at this time ## Membership Data: Apache Parquet was founded 2015-04-21 (5 years ago) There are currently 34 committers and 25 PMC members in this project. The Committer-to-PMC ratio is roughly 9:7. Community changes, past quarter: - No new PMC members. Last addition was Gábor Szádovszky on 2019-06-27. - Xinli Shang was added as committer on 2020-03-12 ## Project Activity: Work in progress: - encryption - bloom filters - improvements to CLI working on release 1.11.1 Recent releases: - Parquet Format 2.8.0 was released on 2020-01-13. - Parquet 1.11.0 was released on 2019-12-06. - Parquet Format 2.7.0 was released on 2019-09-29. ## Community Health: JIRA and PRs are opened and resolved at a healthy pace discussions happening around: releases, encryption, bloom filters, CLI improvement
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. (Now as part of apache arrow) ## Issues: there are no issues requiring board attention at this time ## Membership Data: Apache Parquet was founded 2015-04-21 (5 years ago) There are currently 33 committers and 25 PMC members in this project. The Committer-to-PMC ratio is roughly 9:7. Community changes, past quarter: - No new PMC members. Last addition was Gábor Szádovszky on 2019-06-27. - No new committers. Last addition was Fokko Driesprong on 2019-06-25. ## Project Activity: We released 1.11.0: https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1110 In particular, it includes: - column indexes. - new logical types - nanosecond precision timestamps also many bug fixes and dependencies updates. Still in progress: encryption Recent releases: 1.11.0 was released on 2019-12-06. Format 2.7.0 was released on 2019-09-29. ## Community Health: dev@parquet.apache.org had a 9% increase in traffic in the past quarter (624 emails compared to 569) We're closing tickets at a reasonable rate 61 issues opened in JIRA, past quarter (17% increase) 54 issues closed in JIRA, past quarter (8% increase)
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. (Now as part of apache arrow) ## Issues: there are no issues requiring board attention at this time ## Membership Data: Apache Parquet was founded 2015-04-21 (4 years ago) There are currently 33 committers and 25 PMC members in this project. The Committer-to-PMC ratio is roughly 9:7. Community changes, past quarter: - No new PMC members. Last addition was Gábor Szádovszky on 2019-06-27. - No new committers. Last addition was Fokko Driesprong on 2019-06-25. ## Project Activity: - Format 2.7.0 was released on 2019-09-29. working towards a parquet-mr release to go with it. ## Community Health: JIRA activity is fairly stable, tickets are opened and closed at a similar rate. - 51 issues opened in JIRA, past quarter (-21% decrease) - 46 issues closed in JIRA, past quarter (-4% decrease) there is a bit of activity in finalizing big efforts that have been in the work for a while (encryption, bloom filters) - 40 commits in the past quarter (21% increase) - 19 code contributors in the past quarter (72% increase) - 40 PRs opened on GitHub, past quarter (-16% decrease) - 49 PRs closed on GitHub, past quarter (40% increase) Nice exploration of floating point compression from the community.
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. (Now as part of apache arrow) ## Issues: there are no issues requiring board attention at this time ## Activity: Following up on activity from last report. We have recently added new committers and a PMC member. We are still working on releasing 1.10. There’s been agreement on a plan to get there but is still slow moving. ## Health report: The discussion volume on the mailing lists is stable. Tickets get created and closed at a reasonable pace. ## PMC changes: - Currently 25 PMC members. - Gábor Szádovszky was added to the PMC on Fri Jun 28 2019 ## Committer base changes: - Currently 33 committers. - New commmitters: - Fokko Driesprong was added as a committer on Tue Jun 25 2019 - Nándor Kollár was added as a committer on Tue Jun 25 2019 ## Releases: - Last release was Format 2.6.0 on Tue Oct 02 2018 ## Mailing list activity: - dev@parquet.apache.org: - 238 subscribers (up 14 in the last 3 months): - 693 emails sent to list (684 in previous quarter) ## JIRA activity: - 63 JIRA tickets created in the last 3 months - 48 JIRA tickets closed/resolved in the last 3 months
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. (Now as part of apache arrow) ## Issues: there are no issues requiring board attention at this time ## Activity: We have been working towards releasing parquet 1.11.0 We have been slow at validating the release. This is dues in part to the scope of the release affecting the file format itself and warranting more scrutiny to ensure backwards compatibility. We are actively discussing how to improve our processes. Current actions considered: - Clarify the vetting process for such releases. - Simplify/Automate the release validation process. - Review potential PMC candidates in our current contributors. ## Health report: The discussion volume on the mailing lists is stable. Tickets get created and closed at a reasonable pace. ## PMC changes: - Currently 24 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Zoltan Ivanfi on Sun Apr 15 2018 ## Committer base changes: - Currently 31 committers. - No new committers added in the last 3 months - Last committer addition was Benoit Hanotte at Mon May 28 2018 ## Releases: - Last release was Format 2.6.0 on Mon Oct 01 2018 ## Mailing list activity: - email volume is stable, JIRA opened and closed at a similar pace - dev@parquet.apache.org: - 224 subscribers (up 5 in the last 3 months): - 684 emails sent to list (517 in previous quarter) ## JIRA activity: - 66 JIRA tickets created in the last 3 months - 65 JIRA tickets closed/resolved in the last 3 months
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. ## Issues: No issue at this time ## Activity: Current activity around: encryption Page indexing cutting a new release improvement on parquet-proto ## Health report: The discussion volume on the mailing lists is stable. Tickets get created and closed at a reasonable pace. ## PMC changes: - Currently 24 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Zoltan Ivanfi on Sun Apr 15 2018 ## Committer base changes: - Currently 31 committers. - No new committers added in the last 3 months - Last committer addition was Benoit Hanotte at Mon May 28 2018 ## Releases: - Last release was Format 2.6.0 on Mon Oct 01 2018 ## Mailing list activity: - dev@parquet.apache.org: - 216 subscribers (up 2 in the last 3 months): - 529 emails sent to list (757 in previous quarter) ## JIRA activity: - 49 JIRA tickets created in the last 3 months - 65 JIRA tickets closed/resolved in the last 3 months
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. ## Issues: No issue at this time ## Activity: As mentioned in the Arrow report: - the Arrow and Parquet communities resolved by vote to merge their respective C++ codebases in the Apache Arrow repository. This work was completed this quarter. - We now need to update the parquet-cpp repository accordingly. Current activity around: - encryption - Page indexing - Bug fixes ## Health report: The discussion volume on the mailing lists is stable. Tickets get created and closed at a reasonable pace ## PMC changes: - Currently 24 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Zoltan Ivanfi on Sun Apr 15 2018 ## Committer base changes: - Currently 31 committers. - No new committers added in the last 3 months - Last committer addition was Benoit Hanotte at Mon May 28 2018 ## Releases: - CPP-1.5.0 was released on Wed Sep 19 2018 - Format 2.6.0 was released on Mon Oct 01 2018 ## Mailing list activity: - dev@parquet.apache.org: - 215 subscribers (up 7 in the last 3 months): - 797 emails sent to list (880 in previous quarter) ## JIRA activity: - 94 JIRA tickets created in the last 3 months - 55 JIRA tickets closed/resolved in the last 3 months
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. ## Issues: there are no issues requiring board attention at this time ## Activity: Progress on encryption functionality bloom filters discussions Page indexes implementation ## Health report: The discussion volume on the mailing lists is increasing. Tickets get created and closed at a reasonable pace ## PMC changes: - Currently 24 PMC members. - Zoltan Ivanfi was added to the PMC on Mon Apr 16 2018 ## Committer base changes: - Currently 31 committers. - New commmitters: - Benoit Hanotte was added as a committer on Mon May 28 2018 - Costi Muraru was added as a committer on Sat May 19 2018 - Gábor Szádovszky was added as a committer on Wed May 16 2018 ## Releases: - 1.8.3 was released on Fri May 11 2018 - Format 2.5.0 was released on Wed Apr 18 2018 ## Mailing list activity: Steady activity - dev@parquet.apache.org: - 208 subscribers (up 3 in the last 3 months): - 898 emails sent to list (875 in previous quarter) ## JIRA activity: - 77 JIRA tickets created in the last 3 months - 61 JIRA tickets closed/resolved in the last 3 months
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. ## Issues: No issue at this time ## Activity: - Statistics bug fixes - ongoing votes for parquet-format and parquet-mr release - ongoing votes for parquet-rust contribution - parquet-proto improvements and new contributors ## Health report: The discussion volume on the mailing lists is increasing. Tickets get created and closed at a reasonable pace ## PMC changes: - Currently 23 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Uwe Korn on Sun Mar 26 2017 ## Committer base changes: - Currently 28 committers. - No new committers added in the last 3 months - Last committer addition was Zoltan Ivanfi at Fri Oct 27 2017 ## Releases: - CPP-1.4.0 was released on Mon Mar 05 2018 ## Mailing list activity: - mailing list activity up this quarter - dev@parquet.apache.org: - 202 subscribers (up 5 in the last 3 months): - 933 emails sent to list (573 in previous quarter) ## JIRA activity: - 77 JIRA tickets created in the last 3 months - 54 JIRA tickets closed/resolved in the last 3 months
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. ## Issues: No issue at this time ## Activity: Current activity around: Deprecating int96 timestamp Preparing for a release Supporting dot net integration Min max stats improvement Page indexing new features Bloom filters support ## Health report: The discussion volume on the mailing lists is stable. Tickets get created and closed at a reasonable pace ## PMC changes: - Currently 23 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Uwe Korn on Sun Mar 26 2017 ## Committer base changes: - Currently 28 committers. - New commmitters: - Lars Volker was added as a committer on Mon Oct 16 2017 - Zoltan Ivanfi was added as a committer on Fri Oct 27 2017 ## Releases: - CPP-1.3.1 was released on Fri Oct 27 2017 - Format 2.4.0 was released on Sat Oct 21 2017 ## Mailing list activity: - dev@parquet.apache.org: - 200 subscribers (up 12 in the last 3 months): - 633 emails sent to list (432 in previous quarter) ## JIRA activity: - 54 JIRA tickets created in the last 3 months - 41 JIRA tickets closed/resolved in the last 3 months
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. ## Issues: there are no issues requiring board attention at this time. ## Activity: - Ongoing work to add Bloom Filters to parquet format. Discussion around the prototype and java<->cpp interoperability - Prototype is ready for adding page offset metadata in the footer and using it for better push down. Ready to proceed with merging metadata. - compression with Brotli and Zstandard - Preparing a parquet-format release ## Health report: - issues: tickets closed about at the same rate they are opened - mailing list email level is stable. ## PMC changes: - Currently 23 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Uwe Korn on Sun Mar 26 2017 ## Committer base changes: - Currently 26 committers. - Deepak Majeti was added as a committer on Tue Aug 01 2017 ## Releases: - CPP-1.2.0 was released on Sun Jul 30 2017 - CPP-1.3.0 was released on Sun Sep 24 2017 ## Mailing list activity: - activity stable since the last report - dev@parquet.apache.org: - 188 subscribers (up 3 in the last 3 months): - 468 emails sent to list (618 in previous quarter) ## JIRA activity: - 74 JIRA tickets created in the last 3 months - 54 JIRA tickets closed/resolved in the last 3 months
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. ## Issues: there are no issues requiring board attention at this time. ## Activity: - Ongoing work to add Bloom Filters to parquet format. Discussion around the prototype and java<->cpp interoperability - Ongoing prototype for adding page offset metadata in the footer and using it for better push down. - Preparing a patch level release of parquet-mr - Planning release 1.2.0 of parquet-cpp - activity around protocol buffer integration ## Health report: - issues: tickets closed about at the same rate they are opened - mailing list email level is stable. ## PMC changes: - Currently 23 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Uwe Korn on Sun Mar 26 2017 ## Committer base changes: - Currently 25 committers. - No new committers added in the last 3 months - Last committer addition was Uwe Korn at Sun Sep 04 2016 ## Releases: - CPP-1.1.0 was released on Sun May 21 2017 ## Mailing list activity: - activity stable since the last report - dev@parquet.apache.org: - 185 subscribers (up 8 in the last 3 months): - 637 emails sent to list (638 in previous quarter) ## JIRA activity: - 101 JIRA tickets created in the last 3 months - 91 JIRA tickets closed/resolved in the last 3 months
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the format along with integrations to various projects (thrift, pig, protobuf, avro, ...) - parquet-cpp: C++ apis and implementation of the format along with Python bindings and arrow integration. ## Issues: there are no issues requiring board attention at this time ## Activity: - We had our first parquet-cpp release (kudos Uwe and Wes!) - Several threads relating to time types and statistics. ## Health report: We host regular public sync ups on hangout. Notes are sent to the mailing list and follow ups happen on JIRA and github pull requests. Recently we're also using google docs attached to JIRAs to drive the discussion. ## PMC changes: - Currently 23 PMC members. - Uwe Korn was added to the PMC on Sun Mar 26 2017 ## Committer base changes: - Currently 25 committers. - No new committers added in the last 3 months - Last committer addition was Uwe Korn at Sun Sep 04 2016 ## Releases: - 1.8.2 was released on Mon Jan 23 2017 - CPP-1.0.0 was released on Mon Mar 13 2017 ## Mailing list activity: Some of the extra activity related to 1.8.2 release for spark, time types discussion, parquet-cpp release, statistics discussions. - dev@parquet.apache.org: - 178 subscribers (up 2 in the last 3 months): - 656 emails sent to list (418 in previous quarter) ## JIRA activity: - 120 JIRA tickets created in the last 3 months - 90 JIRA tickets closed/resolved in the last 3 months
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. ## Issues: there are no issues requiring board attention at this time ## Activity: - parquet-arrow integration has been added in parquet-cpp - We're preparing a 1.8.2 patch release for the Apache Spark project - We're preparing parquet-cpp 0.1: its first release (PARQUET-713) ## Health report: Discussion is happening on the mailing list, JIRA and regular hangout sync up. Notes are sent to the mailing list. ## PMC changes: - Currently 22 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Wes McKinney on Thu Sep 01 2016 ## Committer base changes: - Currently 25 committers. - No new committers added in the last 3 months - Last committer addition was Uwe Korn at Sun Sep 04 2016 ## Releases: - 1.9.0 was released on Sun Oct 23 2016 ## Mailing list activity: - Activity on the mailing list is still relatively the same - JIRAS are resolved about at the same pace they are opened. - dev@parquet.apache.org: - 176 subscribers (up 3 in the last 3 months): - 452 emails sent to list (436 in previous quarter) ## JIRA activity: - 81 JIRA tickets created in the last 3 months - 67 JIRA tickets closed/resolved in the last 3 months
Report from the Apache Parquet committee [Julien Le Dem] ## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. ## Issues: there are no issues requiring board attention at this time ## Activity: The community has been converging toward a 1.9 release. The vote will start in the coming days. Discussion about better encoding and vectorization apis are ongoing. The parquet-cpp repo has reached a stable state and should release soon. Integration with arrow-cpp is now in the parquet-cpp repo. ## Health report: The PMC and committer list are growing. Discussion is happening on the mailing list, JIRA and regular hangout sync up. Notes are sent to the mailing list. ## PMC changes: - Currently 22 PMC members. - Wes McKinney was added to the PMC on Thu Sep 01 2016 ## Committer base changes: - Currently 25 committers. - Uwe Korn was added as a committer on Sun Sep 04 2016 ## Releases: - Last release was Format 2.3.1 on Thu Dec 17 2015 - parquet-mr 1.9.0 vote ongoing ## Mailing list activity: - Activity on the mailing list is still relatively the same - JIRAS are resolved about at the same pace they are opened. - dev@parquet.apache.org: - 172 subscribers (up 9 in the last 3 months): - 486 emails sent to list (394 in previous quarter) ## JIRA activity: - 85 JIRA tickets created in the last 3 months - 74 JIRA tickets closed/resolved in the last 3 months
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. ## Issues: there are no issues requiring board attention at this time ## Activity: - Work on stabilizing master preparing for a release of parquet-mr (ByteBuffer) - encoding strategy experiments - Bytebuffer stabilization. - Brotli compression experiments - parquet-cpp development - discussion about vectorized reads and Apache Arrow integration ## Health report: - JIRAs opened and closed at the same rate - email activity was more important last quarter due to parquet-cpp kickoff and discussions. ## PMC changes: - Currently 21 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Alex Levenson on Tue Apr 21 2015 ## Committer base changes: - Currently 24 committers. - No new committers added in the last 3 months - Last committer addition was Wes McKinney at Thu Mar 03 2016 ## Releases: - Last release was Format 2.3.1 on Thu Dec 17 2015 ## Mailing list activity: Last quarter had more email activity dues to the kickoff of parquet-cpp - dev@parquet.apache.org: - 163 subscribers (up 5 in the last 3 months): - 427 emails sent to list (901 in previous quarter) ## JIRA activity: - 81 JIRA tickets created in the last 3 months - 80 JIRA tickets closed/resolved in the last 3 months
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. ## Issues: there are no issues requiring board attention at this time ## Activity: There is a surge of activity related to the development of the Parquet-cpp library. Initially Parquet had a java implementation as well as reference implementations for some encodings in C++. The C++ version is now being fully implemented. A new committer has been recently invited based on that work. ## Health report: The project is healthy. We have new contributors. Communication happens on the mailing list and on regular public hangout sync ups for which notes are published on the mailing list. ## PMC changes: - Currently 21 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Alex Levenson on Tue Apr 21 2015 ## Committer base changes: - Currently 24 committers. - Wes McKinney was added as a committer on Thu Mar 03 2016 ## Releases: - Format 2.3.1 was released on Thu Dec 17 2015 ## Mailing list activity: A surge of emails related to the development of parquet-cpp - dev@parquet.apache.org: - 152 subscribers (up 14 in the last 3 months): - 940 emails sent to list (361 in previous quarter) ## JIRA activity: - 158 JIRA tickets created in the last 3 months - 109 JIRA tickets closed/resolved in the last 3 months
## Description: Apache Parquet is a general-purpose columnar storage format. ## Issues: there are no issues requiring board attention at this time ## Activity: All changes required by Apache Drill have been merged into Apache Parquet, getting Drill off of its Parquet fork. Releases are ongoing to allow Drill to upgrade its dependencies. Several efforts are ongoing to improve vectorized reads from Java and C++ They involve collaboration of several organizations. Communication is happening in JIRA ## Health report: We have now a rotation to have someone responsible for answering JIRAs and emails each week. Level of ticket creation and resolution is about the same, keeping opened tickets to a reasonable amount. Typically user activity shows up in the user lists of other projects depending on parquet (drill, impala, presto, spark, ...) ## PMC changes: - Currently 21 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Alex Levenson on Tue Apr 21 2015 ## Committer base changes: - Currently 23 committers. - New commmitters: - Cheng Lian was added as a committer on Wed Dec 02 2015 - Sergio Peña was added as a committer on Wed Dec 02 2015 ## Releases: - Parquet-Format 2.3.1 was released on Thu Dec 17 2015 - Parquet-mr 1.9.0 in preparation ## Mailing list activity: - dev@parquet.apache.org: - 147 subscribers (up 17 in the last 3 months): - 466 emails sent to list (396 in previous quarter) ## JIRA activity: - 40 JIRA tickets created in the last 3 months - 36 JIRA tickets closed/resolved in the last 3 months
## Description: Apache Parquet is a general-purpose columnar storage format. ## Issues: there are no issues requiring board attention at this time ## Activity: - Bloom filters: need to finalize the design. Have use cases to validate it (query execution, etc) - Vectorized read API: refactoring of the code based on feedback. - Using dict in filter push down: rework to have better code reuse. - ByteBuffer: close to being merged. ## Health report: The project is fairly stable with new features and compatibility testing underway. ## PMC changes: - Currently 21 PMC members . - No new PMC members added in the last 3 months - Last PMC addition was Alex Levenson at Tue Apr 21 2015 ## LDAP changes: - Currently 21 committers and 21 committee group members. - No new changes to the committee group or committership since last report. ## Releases: - 1.8.1 was released on Tue Jul 21 2015 ## Mailing list activity: - dev@parquet.apache.org: - 130 subscribers (up 13 in the last 3 months): - 367 emails sent to list (705 in previous quarter) ## JIRA activity: - 53 JIRA tickets created in the last 3 months - 25 JIRA tickets closed/resolved in the last 3 months
Apache Parquet is a general-purpose columnar storage format. ## Activity: We're working towards a 1.8.0 release and merging the ByteBuffer PR (ZeroCopy HDFS reads) Our goal is to keep master in a releasable state and to do releases quickly. ## Issues: - there are no issues requiring board attention at this time ## LDAP committee group/Committership changes: - Currently 21 committers and 21 LDAP committee group members. - No new changes to the LDAP committee group or committership since last report. Two new PMC members Alex Levenson and Daniel Weeks were added on Dec 28th 2014 ## Releases: - 1.7.0 was released on Mon May 18 2015 - 1.8.0 is being voted on. ## Mailing list activity: - dev@parquet.apache.org: - 116 subscribers (up 5 in the last 3 months): - 707 emails sent to list (722 in previous quarter) ## JIRA activity: - 79 JIRA tickets created in the last 3 months - 64 JIRA tickets closed/resolved in the last 3 months
## Description: Apache Parquet is a general-purpose columnar storage format. ## Activity: We're working towards a 1.8.0 release and merging the ByteBuffer PR (ZeroCopy HDFS reads) Our goal is to keep master in a releasable state and to do releases quickly. ## Issues: there are no issues requiring board attention at this time ## PMC/Committership changes: - Currently 21 committers and 21 PMC members in the project. - No new changes to the PMC or committership since last report. Two new PMC members Alex Levenson and Daniel Weeks were added on Dec 28th 2014 ## Releases: - 1.7.0 was released on Mon May 18 2015 ## Mailing list activity: - dev@parquet.apache.org: - 112 subscribers (up 12 in the last 3 months): - 829 emails sent to list (459 in previous quarter) ## JIRA activity: - 91 JIRA tickets created in the last 3 months - 57 JIRA tickets closed/resolved in the last 3 months
Parquet is a columnar file format for Hadoop. ## Project Status The project just graduated from the incubator and is voting on its first release as a TLP. No issues to report. ## Community - Two new PMC members Alex Levenson and Daniel Weeks on Dec 28th 2014 - No new committer or PMC member since last report in April - JIRA past 30 days: 30 created and 22 resolved as of May 18th https://issues.apache.org/jira/browse/PARQUET - 114 subscribers to the dev mailing list as of May 18th - emails on the dev list: Apr: 397, Mar: 319, Feb: 135, Jan: 112 http://mail-archives.apache.org/mod_mbox/parquet-dev/ - commits: Apr: 84, Mar: 38, Feb: 47, Jan: 9 http://mail-archives.apache.org/mod_mbox/parquet-commits/ - regular project sync ups are held on hangout. They are open to anyone and advertised on the dev mailing list notes are then published on the list as well - several Parquet related presentations scheduled at the Hadoop summit in June http://2015.hadoopsummit.org/san-jose/agenda/ # Community Objectives The community main objectives (not excluding other efforts also ongoing) - Working towards merging the ByteBuffer access work - Vectorized execution improvements (and integration with Apache Drill, Apache Hive, Presto) - Improving Projection and Predicate APIs - Standardizing nested type representations (thrift and avro write-side) - Improving high-level type specs (microsecond time/timestamp) ## Releases - Last releases: - parquet-mr 1.6.0-incubating on Apr 12th: https://dist.apache.org/repos/dist/release/parquet/parquet-mr-1.6.0-incubating/ - parquet-mr 1.7.0 on May 18th (just voted): https://dist.apache.org/repos/dist/release/parquet/parquet-mr-1.7.0/ - Next release: a parquet-format release will happen soon.
WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to a columnar storage format for Hadoop. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache Parquet Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Parquet Project be and hereby is responsible for the creation and maintenance of software related to a columnar storage format for Hadoop; and be it further RESOLVED, that the office of "Vice President, Apache Parquet" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Parquet Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Parquet Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Parquet Project: * Chris Aniszczyk <caniszczyk@apache.org> * Ryan Blue <blue@apache.org> * Jonathan Coveney <jcoveney@apache.org> * Tim <tianshuo@apache.org> * Jake Farrell <jfarrell@apache.org> * Marcel Kornacker <marcel@apache.org> * Mickael Lacour <mlacour@apache.org> * Julien Le Dem <julien@apache.org> * Alex Levenson <alexlevenson@apache.org> * Nong Li <nong@apache.org> * Todd Lipcon <todd@apache.org> * Chris Mattmann <mattmann@apache.org> * Aniket Mokashi <aniket486@apache.org> * Lukas Nalezenec <lukas@apache.org> * Brock Noland <brock@apache.org> * Wesley Graham Peck <wesleypeck@apache.org> * Remy Pecqueur <rpecqueur@apache.org> * Dmitriy Ryaboy <dvryaboy@apache.org> * Roman Shaposhnik <rvs@apache.org> * Daniel Weeks <dweeks@apache.org> * Thomas White <tomwhite@apache.org> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Julien Le Dem be appointed to the office of Vice President, Apache Parquet, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the initial Apache Parquet PMC be and hereby is tasked with the creation of a set of bylaws intended to encourage open development and increased participation in the Apache Parquet Project; and be it further RESOLVED, that the Apache Parquet Project be and hereby is tasked with the migration and rationalization of the Apache Incubator Parquet podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator Parquet podling encumbered upon the Apache Incubator Project are hereafter discharged. Special Order 7D, Establish the Apache Parquet Project, was approved by Unanimous Vote of the directors present.
Parquet is a columnar storage format for Hadoop. Parquet has been incubating since 2014-05-20. Three most important issues - 1st releases toward org.apache Parquet 1.6.0 GA - Expanding the community and adding new committers - Ensuring timely code reviews by committers, developing reviewers Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? - None at this time Latest Additions: * PMC addition: None * Committer addition: Dan Weeks and Alex Levenson (from last report) Issue backlog status since last report: * Created: 34 * Resolved: 50 Mailing list activity since last report: * dev 560 messages: 111 in Jan, 136 in Feb, and 313 in Mar How has the project developed since the last report? - Preparing last commits for the first parquet-mr release candidate - Planned parquet-mr 1.6.0 release schedule - ASF required changes to parquet-mr are finished - Released parquet-format 2.3.0, with org.apache packages - Parquet presentation at Strata 2015 San Jose and the Presto meetup Date of last release: - parquet-format 2.3.0 released 19 Feb - Not yet released: parquet-mr and parquet-cpp Signed-off-by: [ ](parquet) Todd Lipcon [X](parquet) Jake Farrell [X](parquet) Chris Mattmann [X](parquet) Roman Shaposhnik [ ](parquet) Tom White
Parquet is a columnar storage format for Hadoop. Parquet has been incubating since 2014-05-20 . Three most important issues - Expanding the community and adding new committers - 1st releases toward org.apache Parquet 1.6.0 GA - Identifying how to ensure timely code reviews by committers Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? - None at this time Latest Additions: * PMC addition: None * Contributor addition: Dan Weeks and Alex Levenson Issue backlog status since last report: * Created: 45 * Resolved: 20 Mailing list activity since last report: * dev 310 messages: 90 in Oct, 126 in Nov, and 94 in Dec How has the project developed since the last report? - Completed first release, Apache Parquet Format (incubating) 2.2.0 - Established a by-law for adding committers - Added 2 new committers - Parquet presentation accepted for Strata San Jose Date of last release: - parquet-format released 14 November 2014 - Not yet released: parquet-mr and parquet-cpp Signed-off-by: [ ](parquet) Todd Lipcon [X](parquet) Jake Farrell [X](parquet) Chris Mattmann [X](parquet) Roman Shaposhnik [ ](parquet) Tom White Shepherd/Mentor notes: Mailing lists are active; most mentors are active.
Parquet is a columnar storage format for Hadoop. Parquet has been incubating since 2014-05-20 . Three most important issues - Expanding the community and adding new committers - 1st releases - Identifying how to ensure timely code reviews by committers Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? - None at this time Latest Additions: * PMC addition: N/A * Contributor addition: N/A Issue backlog status since last report: * Created: 27 * Resolved: 19 Mailing list activity since last report: * dev 144 messages How has the project developed since the last report? - Attempted parquet-format release twice, next RC in early October. - Assembled tasks to complete for a parquet-mr release - New push-down filter API and task-side block metadata reading Date of last release: - No releases as of yet. Signed-off-by: [ ](parquet) Todd Lipcon [X](parquet) Jake Farrell [ ](parquet) Chris Mattmann [X](parquet) Roman Shaposhnik [X](parquet) Tom White
Parquet is a columnar storage format for Hadoop. Parquet has been incubating since 2014-05-20. Three most important issues to address in the move towards graduation: 1. Expanding the community and adding new committers 2. 1st release 3. Identifying how to ensure timely code reviews by committers Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? None at this time Latest Additions: * PMC addition: N/A * Contributor addition: N/A Issue backlog status since last report: * Created: 60 * Resolved: 17 Mailing list activity since last report: * dev 212 messages How has the project developed since the last report? * New commit workflow has been documented and commits have been increasing using the commit script. * Project website is posted: parquet.incubator.apache.org, working on moving more content from github hosting * Moved to issues.apache.org for all new issues * Planning first release of parquet-format and parquet-mr. Using parquet-format release to identify steps needed to release the larger projects (e.g., parquet-mr) * Adding documentation on reviews and contacts for specific modules Signed-off-by: [X](parquet) Jake Farrell [ ](parquet) Chris Mattmann [ ](parquet) Roman Shaposhnik [X](parquet) Tom White [X](parquet) Todd Lipcon Shepherd/Mentor notes: Mailing list has a healthy traffic, mostly bug reports. Mentors are active and participating in the community.
Parquet is a columnar storage format for Hadoop. Parquet has been incubating since 2014-05-20 . Three most important issues - Finish bootstrapping project(completed), IP clearance (completed), initial website (in progress) - Expanding the community and adding new committers - 1st release Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? - None at this time Latest Additions: * PMC addition: N/A * Contributor addition: N/A Issue backlog status since last report: * Created: 8 * Resolved: 2 Mailing list activity since last report: * @dev 69 messages How has the project developed since the last report? - All bootstrap tickets have been completed and status page updated - Mailing lists created, Jira setup, Code imported - Jira issues starting to be imported to issues.apache.org - Website in the works and will be available soon, infra for this is all ready setup - Working on documenting contributing guide and committers workflow - We have now setup the mechanisms to accept contributions through the Apache Github and have already accepted one external contribution. Date of last release: - No releases as of yet. Signed-off-by: [X](parquet) Todd Lipcon [X](parquet) Jake Farrell [ ](parquet) Chris Mattmann [X](parquet) Roman Shaposhnik [X](parquet) Tom White
Parquet is a columnar storage format for Hadoop. Parquet has been incubating since 2014-05-20 . Three most important issues - Finish bootstrapping project, IP clearance, initial website - Expanding the community and adding new committers - 1st release Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? - None at this time How has the community developed since the last report? - All initial committers have submitted ICLAs and the accounts have been created. The mailing lists have been setup and we are starting to use them for communication. How has the project developed since the last report? - We have setup the incubator status page and are waiting on the final SGA to be sent in to start the code import (INFRA-7782) Date of last release - No releases as of yet. Working through initial IP clearance. When were the last committers or PMC members elected? - N/A, still bootstrapping the project. Signed-off-by: [ ](parquet) Todd Lipcon [X](parquet) Jake Farrell [ ](parquet) Chris Mattmann [X](parquet) Roman Shaposhnik [X](parquet) Tom White