This was extracted (@ 2024-12-18 22:10) from a list of minutes
which have been approved by the Board.
Please Note
The Board typically approves the minutes of the previous meeting at the
beginning of every Board meeting; therefore, the list below does not
normally contain details from the minutes of the most recent Board meeting.
WARNING: these pages may omit some original contents of the minutes.
Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).
Report was filed, but display is awaiting the approval of the Board minutes.
## Description: The mission of Apache DataFusion is the creation and maintenance of software related to an extensible query engine ## Project Status: Current project status: New + Ongoing (high activity) Issues for the board: None ## Membership Data: Apache DataFusion was founded 2024-04-16 (5 months ago) There are currently 37 committers and 14 PMC members in this project. The Committer-to-PMC ratio is roughly 5:2. Community changes, past quarter: - Jay Zhan was added to the PMC on 2024-08-11 - Mehmet Ozan Kabak was added to the PMC on 2024-06-12 - Ruihang Xia was added to the PMC on 2024-06-12 - Berkay Şahin was added as committer on 2024-08-28 - Eduard Karacharov was added as committer on 2024-08-14 - Lewis Zhang was added as committer on 2024-06-14 - Tim Saucer was added as committer on 2024-09-07 - Weijun Huang was added as committer on 2024-08-27 ## Project Activity: The project continues to be active with many PRs and issues opened and closed per day. We wrote two public blogs about our work: [1], [2] and DataFusion and systems built on it are being featured in high profile (for the Database world) venues such as the [CMU Database Systems Seminar] [1]: https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/ [2]: https://datafusion.apache.org/blog/2024/07/20/datafusion-comet-0.1.0/ [CMU Database Systems Seminar]: https://db.cs.cmu.edu/seminar2024/ We are working to [adopt] the sqlparser crate into the project as well [adopt]: https://github.com/sqlparser-rs/sqlparser-rs/issues/1294 ### DataFusion core https://github.com/apache/datafusion We continue the monthly release cadence versions 40.0.0, 41.0.0 and are on track for version 42.0.0. The [41.0.0 release] had almost 70 unique contributors. We are currently focused on performance including for high cardinality aggregates and adding support for StringViewArrays. We completed a long running project to ensure all aggregate functions use the same API and are beginning the same project for window functions. We have been [discussing what [features] to include, and working to add LogicalTypes, as well as to create a more differentiated CLI experience. See the [roadmap ticket] for more details. [features]: https://github.com/apache/datafusion/issues/12357 [roadmap ticket]: https://github.com/apache/datafusion/issues/11442 ### Sub project: DataFusion Python https://github.com/apache/datafusion-python The DataFusion Python project has received significant contributions recently to make the project more “Pythonic” and now has regular activity from maintainers. Tim Saucer has been added as a committer who focuses more heavily on datafusion-python. ### Sub project: DataFusion Comet https://github.com/apache/datafusion-comet The Comet project is very active and recently released its initial 0.1.0 source release. Blog post: https://datafusion.apache.org/blog/2024/07/20/datafusion-comet-0.1.0/ ### Sub project: DataFusion Ballista https://github.com/apache/datafusion-ballista https://github.com/apache/datafusion-ballista-python The Ballista subproject is not very actively maintained, but there have been some contributions recently to upgrade to more recent versions of the core DataFusion project. ### Recent Releases * COMET-0.2.0 was released on 2024-08-28. * 41.0.0 was released on 2024-08-11. * PYTHON-39.0.0 was released on 2024-07-02. * 39.0.0 was released on 2024-06-10. * PYTHON-38.0.1 was released on 2024-05-30. * PYTHON-37.1.0 was released on 2024-05-13. * 38.0.0 was released on 2024-05-10. ## Community Health: It is still hard to keep track of everything going on these, which is a good thing. While it is always a struggle to get enough code review capacity, the committers keep things going and the community helps each other out with reviews. We continue to actively grow our committer and PMC ranks. There are currently four meetups planned: New York City, San Francisco (for the second time!) Belgrade, and Seattle.
## Description: The mission of Apache DataFusion is the creation and maintenance of software related to an extensible query engine ## Project Status: Current project status: New + Ongoing (high activity) Issues for the board: None ## Membership Data: Apache DataFusion was founded 2024-04-16 (3 months ago) There are currently 33 committers and 13 PMC members in this project. The Committer-to-PMC ratio is roughly 9:4. Community changes, past month: - Mehmet Ozan Kabak was added to the PMC on 2024-06-12 - Ruihang Xia was added to the PMC on 2024-06-12 - Lewis Zhang was added as committer on 2024-06-14 ## Project Activity: The project continues to be quite active with many PRs and issues opened and closed per day. We started working on a project blog [1] (previously we used the arrow blog) and hope to have our first blog post as an independent project later this month. There was a well attended face to face meetup in San Francisco, CA USA in June [2]. We have one planned for Hangzhou, China in July[3]. There appears significant interest in these events and there are at least 2 more planned for September in New York, NY USA and in Belgrade, Serbia The community around DataFusion is growing too. For example, Spice AI has made an initial contribution of TableProviders to datafusion-contrib [4] for PostgreSQL, MySQL, DuckDB, and SQLite, enabling these data sources to be easily queried through DataFusion. [1]: https://datafusion.apache.org/blog/ [2]: https://github.com/apache/datafusion/discussions/10800 [3]: https://github.com/apache/datafusion/discussions/10341 #discussioncomment-9738748 [4]: https://github.com/datafusion-contrib/datafusion-table-providers ### DataFusion core https://github.com/apache/datafusion We released version 39.0.0, continuing our schedule of monthly releases and are on track to release version 40.0.0 in the next day or two. Some projects we have been working on recently involve adding support for more flexible use of Parquet files including indexing and extracting statistics. We are also working with the community to make extending SQL planning[2] easier and extending file format support[3], as well as fixing bugs found with a SQL fuzzer[4], and improving performance with StringView[5]. It has been nice to see several good examples of cross contributor/company collaboration such as [6] and [7]. We have also been making external presentations[1] [1]: https://github.com/apache/datafusion/issues/10969 [2]: https://github.com/apache/datafusion/issues/10534 [3]: https://github.com/apache/datafusion/pull/11060 [4]: https://github.com/apache/datafusion/issues/11030 [5]: https://github.com/apache/datafusion/issues/10918 [6]: https://github.com/apache/datafusion/pull/11203 [7]: https://github.com/apache/datafusion/issues/10534 ### Sub project: DataFusion Python https://github.com/apache/datafusion-python The DataFusion Python project continues to receive updates as new versions of the core DataFusion project are released. There have also been some minor improvements to improve user experience. ### Sub project: DataFusion Comet https://github.com/apache/datafusion-comet The Comet project is very active and is working towards an initial 0.1.0 source release. Initial benchmark results were published to https://datafusion.apache.org/comet/contributor-guide/benchmarking.html. ### Sub project: DataFusion Ballista https://github.com/apache/datafusion-ballista https://github.com/apache/datafusion-ballista-python The Ballista subproject is not very actively maintained, but there have been some contributions recently to upgrade to more recent versions of the core DataFusion project. ### Recent Releases * PYTHON-39.0.0 was released on 2024-07-02. * 39.0.0 was released on 2024-06-10. * PYTHON-38.0.1 was released on 2024-05-30. * PYTHON-37.1.0 was released on 2024-05-13. * 38.0.0 was released on 2024-05-10. ## Community Health: Community health is good -- we recently hit the 600 total contributors mark according to Github. This number is partially inflated from initially being part of the Arrow mono repo but the trend is healthy non the less. It is hard to keep track of everything going on these days, which is a good thing. While it is always a struggle to get enough code review, the committers keep things going and the community helps each other out with reviews.
## Description: The mission of Apache DataFusion is the creation and maintenance of software related to an extensible query engine ## Project Status: Current project status: New + Ongoing (high activity) Issues for the board: None ## Membership Data: Apache DataFusion was founded 2024-04-16 (2 months ago) There are currently 32 committers and 13 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - Ruihang Xia was added to the PMC on 2024-06-13 - Mehmet Ozan Kabak was added to the PMC on 2024-06-13 - Mustafa Akur was added to the PMC on 2024-05-09 - Oleks V. was added to the PMC on 2024-05-09 ## Project Activity: The project continues to be quite active with many PRs and issues opened and closed per day. We have mostly completed tasks related to becoming a new top level project including an ASF press release[0] the new top level project and document ing more thoroughly the process of inviting new committers and PMC members[1]. We also began discussing adopting the sql parser into the DataFusion ASF governance process[2]. There are also several regional meetups planned: in San Francisco in June and in China in July. [0]: https://news.apache.org/foundation/entry/ apache-software-foundation-announces-new-top-level-project-apache-datafusion [1]: https://github.com/apache/datafusion/pull/10778 [2]: https://github.com/sqlparser-rs/sqlparser-rs/issues/1294 ### DataFusion core https://github.com/apache/datafusion We made our first successful release as a new project, version 38.0.0 In addition to the work related to moving to a top-level project, the community continues to work on making logical planning faster, making function packages (i.e. UDFs) modular and easier to mix/match, and “de-parsing” logical plan expressions back to SQL, and improve type coercion. Recently there has been renewed interest in reading parquet files and creating secondary indexes. ### Sub project: DataFusion Python https://github.com/apache/datafusion-python The DataFusion Python subproject has become more active since the last board report with contributions from several contributors. Version 37 was released, and version 38 is in the process of being released ### Sub project: DataFusion Comet https://github.com/apache/datafusion-comet The Comet subproject has had face to face sync meetings which are recorded[1]. [1] https://lists.apache.org/thread/9kqxkpwxf4oxonfboyfh8j6ko7r3fb3z The Comet subproject is very active and is receiving significant contributions from new contributors. There is some initial documentation published at https://datafusion.apache.org/comet/. ### Sub project: DataFusion Ballista https://github.com/apache/datafusion-ballista https://github.com/apache/datafusion-ballista-python The Ballista subproject is not currently actively maintained. ### Recent Releases * PYTHON-38.0.1 was released on 2024-05-30. * PYTHON-37.1.0 was released on 2024-05-13. * 38.0.0 was released on 2024-05-10. ## Community Health: We have added several new committers and PMC members (see above) in the last month, and we expect to continue to do so regularly. While it would always be nice to have more bandwidth to devote to PMC activities, we are currently doing well. While most communications still happen through github, the mailing lists are now fully active, as reflected in their metrics: * dev@datafusion.apache.org had a big increase in traffic in the past quarter (71 emails compared to 0) * github@datafusion.apache.org had a big increase in traffic in the past quarter (7685 emails compared to 0)
## Description: The mission of Apache DataFusion is the creation and maintenance of software related to an extensible query engine ## Project Status: Current project status: New + Ongoing (high activity) Issues for the board: None ## Membership Data: Apache DataFusion was founded 2024-04-16 (20 days ago) There are currently 29 committers and 9 PMC members in this project. The Committer-to-PMC ratio is roughly 8:3. Community changes, past quarter: - No new PMC members (project graduated recently). - Mustafa Akur was added as committer on 2024-04-20 - Brent Gardner was added as committer on 2024-04-20 - Oleks V. was added as committer on 2024-04-20 - Jay Zhan was added as committer on 2024-04-20 - Jeffrey Vo was added as committer on 2024-04-20 - Liu Jiayu was added as committer on 2024-04-20 - Metehan Yildirim was added as committer on 2024-04-20 - Wang Mingming was added as committer on 2024-04-20 - Marco Neumann was added as committer on 2024-04-20 - Zhong Yanghong was added as committer on 2024-04-20 - Mehmet Ozan Kabak was added as committer on 2024-04-20 - Paddy Horan was added as committer on 2024-04-20 - Rémi Dettai was added as committer on 2024-04-20 - Sun Chao was added as committer on 2024-04-20 - Daniel Harris was added as committer on 2024-04-20 - Raphael Taylor-Davies was added as committer on 2024-04-20 - Ruihang Xia was added as committer on 2024-04-20 - Xudong Wang was added as committer on 2024-04-20 - Yang Jiang was added as committer on 2024-04-20 - Yijie Shen was added as committer on 2024-04-20 ## Project Activity: The project is quite active with many PRs and issues opened and closed per day. We have spent significant time on tasks related to becoming a new top level project. DataFusion became its own top level project after operating as a subproject of Apache Arrow for several years. We have been focused on the [tasks] required to operate as our own project, largely logistical such as updating documentation, creating mailing lists, and a [DOAP file]. [tasks]: https://github.com/apache/datafusion/issues/9691 [DOAP file]: https://projects.apache.org/project.html?datafusion ### DataFusion core https://github.com/apache/datafusion In addition to the work related to moving to a top-level project, the community is focused on making logical planning faster, making function packages (i.e. UDFs) modular and easier to mix/match, and “de-parsing” logical plan expressions back to SQL. We are preparing the first release as a new project, version 38.0.0 For the DataFusion repo since 2024-04-16, as of 2024-05-07: 132 commits[1] 46 code contributors[2] 168 PRs opened on GitHub[3] 187 PRs closed on GitHub[4] 130 issues opened on GitHub[5] 94 issues closed on GitHub[6] [1]: git log --since="2024-04-16" --pretty=format:"%h" | wc -l [2]: git shortlog -sn --since="2024-04-16" | wc -l [3]: https://s.apache.org/x5gkj [4]: https://s.apache.org/rg9op [5]: https://s.apache.org/sqlun [6]: https://s.apache.org/l3clf ### Sub project: DataFusion Python https://github.com/apache/datafusion-python The DataFusion Python subproject is not currently actively maintained and there has been no release yet to upgrade to DataFusion version 37 or to prepare for the upcoming DataFusion 38 release. ### Sub project: DataFusion Comet https://github.com/apache/datafusion-comet The Comet subproject is very active and is receiving significant contributions from new contributors. There is some initial documentation published at https://datafusion.apache.org/comet/. ### Sub project: DataFusion Ballista https://github.com/apache/datafusion-ballista https://github.com/apache/datafusion-ballista-python The Ballista subproject is not currently actively maintained. ### Recent Releases * 37.1.0 was released on 2024-04-22 * 37.0.0 was released on 2024-04-05 ## Community Health: Overall, the community seems excited by becoming a new top level projectand contributions continue to arrive and activity on the project continues. We have not made any significant change in day to day operations, and don’t have any plans to do so at the moment. The PMC lists are now set up and we are actively discussing growing committers and the PMC. We expect both of these groups to grow in the near future. In the last 6 months or so, it has been hard to discuss potential committers within the Arrow PMC as many contributors focused almost exclusively on DataFusion and did not also have substantial contributions to Arrow (which was more common earlier in the project's life). We have also created a [Governance Page] to maintain project transparency, largely based on the content from the Arrow project. [Governance Page]: https://s.apache.org/98bwp
WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to an extensible query engine. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache DataFusion Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache DataFusion project be and hereby is responsible for the creation and maintenance of software related to an extensible query engine; and be it further RESOLVED, that the office of "Vice President, Apache DataFusion" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache DataFusion Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache DataFusion Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache DataFusion Project: * Andrew Grove <agrove@apache.org> * Andrew Lamb <alamb@apache.org> * Daniël Heres <dheres@apache.org> * Jie Wen <jakevin@apache.org> * Kun Liu <liukun@apache.org> * L. C. Hsieh <viirya@apache.org> * QP Hou <houqp@apache.org> * Wes McKinney <wesm@apache.org> * Will Jones <wjones127@apache.org> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb be appointed to the office of Vice President, Apache DataFusion, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the Apache DataFusion Project be and hereby is tasked with the migration and rationalization of the Apache Arrow DataFusion sub-project; and be it further RESOLVED, that all responsibilities pertaining to the Apache Arrow DataFusion sub-project encumbered upon the Apache Arrow Project are hereafter discharged. Special Order 7C, Establish the Apache DataFusion Project, was approved by Unanimous Vote of the directors present.