Skip to Main Content
Apache Events The Apache Software Foundation
Apache 20th Anniversary Logo

This was extracted (@ 2024-08-21 21:10) from a list of minutes which have been approved by the Board.
Please Note The Board typically approves the minutes of the previous meeting at the beginning of every Board meeting; therefore, the list below does not normally contain details from the minutes of the most recent Board meeting.

WARNING: these pages may omit some original contents of the minutes.
This is due to changes in the layout of the source minutes over the years. Fixes are being worked on.

Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).

DataFusion

17 Jul 2024 [Andrew Lamb / Justin]

## Description:
The mission of Apache DataFusion is the creation and maintenance of software
related to an extensible query engine

## Project Status:
Current project status: New + Ongoing (high activity)
Issues for the board: None


## Membership Data:
Apache DataFusion was founded 2024-04-16 (3 months ago)
There are currently 33 committers and 13 PMC members in this project.
The Committer-to-PMC ratio is roughly 9:4.

Community changes, past month:
- Mehmet Ozan Kabak was added to the PMC on 2024-06-12
- Ruihang Xia was added to the PMC on 2024-06-12
- Lewis Zhang was added as committer on 2024-06-14


## Project Activity:

The project continues to be quite active with many PRs and issues opened and
closed per day.

We started working on a project blog [1] (previously we used the arrow blog)
and hope to have our first blog post as an independent project later this
month.

There was a well attended face to face meetup in San Francisco, CA USA in June
[2]. We have one planned for Hangzhou, China in July[3]. There appears
 significant interest in these events and there are at least 2 more planned
 for September in New York, NY USA and in Belgrade, Serbia

The community around DataFusion is growing too. For example, Spice AI has made
an initial contribution of TableProviders to datafusion-contrib [4] for
PostgreSQL, MySQL, DuckDB, and SQLite, enabling these data sources to be
easily queried through DataFusion.

[1]: https://datafusion.apache.org/blog/
[2]: https://github.com/apache/datafusion/discussions/10800
[3]: https://github.com/apache/datafusion/discussions/10341
#discussioncomment-9738748
[4]: https://github.com/datafusion-contrib/datafusion-table-providers

### DataFusion core
https://github.com/apache/datafusion

We released version 39.0.0, continuing our schedule of monthly releases and
are on track to release version 40.0.0 in the next day or two.

Some projects we have been working on recently involve adding support for more
flexible use of Parquet files including indexing and extracting statistics. We
are also working with the community to make extending SQL planning[2] easier
and extending file format support[3], as well as fixing bugs found with a SQL
fuzzer[4], and improving performance with StringView[5].

It has been nice to see several good examples of cross contributor/company
collaboration such as [6] and [7].

We have also been making external presentations[1]

[1]: https://github.com/apache/datafusion/issues/10969
[2]: https://github.com/apache/datafusion/issues/10534
[3]: https://github.com/apache/datafusion/pull/11060
[4]: https://github.com/apache/datafusion/issues/11030
[5]: https://github.com/apache/datafusion/issues/10918
[6]: https://github.com/apache/datafusion/pull/11203
[7]: https://github.com/apache/datafusion/issues/10534

### Sub project: DataFusion Python

https://github.com/apache/datafusion-python

The DataFusion Python project continues to receive updates as new versions of
the core DataFusion project are released. There have also been some minor
improvements to improve user experience.


### Sub project: DataFusion Comet

https://github.com/apache/datafusion-comet

The Comet project is very active and is working towards an initial 0.1.0
source release. Initial benchmark results were published to
https://datafusion.apache.org/comet/contributor-guide/benchmarking.html.


### Sub project: DataFusion Ballista
https://github.com/apache/datafusion-ballista
https://github.com/apache/datafusion-ballista-python

The Ballista subproject is not very actively maintained, but there have been
some contributions recently to upgrade to more recent versions of the core
DataFusion project.

### Recent Releases
* PYTHON-39.0.0 was released on 2024-07-02.
* 39.0.0 was released on 2024-06-10.
* PYTHON-38.0.1 was released on 2024-05-30.
* PYTHON-37.1.0 was released on 2024-05-13.
* 38.0.0 was released on 2024-05-10.


## Community Health:
Community health is good -- we recently hit the 600 total contributors mark
according to Github. This number is partially inflated from initially
being part of the Arrow mono repo but the trend is healthy non the less.

It is hard to keep track of everything going on these days, which is a good
thing. While it is always a struggle to get enough code review, the
committers keep things going and the community helps each other out with
reviews.

19 Jun 2024 [Andrew Lamb / Sander]

## Description:
The mission of Apache DataFusion is the creation and maintenance of software
related to an extensible query engine

## Project Status:
Current project status: New + Ongoing (high activity)
Issues for the board: None


## Membership Data:
Apache DataFusion was founded 2024-04-16 (2 months ago)
There are currently 32 committers and 13 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- Ruihang Xia was added to the PMC on 2024-06-13
- Mehmet Ozan Kabak was added to the PMC on 2024-06-13
- Mustafa Akur was added to the PMC on 2024-05-09
- Oleks V. was added to the PMC on 2024-05-09

## Project Activity:

The project continues to be quite active with many PRs and issues opened and
closed per day.

We have mostly completed tasks related to becoming a new top level project
including an ASF press release[0] the new top level project and document ing
more thoroughly the process of inviting new committers and PMC members[1].

We also began discussing adopting the sql parser into the DataFusion ASF
governance process[2].

There are also several regional meetups planned: in San Francisco in June and
in China in July.

[0]: https://news.apache.org/foundation/entry/
 apache-software-foundation-announces-new-top-level-project-apache-datafusion
[1]: https://github.com/apache/datafusion/pull/10778
[2]: https://github.com/sqlparser-rs/sqlparser-rs/issues/1294


### DataFusion core
https://github.com/apache/datafusion

We made our first successful release as a new project, version 38.0.0

In addition to the work related to moving to a top-level project, the
community continues to work on making logical planning faster, making function
packages (i.e. UDFs) modular and easier to mix/match, and “de-parsing” logical
plan expressions back to SQL, and improve type coercion.

Recently there has been renewed interest in reading parquet files and creating
secondary indexes.


### Sub project: DataFusion Python
https://github.com/apache/datafusion-python

The DataFusion Python subproject has become more active since the last board
report with contributions from several contributors. Version 37 was released,
and version 38 is in the process of being released


### Sub project: DataFusion Comet
https://github.com/apache/datafusion-comet

The Comet subproject has had face to face sync meetings which are recorded[1].

[1] https://lists.apache.org/thread/9kqxkpwxf4oxonfboyfh8j6ko7r3fb3z

The Comet subproject is very active and is receiving significant contributions
from new contributors. There is some initial documentation published at
https://datafusion.apache.org/comet/.


### Sub project: DataFusion Ballista
https://github.com/apache/datafusion-ballista
https://github.com/apache/datafusion-ballista-python

The Ballista subproject is not currently actively maintained.

### Recent Releases
* PYTHON-38.0.1 was released on 2024-05-30.
* PYTHON-37.1.0 was released on 2024-05-13.
* 38.0.0 was released on 2024-05-10.

## Community Health:

We have added several new committers and PMC members (see above) in the last
month, and we expect to continue to do so regularly. While it would always be
nice to have more bandwidth to devote to PMC activities, we are currently
doing well.

While most communications still happen through github, the mailing lists are
now fully active, as reflected in their metrics:

* dev@datafusion.apache.org had a big increase in traffic in the past quarter
 (71 emails compared to 0)
* github@datafusion.apache.org had a big increase in traffic in the past
 quarter (7685 emails compared to 0)

15 May 2024 [Andrew Lamb / Jeff]

## Description:
The mission of Apache DataFusion is the creation and maintenance of software
related to an extensible query engine

## Project Status:
Current project status: New + Ongoing (high activity)
Issues for the board: None

## Membership Data:
Apache DataFusion was founded 2024-04-16 (20 days ago)
There are currently 29 committers and 9 PMC members in this project.
The Committer-to-PMC ratio is roughly 8:3.

Community changes, past quarter:
- No new PMC members (project graduated recently).
- Mustafa Akur was added as committer on 2024-04-20
- Brent Gardner was added as committer on 2024-04-20
- Oleks V. was added as committer on 2024-04-20
- Jay Zhan was added as committer on 2024-04-20
- Jeffrey Vo was added as committer on 2024-04-20
- Liu Jiayu was added as committer on 2024-04-20
- Metehan Yildirim was added as committer on 2024-04-20
- Wang Mingming was added as committer on 2024-04-20
- Marco Neumann was added as committer on 2024-04-20
- Zhong Yanghong was added as committer on 2024-04-20
- Mehmet Ozan Kabak was added as committer on 2024-04-20
- Paddy Horan was added as committer on 2024-04-20
- Rémi Dettai was added as committer on 2024-04-20
- Sun Chao was added as committer on 2024-04-20
- Daniel Harris was added as committer on 2024-04-20
- Raphael Taylor-Davies was added as committer on 2024-04-20
- Ruihang Xia was added as committer on 2024-04-20
- Xudong Wang was added as committer on 2024-04-20
- Yang Jiang was added as committer on 2024-04-20
- Yijie Shen was added as committer on 2024-04-20

## Project Activity:

The project is quite active with many PRs and issues opened and closed per
day. We have spent significant time on tasks related to becoming a new top
level project.

DataFusion became its own top level project after operating as a subproject of
Apache Arrow for several years.

We have been focused on the [tasks] required to operate as our own project,
largely logistical such as updating documentation, creating mailing lists, and
a [DOAP file].

[tasks]: https://github.com/apache/datafusion/issues/9691
[DOAP file]:  https://projects.apache.org/project.html?datafusion


### DataFusion core
https://github.com/apache/datafusion

In addition to the work related to moving to a top-level project, the
community is focused on making logical planning faster, making function
packages (i.e. UDFs) modular and easier to mix/match, and “de-parsing” logical
plan expressions back to SQL.

We are preparing the first release as a new project, version 38.0.0

For the DataFusion repo since 2024-04-16, as of 2024-05-07:

132 commits[1] 46 code contributors[2] 168 PRs opened on GitHub[3] 187 PRs
closed on GitHub[4] 130 issues opened on GitHub[5] 94 issues closed on
GitHub[6]


[1]: git log --since="2024-04-16" --pretty=format:"%h" | wc -l
[2]: git shortlog -sn --since="2024-04-16" | wc -l
[3]: https://s.apache.org/x5gkj
[4]: https://s.apache.org/rg9op
[5]: https://s.apache.org/sqlun
[6]: https://s.apache.org/l3clf


### Sub project: DataFusion Python
https://github.com/apache/datafusion-python

The DataFusion Python subproject is not currently actively maintained and
there has been no release yet to upgrade to DataFusion version 37 or to
prepare for the upcoming DataFusion 38 release.

### Sub project: DataFusion Comet
https://github.com/apache/datafusion-comet

The Comet subproject is very active and is receiving significant contributions
from new contributors. There is some initial documentation published at
https://datafusion.apache.org/comet/.


### Sub project: DataFusion Ballista
https://github.com/apache/datafusion-ballista
https://github.com/apache/datafusion-ballista-python

The Ballista subproject is not currently actively maintained.

### Recent Releases

* 37.1.0 was released on 2024-04-22
* 37.0.0 was released on 2024-04-05


## Community Health:
Overall, the community seems excited by becoming a new top level
projectand contributions continue to arrive and activity on the
project continues.  We have not made any significant change
in day to day operations, and don’t  have any plans to do so
at the moment.

The PMC lists are now set up and we are actively discussing
growing committers and the PMC. We expect both of these groups
to grow in the  near future.

In the last 6 months or so, it has been hard to discuss potential
committers within the Arrow PMC as many contributors focused
almost exclusively on DataFusion and did not also have substantial
contributions to Arrow  (which was more common earlier in the
project's life).

We have also created a [Governance Page] to maintain project
transparency, largely based on the content from the Arrow project.

[Governance Page]: https://s.apache.org/98bwp

17 Apr 2024

Establish the Apache DataFusion Project

 WHEREAS, the Board of Directors deems it to be in the best interests
 of the Foundation and consistent with the Foundation's purpose to
 establish a Project Management Committee charged with the creation and
 maintenance of open-source software, for distribution at no charge to
 the public, related to an extensible query engine.

 NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
 (PMC), to be known as the "Apache DataFusion Project", be and hereby
 is established pursuant to Bylaws of the Foundation; and be it
 further

 RESOLVED, that the Apache DataFusion project be and hereby is
 responsible for the creation and maintenance of software related to an
 extensible query engine; and be it further

 RESOLVED, that the office of "Vice President, Apache DataFusion" be
 and hereby is created, the person holding such office to serve at the
 direction of the Board of Directors as the chair of the Apache
 DataFusion Project, and to have primary responsibility for management
 of the projects within the scope of responsibility of the Apache
 DataFusion Project; and be it further

 RESOLVED, that the persons listed immediately below be and hereby are
 appointed to serve as the initial members of the Apache DataFusion
 Project:

 * Andrew Grove <agrove@apache.org>
 * Andrew Lamb <alamb@apache.org>
 * Daniël Heres <dheres@apache.org>
 * Jie Wen <jakevin@apache.org>
 * Kun Liu <liukun@apache.org>
 * L. C. Hsieh <viirya@apache.org>
 * QP Hou <houqp@apache.org>
 * Wes McKinney <wesm@apache.org>
 * Will Jones <wjones127@apache.org>

 NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb be appointed
 to the office of Vice President, Apache DataFusion, to serve in
 accordance with and subject to the direction of the Board of Directors
 and the Bylaws of the Foundation until death, resignation, retirement,
 removal or disqualification, or until a successor is appointed; and be
 it further

 RESOLVED, that the Apache DataFusion Project be and hereby is tasked
 with the migration and rationalization of the Apache Arrow DataFusion
 sub-project; and be it further

 RESOLVED, that all responsibilities pertaining to the Apache Arrow
 DataFusion sub-project encumbered upon the Apache Arrow Project are
 hereafter discharged.

 Special Order 7C, Establish the Apache DataFusion Project, was
 approved by Unanimous Vote of the directors present.