Skip to Main Content
Apache Events The Apache Software Foundation
Apache 20th Anniversary Logo

This was extracted (@ 2024-04-17 22:10) from a list of minutes which have been approved by the Board.
Please Note The Board typically approves the minutes of the previous meeting at the beginning of every Board meeting; therefore, the list below does not normally contain details from the minutes of the most recent Board meeting.

WARNING: these pages may omit some original contents of the minutes.
This is due to changes in the layout of the source minutes over the years. Fixes are being worked on.

Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).

DataFu

17 Apr 2024 [Eyal Allweil / Craig]

Report was filed, but display is awaiting the approval of the Board minutes.

17 Jan 2024 [Eyal Allweil / Sander]

## Description:
The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems

## Project Status:
Current project status: Between Ongoing and Dormant
Issues for the board: None

## Membership Data:
Apache DataFu was founded 2018-02-21 (6 years ago)
There are currently 19 committers and 12 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Ohad Raviv on 2023-02-06.
- No new committers. Last addition was Ohad Raviv on 2019-07-27.

## Project Activity:
We are in the process of releasing DATAFU-SPARK-2.0.0 (vote called and
approved, though still ongoing)

## Community Health:
In order to increase participation in the project, once we complete the
release we will publish a blog entry to publicize it and encourage the new
version's use and new contributors. Usually we do get an uptick of interest
after a release, and this one is a major release.

We are also reaching out to past non-committer contributors to see if any
would like to step up to becoming committers. There is at least one promising
candidate.

18 Oct 2023 [Eyal Allweil / Craig]

## Description:
The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems

## Project Status:
Current project status: Between Ongoing and Dormant
Issues for the board: none

## Membership Data:
Apache DataFu was founded 2018-02-21 (6 years ago)
There are currently 19 committers and 12 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Ohad Raviv on 2023-02-06.
- No new committers. Last addition was Ohad Raviv on 2019-07-27.

## Project Activity
There's been progress in supporting Spark 3. We will probably release DataFu
Spark 2.0.0, which will include this, in this next quarter.

## Community Health:
All the work committed in this quarter was to support Spark 3, including one
new contributor.

19 Jul 2023 [Eyal Allweil / Justin]

## Description:
The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems

## Project Status:
Current project status: low - we we released all our pending changes, and will
start working on a new version now (our first to support Spark 3.x).

Issues for the board: none

## Membership Data:
Apache DataFu was founded 2018-02-21 (5 years ago)
There are currently 19 committers and 12 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Ohad Raviv on 2023-02-06.
- No new committers. Last addition was Ohad Raviv on 2019-07-27.

## Project Activity:
DataFu-Spark 1.8.0 was just released, on 2023-07-08.

## Community Health:
Not much going on, but hopefully the community will join in in the tasks
necessary for our next release (some of which already exist as Jira tasks)

19 Apr 2023 [Eyal Allweil / Sharan]

## Description:
The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache DataFu was founded 2018-02-21 (5 years ago)
There are currently 19 committers and 12 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- Ohad Raviv was added to the PMC on 2023-02-06
- No new committers. Last addition was Ohad Raviv on 2019-07-27.

## Project Activity:
DataFu-Spark 1.7.0 was just released, on 2023-01-22. We are planning another
non-backwards-compatible release soon.

## Community Health:
Relatively light activity this quarter. There is a small backlog of submitted
content that we want to release, but no new contributions.

15 Feb 2023 [Eyal Allweil / Sam]

## Description:
The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache DataFu was founded 2018-02-21 (5 years ago)
There are currently 19 committers and 11 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Ohad Raviv, a committer, agreed to join the PMC today, on 2023-02-05.
No new committers. Last addition was Ohad on 2019-07-27.

## Project Activity:
DataFu-Spark 1.7.0 was just released, on 2023-01-22. Now that nearly all the
recent contributions are released, we are planning another release relatively
soon with features that broke support for older Spark versions.

## Community Health:
The past few months have seen some more activity than usual; we are hoping
that this release helps us to build more momentum.

18 Jan 2023 [Eyal Allweil / Christofer]

No report was submitted.

19 Oct 2022 [Eyal Allweil / Rich]

## Description:
The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache DataFu was founded 2018-02-21 (5 years ago)
There are currently 19 committers and 11 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Casey Stella on 2018-02-21.
- No new committers. Last addition was Ohad Raviv on 2019-07-27.

## Project Activity:
A number of open issues were closed, including the log4j upgrade (we weren't
using a vulnerable version). We are working on expanding our support for newer
Spark versions. We are delaying the next release as to include this, it seems
worth waiting for.

The last release, 1.6.1 was released on 2021-10-11.

## Community Health:
We added some "newbie" and "good first contribution" tags to some open issues,
and this seems to have successfully encouraged some first-time contributions.

20 Jul 2022 [Eyal Allweil / Christofer]

## Description:
The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache DataFu was founded 2018-02-21 (4 years ago)
There are currently 19 committers and 11 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Casey Stella on 2018-02-21.
- No new committers. Last addition was Ohad Raviv on 2019-07-27.

## Project Activity:
The last release, 1.6.1 was released on 2021-10-11. We recently received a
number of contributions which we plan to incorporate into an imminent release.

## Community Health:
The past few months have finally seen some significant contributions. Although
not all of them will ultimately be merged, enough look promising that we can
use them for our next release. We hope that this trend will continue.

20 Apr 2022 [Eyal Allweil / Sam]

## Description:
The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache DataFu was founded 2018-02-21 (4 years ago)
There are currently 19 committers and 11 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Casey Stella on 2018-02-21.
- No new committers. Last addition was Ohad Raviv on 2019-07-27.

## Project Activity:
The last release was 1.6.1, on 2021-10-1.

Minor security fixes done using the GitHub dependabot tool.

## Community Health:
Still very light community activty. We're trying to find away to increase this.

19 Jan 2022 [Eyal Allweil / Roman]

## Description:
The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache DataFu was founded 2018-02-21 (4 years ago)
There are currently 19 committers and 11 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Casey Stella on 2018-02-21.
- No new committers. Last addition was Ohad Raviv on 2019-07-27.

## Project Activity:
Some work done on adding static code analysis to our build (via GitHub
workflows). We verified that the project is not affected by the log4j
vulnerability but are planning on updating to versions anyway.

## Community Health:
Still relatively light activity, but a few contributions began or were
completed in this quarter.

20 Oct 2021 [Eyal Allweil / Sam]

## Description:
The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache DataFu was founded 2018-02-21 (4 years ago)
There are currently 19 committers and 11 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Casey Stella on 2018-02-21.
- No new committers. Last addition was Ohad Raviv on 2019-07-27.

## Project Activity:
We just released version 1.6.1 on 2021-10-11. This release contains one new
feature and some build improvements.

## Community Health:
There have been some minor contributions from first-time contributors (some of
which are still under review), but overall activity is still pretty light.

21 Jul 2021 [Eyal Allweil / Sander]

## Description:
The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache DataFu was founded 2018-02-21 (3 years ago)
There are currently 19 committers and 11 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Casey Stella on 2018-02-21.
- No new committers. Last addition was Ohad Raviv on 2019-07-27.

## Project Activity:
The last version, 1.6.0 was released on 2020-03-31. We will probably do a
minor release in this next quarter.

A blog post describing datafu-spark was published on the "Technology at
PayPal" blog on 2021-07-13:

https://medium.com/paypal-tech/introducing-datafu-spark-ba67faf1933a

We will publish a variant of this post on our own DataFu blog in the next few
days. We hope that these posts will draw attention to datafu-spark and draw
contributions and contributors.

## Community Health:
2 issues opened in JIRA, past quarter.

As written above, I hope that the blog post introducing the new module
released last year will kindle renewed interest in our project, since it
represents the addition of Spark (a newer technology) to make up for the
dwindling interest in the older modules.

19 May 2021

Change the Apache DataFu Project Chair

 WHEREAS, the Board of Directors heretofore appointed Matthew Hayes
 (mhayes) to the office of Vice President, Apache DataFu, and

 WHEREAS, the Board of Directors is in receipt of the resignation of
 Matthew Hayes from the office of Vice President, Apache DataFu, and

 WHEREAS, the Project Management Committee of the Apache DataFu project
 has chosen by vote to recommend Eyal Allweil (eyal) as the successor to
 the post;

 NOW, THEREFORE, BE IT RESOLVED, that Matthew Hayes is relieved and
 discharged from the duties and responsibilities of the office of Vice
 President, Apache DataFu, and

 BE IT FURTHER RESOLVED, that Eyal Allweil be and hereby is appointed to
 the office of Vice President, Apache DataFu, to serve in accordance
 with and subject to the direction of the Board of Directors and the
 Bylaws of the Foundation until death, resignation, retirement, removal
 or disqualification, or until a successor is appointed.

 Special Order 7C, Change the Apache DataFu Project Chair, was
 approved by Unanimous Vote of the directors present.

21 Apr 2021 [Matthew Hayes / Roy]

## Description:

The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems.

## Issues:

There are no issues requiring board attention.

## Membership Data:

Apache DataFu was founded 2018-02-20 (3 years ago) There are currently 19
committers and 11 PMC members in this project. The Committer-to-PMC ratio is
roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Casey Stella on 2018-02-20.
- No new committers. Last addition was Ohad Raviv on 2019-07-26.

## Project Activity:

* Added support for newer versions of Gradle.

## Community Health:

There hasn't been very much activity during the past quarter, aside from the
Gradle fix above.

20 Jan 2021 [Matthew Hayes / Bertrand]

## Description:

The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems.

## Issues:

There are no issues requiring board attention.

## Membership Data:

Apache DataFu was founded 2018-02-20 (2 years ago) There are currently 19
committers and 11 PMC members in this project. The Committer-to-PMC ratio is
roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Casey Stella on 2018-02-20.
- No new committers. Last addition was Ohad Raviv on 2019-07-26.

## Project Activity:

* Spark explode array method added.

## Community Health:

* Eyal opened a discussion about whether to deprecate DataFu Pig and Hourglass
 libraries.

21 Oct 2020 [Matthew Hayes / Shane]

## Description:

The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems.

## Issues:

There are no issues requiring board attention.

## Membership Data:

Apache DataFu was founded 2018-02-20 (2 years ago) There are currently 19
committers and 11 PMC members in this project. The Committer-to-PMC ratio is
roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Casey Stella on 2018-02-20.
- No new committers. Last addition was Ohad Raviv on 2019-07-26.

## Project Activity:

Two JIRAs filed for more datafu-spark improvements.

## Community Health:

There hasn't been any community activity the past three months aside from
above JIRAs being filed.

15 Jul 2020 [Matthew Hayes / Shane]

## Description:

The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems.

## Issues:

There are no issues requiring board attention.

## Membership Data:

Apache DataFu was founded 2018-02-20 (2 years ago)
There are currently 19 committers and 11 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Casey Stella on 2018-02-20.
- No new committers. Last addition was Ohad Raviv on 2019-07-26.

## Project Activity:

There has been no project activity the past three months.

## Community Health:

There hasn't been any community activity the past three months.

15 Apr 2020 [Matthew Hayes / Sam]

## Description:

The mission of Apache DataFu is the creation and maintenance of software
related to well-tested libraries that help developers solve common data
problems in Hadoop and similar distributed systems

## Issues:

There are no issues requiring board attention.

## Membership Data:

Apache DataFu was founded 2018-02-20 (2 years ago) There are currently 19
committers and 11 PMC members in this project. The Committer-to-PMC ratio is
roughly 5:3.

Community changes, past quarter:
- No new PMC members. Last addition was Casey Stella on 2018-02-20.
- No new committers. Last addition was Ohad Raviv on 2019-07-26.

## Project Activity:

- Released DataFu 1.6.0 on March 30th, 2020, which includes the new Spark
 library.
- Added Python 3 support for Spark library.
- Updated website instructions for new release.
- Unit test logging improvements.

## Community Health:

- There was more email activity compared to previous quarter due to release
 preparation.
- Large increase in closed JIRAs and PRs is due to closing many old issues
 that hadn't had recent activity.

@Justin: pursue release policy issues with DataFu

15 Jan 2020 [Matthew Hayes / Shane]

## Description:

DataFu provides a collection of Hadoop MapReduce jobs and Pig UDFs to perform
data analysis. It provides functions for common statistics tasks
(e.g. quantiles, sampling), PageRank, stream sessionization, and set and bag
 operations. DataFu also provides Hadoop jobs for incremental data processing
 in MapReduce.  A new Spark package has recently been added.

## Issues:

There are no issues requiring board attention.

## Membership Data:

Apache DataFu was founded 2018-02-20 (2 years ago) There are currently 19
committers and 11 PMC members in this project. The Committer-to-PMC ratio is
roughly 5:3.

Community changes, past quarter:
- No new PMC members.
- No new committers.

## Project Activity:

- Wrote documentation and getting started guide for Apache DataFu Spark.
- Wrote documentation for using Macros, Sampling in DataFu Pig.

## Community Health:

There was less activity this quarter compared to previous quarters. Hopefully
this will change once we release the Spark library.

16 Oct 2019 [Matthew Hayes / Daniel]

## Description:

DataFu provides a collection of Hadoop MapReduce jobs and Pig UDFs to perform
data analysis. It provides functions for common statistics tasks
(e.g. quantiles, sampling), PageRank, stream sessionization, and set and bag
 operations. DataFu also provides Hadoop jobs for incremental data processing
 in MapReduce.  A new Spark package has recently been added.

## Issues:

There are no issues requiring board attention.

## Membership Data:

Apache DataFu was founded 2018-02-20 (2 years ago) There are currently 19
committers and 11 PMC members in this project. The Committer-to-PMC ratio is
roughly 5:3.

Community changes, past quarter:
- No new PMC members.
- Ohad Raviv was added as committer on 2019-07-26

## Project Activity:

- 1.5.0 was released on January 07 2019
- datafu-spark subproject merged into master on July 17 2019

## Community Health:

There were two commits to master in the last quarter, however one of those
commits was for the datafu-spark subproject, which was developed in a branch
consisting of many commits made over a few months that were squashed together.

@Ted: discuss commit squashing from outside branches and loss of code provenance

17 Jul 2019 [Matthew Hayes / Danny]

## Description:

 - DataFu provides a collection of Hadoop MapReduce jobs and Pig UDFs to
   perform data analysis. It provides functions for common statistics tasks
   (e.g. quantiles, sampling), PageRank, stream sessionization, and set and
   bag operations. DataFu also provides Hadoop jobs for incremental data
   processing in MapReduce.  A new Spark package is in development.

## Issues:

 - There are no issues requiring board attention at this time.

## Activity:

 - New spark package in development nearly ready to merge into master.

## PMC changes:

 - Currently 11 PMC members.
 - No new PMC members added in the last 3 months.
 - Last PMC addition predates incubator graduation in Feb 2018.

## Committer base changes:

 - Currently 18 committers.
 - No new changes to the committer base since last report.
 - Last committer addition predates incubator graduation in Feb 2018.

## Releases:

 - 1.5.0 was released on Mon Jan 07 2019

## Mailing list activity:

 - dev@datafu.apache.org:
    - 41 subscribers (down -1 in the last 3 months):
    - 73 emails sent to list (48 in previous quarter)

17 Apr 2019 [Matthew Hayes / Rich]

## Description:

 - DataFu provides a collection of Hadoop MapReduce jobs and Pig UDFs to
   perform data analysis. It provides functions for common statistics tasks
   (e.g. quantiles, sampling), PageRank, stream sessionization, and set and
   bag operations. DataFu also provides Hadoop jobs for incremental data
   processing in MapReduce.  A new Spark package is in development.

## Issues:

 - There are no issues requiring board attention at this time.

## Activity:

 - Reviewing and testing new Spark package in development.

## PMC changes:

 - Currently 11 PMC members.
 - No new PMC members added in the last 3 months.

## Committer base changes:

 - Currently 18 committers.
 - No new changes to the committer base since last report.

## Releases:

 - 1.5.0 was released on Mon Jan 07 2019

## Mailing list activity:

 - dev@datafu.apache.org:
    - 42 subscribers (up 0 in the last 3 months):
    - 48 emails sent to list (32 in previous quarter)

16 Jan 2019 [Matthew Hayes / Brett]

## Description:

 - DataFu provides a collection of Hadoop MapReduce jobs and Pig UDFs to
   perform data analysis. It provides functions for common statistics tasks
   (e.g. quantiles, sampling), PageRank, stream sessionization, and set and
    bag operations. DataFu also provides Hadoop jobs for incremental data
    processing in MapReduce.

## Issues:

 - There are no issues requiring board attention at this time.

## Activity:

 - Improvements and bug fixes for the new Spark subproject have beens been
   contributed (from Eyal Allweil and new contributor Ohad Raviv).
 - Left outer join and dedup macros were contributed.
 - Version 1.5.0 released.

## PMC changes: - Currently 11 PMC members.

 - No new PMC members added in the last 3 months
 - No new PMC members added since podling graduation in February 2018.
 - Last PMC member added during incubation was Eyal Allweil in July 2016.

## Committer base changes: - Currently 18 committers.

 - No new changes to the committer base since last report.
 - No new committers added since podling graduation in February 2018.
 - Last committer added during incubation was Eyal Allweil in July 2016.

## Releases:

- 1.5.0 was released on Mon Jan 07 2019

## JIRA activity:
 - 0 JIRA tickets created in the last 3 months
 - 7 JIRA tickets closed/resolved in the last 3 months

17 Oct 2018 [Matthew Hayes / Ted]

## Description:

 - DataFu provides a collection of Hadoop MapReduce jobs and Pig UDFs to
   perform data analysis. It provides functions for common statistics tasks
   (e.g. quantiles, sampling), PageRank, stream sessionization, and set and
   bag operations. DataFu also provides Hadoop jobs for incremental data
   processing in MapReduce.

## Issues:

 - There are no issues requiring board attention at this time.

## Activity:

 - Code for an initial Spark subproject has been contributed (from Eyal
   Allweil and new contributor Ohad Raviv).
 - Some work on adding a macro for deduping.
 - Javadoc updates.  Noted deprecated methods.

## PMC changes:

 - Currently 11 PMC members.
 - No new PMC members added in the last 3 months
 - No new PMC members added since podling graduation in February 2018.

## Committer base changes:

 - Currently 18 committers.
 - No new changes to the committer base since last report.
 - No new committers added since podling graduation in February 2018.

## Releases:

 - Last release was 1.4.0 on Wed Mar 21 2018

## JIRA activity:

 - 1 JIRA tickets created in the last 3 months
 - 2 JIRA tickets closed/resolved in the last 3 months

18 Jul 2018 [Matthew Hayes / Rich]

## Description:

 - DataFu provides a collection of Hadoop MapReduce jobs and Pig UDFs to
   perform data analysis. It provides functions
   for common statistics tasks (e.g. quantiles, sampling), PageRank, stream
   sessionization, and set and bag operations. DataFu also provides Hadoop jobs
   for incremental data processing in MapReduce.

## Issues:

 - There are no issues requiring board attention at this time.

## Activity:

 - Updated to compile with Java 8.
 - Updated Ruby gems used for website generation.
 - Upgraded build system to Gradle v4.8.1.
 - Added new macro.

## PMC changes:

 - Currently 11 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Casey Stella on Tue Feb 20 2018

## Committer base changes:

 - Currently 18 committers.
 - No new changes to the committer base since last report.

## Releases:

 - Last release was 1.4.0 on Wed Mar 21 2018

## JIRA activity:

 - 2 JIRA tickets created in the last 3 months
 - 3 JIRA tickets closed/resolved in the last 3 months

16 May 2018 [Matthew Hayes / Shane]

## Description:

 - DataFu provides a collection of Hadoop MapReduce jobs and Pig UDFs to
   perform data analysis. It provides functions for common statistics tasks
   (e.g. quantiles, sampling), PageRank, stream sessionization, and set and
   bag operations. DataFu also provides Hadoop jobs for incremental data
   processing in MapReduce.

## Issues:

 - There are no issues requiring board attention at this time.

## Activity:

 - No activity to report in the past month.

## PMC changes:

 - Currently 11 PMC members.
 - No new PMC members added in the last 3 months

## Committer base changes:

 - Currently 18 committers.
 - No new changes to the committer base since last report.

## Releases:

 - 1.4.0 was released on Wed Mar 21 2018

## JIRA activity:

 - 9 JIRA tickets created in the last 3 months
 - 11 JIRA tickets closed/resolved in the last 3 months

18 Apr 2018 [Matthew Hayes / Bertrand]

## Description:

 - DataFu provides a collection of Hadoop MapReduce jobs and Pig UDFs to
   perform data analysis. It provides functions for common statistics tasks
   (e.g. quantiles, sampling), PageRank, stream sessionization, and set and
    bag operations. DataFu also provides Hadoop jobs for incremental data
    processing in MapReduce.

## Issues:

 - There are no issues requiring board attention at this time.

## Activity:

 - Post-graduation work has been completed.
 - Released 1.4.0, the first release since graduating from incubator.

## PMC changes:

 - Currently 11 PMC members.
 - No new PMC members added in the last 3 months

## Committer base changes:

 - Currently 18 committers.
 - No changes (the PMC was established in the last 3 months)

## Releases:

 - 1.4.0 was released on Wed Mar 22 2018

## JIRA activity:

 - 13 JIRA tickets created in the last 3 months
 - 16 JIRA tickets closed/resolved in the last 3 months

21 Mar 2018 [Matthew Hayes / Bertrand]

## Description:

 - DataFu provides a collection of Hadoop MapReduce jobs and Pig UDFs to
   perform data analysis. It provides functions for common statistics tasks
   (e.g. quantiles, sampling), PageRank, stream sessionization, and set and
   bag operations. DataFu also provides Hadoop jobs for incremental data
   processing in MapReduce.

## Issues:

 - There are no issues requiring board attention at this time.

## Activity:

 - Much of the recent activity has focused on tasks related to incubator
   graduation.
 - A Download page (http://datafu.apache.org/docs/download.html) was added
   with clearer instructions for getting the most recent source release and
   validating it.
 - Infra set up the new domain for Apache DataFu: http://datafu.apache.org/
 - There is some upcoming work on other post-graduation items, such as
   updating the website to reflect being a TLP now, building artifacts without
   "incubating" in the name, etc.

## Health report:

 - A JIRA was filed from a new user.  It was found to not be an issue.
 - The last release was in January 2018 while still in incubation.  Planning
   to do a new release soon now that the project has graduated to TLP.

## PMC changes:

 - Currently 11 PMC members.
 - No new PMC members added in the last 3 months

## Committer base changes:

 - Currently 18 committers.
 - No changes (the PMC was established in the last 3 months)

## Releases:

 - No releases so far since graduating to TLP.
 - Last release during incubation was 1.3.3 on January 26th, 2018.

## JIRA activity:

 - 10 JIRA tickets created in the last 3 months
 - 11 JIRA tickets closed/resolved in the last 3 months

21 Feb 2018

Establish the Apache DataFu Project

 WHEREAS, the Board of Directors deems it to be in the best interests
 of the Foundation and consistent with the Foundation's purpose to
 establish a Project Management Committee charged with the creation and
 maintenance of open-source software, for distribution at no charge to
 the public, consisting of well-tested libraries that help developers
 solve common data problems in Hadoop and similar distributed systems.

 NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
 (PMC), to be known as the "Apache DataFu Project", be and hereby is
 established pursuant to Bylaws of the Foundation; and be it further

 RESOLVED, that the Apache DataFu Project be and hereby is responsible
 for the creation and maintenance of libraries that help solve common
 data problems and work with large-scale data in Hadoop and similar
 distributed systems; and be it further

 RESOLVED, that the office of
 Vice President, Apache DataFu be and hereby is created, the person
 holding such office to serve at the direction of the Board of
 Directors as the chair of the Apache DataFu Project, and to have
 primary responsibility for management of the projects within the scope
 of responsibility of the Apache DataFu Project; and be it further

 RESOLVED, that the persons listed immediately below be and hereby are
 appointed to serve as the initial members of the Apache DataFu
 Project:

   * Casey Stella <cestella@apache.org>
   * Evion Kim <evion@apache.org>
   * Eyal Allweil <eyal@apache.org>
   * Jarek Jarcec Cecho <jarcec@apache.org>
   * Josh Wills <jwills@apache.org>
   * Matthew Hayes <mhayes@apache.org>
   * Mitul Tiwari <mitultiwari@apache.org>
   * Roman Shaposhnik <rvs@apache.org>
   * Russell Jurney <rjurney@apache.org>
   * Sam Shah <samshah@apache.org>
   * William Vaughan <wvaughan@apache.org>

 NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matthew Hayes be
 appointed to the office of Vice President, Apache DataFu, to serve in
 accordance with and subject to the direction of the Board of Directors
 and the Bylaws of the Foundation until death, resignation, retirement,
 removal or disqualification, or until a successor is appointed; and be
 it further

 RESOLVED, that the initial Apache DataFu PMC be and hereby is tasked
 with the creation of a set of bylaws intended to encourage open
 development and increased participation in the Apache DataFu Project;
 and be it further

 RESOLVED, that the Apache DataFu Project be and hereby is tasked with
 the migration and rationalization of the Apache Incubator DataFu
 podling; and be it further

 RESOLVED, that all responsibilities pertaining to the Apache Incubator
 DataFu podling encumbered upon the Apache Incubator Project are
 hereafter discharged.

 Special Order 7E, Establish the Apache DataFu Project, was
 approved by Unanimous Vote of the directors present.

17 Jan 2018

DataFu provides a collection of Hadoop MapReduce jobs and functions in higher
level languages based on it to perform data analysis. It provides functions
for common statistics tasks (e.g. quantiles, sampling), PageRank, stream
sessionization, and set and bag operations. DataFu also provides Hadoop jobs
for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Address IPMC feedback raised during graduation discussion 2. Positive
 IPMC recommendation vote for graduation 3. Continue releases

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

 None

How has the community developed since the last report?

 One new contributor (Yuval Allweil)

How has the project developed since the last report?

 Upgraded Guava and Gradle versions. Addressed many website issues raised in
 graduation discussion.  Whimsy report now mostly green. Rat task
 automatically run as part of build. New UDFs for diffing tuples and
 computing hashes.

How would you assess the podling's maturity? Please feel free to add your own
commentary.

 [ ] Initial setup
 [ ] Working towards first release
 [ ] Community building
 [x] Nearing graduation
 [ ] Other:

Date of last release:

 2017-03-10

When were the last committers or PPMC members elected?

 July 2016 (Eyal Allweil)

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
    Comments:
 [X](datafu) Roman Shaposhnik
    Comments: In my view the podling is ready to
  graduate at this point.
 [ ](datafu) Ted Dunning
    Comments:

IPMC/Shepherd notes:

18 Oct 2017

DataFu provides a collection of Hadoop MapReduce jobs and functions in higher
level languages based on it to perform data analysis. It provides functions
for common statistics tasks (e.g. quantiles, sampling), PageRank, stream
sessionization, and set and bag operations. DataFu also provides Hadoop jobs
for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Positive IPMC recommendation vote for graduation
 2. Address any IPMC feedback regarding graduation
 3. Continue releases

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None

How has the community developed since the last report?

 No new developments

How has the project developed since the last report?

 Added support for Pig macros distributed in JAR.
 Added a couple counting macros and TFIDF.
 Started graduation discussion in general list.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [ ] Initial setup
 [ ] Working towards first release
 [ ] Community building
 [x] Nearing graduation
 [ ] Other:

Date of last release:

 2017-03-10

When were the last committers or PPMC members elected?

 July 2016 (Eyal Allweil)

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
    Comments:
 [ ](datafu) Roman Shaposhnik
    Comments:
 [X](datafu) Ted Dunning
    Comments:

IPMC/Shepherd notes:

 johndament: The podling is attempting to graduate, however there are some concerns raised within the discussion over how big the actual PMC is.  There have also been concerns raised over the removal of committer status.

16 Aug 2017

DataFu provides a collection of Hadoop MapReduce jobs and functions in higher
level languages based on it to perform data analysis. It provides functions
for common statistics tasks (e.g. quantiles, sampling), PageRank, stream
sessionization, and set and bag operations. DataFu also provides Hadoop jobs
for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Positive IPMC recommendation vote for graduation
 2. Address any IPMC feedback regarding graduation
 3. Continue releases

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None

How has the community developed since the last report?

* Community voted positively for graduation from Incubator.
* New contributor opened and fixed DATAFU-124

How has the project developed since the last report?

* Completed maturity evaluation checklist
 (https://cwiki.apache.org/confluence/display/DATAFU/Maturity+Evaluation)
* Drafted a graduation resolution
 (https://cwiki.apache.org/confluence/display/DATAFU/Graduation+Resolution)

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [ ] Initial setup
 [ ] Working towards first release
 [ ] Community building
 [x] Nearing graduation
 [ ] Other:

Date of last release:

 2017-03-10

When were the last committers or PPMC members elected?

 July 2016 (Eyal Allweil)

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
    Comments:
 [x](datafu) Roman Shaposhnik
    Comments:
    best of luck to this podling's graduation. They are a small, but a viable community
 [x](datafu) Ted Dunning
    Comments: Good luck to the community

IPMC/Shepherd notes:
Project is ready to graduate. I noticed a problem with their answer to QU30 on the Maturity model.
It was in part a documentation issue. Raised it with the project and they are addressing the way that
they are asking for security issues. Raised the documentation issue with ComDev and we fixed it.
Dave Fisher

19 Apr 2017

DataFu provides a collection of Hadoop MapReduce jobs and functions in higher
level languages based on it to perform data analysis. It provides functions
for common statistics tasks (e.g. quantiles, sampling), PageRank, stream
sessionization, and set and bag operations. DataFu also provides Hadoop jobs
for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Complete maturity evaluation checklist
 2. Draft graduation resolution
 3. Continue releases

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None

How has the community developed since the last report?

 No changes

How has the project developed since the last report?

 Version 1.3.2 was released.  This addressed an issue with released
 convenience binaries not including LICENSE, NOTICE, and DISCLAIMER
 in META-INF of JARs.  This was considered an important item to
 tackle before graduation.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [x] Initial setup
 [x] Working towards first release (released 1.3.0, 1.3.1, and 1.3.2)
 [x] Community building (4 committers added since incubuation, 24
     contributors in total)
 [x] Nearing graduation (maturity evaluation is nearly complete)
 [ ] Other:

Date of last release:

 2017-03-10

When were the last committers or PPMC members elected?

 July 2016 (Eyal Allweil)

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
    Comments:
 [x](datafu) Roman Shaposhnik
    Comments: I believe the podling is ready for graduation
 [ ](datafu) Ted Dunning
    Comments:

27 Feb 2017

DataFu provides a collection of Hadoop MapReduce jobs and functions in higher
level languages based on it to perform data analysis. It provides functions
for common statistics tasks (e.g. quantiles, sampling), PageRank, stream
sessionization, and set and bag operations. DataFu also provides Hadoop jobs
for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Resolve NOTICE and LICENSE issues for binary distributions
 2. Continued releases
 3. Increased committer activity

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None

How has the community developed since the last report?

 Received a patch from a new contributor.

How has the project developed since the last report?

 No updates

Date of last release:

 2016-08-10

When were the last committers or PPMC members elected?

 July 2016 (Eyal Allweil)

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
 [X](datafu) Roman Shaposhnik
 [ ](datafu) Ted Dunning

Shepherd/Mentor notes:

 Roman Shaposhnik:

   I really think we need to do one final push and either graduate or retire.

16 Nov 2016

DataFu provides a collection of Hadoop MapReduce jobs and functions in higher
level languages based on it to perform data analysis. It provides functions
for common statistics tasks (e.g. quantiles, sampling), PageRank, stream
sessionization, and set and bag operations. DataFu also provides Hadoop jobs
for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Resolve NOTICE and LICENSE issues for binary distributions
 2. Continued releases
 3. Increased committer activity

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None

How has the community developed since the last report?

 * No updates

How has the project developed since the last report?

 * Released 1.3.1.  Now using ASF-associated signing key.  Feedback from
   previous release addressed.
 * Website updated alongside 1.3.1 release.
 * Cleaned up release instructions.

Date of last release:

 2016-08-10

When were the last committers or PMC members elected?

 July 2016 (Eyal Allweil)

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
 [X](datafu) Roman Shaposhnik
 [ ](datafu) Ted Dunning

Shepherd/Mentor notes:

 Roman Shaposhnik:

   Pushing this community towards graduation is pretty high on my TODO list.
   I think they are as ready as they are ever going to be.

19 Oct 2016

 johndament:

   Discussions on this podling seem to have stopped completely.  There was a
     graduation discussion back in August, which seems to have dropped
     completely after some release content issues were identified.

20 Jul 2016

DataFu provides a collection of Hadoop MapReduce jobs and functions in
higher level languages based on it to perform data analysis. It provides
functions for common statistics tasks (e.g. quantiles, sampling), PageRank,
stream sessionization, and set and bag operations. DataFu also provides
Hadoop jobs for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Grow user and contributor base
 2. Increased committer activity
 3. Continued releases

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None

How has the community developed since the last report?

 * Eyal Allweil was voted in as the newest committer and member of the
   PPMC.

How has the project developed since the last report?

 * A new UDF provided by Eyal was committed and another was submitted for
   review.
 * ASF-associated signing key committed in prep for next release,
   addressing feedback from previous release.

Date of last release:

 2015-11-14

When were the last committers or PMC members elected?

 July 2016 (Eyal Allweil)

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
 [x](datafu) Roman Shaposhnik
 [x](datafu) Ted Dunning

20 Apr 2016

DataFu provides a collection of Hadoop MapReduce jobs and functions in higher
level languages based on it to perform data analysis. It provides functions
for common statistics tasks (e.g. quantiles, sampling), PageRank, stream
sessionization, and set and bag operations. DataFu also provides Hadoop jobs
for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Grow user and contributor base
 2. Increased committer activity
 3. Continued releases

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None

How has the community developed since the last report?

 * A new contributor opened several JIRAs regarding improvements and
   contributed patches.  Two have been committed so far.

How has the project developed since the last report?

 * Improved instructions on loading projects in Eclipse based on discussion
   in JIRA.
 * Added checks in build system to catch issues using wrong JDK version.
 * Some UDFs were improved to be more efficient.
 * A new UDF is pending review.

Date of last release:

 2015-11-14

When were the last committers or PMC members elected

 November 2014

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
 [X](datafu) Roman Shaposhnik
 [X](datafu) Ted Dunning

20 Jan 2016

DataFu provides a collection of Hadoop MapReduce jobs and functions in
higher level languages based on it to perform data analysis. It provides
functions for common statistics tasks (e.g. quantiles, sampling), PageRank,
stream sessionization, and set and bag operations. DataFu also provides
Hadoop jobs for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Grow user and contributor base
 2. Increased committer activity
 3. Continued releases

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 * None

How has the community developed since the last report?

 * No new activity in the community since the last report.

How has the project developed since the last report?

 * Apache DataFu 1.3.0 source release completed, which is the first release
   since entering the Incubator.  DataFu 1.3.0 was also released to Maven.
 * Website (http://datafu.incubator.apache.org/) has been updated with
   instructions on how to use the source release or artifacts from Maven.

Date of last release:

 * 2015-11-14

When were the last committers or PMC members elected?



Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
 [X](datafu) Roman Shaposhnik
 [x](datafu) Ted Dunning

Shepherd/Mentor notes:

 Roman Shaposhnik (rvs):

   The community appears to be in the final stretch before graduation,
   hopefully there's enough critical mass for it to happen.

18 Nov 2015

DataFu provides a collection of Hadoop MapReduce jobs and functions in
higher level languages based on it to perform data analysis. It provides
functions for common statistics tasks (e.g. quantiles, sampling), PageRank,
stream sessionization, and set and bag operations. DataFu also provides
Hadoop jobs for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Do first release
 2. Grow user and contributor base
 3. Increased committer activity

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 * Performing the initial release remains the most important milestone.

How has the community developed since the last report?

 * No new activity in the community since the last report.

How has the project developed since the last report?

 * The website documentation (http://datafu.incubator.apache.org/) has been
   updated and brought up to date with the current state of the project and
   build system, making it easier for newcomers to get started.  This was
   the last major task blocking release.
 * All the release tasks filed for our first release have now been
   completed.  A discussion has been opened in the dev mailing list on the
   topic of doing our first release.  A vote will likely be held in the
   next few days.

Date of last release:

 * Not yet released.  First release will likely happen within the coming
   weeks.

When were the last committers or PMC members elected?

 * November 2014

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
 [X](datafu) Roman Shaposhnik
 [X](datafu) Ted Dunning

Shepherd/Mentor notes:

19 Aug 2015

DataFu provides a collection of Hadoop MapReduce jobs and functions in higher
level languages based on it to perform data analysis. It provides functions
for common statistics tasks (e.g. quantiles, sampling), PageRank, stream
sessionization, and set and bag operations. DataFu also provides Hadoop jobs
for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Do first release
 2. Grow user and contributor base
 3. Increased committer activity

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

 * Performing the initial release remains the most important milestone.
   However, the majority of the known tasks for this have been completed.
   The remaining tasks are related to documentation.  We are able to
   generate a signed, versioned source release from the build system.

How has the community developed since the last report?

 * A couple new contributors have submitted patches.  One of these has been
   committed and the other is nearly ready to be committed.

How has the project developed since the last report?

 * The build system has been updated to provide tasks for generating
   signed, versioned source releases.  Documentation has been updated as
   well with instructions on how to do this.
 * Apache DataFu has been updated to run against Hadoop 2.  There was an
   issue running the Hourglass integration tests against Hadoop 2, which
   had been blocking this update.
 * A couple new patches from two new contributors for Pig UDFs have been
   submitted.  One of these is an improvement to the HyperLogLog UDF
   cardinality estimator that makes it much more efficient.  The other is a
   helper for getting a tuple out of a bag.
 * A patch has been submitted for a UDF to incrementally process
   date-partitioned data in Pig.  This provides similar functionality that
   is available in Hourglass.

Date of last release:

 * Not yet released

When were the last committers or PMC members elected?

 * November 2014

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
 [X](datafu) Roman Shaposhnik
 [X](datafu) Ted Dunning

Shepherd/Mentor notes:

 Ted Dunning:

   The generation of signed releases on shared hardware has historically
   raised serious security questions from infra. I think that this process
   needs to be vetted very carefully.

21 Jan 2015

DataFu provides a collection of Hadoop MapReduce jobs and functions in higher
level languages based on it to perform data analysis. It provides functions
for common statistics tasks (e.g. quantiles, sampling), PageRank, stream
sessionization, and set and bag operations. DataFu also provides Hadoop jobs
for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Grow user and contributor base.
 2. Make first release.
 3. Increase activity for initial committers.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 1. Has not yet made a release, but is in process of preparing first.
 2. Need to dramatically grow the contributor base.

How has the community developed since the last report?

 New committer and PMC member.  Several JIRAs filed by new users.

How has the project developed since the last report?

 1. 16 issues created, several from new contributors.
 2. 8 issues closed.
 3. Reasonable amount of mailing list traffic.

Date of last release:

 None yet. Currently preparing release: DATAFU-53.

When were the last committers or PMC members elected?

 Nov 2014, Russell Jurney, both committer and PPMC.

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
 [X](datafu) Roman Shaposhnik
 [X](datafu) Ted Dunning

15 Oct 2014

DataFu provides a collection of Hadoop MapReduce jobs and functions in higher
level languages based on it to perform data analysis. It provides functions
for common statistics tasks (e.g. quantiles, sampling), PageRank, stream
sessionization, and set and bag operations. DataFu also provides Hadoop jobs
for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Building an ASF-based community.
 2. Release.
 3. Adding support for Hadoop 2.x

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None.

How has the community developed since the last report?

 Three new users have contributed code since the last report.

How has the project developed since the last report?

 A couple more UDFs have been committed.  One bug fix was committed.  All
 JARs have been removed from the repo (a blocker for source release).  A
 build task has been added for creating a source release.  No open blockers
 for release left at this point.  Several more UDFs have been contributed but
 are still under review.

Date of last release:

 No release yet.

When were the last committers or PMC members elected?

 2014-02-22

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
 [X](datafu) Roman Shaposhnik
 [ ](datafu) Ted Dunning

Shepherd/Mentor notes:

(jmclean): Did not report on time. Low level mentor activity but no obvious
issues other than missing release. (Release mentioned in last report and
DATAFU-53 blocking release has been resolved).

16 Jul 2014

DataFu provides a collection of Hadoop MapReduce jobs and functions in
higher level languages based on it to perform data analysis. It provides
functions for common statistics tasks (e.g. quantiles, sampling), PageRank,
stream sessionization, and set and bag operations. DataFu also provides
Hadoop jobs for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

  1. Building an ASF-based community.
  2. Release.
  3. Decide on the future home of the project.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

  None.

How has the community developed since the last report?

  Will Vaughan gave a talk on DataFu at ApacheCon in April, and
  Casey Stella gave a talk on Pig and DataFu at the Hadoop Summit in
  June.

How has the project developed since the last report?

 Lots of JIRAs on bug fixes and new features, especially in April and May.
 Work slowed significantly in June, which probably means it's time for a
 release to mark our progress thus far.

Date of last release:

  None. Six month of incubation.

When were the last committers or PMC members elected?

  2014-02-22

Signed-off-by:

  [ ](datafu) Ashutosh Chauhan
  [X](datafu) Roman Shaposhnik
  [ ](datafu) Ted Dunning

Shepherd/Mentor notes:

(jmclean) :  Mentor active, no obvious issues.

16 Apr 2014

DataFu provides a collection of Hadoop MapReduce jobs and functions in
higher level languages based on it to perform data analysis. It provides
functions for common statistics tasks (e.g. quantiles, sampling), PageRank,
stream sessionization, and set and bag operations. DataFu also provides
Hadoop jobs for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Building ASF community
 2. Release
 3. Remaining incubator paperwork

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None.

How has the community developed since the last report?

 A talk was given at an Apache Pig meetup held on March 14th.  A talk
 is scheduled to be given at ApacheCon in Denver on April 7th.  Jian
 Wang accepted the invitation to become a committer.

How has the project developed since the last report?

 Two new Jiras have been filed and received patches.

Date of last release:

 None. Third month of incubation.

When were the last committers or PMC members elected?

 2014-02-22

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
 [X](datafu) Roman Shaposhnik
 [ ](datafu) Ted Dunning

Shepherd/Mentor notes:

 Justin Mclean (jmclean):

   Relative new podling yet to make a release. One mentor is active on
   public mailing list no obvious issues that need attention.

19 Mar 2014

DataFu provides a collection of Hadoop MapReduce jobs and functions in
higher level languages based on it to perform data analysis. It provides
functions for common statistics tasks (e.g. quantiles, sampling), PageRank,
stream sessionization, and set and bag operations. DataFu also provides
Hadoop jobs for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Building ASF community
 2. Release
 3. Remaining incubator paperwork

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None.

How has the community developed since the last report?

 More contributions have been received from Jian Wang, who has also
 been voted in as the newest committer and PPMC member.  A talk
 is planned at the Apache Pig meetup to be held on March 14th.

How has the project developed since the last report?

 Three JIRAs have been opened, four have been closed.  The project has
 migrated from Ant to the Gradle build system, which will make it easier
 to add libraries for Hive, Crunch, etc.

Date of last release:

 None. Second month of incubation.

When were the last committers or PMC members elected?

 2014-02-22

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
 [X](datafu) Roman Shaposhnik
 [x](datafu) Ted Dunning

19 Feb 2014

DataFu provides a collection of Hadoop MapReduce jobs and functions in
higher level languages based on it to perform data analysis. It provides
functions for common statistics tasks (e.g. quantiles, sampling), PageRank,
stream sessionization, and set and bag operations. DataFu also provides
Hadoop jobs for incremental data processing in MapReduce.

DataFu has been incubating since 2014-01-05.

Three most important issues to address in the move towards graduation:

 1. Building ASF community
 2. Release
 3. Remaining incubator paperwork

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None.

How has the community developed since the last report?

 Since initial incubation, have received contributions from two new
 contributors.

How has the project developed since the last report?

 First report.  Have obtained all the necessary infra (git/jira/wiki,etc).
 Thirty JIRAs have been opened, 14 have been closed.  Active discussion on
 mailing list as to community development, etc.

Date of last release:

 None. First month of incubation.

When were the last committers or PMC members elected?

 None. First month of incubation.

Signed-off-by:

 [ ](datafu) Ashutosh Chauhan
 [X](datafu) Roman Shaposhnik
 [ ](datafu) Ted Dunning

Shepherd/Mentor notes:

 Dave Fisher (wave):

   New community to the incubator just getting started. Good guidance from
   Mentors. Needs Apache trademark attribution on site. Should have links
   to Mailing lists on the site.