Skip to Main Content
Apache Events The Apache Software Foundation
Apache 20th Anniversary Logo

This was extracted (@ 2024-03-20 21:10) from a list of minutes which have been approved by the Board.
Please Note The Board typically approves the minutes of the previous meeting at the beginning of every Board meeting; therefore, the list below does not normally contain details from the minutes of the most recent Board meeting.

WARNING: these pages may omit some original contents of the minutes.
This is due to changes in the layout of the source minutes over the years. Fixes are being worked on.

Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).

Crunch

17 Jun 2020

Terminate the Apache Crunch Project

 WHEREAS, the Project Management Committee of the Apache Crunch project
 has arrived at a consensus to recommend moving the project to the
 Attic; and

 WHEREAS, the Board of Directors deems it no longer in the best interest
 of the Foundation to continue the Apache Crunch project due to
 inactivity;

 NOW, THEREFORE, BE IT RESOLVED, that the Apache Crunch project is
 hereby terminated; and be it further

 RESOLVED, that the Attic PMC be and hereby is tasked with oversight
 over the software developed by the Apache Crunch Project; and be it
 further

 RESOLVED, that the office of "Vice President, Apache Crunch" is hereby
 terminated; and be it further

 RESOLVED, that the Apache Crunch PMC is hereby terminated.

 Special Order 7B, Terminate the Apache Crunch Project, was
 approved by Unanimous Vote of the directors present.

20 May 2020 [Josh Wills / Bertrand]

## Description:
The mission of Crunch is the creation and maintenance of software related to
Simple and Efficient MapReduce Pipelines

## Issues:
We're discussing the future of the project on the PMC
mailing list and could use some input from the board.

## Membership Data:
Apache Crunch was founded 2013-02-19 (7 years ago)
There are currently 15 committers and 12 PMC members in this project.
The Committer-to-PMC ratio is 5:4.

Community changes, past quarter:
- No new PMC members. Last addition was Micah Whitacre on 2014-04-02.
- No new committers. Last addition was Stephen Durfey on 2018-02-09.

## Project Activity:
No activity since the last release in January; some COVID-19
related work and an incorrectly configured email server
caused the chair to miss the last report deadline, apologies
for that.

## Community Health:
Things are quiet, it feels like the core of the work is mostly
complete and we are talking about how best to wrap things up.

15 Apr 2020 [Josh Wills / Shane]

No report was submitted.

@Shane: pursue potential Attic resolution for Crunch

15 Jan 2020 [Josh Wills / Danny]

## Description:
The mission of Crunch is the creation and maintenance of software related to
building simple and efficient data pipelines on Hadoop and Spark.

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Crunch was founded 2013-02-19 (7 years ago)
There are currently 15 committers and 12 PMC members in this project.
The Committer-to-PMC ratio is 5:4.

Community changes, past quarter:
- No new PMC members. Last addition was Micah Whitacre on 2014-04-02.
- No new committers. Last addition was Stephen Durfey on 2018-02-09.

## Project Activity:
We did our 1.0.0 release on 2019-10-24, and are currently working
on a major dependency upgrade to keep Crunch compatible with our
myriad upstream dependencies, likely followed quickly by yet
another release so that users who need to be on Hadoop 2.8.2 and
later versions can keep working:
https://issues.apache.org/jira/browse/CRUNCH-692

## Community Health:
Quiet quarter, aside from the release vote and a bit of
traffic related to the upgrade.

16 Oct 2019 [Josh Wills / Rich]

## Description:
The mission of Apache Crunch is to make it easy to create and maintain
large-scale data pipelines within the Apache Hadoop ecosystem.

## Issues:
There are no issues requiring board attention at this time.

## Membership Data:
Apache Crunch was founded 2013-02-19 (7 years ago)
There are currently 15 committers and 12 PMC members in this project.
The Committer-to-PMC ratio is 5:4.

Community changes, past quarter:
- No new PMC members. Last addition was Micah Whitacre on 2014-04-02.
- No new committers. Last addition was Stephen Durfey on 2018-02-09.

## Project Activity:
We are currently in the middle of the 1.0.0 release vote. Feels good
to be reaching this milestone as a project.

## Community Health:
It will be interesting to see how the release vote progresses and
where the community wants to take the project going forward; we hope
that this section will be more interesting in our next report three
months from now.

17 Jul 2019 [Josh Wills / Ted]

## Description:
 Apache Crunch is a Java library for writing, testing, and running MapReduce
 and Apache Spark pipelines on Apache Hadoop.

## Issues:
 There are no issues requiring board attention at this time.

## Activity:
 Some good work the past couple of months resolving some long-standing issues
 with S3 compatibility/utility that put us into good position for completing a
 release and adding new committers.

## Health report:
 Same structural issues as our last report; the utility of the project is
 primarily for developers who are using MapReduce pipelines either
 in local Hadoop clusters and/or migrating them to the cloud, so
 there isn't much new work to do besides those efforts.

## PMC changes:

 - Currently 12 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Micah Whitacre on Wed Apr 02 2014

## Committer base changes:

 - Currently 15 committers.
 - No new committers added in the last 3 months
 - Last committer addition was Stephen Durfey at Fri Feb 09 2018

## Releases:

 - Last release was 0.15.0 on Sat Feb 25 2017

## JIRA activity:

 - 7 JIRA tickets created in the last 3 months
 - 7 JIRA tickets closed/resolved in the last 3 months

15 May 2019 [Josh Wills / Rich]

## Description:
 - Apache Crunch is a JVM library for writing, testing, and running MapReduce
   and Apache Spark pipelines on Apache Hadoop.

## Issues:
 - There are no issues requiring board attention at this time.

## Activity:
 - Most of the work this quarter focused on integrations with Hadoop's
   FileSystem extensions to better support reading/writing pipeline data
   from/to various object stores (mainly S3.)

## Health report:
 - The project's current work is focused on the needs of community members who
   are moving MapReduce-based pipelines that ran in legacy clusters to various
   cloud-based Hadoop clusters, and makes sense given Hadoop's general focus
   on this area as well.

## PMC changes:

 - Currently 12 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Micah Whitacre on Wed Apr 02 2014

## Committer base changes:

 - Currently 15 committers.
 - No new committers added in the last 3 months
 - Last committer addition was Stephen Durfey at Fri Feb 09 2018

## Releases:

 - Last release was 0.15.0 on Sat Feb 25 2017

## JIRA activity:

 - 10 JIRA tickets created in the last 3 months
 - 6 JIRA tickets closed/resolved in the last 3 months

17 Apr 2019 [Josh Wills / Daniel]

No report was submitted.

16 Jan 2019 [Josh Wills / Ted]

## Description:
 - Apache Crunch is a Java library for writing, testing, and running MapReduce
 and Apache Spark pipelines on Apache Hadoop.

## Issues:
 - There are no issues requiring board attention at this time.

## Activity:
 - Had a few issues files this quarter related to extensions/fixes to better
 integrate with object stores (like S3) at the end of MapReduce jobs with a
 patch promised from a developer who has made some small contributions to
 the project in the past and looks like an excellent candidate to become
 a committer in the near future.

## Health report:
 - The needs of the community are largely unchanged and are mainly focused
 on integrating legacy MapReduce jobs with cloud platforms that only use
 HDFS as a caching layer while persisting data for the long-term in object
 storage; there isn't much other work to do on the core of the Crunch system
 aside from these compatibility updates and the occasional bug fix.

## PMC changes:

 - Currently 12 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Micah Whitacre on Wed Apr 02 2014

## Committer base changes:

 - Currently 15 committers.
 - No new committers added in the last 3 months
 - Last committer addition was Stephen Durfey at Fri Feb 09 2018

## Releases:

 - Last release was 0.15.0 on Sat Feb 25 2017

## JIRA activity:

 - 2 JIRA tickets created in the last 3 months
 - 0 JIRA tickets closed/resolved in the last 3 months

17 Oct 2018 [Josh Wills / Phil]

## Description:
 - Apache Crunch is a Java library for writing, testing, and running MapReduce
 and Apache Spark pipelines on Apache Hadoop.

## Issues:
 - There are no issues requiring board attention at this time.

## Activity:
 - After the Hive and HBase work last quarter, we had a relatively
   quiet block of work on S3-compatibility changes and some fixes
   for Avro support with the Apache Spark runtime engine, along
   with some preparation for changes to the public/private settings
   of APIs in the next release of Apache HBase.

## Health report:
 - It was a quiet quarter, just a couple of interesting bugs that
   needed to be worked through. The work for the next quarter
   should be focused on Java upgrades (ideally to JDK11) ahead
   of the next major version release.

## PMC changes:

 - Currently 12 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Micah Whitacre on Wed Apr 02 2014

## Committer base changes:

 - Currently 14 committers.
 - No new committers added in the last 3 months
 - Last committer addition was David Whiting at Mon Nov 30 2015

## Releases:

 - Last release was 0.15.0 on Sat Feb 25 2017

## JIRA activity:

 - 3 JIRA tickets created in the last 3 months
 - 2 JIRA tickets closed/resolved in the last 3 months

18 Jul 2018 [Josh Wills / Roman]

## Description:
 - Apache Crunch is a Java library for writing, testing, and running MapReduce
 and Apache Spark pipelines on Apache Hadoop.

## Issues:
 - There are no issues requiring board attention at this time.

## Activity:
 - The only real activity this quarter was a few small changes to improve S3
 compatibility/reliability and an interesting discussion/debugging exercise
 on the user mailing list about an incompatibility between Crunch's Apache
 Spark runtime and the code for reading/writing Avro serialized data.

## Health report:
 - I believe the overall state of the project reflects the overall state of
   open-source data engineering in general, i.e., a steady move away from
   MapReduce-based pipelines to running pipelines on top of modern engines
   like Spark or the generalized APIs of a system like Apache Beam. There
   simply isn't much interest (or much need) to extend Crunch's functionality
   as opposed to providing a smooth migration off of it and on to Spark or Beam.
   The only real exception to this is certain extremely large workloads that
   cannot move to Spark or Beam for whatever reason, and for those workloads,
   we should be making it easy (as the rest of the Hadoop community has) to
   run those jobs in either an on-premise cluster or on the cloud by not assuming
   that data will be stored in HDFS permanently.

## PMC changes:

 - Currently 12 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Micah Whitacre on Wed Apr 02 2014

## Committer base changes:

 - Currently 15 committers.
 - No new committers added in the last 3 months
 - Last committer addition was Stephen Durfey at Fri Feb 09 2018

## Releases:

 - Last release was 0.15.0 on Sat Feb 25 2017

## Mailing list activity:

 - dev@crunch.apache.org:
    - 79 subscribers (down -2 in the last 3 months):
    - 14 emails sent to list (51 in previous quarter)

 - user@crunch.apache.org:
    - 152 subscribers (down -1 in the last 3 months):
    - 36 emails sent to list (6 in previous quarter)


## JIRA activity:

 - 2 JIRA tickets created in the last 3 months
 - 1 JIRA tickets closed/resolved in the last 3 months

18 Apr 2018 [Josh Wills / Isabel]

## Description:
 - Apache Crunch is a Java library for writing, testing, and running MapReduce
 and Apache Spark pipelines on Apache Hadoop.

## Issues:
 - There are no issues requiring board attention at this time.

## Activity:
 - This quarter was relatively quiet and was primarily focused on fixing
 small bugs in the HBase, Kafka, and Hive integrations that Crunch provides.
 A number of these fixes came from Clement Mathieu, who looks like a good
 candidate to be a new committer on the project along with Stephen Durfey,
 who became a committer in February.

## Health report:
 - Crunch continues to move at a steady pace of commits and bug fixes,
 but there has not been a major push from the community to add or update
 the existing functionality beyond the set of things that Crunch does well
 already. The biggest obvious improvement to the project is an upgrade of
 the APIs for the HBase dependency, but that isn't necessarily the
 kind of work that a developer would be interested in doing for fun (as
 opposed to as a result of a specific need for their work and a desire to
 contribute that work back to the community.)

## PMC changes:

 - Currently 12 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Micah Whitacre on Wed Apr 02 2014

## Committer base changes:

 - Currently 15 committers.
 - Stephen Durfey was added as a committer on Fri Feb 09 2018

## Releases:

 - Last release was 0.15.0 on Sat Feb 25 2017

## JIRA activity:

 - 8 JIRA tickets created in the last 3 months
 - 7 JIRA tickets closed/resolved in the last 3 months

17 Jan 2018 [Josh Wills / Chris]

## Description:
 - Apache Crunch is a Java library for writing, testing, and running MapReduce
 and Apache Spark pipelines on Apache Hadoop.

## Issues:
 - There are no issues requiring board attention at this time.

## Activity:
 - The big push this quarter was to upgrade the Hive dependencies
   for Crunch in order to add a new HCatalog module for reading and
   writing data to the Hive metastore. This was one of the major
   upgrades to the project that we wanted to get done before the 1.0
   release, and we're working on getting the HBase version upgrades
   and fixes into the mainline of the codebase now.

## Health report:
 - Acting on the feedback we received after our last report, the PMC
   recently voted to add the primary developer of the Hive/HCatalog
   functionality as a new committer on the project, and are reaching
   out to him now to kick off the process with the ASF. We're optimistic
   about adding another new committer for the HBase work in the next
   quarter.

## PMC changes:

 - Currently 12 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Micah Whitacre on Wed Apr 02 2014

## Committer base changes:

 - Currently 14 committers.
 - No new committers added in the last 3 months
 - Last committer addition was David Whiting at Mon Nov 30 2015

## Releases:

 - Last release was 0.15.0 on Sat Feb 25 2017

## JIRA activity:

 - 4 JIRA tickets created in the last 3 months
 - 5 JIRA tickets closed/resolved in the last 3 months

15 Nov 2017 [Josh Wills / Chris]

## Description:
 Apache Crunch is a Java library for writing, testing, and running MapReduce
 and Apache Spark pipelines on Apache Hadoop.

## Issues:
 There are no issues requiring board attention at this time.

## Activity:
 Aside from a few small bug fixes, the main focus of development/discussion
 in the last quarter was how to execute the series of upgrades on our
 dependencies (mainly Apache Hadoop, HBase, and Hive) that we need to complete
 the 1.0 release. We need to do some heavy lifting and compatibility-breaking
 changes to the HBase module in order to make the move to HBase 2.x, which
 is necessary to make the move to Hive 2.x. The JIRA issue here tracks this:
 https://issues.apache.org/jira/browse/CRUNCH-659

## Health report:
 The fact that we're no longer really working on new features or functionality
 in favor of bug fixes and version upgrades is the most significant issue with
 the health of the project. The good news is that a number of new contributors
 have been driving the version upgrade effort, and several of them clearly
 have the potential to become committers and PMC members once this work is
 done. The biggest need (and where I have fallen short as PMC chair) is
 providing them with guidance, feedback, and support for their efforts so
 that they can complete this work and earn their committerships.

## PMC changes:

 - Currently 12 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Micah Whitacre on Wed Apr 02 2014

## Committer base changes:

 - Currently 14 committers.
 - No new committers added in the last 3 months
 - Last committer addition was David Whiting at Mon Nov 30 2015

## Releases:

 - Last release was 0.15.0 on Sat Feb 25 2017

## JIRA activity:

 - 8 JIRA tickets created in the last 3 months
 - 2 JIRA tickets closed/resolved in the last 3 months

18 Oct 2017 [Josh Wills / Ted]

No report was submitted.

19 Jul 2017 [Josh Wills / Phil]

## Description:
Apache Crunch is a Java library for writing, testing, and running MapReduce
and Apache Spark pipelines on Apache Hadoop.

## Issues:
There are no issues requiring board attention at this time.

## Activity:
Activity since the most recent release (February 2017) has been focused on
upgrading the versions of major dependencies (especially HBase and Spark) in
preparation for a 1.0 release, which will be synced with the latest and
greatest from downstream projects and will allow us to clean up some
deprecated parts of the API.


## Health report:
## PMC changes:

- Currently 12 PMC members.
- No new PMC members added in the last 3 months
- Last PMC addition was Micah Whitacre on Wed Apr 02 2014

## Committer base changes:

- Currently 14 committers.
- No new committers added in the last 3 months
- Last committer addition was David Whiting at Mon Nov 30 2015

## Releases:

- Last release was 0.15.0 on Sat Feb 25 2017

## JIRA activity:

- 8 JIRA tickets created in the last 3 months
- 2 JIRA tickets closed/resolved in the last 3 months

19 Apr 2017 [Josh Wills / Mark]

## Description:
 Apache Crunch is a Java library for writing, testing, and running MapReduce
 and Apache Spark pipelines on Apache Hadoop.

## Issues:
 There are no issues requiring board attention at this time.

## Activity:
 Activity since the most recent release (February 2017) has been focused
 on upgrading the versions of major dependencies (especially HBase and Spark)
 in preparation for a 1.0 release, which will be synced with the latest
 and greatest from downstream projects and will allow us to clean up some
 deprecated parts of the API.

## Health report:
 Although the current focus of the project is good and useful, the question
 now is what to do after the 1.0 release is complete, which brings us back
 to broader questions about the future of Crunch and how it should relate
 to similar top-level projects like Apache Beam. We'll begin this conversation
 in earnest on the dev mailing list once the 1.0 release is finished.

## PMC changes:

 - Currently 12 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Micah Whitacre on Wed Apr 02 2014

## Committer base changes:

 - Currently 14 committers.
 - No new committers added in the last 3 months
 - Last committer addition was David Whiting at Mon Nov 30 2015

## Releases:

 - 0.15.0 was released on Sat Feb 25 2017

## JIRA activity:

 - 10 JIRA tickets created in the last 3 months
 - 10 JIRA tickets closed/resolved in the last 3 months

18 Jan 2017 [Josh Wills / Brett]

Apache Crunch is a Java library for writing, testing, and running MapReduce
and Apache Spark pipelines on Apache Hadoop.

## Issues:

There are no issues requiring board attention at this time.

## Activity:

The JIRAs for the past few months have focused on core bug fixes and
iteration/fixes on the Apache Kafka support that we started on after our last
release in May.

## Health report:

The Crunch code does what it does well, and it has for at least a few
releases. Beyond bug fixes and supporting upgrades to new Hadoop/Spark
releases, there isn't an obvious new direction to take the project in that
would stay true to its original mission while remaining useful to developers.
In terms of project goals, API design, and even a subset of committers, Crunch
has a lot in common with the newly top-level Apache Beam project, which is
focused on the next generation of data processing engines that unify batch and
streaming use cases into a single API. Finding a way to join forces with Beam
is one available way forward for the project, but figuring out what that move
would look like would require some extensive discussions on the mailing lists,
both about the future of data pipelines in general as well as the role that
the Crunch community most wants to play.

## PMC changes:

- Currently 12 PMC members.
- No new PMC members added in the last 3 months
- Last PMC addition was Micah Whitacre on Wed Apr 02 2014

## Committer base changes:

- Currently 14 committers.
- No new committers added in the last 3 months
- Last committer addition was David Whiting at Mon Nov 30 2015

## Releases:

- Last release was 0.14.0 on Wed May 04 2016

## JIRA activity:

- 9 JIRA tickets created in the last 3 months
- 7 JIRA tickets closed/resolved in the last 3 months

19 Oct 2016 [Josh Wills / Isabel]

## Description:
   Apache Crunch is a Java library for writing, testing, and running
   MapReduce and Apache Spark pipelines on Apache Hadoop.

## Issues:
   We had a report this quarter that a file that was used in some of our
   tests contained content that had an unclear copyright status outside
   of the US (maugham.txt, containing a sampling of work from W. Somerset
   Maugham.) We removed this file from our source repo and updated the
   tests that made use of it, as discussed in this JIRA issue:
   https://issues.apache.org/jira/browse/CRUNCH-616 If the board feels
   that any additional action is required here, please let the PMC know
   and we will take it.

## Activity:
   We had a normal amount of activity this month, primarily focused on
   performance and debugging for very large MapReduce pipelines executed by
   Crunch and improvements to the new Kafka Streams-based pipeline executor.

## Health report:
   Work proceeds apace to improve what Crunch does well (executing large and
   complex MapReduce pipelines), but as MapReduce gradually declines in
   use and is replaced by Apache Spark as the execution engine of choice, we
   expect that patches will come in more slowly and be primarily focused on
   fixing bugs as opposed to adding new functionality.

## PMC changes:

 - Currently 12 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Micah Whitacre on Wed Apr 02 2014

## Committer base changes:

 - Currently 14 committers.
 - No new committers added in the last 3 months
 - Last committer addition was David Whiting at Mon Nov 30 2015

## Releases:

 - 0.14.0 was released on Fri May 06 2016

## JIRA activity:

 - 10 JIRA tickets created in the last 3 months
 - 12 JIRA tickets closed/resolved in the last 3 months

20 Jul 2016 [Josh Wills / Greg]

## Description:
   Apache Crunch is a Java library for writing, testing, and running
   MapReduce and Apache Spark pipelines on Apache Hadoop.

## Issues:
   There are no issues requiring board attention at this time.

## Activity:
   There has been the normal bug fixing work after our most recent release
   in May, along with some new work to explore using Apache Kafka as a data
   source in Crunch pipelines that should make a good foundation for our
   next major release.

## Health report:
   Things are generally good: the core library does what it was designed to
   reasonably well, bugs are reported and addressed in a timely manner, and
   we have some new and interesting development avenues to explore in leveraging
   Crunch as a way of doing simplified stream processing without requiring the
   deployment of more heavyweight frameworks like Apache Storm or Apache Spark's
   streaming engine.

## PMC changes:

 - Currently 12 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Micah Whitacre on Wed Apr 02 2014

## Committer base changes:

 - Currently 14 committers.
 - No new committers added in the last 3 months
 - Last committer addition was David Whiting at Mon Nov 30 2015

## Releases:

 - 0.14.0 was released on Fri May 06 2016

## JIRA activity:

 - 9 JIRA tickets created in the last 3 months
 - 4 JIRA tickets closed/resolved in the last 3 months

20 Apr 2016

Change the Apache Crunch Project Chair

 WHEREAS, the Board of Directors heretofore appointed Micah Whitacre
 (mkwhit) to the office of Vice President, Apache Crunch, and

 WHEREAS, the Board of Directors is in receipt of the resignation of
 Micah Whitacre from the office of Vice President, Apache Crunch, and

 WHEREAS, the Project Management Committee of the Apache Crunch project
 has chosen by vote to recommend Josh Wills (jwills) as
 the successor to the post;

 NOW, THEREFORE, BE IT RESOLVED, that Micah Whitacre is relieved and
 discharged from the duties and responsibilities of the office of Vice
 President, Apache Crunch, and

 BE IT FURTHER RESOLVED, that Josh Wills be and hereby is appointed to
 the office of Vice President, Apache Crunch, to serve in accordance
 with and subject to the direction of the Board of Directors and the
 Bylaws of the Foundation until death, resignation, retirement, removal
 or disqualification, or until a successor is appointed.

 Special Order 7I, Change the Apache Crunch Project Chair, was
 approved by Unanimous Vote of the directors present.

20 Apr 2016 [Micah Whitacre / Greg]

Apache Crunch is a Java library for writing, testing, and running
MapReduce and Spark pipelines on Apache Hadoop.

Project Status
--------------

The project has been moving at a slightly slower pace to the previous
quarter, with13 new JIRAs being created since the previous board report
with 11 (10 new + 1 old) issues being resolved in that time. The majority of
the work on the project continues to focus on maintenance such as bug
fixes but improvements to the Java 8 Lambda support and HBase efficiencies.

There are no board-level issues at this time.

Community
---------

Community activity slowed but remained steady.
The user mailing list activity has reduced to a question (1 every 3 days).
Over the last reporting period the activity on the developer mailing list
has remained steady (2 per day).

David Whiting was added as a committer on Dec 2nd, 2015.
Josh Wills was re-elected to the PMC Chair on April 10, 2016 and resolution
sent to the board.

Releases
--------

* Apache Crunch 0.13.0 was released August 5, 2015

20 Jan 2016 [Micah Whitacre / David]

Apache Crunch is a Java library for writing, testing, and running
MapReduce and Spark pipelines on Apache Hadoop.

Project Status
--------------

The project has been moving at a slightly slower pace to the previous
quarter, with 19 new JIRAs being created since the previous board report
with 18 (11 new + 7 old) issues being resolved in that time. The majority of
the work on the project continues to focus on maintenance such as bug
fixes but also improvements to Scala and Spark.
A significant design and implementation effort has been underway to design
Java 8 Lambda API support.

There are no board-level issues at this time.

Community
---------

Community activity continues to be similar with the previous reporting
period.
The user mailing list has maintained a steady rate of questions and answer
(1 per day).
Over the last reporting period the activity on the developer mailing list
has increased (2 per day).

Micah Whitacre was added to the PMC on April 3rd, 2014.
David Whiting was added as a committer on Dec 2nd, 2015.

Releases
--------

* Apache Crunch 0.13.0 was released August 5, 2015

21 Oct 2015 [Micah Whitacre / David]

Apache Crunch is a Java library for writing, testing, and running
MapReduce and Spark pipelines on Apache Hadoop.

Project Status
--------------

The project has been moving at a similar pace to the previous quarter, with
34 new JIRA issues logged since the previous board report, with 24 (23 new +
1 old) issues being resolved in that time. The majority of the work on the
project continues to focus on maintenance such as bug fixes with a heavier
focus on rounding out HBase support, building out better Spark support, and
support for Java 8 Lambdas.

The project also successfully release version 0.13.0 this quarter which
featured 27 issues.  The release focussed on several bug fixes but the major
effort was to upgrade to HBase 1.0 and remove support for Hadoop 1.0.

There are no board-level issues at this time.

Community
---------

Community activity continues to be similar with the previous reporting
period.  The user mailing list has maintained a steady rate of questions and
answer (1 per day).  Over the last reporting period the activity on the
developer mailing list has increased (2 per day).

Micah Whitacre was added to the PMC on April 3rd, 2014.
Micah Whitacre was added as a committer on July 11th, 2013.

Releases
--------

* Apache Crunch 0.13.0 was released August 5, 2015

15 Jul 2015 [Micah Whitacre / Jim]

Apache Crunch is a Java library for writing, testing, and running MapReduce
and Spark pipelines on Apache Hadoop.

Project Status
--------------

The project has been moving at a similar pace to the previous quarter, with 33
new JIRA issues logged since the previous board report, with 24 (22 new + 2
old) issues being resolved in that time. The majority of the work on the
project continues to focus on maintenance such as bug fixes but also
improvements to upgrade
the technology stack such as HBase and Java.

The project also successfully release version 0.12.0 this quarter which
featured 35 issues.  The release focussed on several bug fixes but also
improvements to the projects Scala and Spark support.

There are no board-level issues at this time.

Community
---------

Community activity continues to be similar with the previous reporting period.
The user mailing list has maintained a steady rate of questions and answer (1
per day).
Over the last reporting period the activity on the developer mailing list has
increased (2 per day).

Micah Whitacre was added to the PMC on April 3rd, 2014. Micah Whitacre was
added as a committer on July 11th, 2013.

Releases
--------

* Apache Crunch 0.12.0 was released May 8, 2015

22 Apr 2015

Change the Apache Crunch Project Chair

 WHEREAS, the Board of Directors heretofore appointed Gabriel Reid
 to the office of Vice President, Apache Crunch, and

 WHEREAS, the Board of Directors is in receipt of the resignation
 of Gabriel Reid from the office of Vice President, Apache Crunch,
 and

 WHEREAS, the Project Management Committee of the Apache Crunch
 project has chosen by vote to recommend Micah Whitacre as the
 successor to the post;

 NOW, THEREFORE, BE IT RESOLVED, that Gabriel Reid is relieved and
 discharged from the duties and responsibilities of the office
 of Vice President, Apache Crunch, and

 BE IT FURTHER RESOLVED, that Micah Whitacre be and hereby is
 appointed to the office of Vice President, Apache Crunch, to
 serve in accordance with and subject to the direction of the
 Board of Directors and the Bylaws of the Foundation until
 death, resignation, retirement, removal or disqualification, or
 until a successor is appointed.

 Special Order 7C, Change the Apache Crunch Project Chair, was
 approved by Unanimous Vote of the directors present.

22 Apr 2015 [Gabriel Reid / Brett]

Apache Crunch is a Java library for writing, testing, and running
MapReduce and Spark pipelines on Apache Hadoop.

Project Status
--------------

The project has been moving at a similar pace to the previous quarter, with
20 new JIRA issues logged since the previous board report, with 15 of them
being closed in that time. The majority of the work on the project continues
to focus on maintenance.

The addition of a new committer was successfully voted on, but the person in
question decided to make a major career change just at the same moment, which
was an unfortunate loss for the Crunch community.

This report also marks the resignation (due to the term of one year being up)
of Gabriel Reid, the current PMC chair. There has been a successful vote to
recommend Micah Whitacre as the new PMC chair.

There are no board-level issues at this time.

Community
---------

Community activity continues to be similar with the previous reporting period.
The user mailing list was more active, with an average of more than one
message per day (nearly double of the previous reporting period), while
the dev mailing list activity has dropped slightly in comparison with the
previous reporting period.

Micah Whitacre was added to the PMC on April 3rd, 2014.
Micah Whitacre was added as a committer on July 11th, 2013.

Releases
--------

There were no releases made in this quarter. The last releases were:
* Apache Crunch 0.11.0, released Sept 10, 2014
* Apache Crunch 0.8.4, released Sept 13, 2014

21 Jan 2015 [Gabriel Reid / Sam]

Apache Crunch is a Java library for writing, testing, and running
MapReduce and Spark pipelines on Apache Hadoop.

Project Status
--------------

The project has been moving at a slightly slower pace than in past quarters.
Since the last board report there have been 15 new issues logged in Jira, with
8 of them being closed in that time. Similar to the previous couple of board
reports, the majority of recent work has been focused on minor improvements
and bug fixes.

A particularly interesting recent jira ticket was the donation of a number of
Crunch utilities from Spotify (CRUNCH-484). Spotify also posted an interesting
blog post about how they currently use Crunch for analytics pipelines [1].

There are no board-level issues at this time.

Community
---------

Community activity has been similar, although slightly lower, in comparison
with recent quarters, with an average of several mails on the developer list
per day and an average of a message every two or three days on the user list.

Micah Whitacre was added to the PMC on April 3rd, 2014.
Micah Whitacre was added as a committer on July 11th, 2013.

Releases
--------

There were no releases made in this quarter.


1. https://labs.spotify.com/2014/11/27/crunch/

15 Oct 2014 [Gabriel Reid / Chris]

Apache Crunch is a Java library for writing, testing, and running MapReduce
and Spark pipelines on Apache Hadoop.

Project Status
--------------

The project has been moving along at a steady pace in the past quarter, at a
similar velocity to the previous two quarters, although in the final weeks
before this report things have quieted down quite a bit. Since the last board
report there have been 32 new issues logged in Jira, with 27 of them being
closed in that time.

Similar to the previous board report, the majority of recent work has been
focused on minor improvements and bug fixes.

There was a minor hiccup in releasing version 0.11.0, when the released maven
artifacts were correctly pushed to Nexus, but not synced to Maven Central. This
was noticed by a user (reported on the user mailing list), and turned out to
be an infrastructure issue resolved in INFRA-8333.

There are no board-level issues at this time.

Community
---------

Community activity has continued to be in line with recent quarters, with an
average of several mails on the developer list per day and an average of a
message every day or two on the user list.

Micah Whitacre was added to the PMC on April 3rd, 2014.
Micah Whitacre was added as a committer on July 11th, 2013.

Releases
--------

There were two releases made in this quarter:
* 0.11.0 was released on September 2nd, 2014
* 0.8.4 was released on September 13th, 2014

16 Jul 2014 [Gabriel Reid / Ross]

Apache Crunch is a Java library for writing, testing, and running
MapReduce and Spark pipelines on Apache Hadoop.

Project Status
--------------

The project has been moving along at a steady pace in the past quarter, at a
similar velocity to the previous two quarters. Since the last board report
there have been 53 new issues logged in Jira, with 44 of them being closed
in that time. The majority of recent work has been focused on minor
improvements and bug fixes, and there have also been quite a few tickets
related to improvements in Scrunch (the Scala API for Crunch).

There are no board-level issues at this time.

Community
---------

Community activity has continued to be in line with recent quarters, with an
average of several mails on the developer list per day and an average of a
message every day or two on the user list, and first-time contributions from
new contributors every few weeks.

Micah Whitacre was added to the PMC on April 3rd, 2014.
Micah Whitacre was added as a committer on July 11th, 2013.

Releases
--------

The last two releases (0.10.0 and 0.8.3) were both made on June 9th 2014.

16 Apr 2014

Change the Apache Crunch Project Chair

 WHEREAS, the Board of Directors heretofore appointed Josh Wills
 to the office of Vice President, Apache Crunch, and

 WHEREAS, the Board of Directors is in receipt of the resignation
 of Josh Wills from the office of Vice President, Apache Crunch,
 and

 WHEREAS, the Project Management Committee of the Apache Crunch
 project has chosen by vote to recommend Gabriel Reid as the successor
 to the post;

 NOW, THEREFORE, BE IT RESOLVED, that Josh Wills is relieved and
 discharged from the duties and responsibilities of the office
 of Vice President, Apache Crunch, and

 BE IT FURTHER RESOLVED, that Gabriel Reid be and hereby is
 appointed to the office of Vice President, Apache Crunch, to
 serve in accordance with and subject to the direction of the
 Board of Directors and the Bylaws of the Foundation until
 death, resignation, retirement, removal or disqualification, or
 until a successor is appointed.

 Special Order 7B, Change the Apache Crunch Project Chair, was
 approved by Unanimous Vote of the directors present.

16 Apr 2014 [Josh Wills / Roy]

Apache Crunch is a Java library for writing, testing, and running
MapReduce pipelines on Apache Hadoop.

General:
The Crunch community has been steadily fixing issues for the last
quarter, with 48 issues filed and fixed since the last release in December
2013, which means it's just about time for a new release. Since our
last report we dramatically improved the quality and depth of the user
guide [1] and getting started information [2] for the project on our website,
and we have a proposal for the board to approve a new PMC chair for the
project. We have also added one new PMC member since our last report.

[1] http://crunch.apache.org/user-guide.html
[2] http://crunch.apache.org/getting-started.html

Releases:
Last releases were 0.9.0 and 0.8.2, both made on December 17th, 2013.

Community:
Micah Whitacre was added to the PMC on April 3rd, 2014.
Micah Whitacre was added as a committer on July 11th, 2013.

15 Jan 2014 [Josh Wills / Greg]

Apache Crunch is a Java library for writing, testing, and running
MapReduce pipelines on Apache Hadoop.

General:
The Crunch community had a large number of releases in the last
quarter, primarily focused on updating the libraries to work
against the major releases of Apache Hadoop (2.2.0) and Apache HBase
(0.96) that came out in the past quarter. 42 issues were created and
40 issues were resolved over this period, including bug fixes,
new features, and support for a new Hadoop-based execution engine
that is currently in the incubator, Apache Spark (incubating).

Releases:
The 0.9.0 release was made on December 17th, 2013.
The 0.8.2 release was made on December 17th, 2013.
The 0.8.1 release was made on November 20th, 2013.
The 0.8.0 release was made on November 8th, 2013.

Community:
Chao Shi was added to the PMC on August 20th, 2013.
Micah Whitacre was added as a committer on July 11th, 2013.

16 Oct 2013 [Josh Wills / Shane]

Apache Crunch is a Java library for writing, testing, and running
MapReduce pipelines on Apache Hadoop.

General:
The Crunch community continues to develop features and fix bugs
at a healthy pace: there are currently 28 JIRA issues that have been
resolved since the 0.7.0 release, with input from 10 different contributors.
Given our current rate of issue resolution, I believe that the community
will vote to create a new release within the next few weeks.

Releases:
The 0.7.0 release was made on July 25, 2013.
The 0.6.0 release was made on May 13th, 2013.

Community:
Chao Shi was added to the PMC on August 20th, 2013, our first new
PMC member since leaving the Incubator.
Micah Whitacre was added as a committer on July 11th, 2013.

17 Jul 2013 [Josh Wills / Greg]

Apache Crunch is a Java library for writing, testing, and running
MapReduce pipelines on Apache Hadoop.

Issues:
There are no issues requiring board attention at this time.

Releases:
We made one release last quarter, version 0.6.0 on May 13th, 2013.
Work is currently underway on version 0.7.0.

Community:
The Crunch PMC voted to add Micah Whitacre as a committer on the
project and he has accepted. There have been no changes to the PMC
since becoming a TLP in February 2013.

Activity on the development list has been at a steady cadence for the last
quarter, with new issues and patches being submitted by a diverse set of
new and veteran contributors almost every day.

Eli Collins gave a talk about using the Crunch libraries with Apache Avro
to build applications on top of Apache Hadoop at QCon in June 2013. [1]

[1] http://s.apache.org/Kr4 (PDF)

15 May 2013 [Josh Wills / Jim]

Apache Crunch is a Java library for writing, testing, and running
MapReduce pipelines on Apache Hadoop.

Issues:
There are no issues requiring board attention at this time.

Releases:
We are currently holding a vote for our 0.6.0 release, our first
release since leaving the incubator at the end of February. We have
received three +1 votes for the current release candidate from PMC members
and expect the vote to pass when voting closes in a couple of days.

Community & Development:
No new PMC members or committers have been added since our report last
month, when we added two new committers.

We had a tough month on the dev list, primarily due to a strange and
somewhat random Java compiler error that caused Crunch builds to
fail consistently in some environments but not others, which was
frustrating to debug and caused lots of Jenkins failures. We believe
that we have resolved these issues with the latest release candidate
and are looking forward to our next release and getting back to working
on new features and bug fixes for our next release.

17 Apr 2013 [Josh Wills / Ross]

Apache Crunch is a Java library for writing, testing, and running
MapReduce pipelines on Apache Hadoop.

ISSUES
* There are no issues to raise with the board.

COMMUNITY
* We added two new committers to the Crunch project since our last board
 report. These are our first new committers since graduation.
* Activity on the dev mailing list is stable and healthy. The user mailing
 list saw some questions from new users and roughly the same number of
 threads, but the overall volume of messages fell by about half.
* We completed an overhaul of the Crunch website and added an About page.
* Two Crunch PMC members are working on third-party projects that build on
 Crunch. Cloudera ML [1] integrates Apache Hive, Apache Mahout, and Crunch
 to perform data preparation and model evaluation tasks on Apache Hadoop.
 Also, work began to integrate Crunch with ElasticSearch's Hadoop
 libraries. [2]

RELEASES
* No new releases since our February 2013 release just before graduation.
 We expect to perform our first TLP release within the next few weeks.

[1] http://github.com/cloudera/ml
[2] http://github.com/tzolov/elasticsearch-hadoop#crunch

20 Mar 2013 [Josh Wills / Brett]

Apache Crunch is a Java library for writing, testing, and running
MapReduce pipelines on Apache Hadoop.

MILESTONES
We completed our most recent release (0.5.0-incubating) on 2/19/13,
just before we left the Incubator.

ACTIVITY
* Most of the Incubator transfer procedures have been completed.
 We still have our old release directory at the Incubator, but
 we plan to update that upon our next release.
* The PMC is currently voting on a set of bylaws for the project
 modeled after the bylaws of the Apache Zookeeper project with
 some small tweaks based on the bylaws of the Apache Pig project.
* Nine JIRAs have been resolved since our most recent release,
 primarily small bug fixes. One major feature was adding the
 ability to start and monitor a MapReduce pipeline
 asynchronously, which was contributed by a new developer on
 the project.

COMMUNITY
* 50 subscribers to the dev mailing list, 62 subscribers to the
 user mailing list.
* There have been no changes to the PMC or committer composition
 since our recent graduation, although the PMC is currently
 holding a vote on adding two new committers.
* The user mailing list has seen small but steady traffic in the
 form of questions and requests from Crunch users.

INFRASTRUCTURE
* The major TLP creation tasks are completed, but we have one
 outstanding issue from the move from the Incubator: our
 github mirror hasn't been updated to reflect the new repo
 name. This is tracked in INFRA-5933.
* The website has been updated to reflect the project's new
 status as a TLP.

LEGAL
No known issues.

BRANDING
No known issues.

20 Feb 2013

Establish the Apache Crunch Project

 WHEREAS, the Board of Directors deems it to be in the best
 interests of the Foundation and consistent with the Foundation's
 purpose to establish a Project Management Committee charged with
 the creation and maintenance of open-source software, for
 distribution at no charge to the public, related to the
 development of Java libraries for writing, testing, and running
 MapReduce pipelines.

 NOW, THEREFORE, BE IT RESOLVED, that a Project Management
 Committee (PMC), to be known as the "Apache Crunch Project", be
 and hereby is established pursuant to Bylaws of the Foundation;
 and be it further

 RESOLVED, that the Apache Crunch Project be and hereby is
 responsible for the creation and maintenance of software related
 to development of Java libraries for writing, testing, and running
 MapReduce pipelines; and be it further

 RESOLVED, that the office of "Vice President, Apache Crunch" be
 and hereby is created, the person holding such office to serve at
 the direction of the Board of Directors as the chair of the Apache
 Crunch Project, and to have primary responsibility for management
 of the projects within the scope of responsibility of the Apache
 Crunch Project; and be it further

 RESOLVED, that the persons listed immediately below be and hereby
 are appointed to serve as the initial members of the Apache Crunch
 Project:

   * Brock Noland <brock@apache.org>
   * Christian Tzolov <tzolov@apache.org>
   * Gabriel Reid <greid@apache.org>
   * Josh Wills <jwills@apache.org>
   * Kiyan Ahmadizadeh <kiyan@apache.org>
   * Matthias Friedrich <mafr@apache.org>
   * Rahul Sharma <rsharma@apache.org>
   * Robert Chu <robertchu@apache.org>
   * Tom White <tomwhite@apache.org>
   * Vinod Kumar Vavilapalli <vinodkv@apache.org>

 NOW, THEREFORE, BE IT FURTHER RESOLVED, that Josh Wills be
 appointed to the office of Vice President, Apache Crunch, to serve
 in accordance with and subject to the direction of the Board of
 Directors and the Bylaws of the Foundation until death,
 resignation, retirement, removal or disqualification, or until a
 successor is appointed; and be it further

 RESOLVED, that the initial Apache Crunch PMC be and hereby is
 tasked with the creation of a set of bylaws intended to encourage
 open development and increased participation in the Apache Crunch
 Project; and be it further

 RESOLVED, that the Apache Crunch Project be and hereby is tasked
 with the migration and rationalization of the Apache Incubator
 Crunch podling; and be it further

 RESOLVED, that all responsibilities pertaining to the Apache
 Incubator Crunch podling encumbered upon the Apache Incubator
 Project are hereafter discharged.

 Special Order 7B, Establish the Apache Crunch Project, was
 approved by Unanimous Vote of the directors present.

20 Feb 2013

Crunch is a Java library for writing, testing, and running pipelines of MapReduce
jobs on Apache Hadoop.

Crunch has been incubating since 2012-05-26.

Three most important issues to address in the move towards graduation:

  * None


Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of?

  * Nothing that currently requires IPMC attention.


How has the community developed since the last report?

The Apache Crunch development team has released version 0.4.0-incubating
in November, the second release at the Apache Incubator. We have worked
with the Apache BigTop project and our release is now part of Apache
BigTop 0.5.0. For our next release, we have discussed and agreed on some
large-scale API cleanup and implemented the necessary changes.

We have performed the podling name search - the name Apache Crunch has
been approved by the trademarks team. This has been our last blocker for
graduation, we have already started a vote on a graduation resolution
within the community and expect to start the vote on incubator-general in
February.

Development activity around Christmas has been a bit lower than usual but
is now picking up again. There has been a significant increase in traffic
on crunch-user; it is great to see that more and more users show up, file
bug reports and contribute patches or test cases.


How has the project developed since the last report?

 - 53 issues were created on the Crunch JIRA in November
   to January, 39 issues have been resolved
 - crunch-dev has seen 615 emails in the reporting period,
   while 126 emails were posted to crunch-user


Signed-off-by:
Arun Murthy: [ ](crunch)
Patrick Hunt: [X](crunch)
Tom White: [ ](crunch)


Shepherd notes:

21 Nov 2012

Crunch is a Java library for writing, testing, and running pipelines
of MapReduce jobs on Apache Hadoop.

Crunch entered incubation on 2012-05-29.

The most important steps towards graduation:

 - Create another release or two
 - Perform the name search

Nothing that currently requires IPMC attention.

Community:

 The Crunch community has been very active and continues to
 grow. Two new committers have been voted in and one existing
 committer has joined the PPMC. As a result, Crunch now has 10
 committers from 7 different organizations.

 We have created our first release in September and have published
 a website using the Apache CMS a few days later. Our second
 release will follow in November.

Development:

 - 76 issues were created on the Crunch JIRA in August to October,
   70 of those were resolved.
 - crunch-dev has been active: 922 emails in the reporting period
 - Apache CMS and ReviewBoard for Crunch are up and running
 - All ICLAs are in place, including those for the new committers

Signed-off-by: tomwhite, jukka

15 Aug 2012

Crunch is a Java library for writing, testing, and running pipelines of
MapReduce jobs on Apache Hadoop.

Crunch entered incubation on 2012-05-29.

The most important steps towards graduation:

 - Infrastructure setup (CMS for the Crunch website)
 - Add new committers
 - Create a release

Nothing that currently requires IPMC attention.

Community:

 The Crunch developer community continue to grow. The project
 received code submissions from six new developers representing
 five distinct organizations in the month of July. One of the new
 developers made such substantial contributions to the design
 and testability of the Crunch code base that the PPMC voted
 to add him as a committer, increasing the number of distinct
 organizations on the committer list from four to five. We look
 forward to adding new committers from our pool contributors, and
 also added documentation to the wiki to explain to new contributors
 how to get started with the project.

Development:

 - 29 issues were created on the Crunch JIRA in the month of July,
   23 of those were resolved.
 - All ICLAs are in place, including the one for the committer the project
   just added.
 - Cloudera submitted the software grant documents to the Apache
   Secretary on 2012-07-11, and the Secretary registered the grant
   the same day.
 - crunch-dev has been active: 308 emails on the list in July.

Signed-off-by: jukka

25 Jul 2012

Crunch is a Java library for writing, testing, and running pipelines of
MapReduce jobs on Apache Hadoop.

Crunch entered incubation on May 27, 2012.

The most important steps towards graduation:

 - Infrastructure setup (JIRA, Confluence, etc.)
 - CCLA licensing of the existing Crunch code
 - Adding new contributors
 - Creating a release

Nothing that currently requires IPMC attention.

Community:

 The developer mailing list has been very active with bug fixes, new
 features, and discussions of infrastructure setup and project policies,
 both from the existing committers and other developers with an interest in
 the project. The first patch from a non-committer is currently being
 prepared for submission: the code is written, but we were blocking on
 getting JIRA setup so that the copyright on the code could cleanly be
 assigned to the ASF. The JIRA issues were resolved earlier this week.

 All ICLAs are in place. Cloudera has gathered all of the copyright
 assignments for the existing Crunch code from non-Cloudera developers
 and is preparing the CCLA to assign the copyrights on the existing Crunch
 code to the ASF.

Development:

 The 15 commits on the project this month were primarily for documentation
 and bug fixes, although we are evaluating two larger patches that bring
 additional functionality to the library: 1) adding map-side joins and
 2) supporting interactive pipeline creation and execution via the Scala REPL.

Signed off by mentor: phunt, tomwhite

20 Jun 2012

Crunch is a Java library for writing, testing, and running pipelines of
MapReduce jobs on Apache Hadoop.

Crunch entered incubation on May 27, 2012.

Community

 - Mailing lists have been created.
 - New committer accounts are being created, some pending ICLAs.
 - The Incubator status page has been created.

Issues Before Graduation

 - Create Confluence instance.
 - Create JIRA issue tracker (CRUNCH)
 - Migrate code to Apache Git repository from Cloudera's GitHub repository.
 - Create Crunch website.
 - Make an incubating release.
 - Grow the size and diversity of the community.

Licensing and other issues

 Work to obtain CCLA from Cloudera regarding license grant for existing
 Crunch GitHub repository is underway.

Signed off by mentor: phunt