Apache Logo
The Apache Way Contribute ASF Sponsors

This was extracted (@ 2017-11-15 21:10) from a list of minutes which have been approved by the Board.
Please Note The Board typically approves the minutes of the previous meeting at the beginning of every Board meeting; therefore, the list below does not normally contain details from the minutes of the most recent Board meeting.

2017 | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | Pre-organization meetings

Spark

15 Nov 2017 [Matei Alexandru Zaharia / Chris]

Report was filed, but display is awaiting the approval of the Board minutes.

16 Aug 2017 [Matei Zaharia / Rich]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, Python and R as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- We released Spark 2.2.0 on July 11th, with 1100 patches since the last
 version. Some of the major features released included a cost-based
 optimizer for Spark SQL / DataFrames, PyPI publishing, and the first
 production version of the new high-level Structured Streaming API (losing
 the experimental tag because the API has been stabilized). More details
 are available at spark.apache.org/releases/spark-release-2-2-0.html.

- The Spark Summit conference ran in June with around 3000 attendees.

- Work is under way for Spark 2.3.0, with the current target to close the
 new feature window and cut a release branch in November 2017.

Trademarks:

- We are continuing engagement with various organizations.

Latest releases:

- July 11, 2017: Spark 2.2.0
- May 02, 2017: Spark 2.1.1
- Dec 28, 2016: Spark 2.1.0
- Nov 14, 2016: Spark 2.0.2
- Nov 07, 2016: Spark 1.6.3

Committers and PMC:

- The last committers were added on July 27th, 2017
 (Hyukjin Kwon and Sameer Agarwal).
- The last PMC members were added on June 16th, 2017
 (six new PMC members from the existing committers).

17 May 2017 [Matei Zaharia / Ted]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, Python and R as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- The community released Apache Spark 2.1.1 on May 2nd with bug fixes for
 the 2.1 branch, and is currently voting on release candidates for 2.2.0.
 This will be a major release with various new features in streaming, SQL,
 machine learning and other areas of the project.

- We have been making significant progress to publish Apache Spark in the
 standard Python and R package repositories (PyPI and CRAN) to make it
 easier to install for Python and R users.

- We documented the "Spark improvement proposal" process described
 earlier for proposing large new features on our website. It just defines
 a short format for writing a proposal and a JIRA tag to place on such
 documents so that they can all be viewed in one place.

- The Spark Summit East conference ran Feb 7th to 9th in Boston.

Trademarks:

- We are continuing engagement with various organizations.

Latest releases:

- May 02, 2017: Spark 2.1.1
- Dec 28, 2016: Spark 2.1.0
- Nov 14, 2016: Spark 2.0.2
- Nov 07, 2016: Spark 1.6.3
- Oct 03, 2016: Spark 2.0.1
- July 26, 2016: Spark 2.0.0

Committers and PMC:

- The last committer was added on Feb 10th, 2017
(Takuya Ueshin).
- The last PMC members were added on Feb 15th, 2016
(Joseph Bradley, Sean Owen and Yin Huai).

27 Feb 2017 [Matei Zaharia / Isabel]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, Python and R as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- The community released Apache Spark 2.1.0 on Dec 28 with a variety of
 new features for the 2.x branch, most notably improvements to streaming
 (http://spark.apache.org/releases/spark-release-2-1-0.html). We also
 released Spark 2.0.2 on Nov 14 with bug fixes for the 2.0.x branch.

- The Spark Summit East conference is running Feb 7th to 9th in Boston.

- We've continued discussions on a "Spark Improvement Proposal" format
 for documenting large proposed additions over the dev list and are
 converging towards a final version that we want to post on our website.

Trademarks:

- We are continuing engagement with various organizations.

Latest releases:

- Dec 28, 2016: Spark 2.1.0
- Nov 14, 2016: Spark 2.0.2
- Nov 07, 2016: Spark 1.6.3
- Oct 03, 2016: Spark 2.0.1
- July 26, 2016: Spark 2.0.0

Committers and PMC:

- The last committers were added on Jan 24th, 2017
 (Holden Karau and Burak Yavuz).
- The last PMC members were added on Feb 15th, 2016
 (Joseph Bradley, Sean Owen and Yin Huai).

@Shane: follow up on brand action item

16 Nov 2016 [Matei Zaharia / Isabel]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, Python and R as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- The community released Apache Spark 2.0.1 on October 3rd, 2016 as the
 first patch release for the 2.x branch. We also released Spark 1.6.3 on
 November 7th to continue patching the 1.x branch, and started voting on
 release candidates for Spark 2.0.2 with more patches to 2.x.

- The Spark Summit Europe conference ran in Brussels on Oct 25-27 with
 around 1000 attendees, including presentations on new use cases at
 Microsoft and Facebook.

- There've been several discussions on the dev list about making the
 development process easier to follow and giving feedback to contributors
 faster. One concrete thing we'd like to implement is a process to post
 "improvement proposals" scoping a new feature before detailed design
 begins, so that developers can solicit feedback from users earlier, and
 users can easily see the project's high-level roadmap in one place. The
 most recent writeup on this is at https://s.apache.org/ndAX and seems to
 be welcomed by contributors who've used a similar process in other ASF
 projects. Other things that contributors are working on are creating a
 template for design documents and cleaning up JIRA.

Trademarks:

- We are continuing engagement with various organizations.

Latest releases:

Nov 07, 2016: Spark 1.6.3
Oct 03, 2016: Spark 2.0.1
July 26, 2016: Spark 2.0.0
June 25, 2016: Spark 1.6.2
May 26, 2016: Spark 2.0.0-preview

Committers and PMC:

The last committer was added on Sept 29th, 2016 (Xiao Li).

The last PMC members were added on Feb 15th, 2016
(Joseph Bradley, Sean Owen and Yin Huai).

@Shane: Follow up with PMC and legal regarding potential trademark issues with a vendor

17 Aug 2016 [Matei Zaharia / Chris]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, Python and R as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- The community released Apache Spark 2.0 on July 26, 2016. This was a big
 release after nearly 6 months of effort that puts in a strong foundation
 for the 2.x line and multiple new components while remaining highly
 compatible with 1.x. Full release notes are available at
 http://spark.apache.org/releases/spark-release-2-0-0.html.

Trademarks:

- We posted a trademarks summary page on our website after discussions
 with trademarks@ to let users easily find out about the trademark policy:
 https://spark.apache.org/trademarks.html

- We are continuing engagement with the organizations discussed earlier.

Latest releases:

- July 26, 2016: Spark 2.0.0
- June 25, 2016: Spark 1.6.2
- May 26, 2016: Spark 2.0.0-preview
- Mar 9, 2016: Spark 1.6.1
- Jan 4, 2016: Spark 1.6.0

Committers and PMC:

The last committer was added on August 6th, 2016 (Felix Cheung).

The last PMC members were added Feb 15th, 2016
(Joseph Bradley, Sean Owen and Yin Huai)

20 Jul 2016 [Matei Zaharia / Mark]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, Python and R as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- The community is continuing to make progress towards its 2.0 release,
 with two release candidates having been posted. Apache Spark 2.0 is a major
 release that includes a new SQL-based high-level streaming API, machine
 learning model persistence, and cleanup of Spark's dependencies and internal
 APIs. The full list of changes in Apache Spark 2.0 is available at
 http://s.apache.org/spark-2.0-features.

- We released Spark 1.6.2 on June 26th, with bug fixes for the 1.6
 branch of the project (https://s.apache.org/spark-1.6.2).

Trademarks:

- The PMC is engaging with several third parties that are using Spark
 in product names, branding, etc.

- The PMC has been working on a page about trademark guidelines to include
 on the Spark website (https://s.apache.org/PaXo). It would be great to get
 feedback on this (several board members said it was a good idea to create
 such a page after we suggested it in our last report).

- To make the project's association with the ASF clearer in news articles
 and corporate materials, we have updated its logo to include "Apache":
 https://s.apache.org/Jf7J. This change is live on the website, JIRA, etc.

Latest releases:

June 25, 2016: Spark 1.6.2
May 26, 2016: Spark 2.0.0-preview
Mar 9, 2016: Spark 1.6.1
Jan 4, 2016: Spark 1.6.0
Nov 09, 2015: Spark 1.5.2

Committers and PMC:

The last committer was added on May 23, 2016 (Yanbo Liang).

The last PMC members were added Feb 15, 2016
(Joseph Bradley, Sean Owen and Yin Huai)

Working with IBM to resolve the trademark issues is critical.

Shane: We need to think through the question of whether a simple "foo.x" is ever ok where foo is an Apache project name and x is any top level domain.

15 Jun 2016 [Matei Zaharia / Shane]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, Python and R as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- The community is in the QA phase for Spark 2.0, our second major
 version since joining Apache. There are a large number of additions in
 2.0, including a higher-level streaming API, improved runtime code
 generation for SQL, and improved export for machine learning models.
 We are also using this release to clean up some experimental APIs,
 remove some dependencies, add support for Scala 2.12. The full list
 of changes is available at http://s.apache.org/spark-2.0-features.
 We also released a 2.0.0-preview package to let users broadly
 participate in testing the new APIs.

- We released Spark 1.6.1 in March, with bug fixes for the 1.6 branch.

- For Apache Spark 2.0, the community decided to move some of the
 less used data source connectors for Spark Streaming to a separate
 project, Apache Bahir (http://bahir.apache.org). We proposed a new
 project in order to maintain ASF governance of these components.

- The project removed the role of "maintainers" for reviewing changes to
 specific components (originally added 1.5 years ago) in response to
 concerns from some ASF members that it makes the project appear less
 welcoming, as well as the conclusion that it did not have a noticeable
 impact in practice (https://s.apache.org/DUTB, https://s.apache.org/AgCt).

Trademarks:

In the past few weeks, there have been several discussions asking for more
attention to trademark use from the PMC. Some of the main issues were:
- A vendor offering a "technical preview" package of Apache Spark 2.0
 before there was any official PMC release.
- A vendor claiming to offer "early access" to the project's roadmap.
- Various corporate and open source products whose name includes "Spark".
- Corporate pages were the most prominent mention says "Spark"
 instead of "Apache Spark".

The PMC is addressing these issues in several ways:

- Reaching out to the organizations involved.

- To make the project's association with the ASF clearer in news articles
 and corporate materials, we are working to update the logo to include
 "Apache": https://s.apache.org/Jf7J. We also added a FAQ entry about
 using the logo that links to the ASF trademarks page.

- Continuing to review news articles, product announcements, etc.

- Starting with this board report, we will have a section on
 trademarks in our reports to track brand activity.

- Question for the board: Would it be helpful to put a summary of the
 trademark policy on spark.apache.org? It would be nice to have this
 more visible (e.g. in the site's navigation menu), but either way is
 fine. We can draft a version and sent it to trademarks@.

Events:

- The Spark Summit community conference in San Francisco ran June 6-8.
 There were close to 100 talks from at least 50 organizations.

Latest releases:

- May 26, 2016: Spark 2.0.0-preview
- Mar 9, 2016: Spark 1.6.1
- Jan 4, 2016: Spark 1.6.0
- Nov 09, 2015: Spark 1.5.2
- Oct 02, 2015: Spark 1.5.1

Committers and PMC:

- The last committer was added on May 23, 2016 (Yanbo Liang).

- The last PMC members were added Feb 15, 2016
 (Joseph Bradley, Sean Owen and Yin Huai)

18 May 2016 [Matei Zaharia / Chris]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, Python and R as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- The community is entering the QA phase for Spark 2.0, our second major
 version since joining Apache. There are a large number of additions in
 2.0, including a higher-level streaming API, improved runtime code
 generation for SQL, and improved export for machine learning models.
 We are also using this release to clean up some experimental APIs,
 remove some dependencies, add support for Scala 2.12. The full list
 of changes is available at http://s.apache.org/spark-2.0-features.

- We released Spark 1.6.1 in March, with bug fixes for the 1.6 branch.
 In general, we have seen fast adoption of Spark 1.6, with many
 organizations adding support right away.

- For Apache Spark 2.0, the community decided to move some of the lesser
  used data source connectors for Spark Streaming to a separate ASF
  project, which has been proposed as Apache Bahir. We proposed a new
  project in order to maintain ASF governance of these components.

- In the past few weeks, there have been several discussions asking
  for more attention to trademark use from this PMC. Some of the main
  issues were:
  - A vendor offering a "technical preview" package of Apache Spark 2.0
    before there was any official PMC release.
  - A vendor claiming to offer "early access" to the project's roadmap.
  - Multiple vendors offering products were one component is labeled
    "Spark", without this component being an ASF release.
  - Corporate pages were the most prominent mention says "Spark"
    instead of "Apache Spark".
  In response to these issues, we will be reviewing all corporate uses
  of "Spark" on the trademarks list in the coming weeks and working to
  clarify the trademark rules on the project website as well as within
  the PMC and committer community.

Latest releases:

Mar 9, 2016: Spark 1.6.1
Jan 4, 2016: Spark 1.6.0
Nov 09, 2015: Spark 1.5.2
Oct 02, 2015: Spark 1.5.1
Sept 09, 2015: Spark 1.5.0

Committers and PMC:

The last committers were added on Feb 8, 2016
(Wenchen Fan) and Feb 3, 2016 (Herman von Hovell).

The last PMC members were added Feb 15, 2016
(Joseph Bradley, Sean Owen and Yin Huai)

Mailing list stats:

4509 subscribers to user list (up 249 in the last 3 months)
2570 subscribers to dev list (up 173 in the last 3 months)

Report was not approved; a report with more details is requested for next month.

Shane wants Spark to take ownership of the trademark issues; and for individuals on the PMC who work for companies in this space to ensure that their companies are exemplars. Trademarks won't engage until there is some evidence that there is a reasonable attempt made by the PMC.

Jim thanked Matei for attending, and outlined possible future actions the board might take if these concerns are not addressed.

17 Feb 2016 [Matei Zaharia / Brett]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, Python and R as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- We posted our 1.6.0 release in January, with contributions from 248
developers. This release included a new typed API for working with
DataFrames, faster state management in Spark Streaming, support
for persisting and loading ML pipelines, various optimizations, and a
variety of new advanced analytics APIs. Full release notes are at
http://spark.apache.org/releases/spark-release-1-6-0.html.

- We are currently collecting changes for a Spark 1.6.1 maintenance
release, which will likely happen within several weeks.

- The community also agreed to make our next release 2.0, which
will be a chance to fix small dependency and API problems in
addition to releasing new features. Partial list of planned changes:
http://s.apache.org/spark-2.0-features.

Latest releases:

Jan 4, 2016: Spark 1.6.0
Nov 09, 2015: Spark 1.5.2
Oct 02, 2015: Spark 1.5.1
Sept 09, 2015: Spark 1.5.0

Committers and PMC:

The last committers were added on Feb 8, 2016
(Wenchen Fan) and Feb 3, 2016 (Herman von Hovell).

We just voted in three PMC members on Feb 10, 2016
(Joseph Bradley, Sean Owen, Yin Huai).

Mailing list stats:

4249 subscribers to user list (up 286 in the last 3 months)
2380 subscribers to dev list (up 196 in the last 3 months)

18 Nov 2015 [Matei Zaharia / Shane]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, Python and R as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- We posted our 1.5.0 release in June, with contributions from 230
 developers. This release included many new APIs throughout Spark,
 more R support, UI improvements, and the start of a new low-level
 execution layer that acts directly on binary data (Tungsten). It had
 the most contributors of any release so far. Full release notes are at
 http://spark.apache.org/releases/spark-release-1-5-0.html.

- We made a Spark 1.5.1 maintenance release in October and a Spark
 1.5.2 release this week with bug fixes to the 1.5 line.

- The community is currently QAing Spark 1.6.0, which is expected to
 come out in about a month based on the QA process. Some notable
 features include a type-safe API on the Tungsten execution layer
 and better APIs for managing state in Spark Streaming.

Latest releases:

Nov 09, 2015: Spark 1.5.2
Oct 02, 2015: Spark 1.5.1
Sept 09, 2015: Spark 1.5.0
July 15, 2015: Spark 1.4.1

Committers and PMC:

The last committers added were on July 20th, 2015
(Marcelo Vanzin) and June 8th, 2015 (DB Tsai).

The last PMC members were added August 12th, 2014
(Joseph Gonzalez and Andrew Or).

Mailing list stats:

3946 subscribers to user list (up 419 in the last 3 months)
2181 subscribers to dev list (up 211 in the last 3 months)

19 Aug 2015 [Matei Zaharia / Brett]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, Python and R as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- We posted our 1.4.0 release in June, with contributions from 210
 developers. The biggest addition was support for the R programming
 language, along with many improvements in debugging tools, built-in
 libraries, SQL language coverage, and machine learning functions
 (http://spark.apache.org/releases/spark-release-1-4-0.html).

- We posted a Spark 1.4.1 maintenance release in July.

- We've started the QA process for Spark 1.5.0, which should be
 released in around one month. The biggest features here are large
 performance improvements for Spark SQL / DataFrames, as well
 as further enriched support for R (e.g. exposing Spark's machine
 learning libraries in R).

Latest releases:

July 15, 2015: Spark 1.4.1
June 11, 2015: Spark 1.4.0
April 17, 2015: Spark 1.2.2 and 1.3.1
March 13, 2015: Spark 1.3.0

Committers and PMC:

The last committers added were on July 20th, 2015
(Marcelo Vanzin) and June 8th, 2015 (DB Tsai).

The last PMC members were added August 12th, 2014
(Joseph Gonzalez and Andrew Or).

Mailing list stats:

3501 subscribers to user list (up 493 in the last 3 months)
1947 subscribers to dev list (up 255 in the last 3 months)

20 May 2015 [Matei Zaharia / Greg]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, and Python as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- We posted our 1.3.0 release in March, with contributions from 174
developers. Major features included a DataFrame API for working with
structured data, a pluggable data source API, streaming input source
improvements, and many new machine learning algorithms
(http://spark.apache.org/releases/spark-release-1-3-0.html).

- We posted the Spark 1.2.2 and 1.3.1 maintenance releases in April.

- We cut a release branch and started QA for Spark 1.4.0, which should
be released in June. The biggest feature there is R language support,
along with SQL window functions, support for new Hive versions, and
quite a few improvements to debugging and monitoring tools.

Latest releases:

April 17, 2016: Spark 1.2.2 and 1.3.1
March 13, 2015: Spark 1.3.0
February 9, 2015: Spark 1.2.1
December 18, 2014: Spark 1.2.0

Committers and PMC:

We voted to add four new committers on May 2nd, 2015
(Sandy Ryza, Yun Huai, Kousuke Saruta, Davies Liu)

The last PMC members were added August 12th, 2014
(Joseph Gonzalez and Andrew Or).

Mailing list traffic:

2979 subscribers to user list, 7469 emails in past 3 months
1692 subscribers to dev list, 1622 emails in past 3 months

18 Feb 2015 [Matei Zaharia / Bertrand]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, and Python as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- We posted our 1.2.0 release in December, with contributions from 172
developers. Major features included stable APIs for Spark's graph
processing module (GraphX), a high-level pipeline API for machine
learning, an external data source API, better H/A for streaming, and
networking performance optimizations.

- We posted the Spark 1.2.1 maintenance release on February 9th, with
contributions from 69 developers.

- We cut a release branch and started QA for Spark 1.3.0, which should
be released sometime in March. Some features coming there include a
data frame API similar to R and Python, write support for external data
sources, and quite a few new machine learning algorithms.

- We had a discussion about adding a committer role to the project that
is separate from PMC (before, Spark had PMC = C) to bring in people
sooner, and decided to do that from this point on.

Releases:

Our last few releases were:

February 9, 2015: Spark 1.2.1
December 18, 2014: Spark 1.2.0
November 26, 2014: Spark 1.1.1
September 11, 2014: Spark 1.1.0

Committers and PMC:

The last committers were added February 2nd, 2015
(Joseph Bradley, Cheng Lian and Sean Owen)

The last PMC members were added August 12th, 2014
(Joseph Gonzalez and Andrew Or).

19 Nov 2014 [Matei Zaharia / Greg]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, and Python as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

This has been an eventful three months for Spark. Some major happenings are:

- We posted our 1.1.0 release in September, with contributions from 171
 developers (our largest number yet). Major features were performance and
 scalability optimizations, JSON import and schema inference in Spark SQL,
 feature extraction and statistics libraries, and a JDBC server.

- We recently cut a release branch and started QA for Spark 1.2.0, which
 is targeted for release in December.

- Apache Spark won this year's large-scale sort benchmark
 (http://sortbenchmark.org/), sorting 100 TB of data 3x faster than the
 previous record. It tied with a MapReduce-like system optimized for sorting.

- The community voted to implement a maintainer model for reviewing some
 modules, where changes in architecture and API should be reviewed by a
 maintainer before a merge (http://s.apache.org/Dqz). There was concern
 from some external commenters (Greg Stein, Arun Murthy, Vinod Vavilapalli)
 that this reduces the power of each PMC member (requiring a review from a
 specific set of people); we are looking to test how this works and possibly
 tweak the model.

Releases:

Our last few releases were:

September 11, 2014: Spark 1.1.0
August 5, 2014: Spark 1.0.2
July 23, 2014: Spark 0.9.2
July 11, 2014: Spark 1.0.1

Committers and PMC:

The last committers and PMC members were added August 12, 2014
(Joseph Gonzalez and Andrew Or).

20 Aug 2014 [Matei Zaharia / Sam]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, and Python as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

Spark made its 1.0.0 release on May 30th, bringing API stability for the 1.X
line and a variety of new features. The community is now QAing the 1.1.0
branch for release later this month. (We follow a regular 3-month schedule
for releases.) The community held a user conference, Spark Summit, in July,
sponsored by 25 companies. We continue to see growth in the number of users
and contributors, with over 120 people contributing to 1.1.0.

Some of the big features in 1.1 include JSON loading in Spark SQL, a new
statistics library, streaming machine learning algorithms, improvements to
the Python API, and many stability and performance improvements.

Releases:

Our last few releases were:

August 5, 2014: Spark 1.0.2
July 23, 2014: Spark 0.9.2
July 11, 2014: Spark 1.0.1
May 30, 2014: Spark 1.0.0

Committers and PMC:

We closed votes to add two new committers and PMC members on August 7th.
Before that, we added two committers and PMC members in May 2014.

21 May 2014 [Matei Zaharia / Sam]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala and Python as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

The project is closing out the work for its 1.0.0 release, which will be a
major milestone introducing both new functionality and API compatibility
guarantees across the 1.X series. We’ve had one release candidate posted and
are working on the next after a period of heavy QA. The project continues to
see fast community growth — over 100 people submitted patches for 1.0.

Some of the major features in 1.0 include:
- A new Spark SQL component for accessing structured data within Spark
 programs
- Java 8 lambda syntax support to make Spark programming in Java easier
- Sparse data support, model evaluation, matrix algorithms and decision trees
 in MLlib
- Long-lived monitoring dashboard
- Common job submission script for all cluster managers
- Revamped docs including new detailed docs for all the ML algorithms
- Full integration with Hadoop YARN security model
- API stability across the entire 1.X line

Releases:

Our last few releases were:

Apr 9, 2014: Spark 0.9.1
Feb 2, 2014: Spark 0.9.0-incubating
Dec 19, 2013: Spark 0.8.1-incubating
Sept 25, 2013: Spark 0.8.0-incubating

Committers and PMC:

We just opened votes for two new committers and PMC members on May 12th.
The last committers and (podling) PMC members were added on Dec 22, 2013

16 Apr 2014 [Matei Zaharia / Chris]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala and Python as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:
---------------

The project recently became a TLP and continues to grow in terms of community
size. We finished switching our infrastructure to spark.apache.org, including
recently importing our JIRA instance. We completed the vote on a 0.9.1 minor
release last week (it will be posted on April 9th), and we reached the
feature freeze and QA point for our 1.0 release, which is coming in a few
weeks. Apart from the new features coming in 1.0, a major update in the
community has been a change towards a Semantic Versioning-like policy, where
maintenance releases are clearly marked and API compatibility is preserved
across all minor releases (i.e. all 1.x.y will be compatible). This has been
put in action for both 0.9.x and 1.x.

Releases:
---------

Our last few releases were:

Apr 9, 2014: Spark 0.9.1
Feb 2, 2014: Spark 0.9.0-incubating
Dec 19, 2013: Spark 0.8.1-incubating
Sept 25, 2013: Spark 0.8.0-incubating

Committers and PMC:
-------------------

The last committers and (podling) PMC members were added on Dec 22, 2013.

19 Mar 2014 [Matei Zaharia / Bertrand]

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala and Python as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

The project recently became a TLP and continues to grow in terms of community
size. We switched all our infrastructure out of the incubator and to
spark.apache.org domains / repos (though the old site still needs a redirect).
We have a new minor release being finalized for later this month, and a Spark
1.0 release targeting end of April. Recent activity includes new machine
learning algorithms, updating the Spark Java API to work with Java 8 lambda
syntax, Python API extensions, and improved support for Hadoop YARN.

Releases:

Our last few releases were:

Feb 2, 2014: Spark 0.9.0-incubating
Dec 19, 2013: Spark 0.8.1-incubating
Sept 25, 2013: Spark 0.8.0-incubating

Committers and PMC:

The last committers and (podling) PMC members were added on Dec 22, 2013.

19 Feb 2014

Establish the Apache Spark Project

 WHEREAS, the Board of Directors deems it to be in the best interests
 of the Foundation and consistent with the Foundation's purpose to
 establish a Project Management Committee charged with the creation
 and maintenance of open-source software, for distribution at no
 charge to the public, related to fast and flexible large-scale data
 analysis on clusters.

 NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
 (PMC), to be known as the "Apache Spark Project", be and hereby is
 established pursuant to Bylaws of the Foundation; and be it further

 RESOLVED, that the Apache Spark Project be and hereby is responsible
 for the creation and maintenance of software related to fast and
 flexible large-scale data analysis on clusters; and be it further

 RESOLVED, that the office of "Vice President, Apache Spark" be and
 hereby is created, the person holding such office to serve at the
 direction of the Board of Directors as the chair of the Apache Spark
 Project, and to have primary responsibility for management of the
 projects within the scope of responsibility of the Apache Spark
 Project; and be it further

 RESOLVED, that the persons listed immediately below be and hereby are
 appointed to serve as the initial members of the Apache Spark Project:

 * Mosharaf Chowdhury <mosharaf@apache.org>
 * Jason Dai <jasondai@apache.org>
 * Tathagata Das <tdas@apache.org>
 * Ankur Dave <ankurdave@apache.org>
 * Aaron Davidson <adav@apache.org>
 * Thomas Dudziak <tomdz@apache.org>
 * Robert Evans <bobby@apache.org>
 * Thomas Graves <tgraves@apache.org>
 * Andy Konwinski <andrew@apache.org>
 * Stephen Haberman <stephenh@apache.org>
 * Mark Hamstra <markhamstra@apache.org>
 * Shane Huang <shane_huang@apache.org>
 * Ryan LeCompte <ryanlecompte@apache.org>
 * Haoyuan Li <haoyuan@apache.org>
 * Sean McNamara <smcnamara@apache.org>
 * Mridul Muralidharan <mridulm80@apache.org>
 * Kay Ousterhout <kayousterhout@apache.org>
 * Nick Pentreath <mlnick@apache.org>
 * Imran Rashid <irashid@apache.org>
 * Charles Reiss <woggle@apache.org>
 * Josh Rosen <joshrosen@apache.org>
 * Prashant Sharma <prashant@apache.org>
 * Ram Sriharsha <harsha@apache.org>
 * Shivaram Venkataraman <shivaram@apache.org>
 * Patrick Wendell <pwendell@apache.org>
 * Andrew Xia <xiajunluan@apache.org>
 * Reynold Xin <rxin@apache.org>
 * Matei Zaharia <matei@apache.org>

 NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matei Zaharia be
 appointed to the office of Vice President, Apache Spark, to serve
 in accordance with and subject to the direction of the Board of
 Directors and the Bylaws of the Foundation until death, resignation,
 retirement, removal or disqualification, or until a successor is
 appointed; and be it further

 RESOLVED, that the Apache Spark Project be and hereby is tasked
 with the migration and rationalization of the Apache Incubator Spark
 podling; and be it further

 RESOLVED, that all responsibilities pertaining to the Apache Incubator
 Spark podling encumbered upon the Apache Incubator Project are
 hereafter discharged.

 Special Order 7C, Establish the Apache Spark Project, was
 approved by Unanimous Vote of the directors present.

15 Jan 2014

Spark is an open source system for fast and flexible large-scale data
analysis. Spark provides a general purpose runtime that supports
low-latency execution in several forms.

Spark has been incubating since 2013-06-19.

Three most important issues to address in the move towards graduation:

 1. Pretty much the only issue remaining is importing our old JIRA
    into Apache (https://issues.apache.org/jira/browse/INFRA-6419).
    Unfortunately, although we've been trying to do this since June,
    we haven't had much luck with it, as the INFRA people who tried
    to help out have been busy and software version numbers have
    often been incompatible (we have a hosted JIRA instance from
    Atlassian that they regularly update). We believe that there are
    some export dumps on that issue that are compatible with the ASF's
    current JIRA version, but if we can't get this resolved in the
    next 2-3 weeks, we may simply forgo importing our old issues.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 It would be really great to get a contact who can sit down with us
 and do the JIRA import. We're not sure who from INFRA leads these
 tasks.

How has the community developed since the last report?

 We made a Spark 0.8.1 release in December, and are working on a new
 major release (0.9) this month. We added two new committers, Aaron
 Davidson and Kay Ousterhout.

How has the project developed since the last report?

 We made the Spark 0.8.1 release mentioned above, with a number of
 new features detailed at
 http://spark.incubator.apache.org/releases/spark-release-0-8-1.html.
 We also have some exciting features coming up in Spark 0.9, such as
 support for Scala 2.10, parallel machine learning libraries in
 Python, and improvements to Spark Streaming.

Date of last release:

 2013-12-19

When were the last committers or PMC members elected?

 2013-12-30

Signed-off-by:

 [ ](spark) Chris Mattmann
 [ ](spark) Paul Ramirez
 [ ](spark) Andrew Hart
 [ ](spark) Thomas Dudziak
 [X](spark) Suresh Marru
 [X](spark) Henry Saputra
 [X](spark) Roman Shaposhnik

Shepherd/Mentor notes:

 Alan Cabrera (acabrera):

   Seems like a nice active project.  IMO, there's no need to wait import
   to JIRA to graduate. Seems like they can graduate now.

16 Oct 2013

Spark is an open source system for fast and flexible large-scale data
analysis. Spark provides a general purpose runtime that supports
low-latency execution in several forms.

Spark has been incubating since 2013-06-19.

Three most important issues to address in the move towards graduation:

 1. Move JIRA over to Apache (still haven't gotten success from INFRA
    on this: https://issues.apache.org/jira/browse/INFRA-6419)
 2. Add more committers under Apache process
 3. Make further Apache releases

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 We still need some help importing our JIRA -- see INFRA-6419. For some
 reason we've had a lot of trouble with this. It should be easier now
 because Apache's JIRA was updated and now matches our version.

How has the community developed since the last report?

 We made the Spark 0.8.0 release, which was the biggest so far, with 67
 developers from 24 organizations contributing. The release shows how far
 our community has grown -- our 0.6 release last October had only 17
 contributors, and our 0.7 release in February had 31. Most of the
 contributors are now external to the original UC Berkeley team.

How has the project developed since the last report?

 We made the Spark 0.8.0 release mentioned above, which so far seems to
 be doing well. It brings a number of deployability features, improved
 Python support, and a new standard library for machine learning; see
 http://spark.incubator.apache.org/releases/spark-release-0-8-0.html
 for what's new in the release.

Date of last release:

 2013-09-25

When were the last committers or PMC members elected?

 June 2013

Signed-off-by:

 [X](spark) Chris Mattmann
 [ ](spark) Paul Ramirez
 [ ](spark) Andrew Hart
 [ ](spark) Thomas Dudziak
 [ ](spark) Suresh Marru
 [X](spark) Henry Saputra
 [X](spark) Roman Shaposhnik

Shepherd notes:

 Dave Fisher (wave):

   Very active community on a fast track. Good report. Get your JIRA over
   and you are getting close.  (Oct. 7)

 Marvin Humphrey (marvin):

   Report not filed in time for shepherd review.

18 Sep 2013

Spark is an open source system for fast and flexible large-scale data
analysis. Spark provides a general purpose runtime that supports low-latency
execution in several forms.

Spark has been incubating since 2013-06-19.

Three most important issues to address in the move towards graduation:

 1. Make a first Apache release (we're in the final stages of this)
 2. Move JIRA over to Apache (https://issues.apache.org/jira/browse/INFRA-6419)
 3. Move development to Apache repo (in progress)

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 We still need some help importing our JIRA, though Michael Joyce and INFRA
 have looked into it (see <http://s.apache.org/fi>).

How has the community developed since the last report?

 We're continuing to get a lot of great contributions to Spark. UC Berkeley
 also recently hosted a two-day training on Spark and related technologies
 (http://ampcamp.berkeley.edu/3/) that was highly attended -- we sold out at
 over 200 on-site attendees, and had 1000+ people watch online. User meetups
 included a well-attended meetup on Shark (Hive on Spark) contributions at
 Yahoo!.

How has the project developed since the last report?

 We've made a lot of progress towards a first Apache release of Spark,
 including changing the package name to org.apache.spark, documenting the
 third-party licenses as required in LICENSE / NOTICE, and updating the
 documentation to reflect the transition. This month we've also moved our
 website to an apache.org domain (http://spark.incubator.apache.org) and
 updated the branding there. Finally, on the code side, we have continued to
 make bug fixes and improvements for the 0.8 release. Some recently merged
 improvements include simplified packaging and Python API support for
 Windows.

Date of last release:

 No Apache releases yet

When were the last committers or PMC members elected?

 June 2013

Signed-off-by:

 [x](spark) Chris Mattmann
 [ ](spark) Paul Ramirez
 [x](spark) Andrew Hart
 [ ](spark) Thomas Dudziak
 [x](spark) Suresh Marru
 [x](spark) Henry Saputra
 [x](spark) Roman Shaposhnik

Shepherd notes:

21 Aug 2013

Spark is an open source system for fast and flexible large-scale data
analysis. Spark provides a general purpose runtime that supports low-latency
execution in several forms.

Spark has been incubating since 2013-06-19.

Three most important issues to address in the move towards graduation:

 1. Finish bringing up Apache infrastructure (the only system missing
    is JIRA, but we also still need to move out website to Apache)
 2. Switch development to work directly against Apache repo
 3. Make a Spark 0.8 release through the Apache process

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 Nothing major. We've gotten a lot of help setting up infrastructure and the
 last piece missing is importing issues from our old JIRA, which we're
 working with INFRA on (https://issues.apache.org/jira/browse/INFRA-6419).

How has the community developed since the last report?

 We've continued to get and accept a number of external contributions,
 including metrics infrastructure, improved web UI, several optimizations and
 bug fixes.  We held a meetup on machine learning on Spark in San Francisco
 that got around 200 attendees. Finally, we've set up Apache mailing lists
 and warned users of the migration, which will complete at the beginning of
 September.

How has the project developed since the last report?

 We are finishing some bug fixes and merges to do a first Apache release of
 Spark later this month. During this release we'll go through the process of
 checking that the right license headers are in place, NOTICE file is
 present, etc, and we'll complete a website on Apache.

Date of last release:

 None yet.

Signed-off-by:

 [X](spark) Chris Mattmann
 [ ](spark) Paul Ramirez
 [ ](spark) Andrew Hart
 [ ](spark) Thomas Dudziak
 [X](spark) Suresh Marru
 [X](spark) Henry Saputra
 [X](spark) Roman Shaposhnik

Shepherd notes:

17 Jul 2013

Spark is an open source system for fast and flexible large-scale data
analysis. Spark provides a general purpose runtime that supports low-latency
execution in several forms.

Spark has been incubating since 2013-06-19.

Three most important issues to address in the move towards graduation:

 1. Finish bringing up infrastructure on Apache (JIRA, "user" mailing list,
    SVN repo for website)
 2. Migrate mailing lists and development to Apache
 3. Make a Spark 0.8 under the Apache Incubator

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

 While most of our infrastructure is now up, it has taken a while to get a
 JIRA, a SVN repo for our website (so we can use the CMS), and a
 user@spark.incubator.apache.org mailing list (so we can move our existing
 user list, which is large).

How has the community developed since the last report?

 We only entered the Apache Incubator at the end of June, but in the existing
 developer community keeps expanding and we are seeing many new features from
 new contributors.

How has the project developed since the last report?

 In terms of the Apache incubation process, we filed our IP papers and got a
 decent part of the infrastructure set up (Git, dev list, wiki, Jenkins
 group).

Date of last release:

 None

Signed-off-by:

 [X](spark) Chris Mattmann
 [ ](spark) Paul Ramirez
 [ ](spark) Andrew Hart
 [ ](spark) Thomas Dudziak
 [X](spark) Suresh Marru
 [x](spark) Henry Saputra
 [ ](spark) Roman Shaposhnik

Shepherd notes: