This was extracted (@ 2024-11-20 22:10) from a list of minutes
which have been approved by the Board.
Please Note
The Board typically approves the minutes of the previous meeting at the
beginning of every Board meeting; therefore, the list below does not
normally contain details from the minutes of the most recent Board meeting.
WARNING: these pages may omit some original contents of the minutes.
Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).
Report was filed, but display is awaiting the approval of the Board minutes.
Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Issues for the board: - None Project status: - We released Apache Spark 4.0 Preview 1 on June 3rd, 2024. - We released Apache Spark 3.5.2 on August 10th, 2024. - We added three new committers (Allison Wang, Martin Grund, and Haejoon Lee) and one new PMC member (Kent Yao) to the project. - The votes for two infrastructure changes have passed: "Move Spark Connect server to built-in package (Client API layer remains external)" and "Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go." - The votes on "SPIP: Stored Procedures API for Catalogs" and "Differentiate Spark without Spark Connect from Spark Connect" have passed. - We clarified our committer guidelines at https://spark.apache.org/committers.html, including reminding committers about leaving sufficient time for reviews. Trademarks: - No changes since last report. Latest releases: - Spark 3.5.2 was released on August 10, 2024 - Spark 4.0 Preview 1 was released on June 3, 2024 - Spark 3.4.3 was released on April 18, 2024 Committers and PMC: - The latest committers were added on July 10th, 2023 (Allison Wang, Martin Grund, and Haejoon Lee). - The latest PMC member was added on Aug 8th, 2024 (Kent Yao).
Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Issues for the board: - None Project status: - We made two patch releases: Spark 3.5.1 on February 28, 2024, and Spark 3.4.2 on April 18, 2024. - We've started working toward a preview release for Spark 4.0 to give the community an easy way to try the next major version. - The votes on "SPIP: Structured Logging Framework for Apache Spark" and "Pure Python Package in PyPI (Spark Connect)" have passed. - The votes for two behavior changes have passed: "SPARK-44444: Use ANSI SQL mode by default" and "SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false". - The community decided that the Spark 4.0 release will drop support for Python 3.8. - We started a discussion about the definition of behavior changes that is critical for version upgrades and user experience. - We've opened a dedicated repository for the Spark Kubernetes Operator at https://github.com/apache/spark-kubernetes-operator. We added a new version in Apache Spark JIRA for versioning of the Spark operator based on a vote result. Trademarks: - No major changes since the last report. Latest releases: - Spark 3.4.3 was released on April 18, 2024 - Spark 3.5.1 was released on February 28, 2024 - Spark 3.3.4 was released on December 16, 2023 Committers and PMC: - The latest committer was added on Oct 2nd, 2023 (Jiaan Geng). - The latest PMC members were added on Oct 2nd, 2023 (Yuanjian Li and Yikun Jiang).
Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Issues for the board: - None Project status: - We made two patch releases: Spark 3.3.4 (EOL release) on December 16, 2023, and Spark 3.4.2 on November 30, 2023. - We have begun voting for a Spark 3.5.1 maintenance release. - The vote on "SPIP: Structured Streaming - Arbitrary State API v2" has passed. - We transitioned to an ASF-hosted analytics service, Matomo. For details, visit https://analytics.apache.org/index.php?module=CoreHome&action=index&date=yesterday&period=day&idSite=40. - Arrow Datafusion Comet, a plugin designed to accelerate Spark query execution by leveraging DataFusion and Arrow, is in the process of being open-sourced under the Apache Arrow project. For more information, visit https://github.com/apache/arrow-datafusion-comet. Trademarks: - No changes since the last report. Latest releases: - Spark 3.3.4 was released on December 16, 2023 - Spark 3.4.2 was released on November 30, 2023 - Spark 3.5.0 was released on September 13, 2023 Committers and PMC: - The latest committer was added on Oct 2nd, 2023 (Jiaan Geng). - The latest PMC members were added on Oct 2nd, 2023 (Yuanjian Li and Yikun Jiang).
Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Issues for the board: - None Project status: - We released Apache Spark 3.5 on September 15, a feature release with over 1300 patches. This release introduced more scenarios with general availability for Spark Connect, like Scala and Go client, distributed training and inference support, and enhancement of compatibility for Structured streaming. It also introduced new PySpark and SQL functionality, including the SQL IDENTIFIER clause, named argument support for SQL function calls, SQL function support for HyperLogLog approximate aggregations, and Python user-defined table functions; simplified distributed training with DeepSpeed; introduced watermark propagation among operators; and added the dropDuplicatesWithinWatermark operation in Structured Streaming. - We made a patch release, Spark 3.3.3, on August 21, 2023. - Apache Spark 4.0.0-SNAPSHOT is now ready for Java 21. [SPARK-43831] - We have begun planning for a Spark 3.4.2 maintenance release (discussion at https://lists.apache.org/thread/35o2169l5r05k2mknqjy9mztq3ty1btr) and a Spark 3.3.4 EOL branch release (targeting December 16th). - The vote on "Updating documentation hosted for EOL and maintenance releases" has passed. - The vote on the Spark Project Improvement Proposals (SPIPs) for "State Data Source - Reader" has passed. - The PMC has voted to add two new PMC members, Yuanjian Li and Yikun Jiang, and one new committer, Jiaan Geng, to the project. Trademarks: - No changes since the last report. Latest releases: - Spark 3.5.0 was released on September 13, 2023 - Spark 3.3.3 was released on August 21, 2023 - Spark 3.4.1 was released on June 23, 2023 Committers and PMC: - The latest committer was added on Oct 2nd, 2023 (Jiaan Geng). - The latest PMC members were added on Oct 2nd, 2023 (Yuanjian Li and Yikun Jiang).
Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Issues for the board: - None Project status: - We cut the branch Spark 3.5.0 on July 17th 2023. The community is working on bug fixes, tests, stability and documentation. - We made a patch release, Spark 3.4.1, on June 23, 2023. - We are preparing a Spark 3.3.3 release for later this month (https://lists.apache.org/thread/0kgnw8njjnfgc5nghx60mn7oojvrqwj7). - Votes on three Spark Project Improvement Proposals (SPIP) passed: "XML data source support", "Python Data Source API", and "PySpark Test Framework". - A vote for "Apache Spark PMC asks Databricks to differentiate its Spark version string" did not pass. This was asking a company to change the string returned by Spark APIs in a product that packages a modified version of Apache Spark. - The community decided to release Apache Spark 4.0.0 after the 3.5.0 version. We are tracking issues that may target this release at https://issues.apache.org/jira/browse/SPARK-44111. - An official Apache Spark Docker image is now available at https://hub.docker.com/_/spark - A new repository, https://github.com/apache/spark-connect-go, was created for the Go client of Spark Connect. - The PMC voted to add two new committers to the project, XiDuo You and Peter Toth Trademarks: - No changes since the last report. Latest releases: - We released Apache Spark 3.4.1 on June 23, 2023 - We released Apache Spark 3.2.4 on April 13, 2023 - We released Spark 3.3.2 on February 17, 2023 Committers and PMC: - The latest committers were added on July 11th, 2023 (XiDuo You and Peter Toth). - The latest PMC members were added on May 10th, 2023 (Chao Sun, Xinrong Meng and Ruifeng Zheng) and May 14th, 2023 (Yuming Wang).
Issues for the board: - None Project status: - We released Apache Spark 3.4 on April 13th, a feature release with over 2600 patches. This release introduces Python client for Spark Connect, augments Structured Streaming with async progress tracking and Python arbitrary stateful processing, increases Pandas API coverage and provides NumPy input support, simplifies the migration from traditional data warehouses to Apache Spark by improving ANSI compliance and implementing dozens of new built-in functions, and boosts development productivity and debuggability with memory profiling. - We made two patch releases: Spark 3.2.4 on April 13th and Spark 3.3.2 on February 17th. These have bug fixes to the corresponding branches of the project. - The PMC voted to add three new PMC members to the project. - A vote on a Spark Project Improvement Proposals (SPIP) for "Lazy Materialization for Parquet Read Performance Improvement" passed. Trademarks: - No changes since the last report. Latest releases: - Spark 3.4.0 was released on April 13, 2023 - Spark 3.2.4 on April 13, 2023 - Spark 3.3.2 on February 17, 2023 Committers and PMC: - The latest committer was added on Oct 2nd, 2022 (Yikun Jiang). - The latest PMC members were added on May 10th, 2023 (Chao Sun, Xinrong Meng and Ruifeng Zheng).
Issues for the board: - None Project status: - We cut the branch Spark 3.4.0 on Jan 24th 2023. The community is working on bug fixes, tests, stability and documentation. - We are preparing a Spark 3.3.2 release for later this month (https://lists.apache.org/thread/nwzr3o2cxyyf6sbb37b8yylgcvmbtp16) - Starting in Spark 3.4, we are also attaching an SBOM to Apache Spark Maven artifacts [SPARK-41893] in line with other ASF projects. - We released Apache Spark 3.2.3, a bug fix release for the 3.2 line, on Nov 28th 2022. - Votes on the Spark Project Improvement Proposals (SPIPs) for "Asynchronous Offset Management in Structured Streaming" and "Better Spark UI scalability and Driver stability for large applications" passed. - The DStream API will be deprecated in the upcoming Apache Spark 3.4 release to focus work on the Structured Streaming APIs. [SPARK-42075] Trademarks: - No changes since the last report. Latest releases: - Spark 3.2.3 was released on Nov 28, 2022. - Spark 3.3.1 was released on Oct 25, 2022. - Spark 3.3.0 was released on June 16, 2022. Committers and PMC: - The latest committer was added on Oct 2nd, 2022 (Yikun Jiang). - The latest PMC member was added on June 28th, 2022 (Huaxin Gao).
Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Issues for the board: - None Project status: - We released Apache Spark 3.3.1, a bug fix release for the 3.3 line, on October 25th. We are also currently preparing a Spark 3.2.3 release. - The vote on the Spark Project Improvement Proposal (SPIP) for "Support Docker Official Image for Spark" passed. We created a new Github repository https://github.com/apache/spark-docker for building the official Docker image. - We decided to drop the Apache Spark Hadoop 2 binary distribution in future releases. - We added a new committer, Yikun Jiang, in October 2022. Trademarks: - No changes since the last report. Latest releases: - Spark 3.3.1 was released on Oct 25, 2022. - Spark 3.3.0 was released on June 16, 2022. - Spark 3.2.2 was released on July 17, 2022. Committers and PMC: - The latest committer was added on Oct 2nd, 2022 (Yikun Jiang). - The latest PMC member was added on June 28th, 2022 (Huaxin Gao).
Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Issues for the board: - None Project status: - Apache Spark was honored to receive the SIGMOD System Award this year, given by SIGMOD (the ACM’s data management research organization) to impactful real-world and research systems. - We recently released Apache Spark 3.3.0, a feature release that improves join query performance via Bloom filters, increases the Pandas API coverage with the support of popular Pandas features such as datetime.timedelta and merge_asof, simplifies the migration from traditional data warehouses by improving ANSI SQL compliance and supporting dozens of new built-in functions, boosts development productivity with better error handling, autocompletion, performance, and profiling. - We released Apache Spark 3.2.2, a bug fix release for the 3.2 line, on July 17th. - A Spark Project Improvement Proposal (SPIP) for Spark Connect was voted on and accepted. Spark Connect introduces a lightweight client/server API for Spark (https://issues.apache.org/jira/browse/SPARK-39375) that will allow applications to submit work to a remote Spark cluster without running the heavyweight query planner in the client, and will also decouple the client version from the server version, making it possible to update Spark without updating all the applications. - The community started a major effort to improve Structured Streaming performance, usability, APIs, and connectors called Project Lightspeed (https://issues.apache.org/jira/browse/SPARK-40025), and we'd love to get feedback and contributions on that. - We added three new PMC members, Huaxin Gao, Gengliang Wang and Maxim Gekk, in June 2022. - We added a new committer, Xinrong Meng, in July 2022. Trademarks: - No changes since the last report. Latest releases: - Spark 3.3.0 was released on June 16, 2022. - Spark 3.2.2 was released on July 17, 2022. - Spark 3.1.3 was released on February 18, 2022. Committers and PMC: - The latest committer was added on July 13rd, 2022 (Xinrong Meng). - The latest PMC member was added on June 28th, 2022 (Huaxin Gao).
Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Issues for the board: - None Project status: - We are working on the release of Spark 3.3.0, with Release Candidate 1 currently being tested and voted on. - We released Apache Spark 3.1.3, a bug fix release for the 3.1 line, on February 18th. - We started publishing official Docker images of Apache Spark in Docker Hub, at https://hub.docker.com/r/apache/spark/tags - A new Spark Project Improvement Proposal (SPIP) is being discussed by the community to offer a simplified API for deep learning inference, including built-in integration with popular libraries such as Tensorflow, PyTorch and HuggingFace (https://issues.apache.org/jira/browse/SPARK-38648). Trademarks: - No changes since the last report. Latest releases: - Spark 3.1.3 was released on February 18, 2022. - Spark 3.2.1 was released on January 26, 2022. - Spark 3.2.0 was released on October 13, 2021. Committers and PMC: - The latest committer was added on Dec 20th, 2021 (Yuanjian Li). - The latest PMC member was added on Jan 19th, 2022 (Maciej Szymkiewicz).
Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Issues for the board: - None Project status: - We released Apache Spark 3.2.1, a bug fix release for the 3.2 line, in January. - Two Spark Project Improvement Proposals (SPIPs) were recently accepted by the community: Support for Customized Kubernetes Schedulers (https://issues.apache.org/jira/browse/SPARK-36057) and Storage Partitioned Join for Data Source V2 (https://issues.apache.org/jira/browse/SPARK-37375). - We've migrated away from Spark’s original Jenkins CI/CD infrastructure, which was graciously hosted by UC Berkeley on their clusters since 2013, to GitHub Actions. Thanks to the Berkeley EECS department for hosting this for so long! - We added a new committer, Yuanjian Li, in December 2021. - We added a new PMC member, Maciej Szymkiewicz, in January 2022. Trademarks: - No changes since the last report. Latest releases: - Spark 3.2.1 was released on January 26, 2022. - Spark 3.2.0 was released on October 13, 2021. - Spark 3.1.2 was released on June 23rd, 2021. Committers and PMC: - The latest committer was added on Dec 20th, 2021 (Yuanjian Li). - The latest PMC member was added on Jan 19th, 2022 (Maciej Szymkiewicz).
Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Issues for the board: - None Project status: - We recently released Apache Spark 3.2, a feature release that adds several large pieces of functionality. Spark 3.2 includes a new Pandas API for Apache Spark based on the Koalas project, a new push-based shuffle implementation, a more efficient RocksDB state store for Structured Streaming, native support for session windows, error message standardization, and significant improvements to Spark SQL, such as the use of adaptive query execution by default and GA status for the ANSI SQL language mode. - We updated the Apache Spark homepage with a new design and more examples. - We added a new committer, Chao Sun, in November 2021. Trademarks: - No changes since the last report. Latest releases: - Spark 3.2.0 was released on October 13, 2021. - Spark 3.1.2 was released on June 23rd, 2021. - Spark 3.0.3 was released on June 1st, 2021. Committers and PMC: - The latest committer was added on November 5th, 2021 (Chao Sun). - The latest PMC member was added on June 20th, 2021 (Kousuke Saruta).
Description: Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Issues for the board: - None Project status: - We made a number of maintenance releases in the past three months. We released Apache Spark 3.1.2 and 3.0.3 in June as maintenance releases for the 3.x branches. We also released Apache Spark 2.4.8 on May 17 as a bug fix release for the Spark 2.x line. This may be the last release on 2.x unless major new bugs are found. - We added three PMC members: Liang-Chi Hsieh, Kousuke Saruta and Takeshi Yamamuro. - We are working on Spark 3.2.0 as our next release, with a release candidate likely to come soon. Spark 3.2 includes a new Pandas API for Apache Spark based on the Koalas project, a new push-based shuffle implementation, a more efficient RocksDB state store for Structured Streaming, native support for session windows, error message standardization, and significant improvements to Spark SQL, such as the use of adaptive query execution by default. Trademarks: - No changes since the last report. Latest releases: - Spark 3.1.2 was released on June 23rd, 2021. - Spark 3.0.3 was released on June 1st, 2021. - Spark 2.4.8 was released on May 17th, 2021. Committers and PMC: - The latest committers were added on March 11th, 2021 (Atilla Zsolt Piros, Gabor Somogyi, Kent Yao, Maciej Szymkiewicz, Max Gekk, and Yi Wu). - The latest PMC member was added on June 20th, 2021 (Kousuke Saruta).
Description: Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Issues for the board: - None Project status: - We released Apache Spark 3.1.1, a major update release for the 3.x branch, on March 2nd. This release includes updates to improve Python usability and error messages, ANSI SQL support, the streaming UI, and support for running Apache Spark on Kubernetes, which is now marked GA. Overall, the release includes about 1500 patches. - We are voting on an Apache Spark 2.4.8 bug fix release with for the Spark 2.x line. This may be the last release on 2.x. - We added six new committers in the past three months: Atilla Zsolt Piros, Gabor Somogyi, Kent Yao, Maciej Szymkiewicz, Max Gekk, and Yi Wu. - Several SPIPs (major project improvement proposals) were voted on and accepted, including adding a Function Catalog in Spark SQL and adding a Pandas API layer for PySpark based on the Koalas project. We've also started an effort to standardize error message reporting in Apache Spark (https://spark.apache.org/error-message-guidelines.html) so that messages are easier to understand and users can quickly figure out how to fix them. Trademarks: - The PMC is investigating a potential trademark issue with another open source project. Latest releases: - Spark 3.1.1 was released on March 2nd, 2021. - Spark 3.0.2 was released on February 19th, 2021. - Spark 2.4.7 was released on September 12th, 2020. Committers and PMC: - The latest committers were added on March 18th, 2021 (Atilla Zsolt Piros, Gabor Somogyi, Kent Yao, Maciej Szymkiewicz, Max Gekk, and Yi Wu). - The latest PMC member was added on Sept 4th, 2019 (Dongjoon Hyun). The PMC has been discussing some new PMC candidates.
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - The community is close to finalizing the first Spark 3.1.x release, which will be Spark 3.1.1. There was a problem with our release candidate packaging scripts that caused us to accidentally publish a 3.1.0 version to Maven Central before it was ready, so we’ve deleted that and will not use that version number. Several release candidates for 3.1.1 have gone out to the dev mailing list and we’re tracking the last remaining issues. - Several proposals for significant new features are being discussed on the dev mailing list, including a function catalog for Spark SQL, a RocksDB based state store for streaming applications, and public APIs for creating user-defined types (UDTs) in Spark SQL. We would welcome feedback on these from interested community members. Trademarks: - No changes since the last report. Latest releases: - Spark 2.4.7 was released on September 12th, 2020. - Spark 3.0.1 was released on September 8th, 2020. - Spark 3.0.0 was released on June 18th, 2020. Committers and PMC: - The latest committers were added on July 14th, 2020 (Huaxin Gao, Jungtaek Lim and Dilip Biswal). - The latest PMC member was added on Sept 4th, 2019 (Dongjoon Hyun). The PMC has been discussing some new PMC candidates.
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We released Apache Spark 3.0.1 on September 8th and Spark 2.4.7 on September 12th as maintenance releases with bug fixes to these two branches. - The community is working on a number of new features in the Spark 3.x branch, including improved data catalog APIs, a push-based shuffle implementation, and better error messages to make Spark applications easier to debug. The largest changes have are being discussed as SPIPs on our mailing list. - The new policy about -1 votes on patches that we discussed in the last report is now agreed-upon and active, although some developers in one area of the project are still concerned that their feedback was inappropriately ignored in the past. The PMC is communicating with those developers to understand their perspectives and suggest ways to improve trust and collaboration (including clarifying what behavior is acceptable). Trademarks: - One of the two software projects we reached out to July to change its name due to a trademark issue has changed it. We are still waiting for a reply from the other one, but it may be that development there has stopped. Latest releases: - Spark 2.4.7 was released on September 12th, 2020. - Spark 3.0.1 was released on September 8th, 2020. - Spark 3.0.0 was released on June 18th, 2020. Committers and PMC: - The latest committers were added on July 14th, 2020 (Huaxin Gao, Jungtaek Lim and Dilip Biswal). - The latest PMC member was added on Sept 4th, 2019 (Dongjoon Hyun). The PMC has been discussing some new candidates to add as PMC members.
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We released Apache Spark 3.0.0 on June 18th, 2020. This was our largest release yet, containing over 3400 patches from the community, including significant improvements to SQL performance, ANSI SQL compatibility, Python APIs, SparkR performance, error reporting and monitoring tools. This release also enhances Spark’s job scheduler to support adaptive execution (changing query plans at runtime to reduce the need for configuration) and workloads that need hardware accelerators. - We released Apache Spark 2.4.6 on June 5th with bug fixes to the 2.4 line. - The community is working on 3.0.1 and 2.4.7 releases with bug fixes to these two branches. There are also a number of new SPIPs proposed for large features to add after 3.0, including Kotlin language support, push-based shuffle, materialized views and support for views in the catalog API. These discussions can be followed on our dev list and the corresponding JIRAs. - We had a discussion on the dev list about clarifying our process for handling -1 votes on patches, as well as other discussions on the development process. The PMC is working to resolve any misunderstandings and make the expected process around consensus and -1 votes clear on our website. - We added three new committers to the project since the last report: Huaxin Gao, Jungtaek Lim and Dilip Biswal. Trademarks: - We engaged with three organizations that had created products with “Spark” in the name to ask them to follow our trademark guidelines. Latest releases: - Spark 3.0.0 was released on June 18th, 2020. - Spark 2.4.6 was released on June 5th, 2020. - Spark 2.4.5 was released on Feb 8th, 2020. Committers and PMC: - The latest PMC member was added on Sept 4th, 2019 (Dongjoon Hyun). - The latest committers were added on July 7th, 2020 (Huaxin Gao, Jungtaek Lim and Dilip Biswal).
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - Progress is continuing on the upcoming Apache Spark 3.0 release, with the first votes on release candidates. This will be a major release with various API and SQL language updates, so we’ve tried to solicit broad input on it through two preview releases and a lot of JIRA and mailing list discussion. - The community is also voting on a release candidate for Apache Spark 2.4.6, bringing bug fixes to the 2.4 branch. Trademarks: - Nothing new to report in the past 3 months. Latest releases: - Spark 2.4.5 was released on Feb 8th, 2020. - Spark 3.0.0-preview2 was released on Dec 23rd, 2019. - Spark 3.0.0-preview was released on Nov 6th, 2019. - Spark 2.3.4 was released on Sept 9th, 2019. Committers and PMC: - The latest PMC member was added on Sept 4th, 2019 (Dongjoon Hyun). - The latest committer was added on Sept 9th, 2019 (Weichen Xu).
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including SQL, streaming, machine learning, and graph analytics. Project status: - We have cut a release branch for Apache Spark 3.0, which is now undergoing testing and bug fixes before the final release. In December, we also published a new preview release for the 3.0 branch that the community can use to test and give feedback: https://spark.apache.org/news/spark-3.0.0-preview2.html. Spark 3.0 includes a range of new features and dependency upgrades (e.g. Java 11) but remains largely compatible with Spark’s current API. - We published Apache Spark 2.4.5 on Feb 8th with bug fixes for the 2.4 branch of Spark. Trademarks: - Nothing new to report in the past 3 months. Latest releases: - Spark 2.4.5 was released on Feb 8th, 2020. - Spark 3.0.0-preview2 was released on Dec 23rd, 2019. - Spark 3.0.0-preview was released on Nov 6th, 2019. - Spark 2.3.4 was released on Sept 9th, 2019. Committers and PMC: - The latest PMC member was added on Sept 4th, 2019 (Dongjoon Hyun). - The latest committer was added on Sept 9th, 2019 (Weichen Xu). We also added Ryan Blue, L.C. Hsieh, Gengliang Wang, Yuming Wang and Ruifeng Zheng as committers in the past three months.
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We made the first preview release for Spark 3.0 on November 6th. This release aims to get early feedback on the new APIs and functionality targeting Spark 3.0 but does not provide API or stability guarantees. We encourage community members to try this release and leave feedback on JIRA. More info about what's new and how to report feedback is available at https://spark.apache.org/news/spark-3.0.0-preview.html. - We published Spark 2.4.4. and 2.3.4 as maintenance releases to fix bugs in the 2.4 and 2.3 branches. - We added one new PMC members and six committers to the project in August and September, covering data sources, streaming, SQL, ML and other components of the project. Trademarks: - Nothing new to report since August. Latest releases: - Spark 3.0.0-preview was released on Nov 6th, 2019. - Spark 2.3.4 was released on Sept 9th, 2019. - Spark 2.4.4 was released on Sept 1st, 2019. Committers and PMC: - The latest PMC member was added on Sept 4th, 2019 (Dongjoon Hyun). - The latest committer was added on Sept 9th, 2019 (Weichen Xu). We also added Ryan Blue, L.C. Hsieh, Gengliang Wang, Yuming Wang and Ruifeng Zheng as committers in the past three months.
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - Discussions are continuing about our next feature release, which will likely be Spark 3.0, on the dev and user mailing lists. Some key questions include whether to remove various deprecated APIs, and which minimum versions of Java, Python, Scala, etc to support. There are also a number of new features targeting this release. We encourage everyone in the community to give feedback on these discussions through our mailing lists or issue tracker. - We announced a plan to stop supporting Python 2 in our next major release, as many other projects in the Python ecosystem are now dropping support (https://spark.apache.org/news/plan-for-dropping-python-2-support.html). - We added three new PMC members to the project in May: Takuya Ueshin, Jerry Shao and Hyukjin Kwon. - There is an ongoing discussion on our dev list about whether to consider adding project committers who do not contribute to the code or docs in the project, and what the criteria might be for those. (Note that the project does solicit committers who only work on docs, and has also added committers who work on other tasks, like maintaining our build infrastructure). Trademarks: - We are continuing engagement with various organizations. Latest releases: - May 8th, 2018: Spark 2.4.3 - April 23rd, 2019: Spark 2.4.2 - March 31st, 2019: Spark 2.4.1 - Feb 15th, 2019: Spark 2.3.3 Committers and PMC: - The latest committer was added on Jan 29th, 2019 (Jose Torres). - The latest PMC members were added on May 21st, 2019 (Jerry Shao, Takuya Ueshin and Hyukjin Kwon).
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We released Apache Spark 2.4.1, 2.4.2, 2.4.3 and 2.3.3 in the past three months to fix issues in the 2.3 and 2.4 branches. - Discussions are under way about the next feature release, which will likely be Spark 3.0, on our dev and user mailing lists. Some key questions include whether to remove various deprecated APIs, and which minimum versions of Java, Python, Scala, etc to support. There are also a number of new features targeting this release. We encourage everyone in the community to give feedback on these discussions through our mailing lists or issue tracker. - Several Spark Project Improvement Proposals (SPIPs) for major additions to Spark were discussed on the dev list in the past three months. These include support for passing columnar data efficiently into external engines (e.g. GPU based libraries), accelerator-aware scheduling, new data source APIs, and .NET support. Some of these have been accepted (e.g. table metadata and accelerator aware scheduling proposals) while others are still being discussed. Trademarks: - We are continuing engagement with various organizations. Latest releases: - May 8th, 2018: Spark 2.4.3 - April 23rd, 2019: Spark 2.4.2 - March 31st, 2019: Spark 2.4.1 - Feb 15th, 2019: Spark 2.3.3 Committers and PMC: - The latest committer was added on Jan 29th, 2019 (Jose Torres). - The latest PMC member was added on Jan 12th, 2018 (Xiao Li).
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We created a security@spark.apache.org mailing list to discuss security reports in their own location (as was also suggested by Mark T in November). - We released Apache Spark 2.2.3 on January 11th to fix bugs in the 2.2 branch. The community is also currently voting on a 2.3.3 release to bring recent fixes to the Spark 2.3 branch. - Discussions are under way about the next feature release, which will likely be Spark 3.0, on our dev and user mailing lists. Some key questions include whether to remove various deprecated APIs, and which minimum versions of Java, Python, Scala, etc to support. There are also a number of new features targeting this release. We encourage everyone in the community to give feedback on these discussions through our mailing lists or issue tracker. Trademarks: - We are continuing engagement with various organizations. Latest releases: - Jan 11th, 2019: Spark 2.2.3 - Nov 2nd, 2018: Spark 2.4.0 - Sept 24th, 2018: Spark 2.3.2 Committers and PMC: - There was a discussion about lack of available review bandwidth for streaming on the dev list in January. The PMC discussed this and added a new committer, Jose Torres, specializing in streaming. We are continuing to look for other contributors who'd make good committers here and in other areas. - The latest committer was added on January 29th, 2019 (Jose Torres). - The latest PMC member was added on January 12th, 2018 (Xiao Li).
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We released Apache Spark 2.4.0 on Nov 2nd, 2018 as our newest feature release. Spark 2.4's features include a barrier execution mode for machine learning computations, higher-order functions in Spark SQL, pivot syntax in SQL, a built-in Apache Avro data source, Kubernetes improvements, and experimental support for Scala 2.12, as well as multiple smaller features and fixes. The release notes are available at http://spark.apache.org/releases/spark-release-2-4-0.html. - We released Apache Spark 2.3.2 on Sept 24th, 2018 as a bug fix release for the 2.3 branch. - Multiple dev discussions are under way about the next feature release, which is likely to be Spark 3.0, on our dev and user mailing lists. Some of the key questions are which JDK, Scala, Python, R, Hadoop and Hive versions to support, as well as whether to remove certain deprecated APIs. We encourage everyone in the community to give feedback on these discussions through the mailing lists and JIRA. Trademarks: - We are continuing engagement with various organizations. Latest releases: - Nov 2nd, 2018: Spark 2.4.0 - Sept 24th, 2018: Spark 2.3.2 - July 2nd, 2018: Spark 2.2.2 Committers and PMC: - We added six new committers since the last report: Shane Knapp, Dongjoon Hyun, Kazuaki Ishizaki, Xingbo Jiang, Yinan Li, and Takeshi Yamamuro. - The latest committer was added on Sept 18th, 2018 (Kazuaki Ishizaki). - The latest PMC member was added on Jan 12th, 2018 (Xiao Li).
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We made several maintenance releases in the past 3 months, including Spark 2.3.1, 2.2.2 and 2.1.3, to fix various bugs and issues present in the past 3 released branches. - We are close to cutting a branch for Spark 2.4, which will then go through community testing over the next several weeks to produce RCs and then the final release. Spark 2.4 is slated to include several large features, such as a barrier execution mode to run MPI-like machine learning computations in Spark jobs, various improvements to the millisecond-latency Continuous Processing mode for Structured Streaming, and much of the groundwork for supporting Scala 2.12. Trademarks: - We are continuing engagement with various organizations. Latest releases: - July 2nd, 2018: Spark 2.2.2 - June 29th, 2018: Spark 2.1.3 - June 8th, 2018: Spark 2.3.1 Committers and PMC: - The latest committer was added on March 22nd, 2018 (Zhenhua Wang). - The latest PMC member was added on Jan 12th, 2017 (Xiao Li).
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We released Apache Spark 2.3.0 on Feb 28, 2018. This includes Kubernetes support, a low-latency continuous processing mode for streaming applications that wish to prioritize latency, faster UDFs in Python using data batching through Apache Arrow, images as a data type in the machine learning library, and various other new features. - Work is under way to expand several of these new features in upcoming minor and major releases. Trademarks: - We are continuing engagement with various organizations. Latest releases: - February 28, 2018: Spark 2.3.0 - December 1, 2017: Spark 2.2.1 - October 9, 2017: Spark 2.1.2 - July 11, 2017: Spark 2.2.0 Committers and PMC: - We added seven committers in the past three months: Anirudh Ramanathan, Bryan Cutler, Cody Koeninger, Erik Erlandson, Matt Cheah, Seth Hendrickson and Zhenhua Wang. - The latest committer was added on March 28th, 2018 (Zhenhua Wang). - The latest PMC member was added on Jan 12th, 2017 (Xiao Li).
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We released Spark 2.2.1 on Dec 1st, 2017, with bug fixes for the 2.2 line. Like our previous release, this was done by a new release manager. - Voting is under way for Spark 2.3.0, a new feature release that will bring several large features. These includes support for running on Kubernetes (now merged into the project), a low-latency continuous processing mode for applications that wish to prioritize latency, faster UDFs in Python using data batching through Apache Arrow, images as a data type for the ML library, and other features. All of the larger features mentioned here were proposed as SPIPs in the last year. Trademarks: - We are continuing engagement with various organizations. Latest releases: - December 1, 2017: Spark 2.2.1 - October 9, 2017: Spark 2.1.2 - July 11, 2017: Spark 2.2.0 - May 2, 2017: Spark 2.1.1 Committers and PMC: - We added four new PMC members in the past three months (Felix Cheung, Holden Karau, Yanbo Liang and Xiao Li). - The latest committer was added on September 22nd, 2017 (Tejas Patil). Votes are currently in progress for several other new committers based on recent contributions.
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We released Spark 2.1.2 on October 9th, with maintenance fixes for the 2.1 branch. This release was also managed by a new committer, which helped expose issues in the release process documentation that we've fixed. We are encouraging more new committers to be RMs for upcoming releases like 2.2.1. - The Spark Summit Europe conference ran in Dublin, Ireland in October with 1200 attendees. - Work is under way to merge Kubernetes support for Spark 2.3.0, with the first major pull requisition having undergone a bunch of review and getting close to merging. We need one more pull request beyond this for basic support. SPIPs: We wanted to give an update on Spark Project Improvement Proposals (SPIPs), the process we started to formally propose large changes before having an implementation. Since we started the process, there have been seven SPIPs proposed on the mailing list with the first in June 2017, which are all listed in JIRA at https://s.apache.org/aMHI. So far all the voted-on SPIPs have been accepted and it seems that the discussions, both on our dev list and in JIRA, have been useful, resulting in design changes, better understanding of each idea, and feedback from a wide range of Spark users. Some of the major SPIPs discussed and accepted include Kubernetes support, images as a first-class data type in MLlib, updates to the data source API, and low latency continuous processing. We will continue to encourage people to write large proposals as SPIPs to generate this type of discussion. Trademarks: - No large issues to report in the past 3 months. Latest releases: - October 9, 2017: Spark 2.1.2 - July 11, 2017: Spark 2.2.0 - May 02, 2017: Spark 2.1.1 - Dec 28, 2016: Spark 2.1.0 Committers and PMC: - The latest committer was added on September 22nd, 2017 (Tejas Patil). - The latest PMC members were added on June 16th, 2017 (six new PMC members from the existing committers).
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We released Spark 2.2.0 on July 11th, with 1100 patches since the last version. Some of the major features released included a cost-based optimizer for Spark SQL / DataFrames, PyPI publishing, and the first production version of the new high-level Structured Streaming API (losing the experimental tag because the API has been stabilized). More details are available at spark.apache.org/releases/spark-release-2-2-0.html. - The Spark Summit conference ran in June with around 3000 attendees. - Work is under way for Spark 2.3.0, with the current target to close the new feature window and cut a release branch in November 2017. Trademarks: - We are continuing engagement with various organizations. Latest releases: - July 11, 2017: Spark 2.2.0 - May 02, 2017: Spark 2.1.1 - Dec 28, 2016: Spark 2.1.0 - Nov 14, 2016: Spark 2.0.2 - Nov 07, 2016: Spark 1.6.3 Committers and PMC: - The last committers were added on July 27th, 2017 (Hyukjin Kwon and Sameer Agarwal). - The last PMC members were added on June 16th, 2017 (six new PMC members from the existing committers).
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - The community released Apache Spark 2.1.1 on May 2nd with bug fixes for the 2.1 branch, and is currently voting on release candidates for 2.2.0. This will be a major release with various new features in streaming, SQL, machine learning and other areas of the project. - We have been making significant progress to publish Apache Spark in the standard Python and R package repositories (PyPI and CRAN) to make it easier to install for Python and R users. - We documented the "Spark improvement proposal" process described earlier for proposing large new features on our website. It just defines a short format for writing a proposal and a JIRA tag to place on such documents so that they can all be viewed in one place. - The Spark Summit East conference ran Feb 7th to 9th in Boston. Trademarks: - We are continuing engagement with various organizations. Latest releases: - May 02, 2017: Spark 2.1.1 - Dec 28, 2016: Spark 2.1.0 - Nov 14, 2016: Spark 2.0.2 - Nov 07, 2016: Spark 1.6.3 - Oct 03, 2016: Spark 2.0.1 - July 26, 2016: Spark 2.0.0 Committers and PMC: - The last committer was added on Feb 10th, 2017 (Takuya Ueshin). - The last PMC members were added on Feb 15th, 2016 (Joseph Bradley, Sean Owen and Yin Huai).
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - The community released Apache Spark 2.1.0 on Dec 28 with a variety of new features for the 2.x branch, most notably improvements to streaming (http://spark.apache.org/releases/spark-release-2-1-0.html). We also released Spark 2.0.2 on Nov 14 with bug fixes for the 2.0.x branch. - The Spark Summit East conference is running Feb 7th to 9th in Boston. - We've continued discussions on a "Spark Improvement Proposal" format for documenting large proposed additions over the dev list and are converging towards a final version that we want to post on our website. Trademarks: - We are continuing engagement with various organizations. Latest releases: - Dec 28, 2016: Spark 2.1.0 - Nov 14, 2016: Spark 2.0.2 - Nov 07, 2016: Spark 1.6.3 - Oct 03, 2016: Spark 2.0.1 - July 26, 2016: Spark 2.0.0 Committers and PMC: - The last committers were added on Jan 24th, 2017 (Holden Karau and Burak Yavuz). - The last PMC members were added on Feb 15th, 2016 (Joseph Bradley, Sean Owen and Yin Huai).
@Shane: follow up on brand action item
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - The community released Apache Spark 2.0.1 on October 3rd, 2016 as the first patch release for the 2.x branch. We also released Spark 1.6.3 on November 7th to continue patching the 1.x branch, and started voting on release candidates for Spark 2.0.2 with more patches to 2.x. - The Spark Summit Europe conference ran in Brussels on Oct 25-27 with around 1000 attendees, including presentations on new use cases at Microsoft and Facebook. - There've been several discussions on the dev list about making the development process easier to follow and giving feedback to contributors faster. One concrete thing we'd like to implement is a process to post "improvement proposals" scoping a new feature before detailed design begins, so that developers can solicit feedback from users earlier, and users can easily see the project's high-level roadmap in one place. The most recent writeup on this is at https://s.apache.org/ndAX and seems to be welcomed by contributors who've used a similar process in other ASF projects. Other things that contributors are working on are creating a template for design documents and cleaning up JIRA. Trademarks: - We are continuing engagement with various organizations. Latest releases: Nov 07, 2016: Spark 1.6.3 Oct 03, 2016: Spark 2.0.1 July 26, 2016: Spark 2.0.0 June 25, 2016: Spark 1.6.2 May 26, 2016: Spark 2.0.0-preview Committers and PMC: The last committer was added on Sept 29th, 2016 (Xiao Li). The last PMC members were added on Feb 15th, 2016 (Joseph Bradley, Sean Owen and Yin Huai).
@Shane: Follow up with PMC and legal regarding potential trademark issues with a vendor
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - The community released Apache Spark 2.0 on July 26, 2016. This was a big release after nearly 6 months of effort that puts in a strong foundation for the 2.x line and multiple new components while remaining highly compatible with 1.x. Full release notes are available at http://spark.apache.org/releases/spark-release-2-0-0.html. Trademarks: - We posted a trademarks summary page on our website after discussions with trademarks@ to let users easily find out about the trademark policy: https://spark.apache.org/trademarks.html - We are continuing engagement with the organizations discussed earlier. Latest releases: - July 26, 2016: Spark 2.0.0 - June 25, 2016: Spark 1.6.2 - May 26, 2016: Spark 2.0.0-preview - Mar 9, 2016: Spark 1.6.1 - Jan 4, 2016: Spark 1.6.0 Committers and PMC: The last committer was added on August 6th, 2016 (Felix Cheung). The last PMC members were added Feb 15th, 2016 (Joseph Bradley, Sean Owen and Yin Huai)
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - The community is continuing to make progress towards its 2.0 release, with two release candidates having been posted. Apache Spark 2.0 is a major release that includes a new SQL-based high-level streaming API, machine learning model persistence, and cleanup of Spark's dependencies and internal APIs. The full list of changes in Apache Spark 2.0 is available at http://s.apache.org/spark-2.0-features. - We released Spark 1.6.2 on June 26th, with bug fixes for the 1.6 branch of the project (https://s.apache.org/spark-1.6.2). Trademarks: - The PMC is engaging with several third parties that are using Spark in product names, branding, etc. - The PMC has been working on a page about trademark guidelines to include on the Spark website (https://s.apache.org/PaXo). It would be great to get feedback on this (several board members said it was a good idea to create such a page after we suggested it in our last report). - To make the project's association with the ASF clearer in news articles and corporate materials, we have updated its logo to include "Apache": https://s.apache.org/Jf7J. This change is live on the website, JIRA, etc. Latest releases: June 25, 2016: Spark 1.6.2 May 26, 2016: Spark 2.0.0-preview Mar 9, 2016: Spark 1.6.1 Jan 4, 2016: Spark 1.6.0 Nov 09, 2015: Spark 1.5.2 Committers and PMC: The last committer was added on May 23, 2016 (Yanbo Liang). The last PMC members were added Feb 15, 2016 (Joseph Bradley, Sean Owen and Yin Huai)
Working with IBM to resolve the trademark issues is critical.
Shane: We need to think through the question of whether a simple "foo.x" is ever ok where foo is an Apache project name and x is any top level domain.
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - The community is in the QA phase for Spark 2.0, our second major version since joining Apache. There are a large number of additions in 2.0, including a higher-level streaming API, improved runtime code generation for SQL, and improved export for machine learning models. We are also using this release to clean up some experimental APIs, remove some dependencies, add support for Scala 2.12. The full list of changes is available at http://s.apache.org/spark-2.0-features. We also released a 2.0.0-preview package to let users broadly participate in testing the new APIs. - We released Spark 1.6.1 in March, with bug fixes for the 1.6 branch. - For Apache Spark 2.0, the community decided to move some of the less used data source connectors for Spark Streaming to a separate project, Apache Bahir (http://bahir.apache.org). We proposed a new project in order to maintain ASF governance of these components. - The project removed the role of "maintainers" for reviewing changes to specific components (originally added 1.5 years ago) in response to concerns from some ASF members that it makes the project appear less welcoming, as well as the conclusion that it did not have a noticeable impact in practice (https://s.apache.org/DUTB, https://s.apache.org/AgCt). Trademarks: In the past few weeks, there have been several discussions asking for more attention to trademark use from the PMC. Some of the main issues were: - A vendor offering a "technical preview" package of Apache Spark 2.0 before there was any official PMC release. - A vendor claiming to offer "early access" to the project's roadmap. - Various corporate and open source products whose name includes "Spark". - Corporate pages were the most prominent mention says "Spark" instead of "Apache Spark". The PMC is addressing these issues in several ways: - Reaching out to the organizations involved. - To make the project's association with the ASF clearer in news articles and corporate materials, we are working to update the logo to include "Apache": https://s.apache.org/Jf7J. We also added a FAQ entry about using the logo that links to the ASF trademarks page. - Continuing to review news articles, product announcements, etc. - Starting with this board report, we will have a section on trademarks in our reports to track brand activity. - Question for the board: Would it be helpful to put a summary of the trademark policy on spark.apache.org? It would be nice to have this more visible (e.g. in the site's navigation menu), but either way is fine. We can draft a version and sent it to trademarks@. Events: - The Spark Summit community conference in San Francisco ran June 6-8. There were close to 100 talks from at least 50 organizations. Latest releases: - May 26, 2016: Spark 2.0.0-preview - Mar 9, 2016: Spark 1.6.1 - Jan 4, 2016: Spark 1.6.0 - Nov 09, 2015: Spark 1.5.2 - Oct 02, 2015: Spark 1.5.1 Committers and PMC: - The last committer was added on May 23, 2016 (Yanbo Liang). - The last PMC members were added Feb 15, 2016 (Joseph Bradley, Sean Owen and Yin Huai)
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - The community is entering the QA phase for Spark 2.0, our second major version since joining Apache. There are a large number of additions in 2.0, including a higher-level streaming API, improved runtime code generation for SQL, and improved export for machine learning models. We are also using this release to clean up some experimental APIs, remove some dependencies, add support for Scala 2.12. The full list of changes is available at http://s.apache.org/spark-2.0-features. - We released Spark 1.6.1 in March, with bug fixes for the 1.6 branch. In general, we have seen fast adoption of Spark 1.6, with many organizations adding support right away. - For Apache Spark 2.0, the community decided to move some of the lesser used data source connectors for Spark Streaming to a separate ASF project, which has been proposed as Apache Bahir. We proposed a new project in order to maintain ASF governance of these components. - In the past few weeks, there have been several discussions asking for more attention to trademark use from this PMC. Some of the main issues were: - A vendor offering a "technical preview" package of Apache Spark 2.0 before there was any official PMC release. - A vendor claiming to offer "early access" to the project's roadmap. - Multiple vendors offering products were one component is labeled "Spark", without this component being an ASF release. - Corporate pages were the most prominent mention says "Spark" instead of "Apache Spark". In response to these issues, we will be reviewing all corporate uses of "Spark" on the trademarks list in the coming weeks and working to clarify the trademark rules on the project website as well as within the PMC and committer community. Latest releases: Mar 9, 2016: Spark 1.6.1 Jan 4, 2016: Spark 1.6.0 Nov 09, 2015: Spark 1.5.2 Oct 02, 2015: Spark 1.5.1 Sept 09, 2015: Spark 1.5.0 Committers and PMC: The last committers were added on Feb 8, 2016 (Wenchen Fan) and Feb 3, 2016 (Herman von Hovell). The last PMC members were added Feb 15, 2016 (Joseph Bradley, Sean Owen and Yin Huai) Mailing list stats: 4509 subscribers to user list (up 249 in the last 3 months) 2570 subscribers to dev list (up 173 in the last 3 months)
Report was not approved; a report with more details is requested for next month.
Shane wants Spark to take ownership of the trademark issues; and for individuals on the PMC who work for companies in this space to ensure that their companies are exemplars. Trademarks won't engage until there is some evidence that there is a reasonable attempt made by the PMC.
Jim thanked Matei for attending, and outlined possible future actions the board might take if these concerns are not addressed.
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We posted our 1.6.0 release in January, with contributions from 248 developers. This release included a new typed API for working with DataFrames, faster state management in Spark Streaming, support for persisting and loading ML pipelines, various optimizations, and a variety of new advanced analytics APIs. Full release notes are at http://spark.apache.org/releases/spark-release-1-6-0.html. - We are currently collecting changes for a Spark 1.6.1 maintenance release, which will likely happen within several weeks. - The community also agreed to make our next release 2.0, which will be a chance to fix small dependency and API problems in addition to releasing new features. Partial list of planned changes: http://s.apache.org/spark-2.0-features. Latest releases: Jan 4, 2016: Spark 1.6.0 Nov 09, 2015: Spark 1.5.2 Oct 02, 2015: Spark 1.5.1 Sept 09, 2015: Spark 1.5.0 Committers and PMC: The last committers were added on Feb 8, 2016 (Wenchen Fan) and Feb 3, 2016 (Herman von Hovell). We just voted in three PMC members on Feb 10, 2016 (Joseph Bradley, Sean Owen, Yin Huai). Mailing list stats: 4249 subscribers to user list (up 286 in the last 3 months) 2380 subscribers to dev list (up 196 in the last 3 months)
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We posted our 1.5.0 release in June, with contributions from 230 developers. This release included many new APIs throughout Spark, more R support, UI improvements, and the start of a new low-level execution layer that acts directly on binary data (Tungsten). It had the most contributors of any release so far. Full release notes are at http://spark.apache.org/releases/spark-release-1-5-0.html. - We made a Spark 1.5.1 maintenance release in October and a Spark 1.5.2 release this week with bug fixes to the 1.5 line. - The community is currently QAing Spark 1.6.0, which is expected to come out in about a month based on the QA process. Some notable features include a type-safe API on the Tungsten execution layer and better APIs for managing state in Spark Streaming. Latest releases: Nov 09, 2015: Spark 1.5.2 Oct 02, 2015: Spark 1.5.1 Sept 09, 2015: Spark 1.5.0 July 15, 2015: Spark 1.4.1 Committers and PMC: The last committers added were on July 20th, 2015 (Marcelo Vanzin) and June 8th, 2015 (DB Tsai). The last PMC members were added August 12th, 2014 (Joseph Gonzalez and Andrew Or). Mailing list stats: 3946 subscribers to user list (up 419 in the last 3 months) 2181 subscribers to dev list (up 211 in the last 3 months)
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We posted our 1.4.0 release in June, with contributions from 210 developers. The biggest addition was support for the R programming language, along with many improvements in debugging tools, built-in libraries, SQL language coverage, and machine learning functions (http://spark.apache.org/releases/spark-release-1-4-0.html). - We posted a Spark 1.4.1 maintenance release in July. - We've started the QA process for Spark 1.5.0, which should be released in around one month. The biggest features here are large performance improvements for Spark SQL / DataFrames, as well as further enriched support for R (e.g. exposing Spark's machine learning libraries in R). Latest releases: July 15, 2015: Spark 1.4.1 June 11, 2015: Spark 1.4.0 April 17, 2015: Spark 1.2.2 and 1.3.1 March 13, 2015: Spark 1.3.0 Committers and PMC: The last committers added were on July 20th, 2015 (Marcelo Vanzin) and June 8th, 2015 (DB Tsai). The last PMC members were added August 12th, 2014 (Joseph Gonzalez and Andrew Or). Mailing list stats: 3501 subscribers to user list (up 493 in the last 3 months) 1947 subscribers to dev list (up 255 in the last 3 months)
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, and Python as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We posted our 1.3.0 release in March, with contributions from 174 developers. Major features included a DataFrame API for working with structured data, a pluggable data source API, streaming input source improvements, and many new machine learning algorithms (http://spark.apache.org/releases/spark-release-1-3-0.html). - We posted the Spark 1.2.2 and 1.3.1 maintenance releases in April. - We cut a release branch and started QA for Spark 1.4.0, which should be released in June. The biggest feature there is R language support, along with SQL window functions, support for new Hive versions, and quite a few improvements to debugging and monitoring tools. Latest releases: April 17, 2016: Spark 1.2.2 and 1.3.1 March 13, 2015: Spark 1.3.0 February 9, 2015: Spark 1.2.1 December 18, 2014: Spark 1.2.0 Committers and PMC: We voted to add four new committers on May 2nd, 2015 (Sandy Ryza, Yun Huai, Kousuke Saruta, Davies Liu) The last PMC members were added August 12th, 2014 (Joseph Gonzalez and Andrew Or). Mailing list traffic: 2979 subscribers to user list, 7469 emails in past 3 months 1692 subscribers to dev list, 1622 emails in past 3 months
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, and Python as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: - We posted our 1.2.0 release in December, with contributions from 172 developers. Major features included stable APIs for Spark's graph processing module (GraphX), a high-level pipeline API for machine learning, an external data source API, better H/A for streaming, and networking performance optimizations. - We posted the Spark 1.2.1 maintenance release on February 9th, with contributions from 69 developers. - We cut a release branch and started QA for Spark 1.3.0, which should be released sometime in March. Some features coming there include a data frame API similar to R and Python, write support for external data sources, and quite a few new machine learning algorithms. - We had a discussion about adding a committer role to the project that is separate from PMC (before, Spark had PMC = C) to bring in people sooner, and decided to do that from this point on. Releases: Our last few releases were: February 9, 2015: Spark 1.2.1 December 18, 2014: Spark 1.2.0 November 26, 2014: Spark 1.1.1 September 11, 2014: Spark 1.1.0 Committers and PMC: The last committers were added February 2nd, 2015 (Joseph Bradley, Cheng Lian and Sean Owen) The last PMC members were added August 12th, 2014 (Joseph Gonzalez and Andrew Or).
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, and Python as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: This has been an eventful three months for Spark. Some major happenings are: - We posted our 1.1.0 release in September, with contributions from 171 developers (our largest number yet). Major features were performance and scalability optimizations, JSON import and schema inference in Spark SQL, feature extraction and statistics libraries, and a JDBC server. - We recently cut a release branch and started QA for Spark 1.2.0, which is targeted for release in December. - Apache Spark won this year's large-scale sort benchmark (http://sortbenchmark.org/), sorting 100 TB of data 3x faster than the previous record. It tied with a MapReduce-like system optimized for sorting. - The community voted to implement a maintainer model for reviewing some modules, where changes in architecture and API should be reviewed by a maintainer before a merge (http://s.apache.org/Dqz). There was concern from some external commenters (Greg Stein, Arun Murthy, Vinod Vavilapalli) that this reduces the power of each PMC member (requiring a review from a specific set of people); we are looking to test how this works and possibly tweak the model. Releases: Our last few releases were: September 11, 2014: Spark 1.1.0 August 5, 2014: Spark 1.0.2 July 23, 2014: Spark 0.9.2 July 11, 2014: Spark 1.0.1 Committers and PMC: The last committers and PMC members were added August 12, 2014 (Joseph Gonzalez and Andrew Or).
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, and Python as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: Spark made its 1.0.0 release on May 30th, bringing API stability for the 1.X line and a variety of new features. The community is now QAing the 1.1.0 branch for release later this month. (We follow a regular 3-month schedule for releases.) The community held a user conference, Spark Summit, in July, sponsored by 25 companies. We continue to see growth in the number of users and contributors, with over 120 people contributing to 1.1.0. Some of the big features in 1.1 include JSON loading in Spark SQL, a new statistics library, streaming machine learning algorithms, improvements to the Python API, and many stability and performance improvements. Releases: Our last few releases were: August 5, 2014: Spark 1.0.2 July 23, 2014: Spark 0.9.2 July 11, 2014: Spark 1.0.1 May 30, 2014: Spark 1.0.0 Committers and PMC: We closed votes to add two new committers and PMC members on August 7th. Before that, we added two committers and PMC members in May 2014.
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala and Python as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: The project is closing out the work for its 1.0.0 release, which will be a major milestone introducing both new functionality and API compatibility guarantees across the 1.X series. We’ve had one release candidate posted and are working on the next after a period of heavy QA. The project continues to see fast community growth — over 100 people submitted patches for 1.0. Some of the major features in 1.0 include: - A new Spark SQL component for accessing structured data within Spark programs - Java 8 lambda syntax support to make Spark programming in Java easier - Sparse data support, model evaluation, matrix algorithms and decision trees in MLlib - Long-lived monitoring dashboard - Common job submission script for all cluster managers - Revamped docs including new detailed docs for all the ML algorithms - Full integration with Hadoop YARN security model - API stability across the entire 1.X line Releases: Our last few releases were: Apr 9, 2014: Spark 0.9.1 Feb 2, 2014: Spark 0.9.0-incubating Dec 19, 2013: Spark 0.8.1-incubating Sept 25, 2013: Spark 0.8.0-incubating Committers and PMC: We just opened votes for two new committers and PMC members on May 12th. The last committers and (podling) PMC members were added on Dec 22, 2013
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala and Python as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: --------------- The project recently became a TLP and continues to grow in terms of community size. We finished switching our infrastructure to spark.apache.org, including recently importing our JIRA instance. We completed the vote on a 0.9.1 minor release last week (it will be posted on April 9th), and we reached the feature freeze and QA point for our 1.0 release, which is coming in a few weeks. Apart from the new features coming in 1.0, a major update in the community has been a change towards a Semantic Versioning-like policy, where maintenance releases are clearly marked and API compatibility is preserved across all minor releases (i.e. all 1.x.y will be compatible). This has been put in action for both 0.9.x and 1.x. Releases: --------- Our last few releases were: Apr 9, 2014: Spark 0.9.1 Feb 2, 2014: Spark 0.9.0-incubating Dec 19, 2013: Spark 0.8.1-incubating Sept 25, 2013: Spark 0.8.0-incubating Committers and PMC: ------------------- The last committers and (podling) PMC members were added on Dec 22, 2013.
Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala and Python as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Project status: The project recently became a TLP and continues to grow in terms of community size. We switched all our infrastructure out of the incubator and to spark.apache.org domains / repos (though the old site still needs a redirect). We have a new minor release being finalized for later this month, and a Spark 1.0 release targeting end of April. Recent activity includes new machine learning algorithms, updating the Spark Java API to work with Java 8 lambda syntax, Python API extensions, and improved support for Hadoop YARN. Releases: Our last few releases were: Feb 2, 2014: Spark 0.9.0-incubating Dec 19, 2013: Spark 0.8.1-incubating Sept 25, 2013: Spark 0.8.0-incubating Committers and PMC: The last committers and (podling) PMC members were added on Dec 22, 2013.
WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to fast and flexible large-scale data analysis on clusters. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache Spark Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Spark Project be and hereby is responsible for the creation and maintenance of software related to fast and flexible large-scale data analysis on clusters; and be it further RESOLVED, that the office of "Vice President, Apache Spark" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Spark Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Spark Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Spark Project: * Mosharaf Chowdhury <mosharaf@apache.org> * Jason Dai <jasondai@apache.org> * Tathagata Das <tdas@apache.org> * Ankur Dave <ankurdave@apache.org> * Aaron Davidson <adav@apache.org> * Thomas Dudziak <tomdz@apache.org> * Robert Evans <bobby@apache.org> * Thomas Graves <tgraves@apache.org> * Andy Konwinski <andrew@apache.org> * Stephen Haberman <stephenh@apache.org> * Mark Hamstra <markhamstra@apache.org> * Shane Huang <shane_huang@apache.org> * Ryan LeCompte <ryanlecompte@apache.org> * Haoyuan Li <haoyuan@apache.org> * Sean McNamara <smcnamara@apache.org> * Mridul Muralidharan <mridulm80@apache.org> * Kay Ousterhout <kayousterhout@apache.org> * Nick Pentreath <mlnick@apache.org> * Imran Rashid <irashid@apache.org> * Charles Reiss <woggle@apache.org> * Josh Rosen <joshrosen@apache.org> * Prashant Sharma <prashant@apache.org> * Ram Sriharsha <harsha@apache.org> * Shivaram Venkataraman <shivaram@apache.org> * Patrick Wendell <pwendell@apache.org> * Andrew Xia <xiajunluan@apache.org> * Reynold Xin <rxin@apache.org> * Matei Zaharia <matei@apache.org> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matei Zaharia be appointed to the office of Vice President, Apache Spark, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the Apache Spark Project be and hereby is tasked with the migration and rationalization of the Apache Incubator Spark podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator Spark podling encumbered upon the Apache Incubator Project are hereafter discharged. Special Order 7C, Establish the Apache Spark Project, was approved by Unanimous Vote of the directors present.
Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. Spark has been incubating since 2013-06-19. Three most important issues to address in the move towards graduation: 1. Pretty much the only issue remaining is importing our old JIRA into Apache (https://issues.apache.org/jira/browse/INFRA-6419). Unfortunately, although we've been trying to do this since June, we haven't had much luck with it, as the INFRA people who tried to help out have been busy and software version numbers have often been incompatible (we have a hosted JIRA instance from Atlassian that they regularly update). We believe that there are some export dumps on that issue that are compatible with the ASF's current JIRA version, but if we can't get this resolved in the next 2-3 weeks, we may simply forgo importing our old issues. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? It would be really great to get a contact who can sit down with us and do the JIRA import. We're not sure who from INFRA leads these tasks. How has the community developed since the last report? We made a Spark 0.8.1 release in December, and are working on a new major release (0.9) this month. We added two new committers, Aaron Davidson and Kay Ousterhout. How has the project developed since the last report? We made the Spark 0.8.1 release mentioned above, with a number of new features detailed at http://spark.incubator.apache.org/releases/spark-release-0-8-1.html. We also have some exciting features coming up in Spark 0.9, such as support for Scala 2.10, parallel machine learning libraries in Python, and improvements to Spark Streaming. Date of last release: 2013-12-19 When were the last committers or PMC members elected? 2013-12-30 Signed-off-by: [ ](spark) Chris Mattmann [ ](spark) Paul Ramirez [ ](spark) Andrew Hart [ ](spark) Thomas Dudziak [X](spark) Suresh Marru [X](spark) Henry Saputra [X](spark) Roman Shaposhnik Shepherd/Mentor notes: Alan Cabrera (acabrera): Seems like a nice active project. IMO, there's no need to wait import to JIRA to graduate. Seems like they can graduate now.
Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. Spark has been incubating since 2013-06-19. Three most important issues to address in the move towards graduation: 1. Move JIRA over to Apache (still haven't gotten success from INFRA on this: https://issues.apache.org/jira/browse/INFRA-6419) 2. Add more committers under Apache process 3. Make further Apache releases Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? We still need some help importing our JIRA -- see INFRA-6419. For some reason we've had a lot of trouble with this. It should be easier now because Apache's JIRA was updated and now matches our version. How has the community developed since the last report? We made the Spark 0.8.0 release, which was the biggest so far, with 67 developers from 24 organizations contributing. The release shows how far our community has grown -- our 0.6 release last October had only 17 contributors, and our 0.7 release in February had 31. Most of the contributors are now external to the original UC Berkeley team. How has the project developed since the last report? We made the Spark 0.8.0 release mentioned above, which so far seems to be doing well. It brings a number of deployability features, improved Python support, and a new standard library for machine learning; see http://spark.incubator.apache.org/releases/spark-release-0-8-0.html for what's new in the release. Date of last release: 2013-09-25 When were the last committers or PMC members elected? June 2013 Signed-off-by: [X](spark) Chris Mattmann [ ](spark) Paul Ramirez [ ](spark) Andrew Hart [ ](spark) Thomas Dudziak [ ](spark) Suresh Marru [X](spark) Henry Saputra [X](spark) Roman Shaposhnik Shepherd notes: Dave Fisher (wave): Very active community on a fast track. Good report. Get your JIRA over and you are getting close. (Oct. 7) Marvin Humphrey (marvin): Report not filed in time for shepherd review.
Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. Spark has been incubating since 2013-06-19. Three most important issues to address in the move towards graduation: 1. Make a first Apache release (we're in the final stages of this) 2. Move JIRA over to Apache (https://issues.apache.org/jira/browse/INFRA-6419) 3. Move development to Apache repo (in progress) Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? We still need some help importing our JIRA, though Michael Joyce and INFRA have looked into it (see <http://s.apache.org/fi>). How has the community developed since the last report? We're continuing to get a lot of great contributions to Spark. UC Berkeley also recently hosted a two-day training on Spark and related technologies (http://ampcamp.berkeley.edu/3/) that was highly attended -- we sold out at over 200 on-site attendees, and had 1000+ people watch online. User meetups included a well-attended meetup on Shark (Hive on Spark) contributions at Yahoo!. How has the project developed since the last report? We've made a lot of progress towards a first Apache release of Spark, including changing the package name to org.apache.spark, documenting the third-party licenses as required in LICENSE / NOTICE, and updating the documentation to reflect the transition. This month we've also moved our website to an apache.org domain (http://spark.incubator.apache.org) and updated the branding there. Finally, on the code side, we have continued to make bug fixes and improvements for the 0.8 release. Some recently merged improvements include simplified packaging and Python API support for Windows. Date of last release: No Apache releases yet When were the last committers or PMC members elected? June 2013 Signed-off-by: [x](spark) Chris Mattmann [ ](spark) Paul Ramirez [x](spark) Andrew Hart [ ](spark) Thomas Dudziak [x](spark) Suresh Marru [x](spark) Henry Saputra [x](spark) Roman Shaposhnik Shepherd notes:
Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. Spark has been incubating since 2013-06-19. Three most important issues to address in the move towards graduation: 1. Finish bringing up Apache infrastructure (the only system missing is JIRA, but we also still need to move out website to Apache) 2. Switch development to work directly against Apache repo 3. Make a Spark 0.8 release through the Apache process Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? Nothing major. We've gotten a lot of help setting up infrastructure and the last piece missing is importing issues from our old JIRA, which we're working with INFRA on (https://issues.apache.org/jira/browse/INFRA-6419). How has the community developed since the last report? We've continued to get and accept a number of external contributions, including metrics infrastructure, improved web UI, several optimizations and bug fixes. We held a meetup on machine learning on Spark in San Francisco that got around 200 attendees. Finally, we've set up Apache mailing lists and warned users of the migration, which will complete at the beginning of September. How has the project developed since the last report? We are finishing some bug fixes and merges to do a first Apache release of Spark later this month. During this release we'll go through the process of checking that the right license headers are in place, NOTICE file is present, etc, and we'll complete a website on Apache. Date of last release: None yet. Signed-off-by: [X](spark) Chris Mattmann [ ](spark) Paul Ramirez [ ](spark) Andrew Hart [ ](spark) Thomas Dudziak [X](spark) Suresh Marru [X](spark) Henry Saputra [X](spark) Roman Shaposhnik Shepherd notes:
Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. Spark has been incubating since 2013-06-19. Three most important issues to address in the move towards graduation: 1. Finish bringing up infrastructure on Apache (JIRA, "user" mailing list, SVN repo for website) 2. Migrate mailing lists and development to Apache 3. Make a Spark 0.8 under the Apache Incubator Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? While most of our infrastructure is now up, it has taken a while to get a JIRA, a SVN repo for our website (so we can use the CMS), and a user@spark.incubator.apache.org mailing list (so we can move our existing user list, which is large). How has the community developed since the last report? We only entered the Apache Incubator at the end of June, but in the existing developer community keeps expanding and we are seeing many new features from new contributors. How has the project developed since the last report? In terms of the Apache incubation process, we filed our IP papers and got a decent part of the infrastructure set up (Git, dev list, wiki, Jenkins group). Date of last release: None Signed-off-by: [X](spark) Chris Mattmann [ ](spark) Paul Ramirez [ ](spark) Andrew Hart [ ](spark) Thomas Dudziak [X](spark) Suresh Marru [x](spark) Henry Saputra [ ](spark) Roman Shaposhnik Shepherd notes: