
This was extracted (@ 2025-02-19 21:10) from a list of minutes
which have been approved by the Board.
Please Note
The Board typically approves the minutes of the previous meeting at the
beginning of every Board meeting; therefore, the list below does not
normally contain details from the minutes of the most recent Board meeting.
WARNING: these pages may omit some original contents of the minutes.
Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).
## Description: The mission of Apache MADlib is the creation and maintenance of software related to a scalable, Big Data, SQL-driven machine learning framework for Data Scientists ## Project Status: Current project status: The project is essentially in a stalled state. Issues for the board: We will be taking forward the discussion to formalize moving Apache MADlib closer to an active community (i.e. Apache Cloudberry (Incubating)) project. Two PMC members Roman Shaposhnik (rvs) and Greg Chase are in support of this initiative. I (Ed Espino) am on the Apache Cloudberry PPMC. I will formalize the discussion the potential move under the Apache Cloudberry umbrella of components (see Plan forward (proposal) below). ## Membership Data: Community changes, past quarter: - No new PMC members. Last addition was Ed Espino on 2023-03-22. - No new committers. Last addition was Ed Espino on 2023-03-23. ## Project Activity: - No releases have taken place. - No releases are planned. ## Community Health: The only communication has been on the PMC level to discuss the project's options moving forward. For reference, this is what was sent with only Roman Shaposhnik and Greg Chase responding: ## Plan forward (proposal) As I have conversations with both projects and communities, I could use some guidance on what process to follow. I have not observed any movement of one project into another. Any guidance would be greatly appreciated. Here is a proposal. Initial Discussion and Consensus Building - First, formalize discussions within both communities through their respective dev mailing lists - Get clear confirmation from the MADlib PMC about their willingness to move under Cloudberry - Document support from both communities' members - Create a written proposal outlining the benefits and technical synergies Technical Assessment - Create an inventory of MADlib's current assets (codebase, documentation, infrastructure) - Identify potential integration points with Cloudberry - Draft a technical migration plan - Consider any naming/branding implications Community Alignment - Share merger details with communities - Address any concerns from either community - Establish how the existing MADlib committers and PMC members would transition - Define the governance model post-merger Apache Infrastructure & Legal - Consult with ASF Legal regarding any IP or licensing considerations - Plan repository migration strategy - Document infrastructure needs (CI/CD, websites, mailing lists) Formal Proposal - Draft a formal proposal for the ASF Board - Current state of both projects - Rationale for the merger - Community support evidence - Technical migration plan - Timeline - Post-merger governance structure Implementation - Get board approval - Execute technical migration - Update documentation and websites - Announce to broader Apache community
## Description: The mission of Apache MADlib is the creation and maintenance of software related to a scalable, Big Data, SQL-driven machine learning framework for Data Scientists ## Project Status: Current project status: At Risk - The project is mostly inactive due to the resignation of Broadcom's PMC members, following the decision to transition Greenplum Database and related repositories to a closed-source model. Issues for the board: A few of us, including Roman Shaposhnik (rvs@apache.org), have discussed the possibility of joining the Cloudberry project as a next step. The Apache Incubation proposal for Cloudberry can be found here: https://lists.apache.org/thread/qzfb38dzb1x3cg29snq4doy95gd6pzy8 If joining Cloudberry is not an option, we need to determine if the Cloudberry project can still engage with Apache MADlib to address the gap left by Greenplum Database. ## Membership Data: Apache MADlib was founded 2017-07-18 (7 years ago) There are currently 17 committers and 5 PMC members in this project. The Committer-to-PMC ratio is roughly 3:1. Community changes, past quarter: - No new PMC members. - No new committers. ## Project Activity: - v2.1.0 released on 2023-09-08 - v2.0.0 was released on 2023-06-23 - v1.21.0 was released on 2023-03-01 ## Community Health: - Aside from last quarter's roll call and requests for input (gone unswered) from dev and pmc members, there is virtually no active community.
@Rich: find out about attic plans
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Project Status: - The project is in a critical state due to several factors (below). - Apache MADlib provides support for PostgreSQL and Greenplum Database. - Without any public announcement from Broadcom or the in the Greenplum open source community, on May 24, 2024, the majority of Greenplum's GitHub repositories have been archived (https://github.com/greenplum-db). The Greenplum slack instance and supporting mail lists have been deactivated. - Broadcom is the corporate backer of the Greenplum Database and supporting components. One of those components is Apache MADlib. - A PMC request for input on the Greenplum closed source move went unanswered. "Input on Validating MADlib Branches and Releases Against Greenplum" - Christofer Dutz performed a roll call (Thu, Jun 13, 2024) of the 12 PMC members, eight members who did not respond were removed from the PMC. ## Community Health: - On July 7th, 2024 - Dianjing Wang, the Community Manager for the Cloudberry Database, a derivative of the Greenplum Database. has expressed interest in contributing Apache MADlib. https://lists.apache.org/list?dev@madlib.apache.org:2024-7 - The Cloudberry Database open source project is currently hosted on GitHub (https://github.com/cloudberrydb). The are working on an Apache incubation proposal: https://cwiki.apache.org/confluence/display/INCUBATOR/CloudberryProposal ## Mailing list activity: - Aside from roll call and requests for input (gone unswered) from dev and pmc members, there is virtually not active community. ## Releases: - v2.1.0 released on 2023-09-08 - v2.0.0 was released on 2023-06-23 - v1.21.0 was released on 2023-03-01
No report was submitted.
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Project Status: - Two community developers (Nikhil Kak & Ekta Khanna) participated in addressing MADLIB-1517. As part of that work, several project updates (NOTICE year, Wiki), were performed as well. - Community will mainly continue to work on any bug fixes for existing features based on severity and no new features planned for upcoming releases. - With low bugs reported there are no plans for a new release anytime soon. - The project is still actively used by the Greenplum Database project (mainly sponsored by Broadcom). - The Greenplum Database project continues to strive to improve adoption of Apache MADlib by ensuring that plans generated by Greenplum are more performant. This work however is on the Greenplum side rather than MADlib. ## Project Activity: - Release 2.1.0 occurred on September 8, 2023 which was the 13th release as an Apache TLP project. - Community plans to work on the following JIRAs for the next release: * Fix empty string handling of grouping columns in regression model training ## Community Health: - The community is small with 2 active committers since last report. - Community focus is mainly on adoption of existing features and fixing bugs as reported. - There are no future releases planned. ## Membership Data: - Currently stands at 12 PMC members, no new members added since last report - Last addition was Chris Hajas on 2023-03-22. ## Committer base changes: - Currently 23 committers, no new committers since last report. - Last addition was David Kimura on 2023-03-23. ## Releases: - v2.1.0 released on 2023-09-08 - v2.0.0 was released on 2023-06-23 - v1.21.0 was released on 2023-03-01 ## Mailing list activity: No activity on mailing lists since release announcement.
@Christofer: pursue a roll call vote
## Description: Apache MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical, graph and machine learning methods for structured and unstructured data. ## Project Status: - The project has been relatively quiet the past five months. - There are no issues requiring board attention at this time. ## Membership Data: Apache MADlib was founded 2017-07-18 (6 years ago) There are currently 23 committers and 12 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. ## Project Activity: - No releases have occurred since last project report (Oct 2023). - Two issues (improvement: MADlib-1516, bug: MADLIB-1517) may be worked on for the next release. ## Community Health: We continue to have good voting participation from the newly formed PMC members.
@Rich: follow up on MADlib report prior to April
## Description: Apache MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical, graph and machine learning methods for structured and unstructured data. ## Project Status: - On the Apache MADlib v2 code base, the project completed its first minor (2.1.0) release. - The project is maintaining a healthy Jira issue management level. - There are no issues requiring board attention at this time. ## Membership Data: Apache MADlib was founded 2017-07-18 (6 years ago) There are currently 23 committers and 12 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. ## Project Activity: Apache MADlib v2.1.0 was released on 2023-09-08 Improvements - Build: Fix PG 15 support - Assoc_rules: Fix SERIAL cache issue - DL: Remove SERIAL from load_keras_model - Build: Add ubuntu flag for PyXB installation - Build: Add the actual path of $libdir to dynamic_library_path - Build: Remove PyXB as a packaged dependency and replace it with external pyxb-x dependency. - Build: Use PG15 in Jenkins CI - CRF: Fix anyarray -> anycompatiblearray change for PG14 Release Manager - Orhan Kislal Vote Results - The vote for releasing Apache MADlib 2.1.0 (RC2) passed with 4 binding +1s and no 0 or -1 votes. ## Community Health: We continue to have good voting participation from the newly formed PMC members.
@Sharan: follow up on committer and PMC membership changes
## Description Apache MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical, graph and machine learning methods for structured and unstructured data. ## Project Status - The project completed a major (2.0.0) release. This is an important milestone for the project. - The project is maintaining a healthy Jira issue management level. - There are no issues requiring board attention at this time. ## Membership Data Apache MADlib was founded 2017-07-18 (6 years ago) There are currently 23 committers and 12 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. ## Project Activity * v2.0.0 was released on 2023-06-23 * New features - Build: Add support for python3 - Build: Add support for GP7 Beta, GP6 python3 extension, Postgres 13/14/15 * Improvements - XGBoost: Add support for version 1.7.5 - DL: Add support for tensorflow 2.10.1 and keras 2.10.0 - DBScan: Add support for rtree 1.0.1 ** Release Manager Ed Espino served in the release manager capacity. ** Vote Results The vote for releasing Apache MADlib 2.0.0 (RC1) passed with 4 binding +1s and no 0 or -1 votes. * Upgrading from MADlib 1.X to MADlib 2.X is not supported. ## Community Health * We continue to have good voting participation from the newly formed PMC members. * The dev community has started discussions on expanding the Apache MADlib adoption by discussing various projects (PostgreSQL, other PostgreSQL compliant backends - e.g. https://neon.tech/).
## Description: Apache MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical, graph and machine learning methods for structured and unstructured data. ## Issues: - There are no issues requiring board attention at this time. ## Membership Data: At the ASF Board Meeting on March 22, 2023, Roman Shaposhnik (rvs) put forth a proposal to "Reboot the Apache MADlib Project PMC". The proposal passed unanimously. Here is the new PMC roster: - Atri Sharma (2017-07-19) - Chris Hajas (2023-03-23) - David Kimura (2023-03-23) - Ed Espino (2023-03-23) - Ekta Khanna (2021-02-16) - Greg Chase (2017-07-19) - Hansome Yuan (2023-03-23) - Jingyu Wang (2023-03-23) - Nikhil Kak (2019-02-20) - Orhan Kislal (2017-07-19) - Roman Shaposhnik (2017-07-19) - Venkatesh Raghavan (2023-03-23) Ed Espino now serves as the Apache MADlib PMC chair. Ed has been added to pmc-chairs (INFRA-24380). Ed has been added as a moderator to the project's mailing lists (commits, dev, issues, private, user). All team members have been subscribed to the private@madlib.apache.org and dev@madlib.apache.org mailing lists. Apache MADlib was founded 2017-07-18 (6 years ago) There are currently 23 committers and 12 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. ## Project Activity: * v1.21.0 was released on 2023-03-01 New features include: - Graph: Add warm start for weakly connected components. - Graph: Add multicolumn identifier support for SSSP and APSP. - Build: Add support for Photon3 OS. Improvements: - XGBoost: Add support for bigint and varchar columns. - XGBoost: Enable eval_metrics parameter. Venkatesh (Venky) Raghavan served in the release manager capacity. The vote for releasing Apache MADlib 1.21.0 (RC2) passed with 4 binding +1s, 1 non-binding +1, and no 0 or -1 votes. Official Vote thread: https://s.apache.org/5eghz * The Apache MADlib project's web site was deficient in adhering to the Website Navigation Links Policy (https://www.apache.org/foundation/marks/pmcs#navigation). The following have been corrected (as reported in https://whimsy.apache.org/site/project/madlib) - Events - License - Thanks - Security - Sponsorship - Privacy - External resources ## Community Health: * New PMC members have confirmed they are on the private and dev mailing lists. * On March 2nd, Orhan Kislal put forth a proposal to support Python 3. This work will be targeted for the Apache MADlib v2.0 release. The proposal passed unopposed. Proposal Thread: https://s.apache.org/pt3oa * Thank you Roman Shaposhnik (rvs) for your dedication to ASF and in turn to the Apache MADlib project. * Thank you ASF infrastructure for assisting with the project reboot. * For the next project report, we hope to review and report on various facets of the project (project wiki & website, roadmap, jira, infrastructure, release processes, dev community participation).
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - Apache MADlib held a vote regarding moving it to the attic on 10/15/2022. The vote was not held in accordance to the Apache rules and subsequently got invalidated. ## Project Activity: There hasn't been a new release since the last report. ## Community Health: The community is small with a single active committer since the last report. Two Apache members, Venkatesh Raghavan and Ed Espino, have shown interest in joining as PMC members to shepherd the project. ## Membership Data: - Currently stands at 16 PMC members, no new members added since the last report - The most recent PMC members added were: Ekta Khanna (Feb 2021) Domino Valdano (Feb 2021) ## Committer base changes: - Currently 17 committers, no new committers since the last report. - The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu Pandey (2019-07-27) Domino Valdano (2019-07-27) ## Releases: - Next release: Currently working on v1.20.0 - v1.19.0 released on 2022-03-08 - v1.18.0 released on 2021-04-05 - v1.17.0 released on 2020-04-09 - v1.16.0 released on 2019-07-08 ## Mailing list activity: The mailing list activity was 7 posts to dev@ and 0 posts to user@ for the last 3 months Nov 2022-Jan 2023. ## JIRA Statistics: - 0 JIRA tickets were created in the 3 months - 0 JIRA tickets were resolved in the 3 months
@Roman: follow up on pending interested PMC members to MADlib
No report was submitted.
WHEREAS, the Project Management Committee of the Apache MADlib project has chosen by vote to recommend moving the project to the Attic; and WHEREAS, the Board of Directors deems it no longer in the best interest of the Foundation to continue the Apache MADlib project due to inactivity; NOW, THEREFORE, BE IT RESOLVED, that the Apache MADlib project is hereby terminated; and be it further RESOLVED, that the Attic PMC be and hereby is tasked with oversight over the software developed by the Apache MADlib Project; and be it further RESOLVED, that the office of "Vice President, Apache MADlib" is hereby terminated; and be it further RESOLVED, that the Apache MADlib PMC is hereby terminated. Special Order 7C, Terminate the Apache MADlib Project, was tabled.
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Project Activity: - Release 1.20.0 occurred on Aug 5, 2022 which was the 10th release as an Apache TLP project. New features include: XGBoost: Python based XGBoost with single and grid search executions (MADLIB-1425, MADLIB-1490) Graph: Add multicolumn support for WCC and Pagerank (MADLIB-1502, MADLIB-1503) Improvements: Utilities: Reuse update plan in GroupIterationController Documentation: Update online examples for various modules Bug fixes: Elastic Net - GLM - SVM: Adjust ORCA to reduce planning time ## Community Health: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases, and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 2 in the 3nd quarter of the calendar year 2022. We will constantly be on the lookout for new community members to be invited either as committers or PMC. ## Membership Data: - Currently stands at 16 PMC members, no new members added since the last report - The most recent PMC members added were: Ekta Khanna (Feb 2021) Domino Valdano (Feb 2021) ## Committer base changes: - Currently 17 committers, no new committers since the last report. - The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu Pandey (2019-07-27) Domino Valdano (2019-07-27) ## Releases: - Next release: Currently working on v1.21.0 - v1.20.0 released on 2022-08-03 - v1.19.0 released on 2022-03-08 - v1.18.0 released on 2021-04-05 - v1.17.0 released on 2020-04-09 - v1.16.0 released on 2019-07-08 ## Mailing list activity: The mailing list activity was 46 posts to dev@ and 6 posts to user@ for the last 3 months Jul-Oct 2022. ## JIRA Statistics: - 6 JIRA tickets were created in the 3 months - 2 JIRA tickets were resolved in the 3 months
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Project Activity: - Release 1.19.0 occurred on Mar 8, 2022 which was the 9th release as an Apache TLP project. New features include: DBSCAN: Fast parallel-optimized DBSCAN. MLP: Add rmsprop and Adam optimization techniques. Improvements: Graph: Improve WCC subtx count and catalog entry frequency. MLP: Set lambda value for minibatch. GLM-multinom: Use non-temp tables in GroupIterationController. Jenkins: Add new dockerfile for PG11. Build: Use dynamic_library_path for module pathname. ## Community Health: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases, and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 3 in the 2nd quarter of the calendar year 2022. We will constantly be on the lookout for new community members to be invited either as committers or PMC. ## Membership Data: - Currently stands at 16 PMC members, no new members added since the last report - The most recent PMC members added were: Ekta Khanna (Feb 2021) Domino Valdano (Feb 2021) ## Committer base changes: - Currently 17 committers, no new committers since the last report. - The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu Pandey (2019-07-27) Domino Valdano (2019-07-27) ## Releases: - Next release: Currently working on v1.20.0 - v1.19.0 released on 2022-03-08 - v1.18.0 released on 2021-04-05 - v1.17.0 released on 2020-04-09 - v1.16.0 released on 2019-07-08 ## Mailing list activity: The mailing list activity was 31 posts to dev@ and 5 posts to user@ for the last 3 months Apr-Jul 2022. ## JIRA Statistics: - 7 JIRA tickets were created in the 3 months - 6 JIRA tickets were resolved in the 3 months
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Project Activity: - Release 1.19.0 occurred on Mar 8, 2022 which was the 9th release as an Apache TLP project. New features include: DBSCAN: Fast parallel-optimized DBSCAN. MLP: Add rmsprop and Adam optimization techniques. Improvements: Graph: Improve WCC subtx count and catalog entry frequency. MLP: Set lambda value for minibatch. GLM-multinom: Use non-temp tables in GroupIterationController. Jenkins: Add new dockerfile for PG11. Build: Use dynamic_library_path for module pathname. ## Community Health: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 3 in the 2nd quarter of calendar year 2022. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## Membership Data: - Currently stands at 16 PMC members, no new members added since last report - The most recent PMC members added were: Ekta Khanna (Feb 2021) Domino Valdano (Feb 2021) ## Committer base changes: - Currently 17 committers, no new committers since last report. - The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu Pandey (2019-07-27) Domino Valdano (2019-07-27) ## Releases: - Next release: Currently working on v1.20.0 - v1.19.0 released on 2022-03-08 - v1.18.0 released on 2021-04-05 - v1.17.0 released on 2020-04-09 - v1.16.0 released on 2019-07-08 ## Mailing list activity: Mailing list activity was 26 posts to dev@ and 14 posts to user@ for the last 3 months Jan-Mar 2022. ## JIRA Statistics: - 4 JIRA tickets created in the 3 months - 2 JIRA ticket resolved in the 3 months
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Release 1.18.0 occurred on Apr 5, 2021 which was the 8th release as an Apache TLP project. - Community is working on the 1.19.0 release including the following JIRAs: * WCC: Optimize subtx count and catalog entry frequency * next phase of DBSCAN clustering algorithm * Deep learning minor fixes * multilayer perceptron - added Adam and RMSprop optimizers * Fix build failures for PMML and gppkg ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 3 in the 1st quarter of calendar year 2022. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - Currently stands at 16 PMC members, no new members added since last report - The most recent PMC members added were: Ekta Khanna (Feb 2021) Domino Valdano (Feb 2021) ## Committer base changes: - Currently 17 committers, no new committers since last report. - The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu Pandey (2019-07-27) Domino Valdano (2019-07-27) ## Releases: - Next release: Currently working on v1.19.0 - v1.18.0 released on 2021-04-05 - v1.17.0 released on 2020-04-09 - v1.16.0 released on 2019-07-08 ## Mailing list activity: Mailing list activity was 4 posts to dev@ and 4 posts to user@ for the last 3 months Oct-Jan 2022. ## JIRA Statistics: - 2 JIRA tickets created in the 3 months - 3 JIRA ticket resolved in the 3 months
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Release 1.18.0 occurred on Apr 5, 2021 which was the 8th release as an Apache TLP project. - Community is working on the 1.19.0 release including the following JIRAs: * next phase of DBSCAN clustering algorithm - merged in * Deep learning minor fixes * multilayer perceptron - added Adam and RMSprop optimizers * Fix build failures for PMML and gppkg ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 3 in the 3rd quarter of calendar year 2021. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - Currently stands at 16 PMC members, no new members added since last report - The most recent PMC members added were: Ekta Khanna (Feb 2021) Domino Valdano (Feb 2021) ## Committer base changes: - Currently 17 committers, no new committers since last report. - The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu Pandey (2019-07-27) Domino Valdano (2019-07-27) ## Releases: - Next release: v1.19.0 planned for 2H 2021 - v1.18.0 released on 2021-04-05 - v1.17.0 released on 2020-04-09 - v1.16.0 released on 2019-07-08 ## Mailing list activity: Mailing list activity was 20 posts to dev@ and 5 posts to user@ for the last 3 months Apr-Jun 2021. ## JIRA Statistics: - 3 JIRA tickets created in the 3 months - 1 JIRA ticket resolved in the 3 months
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Release 1.18.0 occurred on Apr 5, 2021 which was the 8th release as an Apache TLP project. - Community is working on the 1.19.0 release including the following JIRAs: * multilayer perceptron - add Adam and RMSprop optimizers * ARIMA - add GROUP BY feature * weakly connected components and other graph methods - add incremental methods * next phase of DBSCAN clustering algorithm - Upcoming VLDB 2021 paper that incudes recent work on Apache MADlib: https://adalabucsd.github.io/papers/2021_Cerebro-DS.pdf Several MADlib committers are co-authors on the paper together with UC San Diego researchers. ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 5 in the 2nd quarter of calendar year 2021. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - Most recent PMC members added: Ekta Khanna (Feb 2021) Domino Valdano (Feb 2021) - Currently stands at 16 PMC members. ## Committer base changes: - Currently 17 committers, no new committers since last report. - The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu Pandey (2019-07-27) Domino Valdano (2019-07-27) ## Releases: - Next release: v1.19.0 planned for 2H 2021 - v1.18.0 released on 2021-04-05 - v1.17.0 released on 2020-04-09 - v1.16.0 released on 2019-07-08 ## Mailing list activity: Mailing list activity was 20 posts to dev@ and 5 posts to user@ for the last 3 months Apr-Jun 2021. ## JIRA Statistics: - 3 JIRA tickets created in the 3 months - 8 JIRA tickets resolved in the 3 months
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Release 1.18.0 occurred on Apr 5, 2021 which was the 8th release as an Apache TLP project. - Community is working on the 1.19.0 release including the following JIRAs: * multilayer perceptron - add Adam and RMSprop optimizers * ARIMA - add GROUP BY feature * weakly connected components and other graph methods - add incremental methods * next phase of DBSCAN clustering algorithm - Recent blog post on Apache MADlib regarding the autoML 1.18.0 release feature: https://tanzu.vmware.com/content/blog-tag-thought-leadership/massively-parallel-automated-model-building-for-deep-learning ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 5 in the 1st quarter of calendar year 2021. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - Added 2 new PMC members since last report: Ekta Khanna (Feb 2021) Domino Valdano (Feb 2021) - Currently stands at 16 PMC members. ## Committer base changes: - Currently 17 committers, no new committers since last report. - The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu Pandey (2019-07-27) Domino Valdano (2019-07-27) ## Releases: - Next release: v1.19.0 planned for 1H 2021 - v1.18.0 released on 2021-04-05 - v1.17.0 released on 2020-04-09 - v1.16.0 released on 2019-07-08 ## Mailing list activity: Mailing list activity was 85 posts to dev@ and 4 posts to user@ for the last 3 months Jan-Mar 2021. ## JIRA Statistics: - 23 JIRA tickets created in the 3 months - 19 JIRA tickets resolved in the 3 months
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Release 1.17.0 occurred on Apr 9, 2020 which was the 7th release as an Apache TLP project. - Community is working on the 1.18.0 release with JIRAs related to deep learning and other ML methods: * deep learning - improve GPU efficiency * deep learning - support custom loss functions and custom metrics * deep learning - add autoML methods Hyperband and Hyperopt * DBSCAN clustering algorithm - Recent blog post mentioning Apache MADlib: https://tanzu.vmware.com/content/blog/analytic-workloads-bi-ai-vmware-tanzu-greenplum ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 5 in the 4th quarter of calendar year 2021. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in the last quarter. Currently stands at 14 PMC members. ## Committer base changes: - Currently 17 committers, no new committers since last report. - The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu Pandey (2019-07-27) Domino Valdano (2019-07-27) ## Releases: - Next release: v1.18.0 planned for 1H 2021 - v1.17.0 released on 2020-04-09 - v1.16.0 released on 2019-07-08 - v1.15.1 released on 2018-10-15 ## Mailing list activity: Mailing list activity was 32 posts to dev@ and 2 posts to user@ for the last 3 months Oct-Dec 2020. ## JIRA Statistics: - 8 JIRA tickets created in the 3 months - 5 JIRA tickets resolved in the 3 months
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Release 1.17.0 occurred on Apr 9, 2020 which was the 7th release as an Apache TLP project. - Community is working on the 1.18.0 release with JIRAs related to deep learning and other ML methods: * deep learning - improve GPU efficiency * deep learning - support custom loss functions and custom metrics * deep learning - add autoML methods Hyperband and Hyperopt * DBSCAN clustering algorithm - Community members presented sessions mentioning Apache MADlib on Greenplum at the VMworld conference on Sept 2020, e.g. https://www.vm world.com/en/video-library/video-landing.html?sessionid=1586467547979001ehEa https://www.vm world.com/en/video-library/video-landing.html?sessionid=1589580297282001SUMh ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 5 in the 3rd quarter of calendar year 2020. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in the last quarter. Currently stands at 14 PMC members. ## Committer base changes: - Currently 17 committers, no new committers since last report. - The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu Pandey (2019-07-27) Domino Valdano (2019-07-27) ## Releases: - Next release: v1.18.0 planned for 2H 2020 - v1.17.0 released on 2020-04-09 - v1.16.0 released on 2019-07-08 - v1.15.1 released on 2018-10-15 ## Mailing list activity: Mailing list activity was 40 posts to dev@ and 1 posts to user@ for the last 3 months Jul-Sep 2020. ## JIRA Statistics: - 18 JIRA tickets created in the 3 months - 18 JIRA tickets resolved in the 3 months
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Release 1.17.0 occurred on Apr 9, 2020 which was the 7th release as an Apache TLP project. - Community is working on the 1.18.0 release with JIRAs related to deep learning and other ML methods: * deep learning - improve GPU efficiency * deep learning - support custom loss functions and custom metrics * DBSCAN clustering algorithm * add new solvers to multi-layer perceptron method - Several new Jupyter notebook examples have been published to the community artifacts repo https://github.com/apache/madlib-site/tree/asf-site/community-artifacts ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 6 in the 2nd quarter of calendar year 2020. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in the last quarter. Currently stands at 14 PMC members. ## Committer base changes: - Currently 17 committers, no new committers since last report. - The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu Pandey (2019-07-27) Domino Valdano (2019-07-27) ## Releases: - Next release: v1.18.0 planned for 2H 2020 - v1.17.0 released on 2020-04-09 - v1.16.0 released on 2019-07-08 - v1.15.1 released on 2018-10-15 ## Mailing list activity: Average monthly mailing list activity was 10 posts to dev@ and 4 posts to user@ for the last 3 months Apr-Jun 2020. ## JIRA Statistics: - 14 JIRA tickets created in the 3 months - 2 JIRA tickets resolved in the 3 months
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Code complete and release in progress for 1.17 (as of time of this writing) which will be the 7th release as an Apache TLP project. - Main 1.17 JIRAs include: * feature improvements for deep learning including training multiple models in parallel for parameter selection (hyper-parameter tuning and model architecture search), inference on models trained outside of MADlib, and performance improvements to mini-batch preprocessor and DL training * performance improvements to correlation/covariance, association rules, and weakly connected components graph algorithm * stopping criteria on LDA using perplexity * auto selection of number of centroids for K-mean clustering * Postgres 12 support - Next will be the 1.18 release with JIRAs related to deep learning and other ML methods — Frank McQuillan (MADlib committer and PMC member) presented the latest deep learning work at FOSDEM'20 https://fosdem.org/2020/schedule/event/mppdb/ in a talk called: "Efficient Model Selection for Deep Neural Networks on Massively Parallel Processing Databases" - Several new Jupyter notebook examples have been published to the community artifacts repo https://github.com/apache/madlib-site/tree/asf-site/community-artifacts ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 7 in the 1st quarter of calendar year 2020. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in the last quarter. Currently stands at 14 PMC members. ## Committer base changes: - Currently 17 committers, no new committers since last report. - The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu Pandey (2019-07-27) Domino Valdano (2019-07-27) ## Releases: - Next release: v1.18 planned for 2H 2020 - v1.17.0 released early April 2020 - v1.16.0 released on 2019-07-08 - v1.15.1 released on 2018-10-15 ## Mailing list activity: Average monthly mailing list activity was 56 posts to dev@ and 5 posts to user@ for the last 3 months Jan-Mar 2020. ## JIRA Statistics: - 8 JIRA tickets created in the last month - 15 JIRA tickets resolved in the last month
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Community is at work on the 1.17 release, which will be the 7th release as an Apache TLP project. Main JIRAs include: * feature improvements for deep learning including training multiple models in parallel for parameter selection (hyper-parameter tuning and model architecture search), inference on models trained outside of MADlib, and performance improvements to mini-batch preprocessor * performance improvements to correlation/covariance, association rules, and weakly connected components graph algorithm * stopping criteria on LDA using perplexity * auto selection of number of centroids for K-mean clustering * Postgres 12 support - After that will be the 2.0 release with JIRAs related to versioning models. — Frank McQuillan (MADlib committer and PMC member) will present the latest deep learning work at FOSDEM'20 https://fosdem.org/2020/schedule/event/mppdb/ in a talk called: "Efficient Model Selection for Deep Neural Networks on Massively Parallel Processing Databases" ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 7 in the 4th quarter of calendar year 2019. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in the last quarter. Currently stands at 14 PMC members. ## Committer base changes: - Currently 17 committers, no new committers since last report. - The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu Pandey (2019-07-27) Domino Valdano (2019-07-27) ## Releases: - Next release: v1.17 planned for Jan 2019 - v1.16.0 released on 2019-07-08 - v1.15.1 released on 2018-10-15 - v1.15.0 released on 2018-08-10 ## Mailing list activity: Average monthly mailing list activity was 138 posts to dev@ and 11 posts to user@ for the last 3 months Oct-Dec 2019. ## JIRA Statistics: - 2 JIRA tickets created in the last month - 10 JIRA tickets resolved in the last month
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Community is at work on the 1.17 release, which will be the 7th release as an Apache TLP project. Main JIRAs include: * feature improvements for deep learning including training multiple models in parallel for parameter selection (hyper-parameter tuning and model architecture search), inference on models trained outside of MADlib, and performance improvements to mini-batch preprocessor * performance improvements to correlation/covariance, association rules, and weakly connected components graph algorithm * stopping criteria on LDA using perplexity * auto selection of number of centroids for K-mean clustering - After that will be the 2.0 release with JIRAs related to versioning models. — Nikhil Kak and Nandish Jayaram (MADlib committers and PMC members) presented a community call on 2019-Aug-1 on the MADlib 1.16 release features: https://www.youtube.com/watch?v=uLW5By66Lf0 - Yuhao Zhang, a PhD candidate at University of California, San Diego completed his internship at Pivotal in Palo Alto on parameter selection in MADlib, which is an important area for deep learning practitioners. Yuhao's advisor at UCSD is Arun Kumar in the Department of Computer Science and Engineering, whose research has contributed to MADlib in the past. A presentation by Yuhao on his work on MADlib is at: https://www.youtube.com/watch?v=aZlKXqhyRKY ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 7 in the 3rd quarter of calendar year 2019. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in the last quarter. Currently stands at 14 PMC members. ## Committer base changes: - Currently 17 committers. - New committers added since last report: Ekta Khanna (2018-07-27) Himanshu Pandey (2018-07-27) Domino Valdano (2018-07-27) ## Releases: - Next release: v1.17 planned for 4Q2019 - v1.16.0 released on 2019-07-08 - v1.15.1 released on 2018-10-15 - v1.15.0 released on 2018-08-10 ## Mailing list activity: Average monthly mailing list activity was 503 posts to dev@ and 11 posts to user@ for the last 3 months Jul-Sep 2019. ## JIRA Statistics: - 3 JIRA tickets created in the last month - 3 JIRA tickets resolved in the last month
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Last release was 1.16 which was the 6th release as an Apache TLP project. This was a significant release that included initial support for distributed training of deep learning models with GPU acceleration, utilities to load model architectures and weights, preprocessing of images for mini-batch gradient descent, and support for Greenplum 6 and PostgreSQL 11. Plus the usual bug fixes and minor improvements. - Community is at work on the 1.17 release. Scope is still being decided by the community, but JIRAs call for improvements to deep learning as a follow on to 1.16, and improvements to correlation/covariance, association rules and decision tree. - After that will be the 2.0 release with JIRAs related to versioning models. - Frank McQuillan (MADlib committer and PMC member) presented at Dell Tech World on 2019-Apr-30 on MADlib and Greenplum Database in a talk called "AI in a Box". - Yuhao Zhang, a PhD candidate at University of California, San Diego is doing an internship at Pivotal in Palo Alto to work on parameter selection in MADlib, which is an important area for deep learning practitioners. Yuhao's advisor at UCSD is Arun Kumar in the Department of Computer Science and Engineering, whose research has contributed to MADlib in the past. ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 8 in the 2nd quarter of calendar year 2019. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in the last quarter. Currently stands at 14 PMC members. ## Committer base changes: - Currently 14 committers. - Last committer additions were Jingyi Mei on 2018-06-14 and Nikhil Kak on 2018-06-27. ## Releases: - Next release: v1.17 planned for 3Q2019 - v1.16.0 released on 2019-07-08 - v1.15.1 released on 2018-10-15 - v1.15.0 released on 2018-08-10 ## Mailing list activity: Average monthly mailing list activity was 620 posts to dev@ and 11 posts to user@ for the last 3 months Apr-Jun. ## JIRA Statistics: - 12 JIRA tickets created in the last month - 13 JIRA ckets resolved in the last month
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Last release was 1.15.1 which was the 5th release as an Apache TLP project. This was a minor release that included support for Ubuntu 16.04 as well as various feature improvements. - Community is at work on the 1.16 release. Key features are PostgreSQL 11 support, a new method for k-NN nearest neighbors, and an early stage implementation of deep learning. - After that will be the 2.0 release with JIRAs related to versioning models. — Frank McQuillan (MADlib committer and PMC member) presented at FOSDEM’19 on 2019-Feb-03 on deep learning on parallel databases, using MADlib and Greenplum Database as an example. - Frank McQuillan also presented at PostgresConf 2019 March 18-22, New York on AI from model perspective ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 12 in the 1st quarter of calendar year 2019. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - Added Jingyi Mei (jingyimei@apache.org) and Nikhil Kak (nkak@apache.org) as new PMC members on 2019-Feb-20 - Jim Jagielski asked to be removed from the PMC - Currently stands at 14 PMC members. ## Committer base changes: - Currently 14 committers. - Last committer additions were Jingyi Mei on 2018-06-14 and Nikhil Kak on 2018-06-27. ## Releases: - Next release: v1.16 planned for 1H2019 - v1.15.1 released on 2018-10-15 - v1.15.0 released on 2018-08-10 - v1.14.0 released on 2018-05-01 ## Mailing list activity: Average monthly mailing list activity was 243 posts to dev@ and 6 posts to user@ for the last 3 months Jan-Mar. ## JIRA Statistics: - 16 JIRA tickets created in the last month - 3 JIRA tickets resolved in the last month
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Last release was 1.15.1 which was the 5th release as an Apache TLP project. This was a minor release that included support for Ubuntu 16.04 as well as various feature improvements. - Community is at work on the 1.16 release which is anticipated in the next month or so. Key features are PostgreSQL 11 support, and a new method for k-NN nearest neighbors. - Community is also at work on the 2.0 release at the same time, with good progress on a JIRA related to versioning models. - On 2018-Oct-30, Nandish Jayaram (MADlib committer) and Frank McQuillan (MADlib committer and PMC member) visited Arun Kumar, Assistant Professor of Computer Science and Engineering at the University of California, San Diego regarding possible collaboration projects related to in-database machine learning. Discussions went well and we will report back to the community if this collaboration moves forward. Note that Professor Kumar is a former colleague of MADlib PMC Chair Aaron Feng when both were at the University of Wisconsin-Madison. — Frank McQuillan (MADlib committer and PMC member) will be presenting at FOSDEM’19 on 2019-Feb-03 on deep learning on parallel databases, using MADlib and Greenplum Database as an example. ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 6 in the 4th quarter of calendar year 2018. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in PMC, currently 13 PMC members. ## Committer base changes: - Currently 15 committers. - Last committer additions were Jingyi Mei on 2018-06-14 and Nikhil Kak on 2018-06-27. ## Releases: - Next release: v1.16 planned for Feb 2019 - v1.15.1 released on 2018-10-15 - v1.15.0 released on 2018-08-10 - v1.14.0 released on 2018-05-01 ## Mailing list activity: Average monthly mailing list activity was 119 posts to dev@ and 16 posts to user@ for the last 3 months Oct-Dec. ## JIRA Statistics: - 5 JIRA tickets created in the last month - 3 JIRA tickets resolved in the last month
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Aaron Feng (PMC Chair) visited Pivotal in Palo Alto in August and held discussions with a few other committers on the topics of new features and testing infrastructure. - A MADlib community call on the topic of the last 1.15 release occurred on 2018-Aug-23. The presenters were project committers Jingyi Mei and Frank McQuillan who reviewed the main 1.15 features and gave some demos using Jupyter notebooks. Demos included: variable importance in decision trees, column/vector operations, and momentum methods for neural networks (multi-layer perceptron). Here is the link to the community call: https://youtu.be/9JpPWuiqweU - Community is currently working on the 1.15.1 release which will be 5th release as an Apache TLP project. We expect voting on release artifacts in Oct. - Ideas for the 2.0 release are being discussed in the JIRAs and mailing list and may include model management and deep learning, depending on community interest and contributions. - Apache MADlib has been referred by two 2018 VLDB papers: “In-RDBMS Hardware Acceleration of Advanced Analytics” http://www.vldb.org/pvldb/vol11/p1317-mahajan.pdf Proceedings of the VLDB Endowment, Vol. 11, No. 11 “A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics” http://www.vldb.org/pvldb/vol11/p2168-thomas.pdf Proceedings of the VLDB Endowment, Vol. 11, No. 13 ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 9 in the 3rd quarter of calendar year 2018, which is about the same as the last report. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in PMC, currently 13 PMC members. ## Committer base changes: - Currently 15 committers. - Last committer additions were Jingyi Mei on 2018-06-14 and Nikhil Kak on 2018-06-27. ## Releases: - Next release: v1.15.1 planned for October 2018 - v1.15.0 released on 2018-08-10 - v1.14.0 released on 2018-05-01 - v1.13.0 released on 2017-12-22 ## Mailing list activity: Average monthly mailing list activity was 169 posts to dev@ and 22 posts to user@ for the last 3 months Jul-Sep. ## JIRA Statistics: - 7 JIRA tickets created in the last month - 8 JIRA tickets resolved in the last month
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - A MADlib community call on the topic of the last 1.14 release occurred on 2018-May-10. This included demos of new features and improvements to existing features. - Community is currently working on the 1.15 release which will be 4th release as an Apache TLP project. We expect voting on release artifacts in late July. - Ideas are being generated for the 2.0 release which will come after 1.15. ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 8 in the 2nd quarter of calendar year 2018, which is about the same as the last report. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in PMC, currently 13 PMC members. ## Committer base changes: - Currently 15 committers. - Two new committers added in the last 3 months. - Last committer additions were Jingyi Mei on 2018-06-14 and Nikhil Kak on 2018-06-27. ## Releases: - Next release: v1.15.0 planned for late July 2018 - v1.14.0 released on 2018-05-01 - v1.13.0 released on 2017-12-22 - v1.12.0 released on 2017-08-29 ## Mailing list activity: Average monthly mailing list activity was 83 posts to dev@ and 11 posts to user@ for the last 3 months Apr-Jun. ## JIRA Statistics: - 8 JIRA tickets created in the last month - 8 JIRA tickets resolved in the last month
## Description: - Apache MADlib is a scalable, big data, SQL-driven machine learning framework for data scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Community is currently finalizing the 1.14 release which will be third release as an Apache TLP project. We expect voting on release artifacts to commence during the week of 2018-April-16. - A MADlib community call on the topic of the 1.14 release will be scheduled towards the end of April. - There was a MADlib community call on the topic of the 1.13 release on 2018-January-17. - Community is working on defining the scope of the 1.15 release in JIRA and mailing lists. - Community has been building and posting data science notebooks as a quick start guide to using MADlib. There are currently more than 25 notebooks available at https://github.com/apache/madlib-site/tree/asf-site/community-artifacts ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation is approximately 9 in the first quarter of the calendar year, which is about the same as the last report. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in PMC, currently 13 PMC members. ## Committer base changes: - Currently 13 committers. - No new committers added in the last 3 months - Last committer addition was Nandish Jayaram on 2016-09-08 ## Releases: - Next release: v1.14.0 planned for April 2018 - v1.13.0 released on 2017-12-22 - v1.12.0 released on 2017-08-29 - v1.11.0-incubating released on 2017-05-17 ## Mailing list activity: Mailing activity has remained relatively stable with 223 posts to dev@ and 7 posts to user@ during the month of 2018-March. ## JIRA Statistics: - 13 JIRA tickets created in the last month - 57 JIRA tickets resolved in the last month
[REPORT] MADlib - January 2018 ## Description: - Apache MADlib is a scalable, Big Data, SQL-driven machine learning framework for Data Scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - MADlib 1.13 was released on 2017-December-22, and this is the second release as an Apache TLP project. - There is a MADlib community call on the topic of the 1.13 release scheduled for 2018-January-17. - As a final (we think) post-graduation task, we cleaned up dist/incubator/madlib - Community is working on 1.14 JIRAs is currently. ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and new functionality being developed by contributors. The number of developers actively contributing to the code/documentation has increased to around 9 per month, up from 6 at the time of the last report. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in PMC, currently 13 PMC members. ## Committer base changes: - Currently 13 committers. - No new committers added in the last 3 months - Last committer addition was Nandish Jayaram on 2016-09-08 ## Releases: - Next release: v1.14.0 planned for Feb 2018 - v1.13.0 released on 2017-12-22 - v1.12.0 released on 2017-08-29 - v1.11.0-incubating released on 2017-05-17 ## Mailing list activity: Mailing activity has remained relatively stable with 148 posts to dev@ and 7 posts to user@ during the month of December. ## JIRA Statistics: - 10 JIRA tickets created in the last month - 13 JIRA tickets resolved in the last month
## Description: - Apache MADlib is a scalable, Big Data, SQL-driven machine learning framework for Data Scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Community activity related to post-graduation tasks has largely been completed (website, wiki, ASF infrastructure, etc.) - A MADlib community call on the topic of the 1.12 release happened on Sept 7, 2017. - The community decided to take some of the proposed 2.0 release JIRAs and put them into a 1.13 release to do first, targeted for November 2017. The reason is that more time is needed to plan out the proposed interface changes for 2.0. Work on 1.13 JIRAs is in flight currently. ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and a bunch of new functionality being developed by contributors. The number of committers actively contributing to the code/documentation has been steady and remains at a level of half a dozen active committers each month. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in PMC, currently 13 PMC members. ## Committer base changes: - Currently 13 committers. - No new committers added in the last 3 months - Last committer addition was Nandish Jayaram on 2016-09-08 ## Releases: - Next release: v1.13.0 planned for Nov 2017 - v1.12.0 released on 2017-08-29 - v1.11.0-incubating released on 2017-05-17 - v1.10.0-incubating released on 2017-03-10 ## Mailing list activity: Mailing activity has remained relatively stable with 144 posts to dev@ and 18 posts to user@ during the month of Sep. ## JIRA Statistics: - 6 JIRA tickets created in the last month - 4 JIRA tickets resolved in the last month
## Description: - Apache MADlib is a scalable, Big Data, SQL-driven machine learning framework for Data Scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - Community activity focused on post-graduation tasks related to MADlib's promotion to ASF TLP status. This included working on the code base, website, wiki, and ASF infrastructure. - Trademark transfer from Pivotal to ASF was completed in the month of August, 2017. - The first TLP release of MADlib 1.12 happened on Aug 29, 2017. - A MADlib community call on the topic of the 1.12 release happened on Sept 7, 2017. - A set of 2.0 JIRAs has been proposed to review by the community. ## Health report: The community is relatively small but very engaged with robust mailing list traffic, interest in doing frequent releases and a bunch of new functionality being developed by contributors. Work has begun in earnest on the next release 2.0 planned in the fall. The number of committers actively contributing to the code/documentation has been steady and remains at a level of half a dozen active committers each month. We will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - No changes in PMC, currently 13 PMC members. ## Committer base changes: - Currently 13 committers. - No new committers added in the last 3 months - Last committer addition was Nandish Jayaram on 2016-09-08 ## Releases: - v1.12.0 released on 2017-08-29 - v1.11.0-incubating released on 2017-05-17 - v1.10.0-incubating released on 2017-03-10 ## Mailing list activity: Mailing activity has remained relatively stable with 324 posts to dev@ and 19 posts to user@ during the month of Aug. ## JIRA Statistics: - 9 JIRA tickets created in the last month - 13 JIRA tickets resolved in the last month
## Description: - The Apache MADlib is a scalable, Big Data, SQL-driven machine learning framework for Data Scientists. ## Issues: - There are no issues requiring board attention at this time. ## Activity: - The bulk of the community activity is now focused on post-graduation tasks of promoting MADlib to the ASF's TLP status. This includes working on code base, website, wiki, and ASF infrastructure. - We expect to finalize the trademark transfer from Pivotal to ASF within the month of August, 2017. - Ed Espino volunteered to drive the first TLP release of MADlib 1.12 which is expected to happen within the next couple of months. ## Health report: The project has just graduated to the status of a TLP at ASF. The community is small but very engaged with robust mailing list traffic, interest in doing frequent releases and a bunch of new functionality being developed by contributors. The number of committers actively contributing to the code/documentation has been steady and remains at a level of half a dozen active committers each month. Since the project has just graduated we haven't had a chance to actively grow our PMC roster, but it must be noted that at this point all of our active committers are also PMC members. Of course, we will constantly be on a lookout for new community members to be invited either as committers or PMC. ## PMC changes: - PMC has been just formalized as part of the graduation resolution - Currently 13 PMC members. ## Committer base changes: - Currently 13 committers. - No new committers added in the last 3 months - Last committer addition was Nandish Jayaram on 2016-09-08 ## Releases: - v1.11.0-incubating released on 2017-05-17 - v1.10.0-incubating released on 2017-03-10 - v1.9.1-incubating released on 2016-09-19 ## Mailing list activity: Mailing activity remains steady with 203 posts to dev@ and 23 posts to user@ ## JIRA Statistics: - 10 JIRA tickets created in the last month - 3 JIRA tickets resolved in the last month
@Jim: follow up with trademark assignment issue
WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to a scalable, Big Data, SQL-driven machine learning framework for Data Scientists. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache MADlib Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache MADlib Project be and hereby is responsible for the creation and maintenance of software related to a scalable, Big Data, SQL-driven machine learning framework for Data Scientists. RESOLVED, that the office of "Vice President, Apache MADlib" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache MADlib Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache MADlib Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache MADlib Project: Sarah Aerni <saerni@apache.org> Greg Chase <gregchase@apache.org> Aaron Feng <aaronfeng@apache.org> Rahul Iyer <riyer@apache.org> Jim Jagielski <jim@apache.org> Nandish Jayaram <njayaram@apache.org> Anirudh Kondaveeti <akondave@apache.org> Orhan Kislal <okislal@apache.org> Frank McQuillan <fmcquillan@apache.org> Srivatsan R <vatsan@apache.org> Rashmi Raghu <rashmiraghu@apache.org> Roman Shaposhnik <rvs@apache.org> Atri Sharma <atri@apache.org> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Aaron Feng be appointed to the office of Vice President, Apache MADlib, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the initial Apache MADlib PMC be and hereby is tasked with the creation of a set of bylaws intended to encourage open development and increased participation in the Apache MADlib Project; and be it further RESOLVED, that the Apache MADlib Project be and hereby is tasked with the migration and rationalization of the Apache Incubator MADlib podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator MADlib podling encumbered upon the Apache Incubator Project are hereafter discharged. Special Order 7A, Establish the Apache MADlib Project, was approved by Unanimous Vote of the directors voting, with Shane Curcuru abstaining. As the MADlib trademark hasn't been transferred to the Foundation yet: 1) MADlib is required to include a disclaimer on their homepage and in their releases, indicating that the mark doesn't belong to the ASF so far, until the trademark is transferred. 2) The expectation is that the trademark handover will be completed before the end of 2017.
Big Data Machine Learning in SQL for Data Scientists. MADlib has been incubating since 2015-09-15. Three most important issues to address in the move towards graduation: 1. Finalize trademark transfer from Pivotal to ASF. 2. Continue to produce regular Apache (incubating) releases. 3. Continue to execute and manage the project according to governance model of the "Apache Way”. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? 1. The Apache MADlib Project is ready for graduation out of the incubator. Discussion by Project: https://lists.apache.org/thread.html/070c6764fcd0448b2db8975936b52f7a28bd0e231c0e690288a6968e@%3Cdev.madlib.apache.org%3E Vote by IPMC and community: https://lists.apache.org/thread.html/733920464e8f8170d9cc831b701f275d757ee9448a7bfd05a1bf8dfd@%3Cgeneral.incubator.apache.org%3E Trademark transfer from Pivotal to ASF is being tracked in: https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-125 2. The resolution for graduation was tabled by the board last month due to trademark issue, and is now being re-submitted. How has the community developed since the last report? 1. Some related events in Q2 2017: * May 25, 2017 - MADlib community call. Topic: New Features in Apache MADlib 1.11 (Frank McQuillan) * Jun 21, 2017 - Greenplum meetup in San Francisco. Topic: Apache Solr & MADlib (incubating): Enabling Massive Text Analytics In-Database (Bharath Sitaraman) * Jul 5-7, 2017 - PG Day Russia. Topic: Various on “Greenplum Day” Jul 5 including in-database analyitics (Roman Shaposhnik and others) * Jul 25, 2017 (upcoming) - SF Bay ACM Chapter meetup. Topic: Advanced Analytics for Security: Lateral Movement Detection (Anirudh Kondaveti) 2. See material technical conversations on user/dev mailing lists and in the appropriate JIRAs and pull requests. How has the project developed since the last report? 1. TLP readiness - maturity evaluation matrix https://cwiki.apache.org/confluence/display/MADLIB/ASF+Maturity+Evaluation 2. TLP readiness - graduation resolution https://cwiki.apache.org/confluence/display/MADLIB/Graduation+Resolution 3. TLP readiness - documented release process https://cwiki.apache.org/confluence/display/MADLIB/Release+Process 4. Active work in progress for 6th ASF release MADlib v1.12 scheduled for Jul/Aug 2017. Features include: more graph analytics (weakly connected components, breadth first search, all pairs shortest path, multiple graph measures), neural nets, stratified sampling, train-test split, improvements to decision tree & random forest, improvements to summary function 5. Mailing list activity in Q2: 295 postings to dev, 77 postings to user. How would you assess the podling's maturity? Please feel free to add your own commentary. [ ] Initial setup [ ] Working towards first release [ ] Community building [X] Nearing graduation [ ] Other: Date of last release: MADlib v1.11 on 5/16/17. When were the last committers or PPMC members elected: Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16. Signed-off-by: [ ](madlib) Konstantin Boudnik Comments: [X](madlib) Ted Dunning Comments: [ ](madlib) Roman Shaposhnik Comments:
WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to a scalable, Big Data, SQL-driven machine learning framework for Data Scientists. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache MADlib Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache MADlib Project be and hereby is responsible for the creation and maintenance of software related to a scalable, Big Data, SQL-driven machine learning framework for Data Scientists. RESOLVED, that the office of "Vice President, Apache MADlib" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache MADlib Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache MADlib Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache MADlib Project: Sarah Aerni <saerni@apache.org> Greg Chase <gregchase@apache.org> Aaron Feng <aaronfeng@apache.org> Rahul Iyer <riyer@apache.org> Jim Jagielski <jim@apache.org> Nandish Jayaram <njayaram@apache.org> Anirudh Kondaveeti <akondave@apache.org> Orhan Kislal <okislal@apache.org> Frank McQuillan <fmcquillan@apache.org> Srivatsan R <vatsan@apache.org> Rashmi Raghu <rashmiraghu@apache.org> Roman Shaposhnik <rvs@apache.org> Atri Sharma <atri@apache.org> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Aaron Feng be appointed to the office of Vice President, Apache MADlib, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the initial Apache MADlib PMC be and hereby is tasked with the creation of a set of bylaws intended to encourage open development and increased participation in the Apache MADlib Project; and be it further RESOLVED, that the Apache MADlib Project be and hereby is tasked with the migration and rationalization of the Apache Incubator MADlib podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator MADlib podling encumbered upon the Apache Incubator Project are hereafter discharged. Special Order 7D, Establish the Apache MADlib Project, was tabled.
Big Data Machine Learning in SQL for Data Scientists. MADlib has been incubating since 2015-09-15. Three most important issues to address in the move towards graduation: 1. Continue to produce regular Apache (incubating) releases. 2. Continue to execute and manage the project according to governance model of the "Apache Way". 3. Continue to build community. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? 1. The next release v1.11 will be the 5th as an incubating project. We believe this release will meet all requirements for a clean ASF release, based on listening to guidance from the IPMC over the previous releases. After that, the community would ideally like to move towards top level status. 2. The licensing issues have been resolved. Should anyone want to review, we have summarized the issue and resolution with relevant links on the MADlib wiki at https://cwiki.apache.org/confluence/display/MADLIB/ASF+Licensing+Guidance How has the community developed since the last report? 1. Some related events in Q1 2017: * Feb 4, 2017 - Presentation at FOSDEM’17 Graph devroom. Topic: Graph Analytics on Massively Parallel Processing Databases (Frank McQuillan) * Feb 2, 2017 - Greenplum meetup in SF. Topic: Machine Learning and Cyber Security with Greenplum and Apache MADlib (Anirudh Kondaveeti, Frank McQuillan) * Mar 23, 2017 - MADlib community call. Topic: New Features in Apache MADlib 1.10 (Frank McQuillan) 2. See material technical conversations on user/dev mailing lists and in the appropriate JIRAs and pull requests. How has the project developed since the last report? 1. Build infra set up on Apache infra https://builds.apache.org/job/madlib-master-build/ 2. Docker image with necessary dependencies required to compile and test MADlib on PostgreSQL 9.6 https://cwiki.apache.org/confluence/display/MADLIB/Quick+Start+Guide+for+Developers#QuickStartGuideforDevelopers-Dock 3. Active work in progress for 5th ASF release MADlib v1.11 scheduled for Apr 2017. Features include: PageRank, connected components, stratified sampling, improvements to decision tree & random forest, array & sparse vector output for pivot 4. Mailing list activity in Q1 to date: 274 postings to dev, 111 postings to user. How would you assess the podling's maturity? Please feel free to add your own commentary. [ ] Initial setup [ ] Working towards first release [ ] Community building [X] Nearing graduation [ ] Other: Date of last release: MADlib v1.10 on 3/10/17. When were the last committers or PMC members elected: Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16. Signed-off-by: [X](madlib) Konstantin Boudnik Comments: [ ](madlib) Ted Dunning Comments: [x](madlib) Roman Shaposhnik Comments: we hope to submit a TLP resolution next month
Big Data Machine Learning in SQL for Data Scientists. MADlib has been incubating since 2015-09-15. Three most important issues to address in the move towards graduation: 1. Need guidance from Incubator PMC on how to resolve the BSD licensing switch over to Apache License. What should be the content of the license headers for files that were previously BSD licensed and then granted to ASF? Related legal-discuss threads: http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201609.mbox/%3CCALGG8z03zHhbFegXoi4fH+vXtF+9m7x6hak9RjKQjapuzi67gQ@mail.gmail.com%3E http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201603.mbox/%3C9D1AF43C-370B-4E58-B0EF-2E29D242F50B%40jaguNET.com%3E 2. Continue to produce regular Apache (incubating) releases. 3. Continue to execute and manage the project according to governance model of the "Apache Way”. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? 1. Yes-please see #1 above and provide guidance. 2. The next release v1.10 will be the 4th as an incubating project. After that, the community would ideally like to move towards top level status. How has the community developed since the last report? 1. Some related events in Q4 2016 and upcoming: * Feb 4, 2017 - Presentation accepted at FOSDEM’17 Graph devroom. Topic: Graph Analytics on Massively Parallel Processing Databases (Frank McQuillan) * Dec 1, 2016 - MADLib community call. Topic: New features in R interface and MADlib user survey results (hosted by Greg Chase, Orhan Kislal, Frank McQuillan) * Nov 16, 2016 - Presentation at PGConf Silicon Valley. Topic: Distributed In-Database Machine Learning with Apache MADlib (incubating) (Frank McQuillan) * Nov 14, 2016 - Presentation at Apache Big Data Europe. Topic: Distributed In-Database Machine Learning with Apache MADlib (incubating) (Roman Shaposhnik) 2. Material technical conversations on user/dev mailing lists and in the appropriate JIRAs and pull requests. 3. New contributors to the project have been working on KNN module and Python interface. How has the project developed since the last report? 1. Active work in progress for 4th ASF release MADlib v10 scheduled for Jan 2017. Features include: single source shortest path graph algorithm, completely new module for encoding categorical variables, R interface update, grouping support in elastic net and PCA, cross validation in elastic net, verbose output option for decision tree visualization. 2. Mailing list activity in Q4: 227 postings to dev, 66 postings to user. Date of last release: MADlib v1.9.1 on 9/19/16. When were the last committers or PMC members elected: Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16. Signed-off-by: [ ](madlib) Konstantin Boudnik [X](madlib) Ted Dunning [X](madlib) Roman Shaposhnik Shepherd/Mentor notes: (rvs) I had a chat with ASF VP Legal and the proposal is to go ahead with the release like it is. If there will be concerns raised by IPMC during the review of this upcoming release Jim volunteered to be directly involved to work through these concerns.
Big Data Machine Learning in SQL for Data Scientists. MADlib has been incubating since 2015-09-15. Three most important issues to address in the move towards graduation: 1. Need guidance from Incubator PMC on how to resolve the BSD licensing switch over to Apache License. What should be the content of the license headers for files that were previously BSD licensed and then granted to ASF? Related legal-discuss threads: http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201609.mbox/%3CC ALGG8z03zHhbFegXoi4fH+vXtF+9m7x6hak9RjKQjapuzi67gQ@mail.gmail.com%3E http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201603.mbox/%3C9 D1AF43C-370B-4E58-B0EF-2E29D242F50B%40jaguNET.com%3E 2. Continue to produce regular Apache (incubating) releases. 3. Continue to execute and manage the project according to governance model required by the "Apache Way”. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? Yes-please see #1 above and provide guidance. How has the community developed since the last report? 1. Two new committers added to the project: * Orhan Kislal (9/7/16) * Nandish Jayaram (9/7/16) 2. MADlib related events in Q3 2016: * Jul 27 - MADLib community call. Topic: Open discussion on Apache MADlib project (hosted by Greg Chase, Frank McQuillan) * Aug 19 - Presentation to Hortonworks. Topic: Apache MADlib, Apache HAWQ (incubating) and Apache Zeppelin (Rahul Iyer, Frank McQuillan) * Sep 13 - MADLib community call. Topic: Deep dive on MADlib 1.9.1 release (hosted by Greg Chase, presentation by Frank McQuillan) * Sep 21 - Meetup at Hortonworks San Francisco. Topic: Future of data - Apache MADlib and Apache HAWQ (Tushar Pednekar) * Sep 22 - Meetup at Hortonworks Santa Clara. Topic: Future of data - Apache MADlib and Apache HAWQ (Tushar Pednekar) 3. Material technical conversations on dev mailing lists and in the appropriate JIRAs and pull requests. How has the project developed since the last report? 1. 3rd ASF release MADlib v1.9.1 released on Sep 19, 2016. Features include: path functions (phase 2), 1-class support vector machines for novelty detection, prediction metrics, sessionization, pivoting. 2. Community has started active development on the v1.10 release. 3. 13 JIRAs created and 5 resolved in last 30 days. Date of last release: MADlib v1.9.1 on 9/19/16. When were the last committers or PMC members elected: Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16. Signed-off-by: [x](madlib) Konstantin Boudnik [x](madlib) Ted Dunning [x](madlib) Roman Shaposhnik Shepherd/Mentor notes: tdunning: This project seems to be ticking along pretty reasonably. The only worry I have about it is that it seems to be strongly centered around a few (or even just one) very strong contributors. That is a worry relative to longevity and community building. Overall, I don't think that the project is getting much marginal value from incubation. johndament: Its unclear what guidance from the IPMC is required if the podling is already reaching out to legal, which would be the main thing I can think of to recommend to them right now. rvs: @johndament: I think we need to formalize whatever decision by legal. I'll create a formal LEGAL JIRA soon.
Big Data Machine Learning in SQL for Data Scientists. MADlib has been incubating since 2015-09-15. Three most important issues to address in the move towards graduation: 1. Continue to produce regular Apache (incubating) releases. 2. Expand the community, increase dev list activity and add new contributors. 3. Execute and manage the project according to governance model required by the "Apache Way”. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? None How has the community developed since the last report? 1. MADlib related events in Q2 2016: * April 19 - Joint community call MADlib - Greenplum Database. Topic: MADlib v1.9 new features (Nandish Jayaram, Ivan Novick, Cesar Rojas, Frank McQuillan) * May 5 - MADLib community call. Topic: Detailed review of the MADlib v1.9 release (Xiaocheng Tang, Frank McQuillan) * June 21 - MADLib community call. Topic: Apache Zeppelin meets Apache MADlib (incubating) and Apache HAWQ (incubating) (Moon soo Lee, Rahul Iyer, Frank McQuillan) * June 21 - Data Engineers Guild meetup in Palo Alto. Topic: The Analytics and Science Behind Connected Transportation (Srivatsan Ramanujam, Esther Vasiete, Ralph Rabbat, Frank McQuillan) 2. Material technical conversations on dev mailing lists and in the appropriate JIRAs and pull requests. 3. We are seeing some PostgreSQL experts chipping on SQL coding and making good suggestions in the pull requests. How has the project developed since the last report? 1. 2nd ASF release MADlib v1.9 released on April 6, 2016. The goal of this 2nd release was general availability of MADlib v1.9 for community use. 2. 3rd ASF release MADlib v1.9.1 anticipated this summer depending on community input. Features include: path functions (phase 2), 1-class support vector machines for novelty detection, prediction metrics, sessionization, pivoting. 3. 2 JIRAs created and 14 resolved in last 30 days. Date of last release: MADlib v1.9 on 4/6/16. When were the last committers or PMC members elected? Xiaocheng Tang on 1/14/16. Signed-off-by: [ ](madlib) Konstantin Boudnik [x](madlib) Ted Dunning [x](madlib) Roman Shaposhnik
Big Data Machine Learning in SQL for Data Scientists. MADlib has been incubating since 2015-09-15. Three most important issues to address in the move towards graduation: 1. Continue to produce regular Apache (incubating) releases. 2. Expand the community, increase dev list activity and add new contributors. 3. Execute and manage the project according to governance model required by the "Apache Way”. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? None How has the community developed since the last report? 1. Held three community calls in Q1 2016. Each call featured a different member of the Apache MADlib community presenting on a topic of interest to them: * Jan 15th - Bayesian Analysis of Binomial Response Models on MPP Databases (Gautam Muralidhar) * Feb 16th - An Overview of GWR Analysis of Spatial Data (Chenliang Wang) * Mar 16th - MADlib on PostgreSQL and PGXN (AJ Welch) 2. One new committer has been added to the project (Xiaocheng Tang) 3. Presentation of Apache MADlib at FOSDEM’16 in Brussels (Frank McQuillan) 4. Material technical conversations on dev mailing lists and in the appropriate JIRAs (e.g., 111 emails on dev@ mailing list in Feb) 5. Several Google Summer of Code (GSoC) candidates have expressed interest in working on MADlib projects via dev@ mailing list. How has the project developed since the last report? 1. 1st ASF release MADlib v1.9alpha on 3/11/16 which was intended to clear all potential IP issues in the code base and make it legally ready to be adopted by the community. 2. 2nd ASF release MADlib v1.9 is currently in IPMC voting as of this writing on 4/6/16. The goal of this 2nd release is general availability of MADlib v1.9 for community use. 3. Some features in the latest release: path functions, support vector machines, advanced matrix operations, covariance matrix, proportion of variance for PCA, support for Apache HAWQ (incubating) 2.0. 4. 15 JIRAs created and 27 resolved in last 30 days. Date of last release: Apache MADlib (incubating) v1.9alpha on 3/11/16. When were the last committers or PMC members elected? Xiaocheng Tang on 1/14/16. Signed-off-by: [ ](madlib) Konstantin Boudnik [X](madlib) Ted Dunning [X](madlib) Roman Shaposhnik
Big Data Machine Learning in SQL for Data Scientists. MADlib has been incubating since 2015-09-15. Three most important issues to address in the move towards graduation: 1. Produce a first Apache (incubating) release. 2. Expand the community, increase dev list activity and add new committers/pmc members. 3. Execute and manage the project according to governance model required by the "Apache Way”. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? None How has the community developed since the last report? 1. Second community call held 12/18/15. On the call Roman suggested that the MADlib community do a 1.9 alpha release in the near term. There was general agreement and this is planned for January. A Release Manager has not yet been identified. 2. Material technical conversations on dev mailing lists and in the appropriate JIRAs. E.g., 51 emails on dev in Dec. 3. One new committer has been proposed and voting is under way on the private mailing list. 4. Two comprehensive proposals were posted to the dev mailing list from community members. One relates to the addition of geographically weighted regression (GWR) algorithms. The second involves Bayesian analysis of binomial response models for MPP databases, which makes extensive use of MADlib’s new matrix operations. Both proposals are under active discussion on the mailing list currently. 5. We have been accepted to present a full talk at FOSDEM 2016/Brussels in the HPC, Big Data & Data Science Devroom on Jan 31. The title of the talk is: "MADlib: Distributed In-Database Machine Learning for Fun and Profit" How has the project developed since the last report? 1. 5 JIRAs created and 4 resolved in last 30 days. 2. A SQL API guide has been added to the the MADlib wiki https://cwiki.apache.org/confluence/display/MADLIB/SQL+API+Guide. Date of last release: No release yet. When were the last committers or PMC members elected? One new committer has been proposed and voting is under way on the private mailing list. Signed-off-by: [x](madlib) Konstantin Boudnik [x](madlib) Ted Dunning [x](madlib) Roman Shaposhnik Shepherd/Mentor notes:
Big Data Machine Learning in SQL for Data Scientists. MADlib has been incubating since 2015-09-15. Three most important issues to address in the move towards graduation: 1. Produce a first Apache (incubating) release. 2. Expand the community, increase dev list activity and add new committers/pmc members. 3. Execute and manage the project according to governance model required by the "Apache Way”. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? None How has the community developed since the last report? 1. First community call held 11/20/15. There were approximately 10 attendees, about half were from outside of the current group of MADlib contributors. This will be a monthly call, possibly moving to 2x per month in the future. 2. Meetup 12/3/15 @ Pivotal Labs, San Francisco: “MADlib and HAWQ for Advanced SQL Machine Learning on Hadoop”. One goal of this meetup is to invite new community participation in MADlib. 3. Material technical conversations are now happening on the dev mailing lists and in the appropriate JIRAs. E.g., 53 emails on dev in Nov compared with 7 in Oct. How has the project developed since the last report? 1. 31 JIRAs created and 7 resolved in last 30 days. 2. Mailing list subscribers: user - 19, dev - 20 3. Proposed scope for first Apache MADlib release has been described to the community for comment. This release includes IP cleanliness and new features. 4. The MADlib wiki <http://s.apache.org/0lQ> has been updated with new content, including a new contributors guideline, an FAQ and a page listing suggestions for first time contributors (these have also been labeled “starter” in the JIRAs). Date of last release: No release yet. When were the last committers or PMC members elected? No new members added on top of the initial committer list. Signed-off-by: [X](madlib) Konstantin Boudnik [X](madlib) Ted Dunning [X](madlib) Roman Shaposhnik Shepherd/Mentor notes:
Big Data Machine Learning in SQL for Data Scientists. MADlib has been incubating since 2015-09-15. Three most important issues to address in the move towards graduation: 1. Produce a first Apache (incubating) release. 2. Expand the community, increase dev list activity and add new committers/pmc members. 3. Execute and manage the project according to governance model required by the "Apache Way”. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? None How has the community developed since the last report? 1. Meetup 10/1/15 @ Pivotal Labs, New York, NY: “MADlib and HAWQ for Advanced SQL Machine Learning on Hadoop” http://s.apache.org/VbG 2. Meetup 10/29/15 @ Pivotal Palo Alto, CA: “Data Science at Scale for IoT” http://www.meetup.com/Pivotal-Open-Source-Hub/events/225426787/ How has the project developed since the last report? 1. All known issues related to IP cleanliness described in https://wiki.apache.org/incubator/MADlibProposal have been fixed and pushed to the Apache repo. 2. All software activity tracking has migrated to Apache MADlib JIRA from previous tool. 18 JIRAs created and 2 resolved in last 30 days. 3. All commits and code are now being done on the Apache Git repo. 4. Three new quick start guides have been written: i) install, ii) user, and iii) developer. The goal is to make it easier to onboard new community members. 5. A new Greenplum DB sandbox VM with MADlib pre-installed has been created and made available publicly at https://github.com/greenplum-db/gpdb-sandbox-tutorials. The goal is to make it easier to onboard new community members - they can download and start trying MADlib right away with no install/setup. 6. A "catchup JIRA" was filed https://issues.apache.org/jira/browse/MADLIB-912 in order to catch up between the time of the code grand to Apache and bringing in dev work that was already in flight at the time. We apologize for any inconvenience in clubbing together these multiple items; it was a one-time operation. Date of last release: No release yet. When were the last committers or PMC members elected? No new members added on top of the initial committer list. Signed-off-by: [X](madlib) Konstantin Boudnik [X](madlib) Ted Dunning [X](madlib) Roman Shaposhnik Shepherd/Mentor notes: Konstantin Boudnik (cos): I don't see much info on the community development. How many new contributors the project had gained? Were there any additions in the mailing lists? Please consider providing this information in the next report.
Big Data Machine Learning in SQL for Data Scientists. MADlib has been incubating since 2015-09-15. Three most important issues to address in the move towards graduation: 1. Produce a first Apache release. 2. Finalize infrastructure migration and ICLAs from committers. 3. Expand the community, increase dev list activity and add new committers/pmc members. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? Just started incubation, nothing specific to report at this time. How has the community developed since the last report? We are approximately 2 weeks into the incubation process. 1. So still at early stage. 2. Most of the core contributors have completed their ICLA's and have established apache ids. 3. Several presentations and meet-ups at ApacheCon EU and Strata NYC to discuss MADlib move to ASF governance. 4. Formal announcements from Pivotal, press briefings and blogs related to the move of the project into Apache aimed at growing awareness and interest in the project. Specialty press have picked up the story and reported widely. How has the project developed since the last report? Early activity: 1. Initial code drop provided to Apache 2. Core infrastructure is in the process of being migrated from existing infrastructure: mailing lists, git, jira, wiki, website Date of last release: No release yet. When were the last committers or PMC members elected? Most of initial list of committers have been on-boarded, some still outstanding. No new members added on top of the initial committer list. Signed-off-by: [x](madlib) Konstantin Boudnik [X](madlib) Ted Dunning [ ](madlib) Roman Shaposhnik Shepherd/Mentor notes: