
This was extracted (@ 2025-02-19 17:10) from a list of minutes
which have been approved by the Board.
Please Note
The Board typically approves the minutes of the previous meeting at the
beginning of every Board meeting; therefore, the list below does not
normally contain details from the minutes of the most recent Board meeting.
WARNING: these pages may omit some original contents of the minutes.
Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Project Status: Current project status: Ongoing Issues for the board: none. ## Membership Data: Apache Hudi was founded 2020-05-19 (5 years ago) There are currently 40 committers and 19 PMC members in this project. The Committer-to-PMC ratio is roughly 5:3. Community changes, past quarter: - No new PMC members. Last addition was Sagar Sumit on 2023-11-05. - Vova Kolmakov was added as committer on 2024-09-13 ## Project Activity: Exciting times for the project. The community worked on two major releases. Second version of hudi-rs 0.2.0 is now out, that expands the project into broader rust/python community. hudi-rs has already been integrated into popular query engines in the ecosystem like Ray and Daft, and the community is planning a roadmap to bring hudi-rs on par with the core java feature set. Hudi 1.0 GA release has been successfully voted upon and ratified by the community. And will be announced next week with due process. 1.0 is the largest release to date, with multiple beta versions spanning a year, which has resulted in a large reimagination for the next few years. Notable innovations the project is bringing to the community and space at large is : secondary and different types of indexes, new streaming friendly concurrency control and deep storage format optimizations to reduce write/read latency, and much more.. Many of these are industry first features. So, we are proud of the 62 or so contributors on this release (compared against 0.15.0) Several leading companies including Amazon, Peloton have presented their usage of Hudi to build data lakes in community syncs. ## Community Health: We're seeing steady growth in community in terms of engagement on GitHub (https://ossinsight.io/analyze/apache/hudi). dev list and users lists are primarily used by community for low bandwidth communication. Github issues continue to be main engagement model for community support for issues with the project.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Project Status: Current project status: Ongoing Issues for the board: None ## Membership Data: Apache Hudi was founded 2020-05-19 (4 years ago) There are currently 38 committers and 19 PMC members in this project. The Committer-to-PMC ratio is 2:1. Community changes, past quarter: - No new PMC members. Last addition was Sagar Sumit on 2023-11-05. - No new committers. Last addition was Jonathan Vexler on 2024-02-13. ## Project Activity: The community has released 1.0.0-beta2 as a next beta release for 1.x release line and is crunching new features towards 1.0 GA parallely. Another release for 0.x line is being worked upon to stabilize the 0.x line. A new repo apache/hudi-rs, had its first official release with version 0.1.0. Community continues to be active with community syncs (monthly) where we present major developments, showcase user talks. New blogs written by Hudi users have been added to the Hudi website's blog page. ## Community Health: We continue to see active developer engagement on the project in terms of code contributions, and dev emails. These can be attributed to new RFC discussions, expansion of Hudi ecosystem, commits for release items, release email threads, and so on. Overall, the community has been working on stabilizing 0.x line, working towards crunching features for 1.0 GA and expanding on Rust/Python ecosystem.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Project Status: Current project status: Ongoing Issues for the board: None ## Membership Data: Apache Hudi was founded 2020-05-19 (4 years ago) There are currently 38 committers and 19 PMC members in this project. The Committer-to-PMC ratio is 2:1. Community changes, past quarter: - No new PMC members. Last addition was Sagar Sumit on 2023-11-05. - No new committers. Last addition was Jonathan Vexler on 2024-02-13. ## Project Activity: The community has released 0.15.0 as a stable release version for 0.x release line. In parallel, active PRs are worked upon on master branch towards a 1.0.0-beta2 release, while taking feedback from 1.0.0-beta1 that was released late last year. A new repo apache/hudi-rs is setup and under development towards its first official release 0.1.0. Community continues to be active with community syncs (monthly) where we present major developments, showcase user talks. New blogs written by Hudi users have been added to the Hudi website's blog page. ## Community Health: We continue to see more active developer engagement on the project in terms of code contributions, and a jump in dev emails. These can be attributed to commits for release items, release email threads, and the new repo's automated email triggered by GH activities (have been moved to commits email list). Overall, the community has been working on stabilizing the branch for 0.15.0 (landing fixes and certifying the artifacts), and expanding to Rust/Python ecosystem.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Project Status: Current project status: Ongoing Issues for the board: none. ## Membership Data: Apache Hudi was founded 2020-05-19 (4 years ago) There are currently 38 committers and 19 PMC members in this project. The Committer-to-PMC ratio is 2:1. Community changes, past quarter: - No new PMC members. Last addition was Sagar Sumit on 2023-11-05. - Jing Zhang was added as committer on 2024-02-07 - Jonathan Vexler was added as committer on 2024-02-13 ## Project Activity: The community is working on two release lines. The 0.15.0 release is another major release in the current 0.X release line. In parallel, active PRs are worked upon on master branch towards a 1.0.0-beta2 release, while taking feedback from 1.0.0-beta1 that was released late last year. Community continues to be active with community syncs (monthly) where we present major developments, showcase user talks. New blogs written by Hudi users have been added to the Hudi website's blog page. ## Community Health: We continue to see healthy, steady developer engagement on the project in terms of code contributions. We don't have any specific causes/reasons for the marginal dips in PR activity. Our GH issue traffic reflects support issues filed by the community, reduction there could also signify progress. Overall, the community has been working on things around improving devex, for e.g our CI runtime has been reduced from 3hrs to almost 1 hr now, helping us land PRs faster.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Project Status: Current project status: Ongoing Issues for the board: none. ## Membership Data: Apache Hudi was founded 2020-05-19 (4 years ago) There are currently 36 committers and 19 PMC members in this project. The Committer-to-PMC ratio is roughly 9:5. Community changes, past quarter: - Sagar Sumit was added to the PMC on 2023-11-05 - Prashant Wason was added to the PMC on 2023-11-05 - Hui An was added as committer on 2023-12-01 - Qijun Fu was added as committer on 2023-11-13 - Voon Hou was added as committer on 2023-12-01 ## Project Activity: The community made two big releases since the last report. 0.14.0 shipped with a new fast indexing mechanism and ease-of-use around Spark SQL. We community is now working towards a 0.14.1 patch release on top. We expect 0.14 to be the last major release in the 0.X release line. Over the past few months, a re-imagination and redesign of the project has been proposed for 1.0 release and is been worked on actively. 1.0.0-beta-1 was released to allow community to test early versions and provide feedback. This has been particularly useful around brand new features like non-blocking concurrency control. Development is continuing to make a GA release in Q1. ## Community Health: Project continues to see healthy developer and user engagement. There are couple PMC members who are helping with the PR review backlog. We are not aware of any particular reason for drops in PR opened for e.g.Scaling the review process and merging contributions remains an active area of investment. We have had couple of CI instability issues. Some ongoing efforts to reduce the test runtimes will help ease some of these challenges. On the user side, we have had users share their usage of Hudi and vendors also launch new product integrations around Hudi. Good chunk of the dev email spike is new contributors requesting access. Another key process is to scale is supporting community questions on a day-to-day basis. We have about 442 total open support issues, (of which we closed 200 in the last quarter per wizard). We are continuing to explore creative ways to scale this.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Project Status: Current project status: Ongoing Issues for the board: none. ## Membership Data: Apache Hudi was founded 2020-05-19 (3 years ago) There are currently 33 committers and 17 PMC members in this project. The Committer-to-PMC ratio is roughly 9:5. Community changes, past quarter: - No new PMC members. Last addition was Y Ethan Guo on 2023-03-13. - No new committers. Last addition was Yue Zhang on 2022-12-31. ## Project Activity: Hudi community is going through the process of voting on the 0.14 release, with brings record level indexing to Hudi, with RC1 voted down and RC2 being worked on. The community hopes to release 0.14 by end of the month. In parallel, the community is also working on finalizing all the various design details for the upcoming Hudi 1.0, which will introduce major revamp in Hudi's storage format aimed at generalizing the core transactional layer, with database like semantics. The project's master has already moved to 1.0.0-SNAPSHOT and a bunch of major changes are already been landed. We have public tracking of the effort maintained in cwiki. https://cwiki.apache.org/confluence/display/HUDI/1.0+Execution+Planning ## Community Health: Project continues to see healthy engagement from tons of users and contributors. Other than the increase in backlog of pull requests due to committers/PMCs being busy with the two releases, we don't cite any specific reasons for the drop in code commits. We expect to pick up pace once 0.14 is finalized and more resources are available.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Project Status: Current project status: ongoing, with high activity (400+ contributors,~1500 commits/quarter) Issues for the board: Issues for the board: none. ## Membership Data: Apache Hudi was founded 2020-05-19 (3 years ago) There are currently 33 committers and 17 PMC members in this project. The Committer-to-PMC ratio is roughly 9:5. Community changes, past quarter: - No new PMC members. Last addition was Y Ethan Guo on 2023-03-13. - No new committers. Last addition was Yue Zhang on 2022-12-31. ## Project Activity: Since the last report, we made two patch releases - 0.12.3 to the 0.12.X LTS release line, 0.13.1 to the latest major release. Community is working on 0.14.0 which adds several new features again, and 550+ commits have been landed already. Plan is to code complete by end of june. Outside of the 0.X releases, the most exciting development is the RFC for Hudi 1.X, which is a powerful re-imagination of Hudi as a database for multi modal data lakes. https://github.com/apache/hudi/pull/8679 has seen very active engagement and prototyping is underway to define fully what constitutes the alpha, beta and first stable 1.0 release. ## Community Health: Like noted in the previous report, we spent efforts landing a lot of PRs and backporting bugfixes, which have all resulted in increased project activity across the board.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: [There are no issues requiring board attention. ## Membership Data: Apache Hudi was founded 2020-05-19 (3 years ago) There are currently 33 committers and 17 PMC members in this project. The Committer-to-PMC ratio is roughly 9:5. Community changes, past quarter: - Y Ethan Guo was added to the PMC on 2023-03-13 - Yue Zhang was added as committer on 2022-12-31 ## Project Activity: Since the last report we put out two Hudi releases. 0.12.2 is a minor release on the 0.12.x release line, where the community is currently getting LTS to migrate from much older releases. In addition, we released a new major release 0.13.0 which brings significant improvements (new record merger APIs, new CLI bundle, Flink 1.16 support) as well brand new innovations like (new CDC format, consistent hashing index, eager conflict detection and much more). We put together a Hudi in 2022 blog, https://hudi.apache.org/blog/2022/12/29/Apache-Hudi-2022-A-Year-In-Review that showcases the various community events, and collaboration through 2022. Other similar blogs are featured at https://hudi.apache.org/blog Going forward, we are gearing up for our 1.0 release plans, which will be a major revamp of our underlying storage format, based on our learnings in the community and industry for the past 3+ years. ## Community Health: We continue to see very healthy engagement from the community. The code commits slow down is from the 0.13 release hogging up a lot of the committer cycles to review/land code.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hudi was founded 2020-05-19 (3 years ago) There are currently 32 committers and 16 PMC members in this project. The Committer-to-PMC ratio is 2:1. Community changes, past quarter: - No new PMC members. Last addition was Danny Chen on 2022-01-10. - Alexey Kudinkin was added as committer on 2022-10-05 - Bi Yan was added as committer on 2022-10-05 ## Project Activity: The hudi community also approved the 0.12.1 releases, which is the first series of patch releases we are starting to maintain. We noticed that many Hudi users are on older versions, and this effort will help them migrate to a stable release while picking up newer features in the last 18 months as well. As we write, the 0.12.2 RC is being voted on. Towards the next major 0.13 release, community has landed large contributions like RFC-46, which overhaul the merge APIs for the first time since project inception. Several other key RFCs have also been proposed on our RFC list. https://github.com/apache/hudi/tree/master/rfc ## Community Health: We observed similar levels of engagement from developers and users quarter over quarter. We had noted in investing efforts towards landing commits/PRs faster in the last report. We made some good strides there, leading to an increase in total commits landed i.e cleared some backlog. We had several hudi related tech talks and events. All of them are getting cataloged on the site at. https://hudi.apache.org/talks . We also plan on surfacing Hudi tutorial videos generated by community members to the site.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hudi was founded 2020-05-20 (2 years ago) There are currently 30 committers and 16 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - No new PMC members. Last addition was Danny Chen on 2022-01-11. - No new committers. Last addition was Vinoth Govindarajan on 2022-05-07. ## Project Activity: Hudi community syncs have been regularly happening every month and now also happening in Chinese separately. Videos are being uploaded to the Apache Hudi Youtube channel owned by the PMC. https://www.youtube.com/channel/UCs7AhE0BWaEPZSChrBR-Muw In addition, there have been several talks and blogs in the last quarter which are collected here https://hudi.apache.org/blog. Many query engine projects/vendors expressed interest in a storage spec document, to help integrate with Hudi as a format, for e.g non Java engines. We have published the Hudi tech spec documents here. https://hudi.apache.org/tech-specs/ The hudi community also approved the 0.11.1 and 0.12.0 releases, which contain major features/performance/reliability improvements. In the meantime, 0.13 release is currently planned with some really large new features like metaserver, overhaul of merge APIs, new indexes and many more. Based on feedback from community, we are planning to maintain patch versions on top of 0.12.0, to help users who are much older releases to migrate to a recent version. ## Community Health: We observed similar levels of engagement from developers and users quarter over quarter. Our total open PRs and Issues are roughly 25% more compared to June and the PMC is looking for ways to improve turnaround on code reviews, scaling test infrastructure.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hudi was founded 2020-05-19 (2 years ago) There are currently 30 committers and 16 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - No new PMC members. Last addition was Danny Chen on 2022-01-10. - Forward Xu was added as committer on 2022-04-14 - Vinoth Govindarajan was added as committer on 2022-05-06 - Zhaojing Yu was added as committer on 2022-03-23 ## Project Activity: Hudi community put out yet another strong major release in 0.11.0 and is working towards the first minor release on top 0.11.1. This release had 638 commits from about 61 contributors, and shipped the first ever multi-modal indexing system for data lakes. Community held 3 monthly syncs with talks from Hudi PMC, contributors, users. Several Hudi users including HaloDoc, Zendesk, AWS published blogs on their Hudi usage. ## Community Health: We continue to see a trend where developers prefer to directly engage on Github PRs and issues for technical discussions. PMC is consistently leveraging the dev list though for any decisions made major like RCs,release timelines, or smaller like adjusting community sync times. All in all, this quarter we kept up the momemtum and onboarded net new code contributors to the project, while we slowed down a bit on our support issue resolution rate (GH issues closed). We will pay close attention to this going forward.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hudi was founded 2020-05-19 (2 years ago) There are currently 27 committers and 16 PMC members in this project. The Committer-to-PMC ratio is roughly 7:4. Community changes, past quarter: - Danny Chen was added to the PMC on 2022-01-10 - Tao Meng was added as committer on 2021-12-17 ## Project Activity: Hudi community made two releases since the last board report - a major 0.10.0 release and a minor 0.10.1 bug fix release, on top of that. 0.10.0 delivered some major technical features like z-order clustering optimizations, a new metadata storage model and a new Apache Kafka Connect sink for Hudi. 0.10.1 delivered over 100 bug fixed on top. Given the recent increase in developer bandwidth, we are now planning to do provide minor releases on top of the last major release, roughly each month. The community has put up a roadmap for the months ahead hudi.apache.org/roadmap. Community has been organizing a monthly sync for presenting new Hudi use-cases and so far dec 2021, jan 2022, feb 2022 occurrences have been successfully completed, with tens of total participants. ## Community Health: Community health is looking steady. Decline in dev mailing list traffic could be seasonal, we cannot think of any particular reason why.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hudi was founded 2020-05-19 (2 years ago) There are currently 26 committers and 15 PMC members in this project. The Committer-to-PMC ratio is roughly 7:4. Community changes, past quarter: - Udit Mehrotra was added to the PMC on 2021-11-16 - Sagar Sumit was added as committer on 2021-11-03 - Y Ethan Guo was added as committer on 2021-11-23 ## Project Activity: Community is currently voting on the 0.10.0 release, which adds a new streaming sink for Apache Kafka Connect, simplified config management, and more. For the first time in open source data lake storage, we have added support for multi-dimensional indexing aka space filling curves as well. We focussed a lot on integrations with other open source projects in the data ecosystem - including a Trino connector, redoing our presto Integration. Hudi is also now integrated into the popular dbt framework for authoring ETL pipelines. We revived our monthly community syncs and weekly office hours, post pandemic. Members of the Hudi community continued to give various talks, presentations around their work. ## Community Health: Our github engagement remains strong and growing. As noted in previous reports as well, users tend to prefer reporting issues there over users list.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hudi was founded 2020-05-19 (a year ago) There are currently 24 committers and 14 PMC members in this project. The Committer-to-PMC ratio is roughly 3:2. Community changes, past quarter: - Raymond Xu was added to the PMC on 2021-07-13 - Danny Chen was added as committer on 2021-07-12 - Zhiwei Peng was added as committer on 2021-07-12 ## Project Activity: Hudi community released 0.9.0 version, which bring several large features including Spark SQL DML/DDL support, performance enhancements for uncommitted data rollbacks, virtual keys and a bunch of new data sources. Community also discussed and ratified a more clear description of all the components in the project currently, a future set of new components we will build out and this was shared in a manifesto blog on our site. Members of the community continued to present their work in different conferences and tech talks. We also redid our website, moving away from the jekyll theme to docusaurus and in the process of realigning the docs to the new platform vision. We will also be presenting this in ApacheCon later in september. In other news, we moved our CI away from Travis (shared with other apache projects) to Azure pipelines. Long queuing delays in travis was slowing our development significantly in the early part of the quarter. ## Community Health: There was significant dev activity leading upto the release, that explains the large number of commits, prs. We use github issues for user support, there were fewer opened this reporting period.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hudi was founded 2020-05-19 (a year ago) There are currently 22 committers and 13 PMC members in this project. The Committer-to-PMC ratio is roughly 3:2. Community changes, past quarter: - Gary Li was added to the PMC on 2021-04-23 - No new committers. Last addition was Wenning Ding on 2021-02-17. - PMC sent appreciation emails to couple promising contributors ## Project Activity: Hudi 0.8.0 was released in april, which contained two major features. Flink support matrix is now complete and users have the ability to write concurrently to Hudi tables, with optimistic concurrency control between writers, while preserving MVCC between writers and Hudi's table services. Community is gearing up for a 0.9.0 release, which again brings a oft-requested feature, Spark SQL DML support for Hudi tables. Community deliberated the future vision and a more accurate positioning of the project and agreed to rebrand Hudi as "the data lake platform". We plan to revamp our docs and release a new blog outlining the vision together with the 0.9.0 release. On community evangelization, we held another meetup hosted by Uber, with Robinhood/LogicalClocks engineers presenting. Our PMC also delivered several talks including PrestoCon Day, Data Summit. Vinoth Chandar went on software engineering daily podcast to discuss Hudi and the future of data architectures. ## Community Health: In general, we have had pretty healthy interactions from the community. Increasingly users prefer Github Issues for discussion of support issues over the users ML, due to lack of image sharing, hyperlinking issues. We resolved a lot more support issues than the previous quarter. Developers have been interacting on RFCs (Hudi's design docs) on cWiki - over 200 times. We also spent time organizing our user support and PR review work into boards so community has transparency over how their contributions are being prioritized (or) if they observe delays, why that is the case.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hudi was founded 2020-05-19 (10 months ago) There are currently 22 committers and 12 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - No new PMC members. Last addition was Sivabalan Narayanan on 2020-10-07. - Li Way was added as committer on 2021-01-26 - Xianghu Wang was added as committer on 2021-02-02 - Wenning Ding was added as committer on 2021-02-17 ## Project Activity: We released a very significant 0.7.0 version, with flink support and foundations for supporting different types of table metadata. We also take pride in calling out that fully managed data clustering is available in Hudi, as a first of its kind in open source big data management. We have also made several bug fixes and improvements. We are preparing for 0.8.0 release soon, that vastly improves Flink integration and also provides different multi-writer concurrency modes. Community members gave presentations at Uber organized meetup, that had over ~250 signups. Engineers from Uber, also shared content from a engineering wide learning series, with rest of community. We added few more names to our powered by list. ## Community Health: As evidenced by the metrics, there was significant uptick in activity leading upto the 0.7.0 release and beyond. We had noted that the PMC/Committers were working on process/code abstractions to help improve the velocity of the community. Some of those efforts bore fruits, by way of organizing/streamlining work in umbrella JIRAs etc. Going forward, the challenge we need to tackle is scaling code reviews. We also faced significant issues with travis CI during our release (seems to have recovered since). We are exploring azure CI which provides unlimited minutes (similar to Flink's setup), and investing with more regression/stress tests to gain much higher confidence for landing PRs.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: There are no issues requiring board attention ## Membership Data: Apache Hudi was founded 2020-05-19 (7 months ago) There are currently 19 committers and 12 PMC members in this project. The Committer-to-PMC ratio is roughly 5:3. Community changes, past quarter: - Sivabalan Narayanan was added to the PMC on 2020-10-07 - Prashant Wason was added as committer on 2020-11-20 - Satish Kotha was added as committer on 2020-11-23 ## Project Activity: We are preparing for a 0.7.0 release by end of Dec 2020, which will quite literally be our most feature packed release so far. Some highlight include: Clustering support, built-in metadata tracking to eliminate file listings, Flink support, insert_overwrite operation, many bug fixes/perf improvements, nClouds training session. Our contributors, committers, users have written a number of blogs, delivered talks since the last report. Talks include : ApacheCon talk, PrestoDB panel discussion @PrestoCon, QCon China, DC_THURS Session. New blogs have been also written: Grofers' Data lake article, towards datascience article, aws blog, nClouds blog. All of these can be found on our site https://hudi.apache.org/docs/powered_by.html ## Community Health: While we think the contributor health and overall community activity is healthy, we do acknowledge the relative downward trend in the past quarter. Absolute numbers are good. Below we try to explain them as best as we can On dev mailing list, dev discussions have largely happened over PRs and cWiki pages. We have had several design discussions over cWiki RFC pages, that don't quite get captured here. RFC-15,18,19 which form bulk of the upcoming release all have great amount of cWiki activity We use GH Issues primarily as a support mechanism. So lower issues can also be sign of users facing less problems after our 0.6.0 release. We are actively tracking the issues almost everyday for potentially unknown issues affecting users. On code contributions, we have landed a lot of larger PRs this time around. So review process as well as number of PRs have taken a dip. Before the next report, we plan to build out an analytics pipeline using Hudi which will help us delve deeper into root causing these numbers.
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: There are no issues requiring board attention ## Membership Data: Apache Hudi was founded 2020-05-19 (4 months ago) There are currently 18 committers and 11 PMC members in this project. The Committer-to-PMC ratio is roughly 3:2. Community changes, past quarter: - No new PMC members. Last addition was Anbu Cheeralan on 2020-05-19. - Pre-graduation,last PMC addition was Bhavani Sudha Saktheeswaran on 2020-03-27 - Gary Li was added as committer on 2020-08-21 - Pratyaksh Sharma was added as committer on 2020-09-04 - Udit Mehrotra was added as committer on 2020-08-24 - Raymond Xu was added as committer on 2020-08-21 ## Project Activity: We released Apache Hudi 0.6.0, the major release that provides lot of critical functionality like - Spark Streaming/Compaction, Bulk Ingestion tools, Data bootstrapping and many more. All in all, it was a community effort to stabilize RC, test, fix blockers and put this release out. We had few blogs on hudi.apache.org/blog, that feature contributors sharing ideas, explaining how features work. Our PMC members (Vinoth Chandar/Nishith Agarwal) were featured on DataCouncil DC_THURS to discuss history and evolution of the project. Planning ahead, community is discussing several improvements to our release process including major/minor release principles, testing/CI to be more continuous. 0.7.0 is the next major release that will deliver elimination of filesystem listing, new indexes and Flink support. ## Community Health: Compared to previous report timeline, there were similar activity in terms of github issues/prs/jiras. Some upticks in number of participants on dev ML and its related metrics. Drilling down, we continue to support more and more user support tickets via GH issues, as users tend to like its ability to share code/stack traces better over mailing list. Last month, we merged close to 68 PRs, with several large PRs under review https://github.com/apache/hudi/pulse/monthly
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: There are no issues requiring board attention. ## Membership Data: Apache Hudi was founded 2020-05-19 (3 months ago) There are currently 14 committers and 11 PMC members in this project. The Committer-to-PMC ratio is roughly 7:6. Community changes, past quarter: - No new PMC members (project graduated recently). - No new committers were added. - Ongoing discussions in PMC on committer/pmc candidates ## Project Activity: Developement - We made great progress towards our 0.6.0 major release, expecting first RC this week, as planned by the community. - Notably, we have merged support for bootstrapping any parquet datasets into hudi tables seamlessly, spark streaming/async compaction support, several performance fixes - Author of RFC-15 (design docs in Hudi) has an initial version working, we made progress on several key RFCs like record indexing, clustering - Few large PRs could not make it into 0.6.0, due to timeline risks/expanded scope. Plans are to target this on the first bug fix release in 0.6.x Outreach - Hudi PMC and Contributors from Amazon gave a talk around past, present, future of PrestoDB/Hudi integration. Also authored a blog on presto site. ## Community Health: - We held our very first community code/design walkthrough session, attended by almost ~10 contributors across time zones. Slides/Video recording shared with entire community - Since the last report, we saw good uptick (38%) in conversations on dev mailing list. We also have a users mailing list now, but users are still preferring dev@ - We are nearing almost 500 members on our slack channel. - Our project JIRA/Github activity recorded a 34% uptick, as we picked up pace towards 0.6.0
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: There are no issues requiring board attention ## Membership Data: Apache Hudi was founded 2020-05-19 (2 months ago) There are currently 14 committers and 11 PMC members in this project. The Committer-to-PMC ratio is roughly 7:6. Community changes, past quarter: - No new PMC members (project graduated recently). - No new committers were added. - 4 committers, 1 PMC candidates in the pipeline ## Project Activity: Apache Hudi released 0.5.3, which marks the first release since graduation. Release contained more than 30+ bug/performance fixes. The community also used this opportunity to rework the release guide as a TLP. This sets us up well for future releases. We continued to make steady progress towards the 0.6.0 release, which delivers several large features. To this end, we have merged ~25 pull requests, contributors have proposed ~15 new pull requests. Press/Articles: - Uber published an article on Apache Hudi graduation https://eng.uber.com/apache-hudi-graduation/ - PMC Member Nishith Agarwal presented Hudi at BerlinBuzz words ## Community Health: 158 emails (-33%) on dev mailing list, across 38 topics, 41 participants. 1841 (-25%) interactions across Github Issues, Pull requests, JIRA issues. ~500 messages on Slack Engagement metrics are lower month over month, even as the absolute values remain healthy. We don't clearly understand any patterns here (seasonal or otherwise).
## Description: The mission of Apache Hudi is the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data ## Issues: There are no issues requiring board attention ## Membership Data: Apache Hudi was founded 2020-05-19 (20 days ago) There are currently 14 committers and 11 PMC members in this project. The Committer-to-PMC ratio is roughly 7:6. Community changes, past quarter: - No new PMC members (project graduated recently). - No new committers were added (since graduation). ## Project Activity: Apache Hudi is currently in the process of finalizing 0.5.3 release, which delivers large performance, usability improvements. Community continues to work towards the next 0.6.0 major release, planned over the next month or so. We are also adding more testing to further improve developer velocity and quality. Community also participated in a bug bash, that ran over 10 days. We have had 4 major design proposals (RFCs) submitted and under review, targetting major releases beyond 0.6.0. Blogs/Talks - Blogs have been moved over to the hudi.apache.org site and contributors have written some useful new blogs - Hudi PMC has authored a very popular info.cn article on technical underpinnings of Hudi, that was featured as a top story. - Couple of planned talks were cancelled due to COVID. ## Community Health: Dev mailing list activity is a mix of user questions and technical discussions, which have been pretty steady. We continue to steadily ship code, with contributors driving a large chunk of them. We use Github Issues as the support channel, and there was a lot of growth in users engaging with us on Github to file support issues. We are triaging some flaky test issues now, that has slightly affected our ability to land PRs quickly. Community is actively working on mitigating this.
WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to providing atomic upserts and incremental data streams on Big Data. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache Hudi Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Hudi be and hereby is responsible for the creation and maintenance of software related to providing atomic upserts and incremental data streams on Big Data; and be it further RESOLVED, that the office of "Vice President, Apache Hudi" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Hudi Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Hudi Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Hudi Project: * Nishith Agarwal <nagarwal@apache.org> * Vinoth Chandar <vinoth@apache.org> * Anbu Cheeralan <anchee@apache.org> * Shaofeng Li <leesf@apache.org> * Suneel Marthi <smarthi@apache.org> * Prasanna Rajaperumal <prasanna@apache.org> * Luciano Resende <lresende@apache.org> * Bhavani Sudha <bhavanisudha@apache.org> * Balaji Varadarajan <vbalaji@apache.org> * Thomas Weise <thw@apache.org> * Vino Yang <vinoyang@apache.org> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Vinoth Chandar be appointed to the office of Vice President, Apache Hudi, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the Apache Hudi Project be and hereby is tasked with the migration and rationalization of the Apache Incubator Hudi podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator Hudi podling encumbered upon the Apache Incubator PMC are hereafter discharged. Special Order 7H, Establish the Apache Hudi Project, was approved by Unanimous Vote of the directors present.
Hudi provides atomic upserts and incremental data streams on Big Data Hudi has been incubating since 2019-01-17. ### Three most important unfinished issues to address before graduating: 1. Project is ready to graduate from incubator. 2. 3. ### Are there any issues that the IPMC or ASF Board need to be aware of? None ### How has the community developed since the last report? 1. 683 conversations on dev ML across 130 topics [1] 2. 75 participants during this period ### How has the project developed since the last report? 1. ~180 Commits in gitbox [2] 2. ~260 issues opened on Jira [3]. ~140 issues resolved in Jira [4] 3. Hudi 0.5.1 released on Jan 31, 2020 4. Hudi 0.5.2 released on March 26, 2020 5. Work in progress for Hudi 0.6.0 planned for Apr 2020. 6. 2 new committers - Sivabalan Narayanan, Lamber-ken 7. 3 new PPMCs - Leesf, Vino Yang, Bhavani Sudha Saktheeswaran 8. Completed the Apache Maturity Matrix for the project [5] 9. Apache Hudi talk at Hadoop Summit Bangalore [6] 10. Apache Hudi & Apache Kylin Online Meetup, China [7] [8] 11. Steve Blackmon was added as a mentor on April 3, 2020 [1] https://lists.apache.org/trends.html?dev@hudi.apache.org:lte=3M [2] git log --since="2019-12-25" --no-merges | grep -e 'commit [a-zA-Z0-9]*' | wc -l [3] project = HUDI AND created >= 2019-12-25 AND created <=now() [4] project = HUDI AND status = Closed AND status changed to Closed DURING ("2019/12/25",now()) [5] https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi +Maturity+Matrix [6] https://www.slideshare.net/SyedKather/building-robust-cdc- pipeline-with-apache-hudi-and-debezium [7] https://drive.google.com/open?id=1dmH2kWJF69PNdifPp37QBgjivOHaSLDn [8] https://drive.google.com/open?id=1Pk_WdFxfEZxMMfAOn0R8-m3ALkcN6G9e ### How would you assess the podling's maturity? Please feel free to add your own commentary. 1. The project now has a diverse developer and user community, and excellent community traction. 2. The project’s committers and PPMC members are drawn from diverse places - Tencent, Uber, Confluent, Snowflake, Lyft, Shopify, Double Verify. 3. Apache Hudi is being used across various industries for creating data lakes and also for managing Machine Learning feature stores. - AWS, Alibaba, Uber, Tencent, Kyligence, EMIS Health, Tathastu.ai, Logical Clocks - [ ] Initial setup - [ ] Working towards first release - [ ] Community building - [X] Nearing graduation - [ ] Other: ### Date of last release: 2020-03-26 Apache Hudi-incubating 0.5.2 Release 2020-01-31 Apache Hudi-incubating 0.5.1 Release ### When were the last committers or PPMC members elected? Sivabalan Narayanan was made committer on Feb 15, 2020 Vino Yang and Leesf were added to PPMC on Feb 15, 2020 Bhavani Sudha was added to PPMC on April 1, 2020 Lamber-ken was made a committer on March 31, 2020 ### Have your mentors been helpful and responsive? Yes, very helpful ### Is the PPMC managing the podling's brand / trademarks? Yes. ### Signed-off-by: - [X] (hudi) Suneel Marthi Comments: The project is ready to graduate incubator. - [X] (hudi) Thomas Weise Comments: Ready for graduation. - [ ] (hudi) Luciano Resende Comments: - [ ] (hudi) Kishore Gopalakrishnan Comments: - [X] (hudi) Steve Blackmon Comments: ### IPMC/Shepherd notes:
Hudi provides atomic upserts and incremental data streams on Big Data Hudi has been incubating since 2019-01-17. ### Three most important unfinished issues to address before graduating: 1. Making sufficient number of Apache releases. 2. Continue to grow the community. 3. Work towards graduation. Finish pending issues in the Maturity Matrix document : https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+Maturity+Matrix ### Are there any issues that the IPMC or ASF Board need to be aware of? None ### How has the community developed since the last report? 1. 630 conversations on dev ML across ~100 topics 2. 70 participants during this period ### How has the project developed since the last report? 1. ~130 Commits in gitbox 2. ~200 issues opened on Jira. ~80 issues resolved in Jira 3. Hudi 0.5.0 (first Apache Release) released. Next release 0.5.1 planned for January 2020. 4. 3 new committers (vinoyang, leesf and bhavanisudha) added to project. 5. Apache Hudi is now packaged as part of AWS EMR. Apache Hudi talk in AWS re:Invent was well received 6. Project took a first pass at assessing Apache Maturity Model for the project. ### How would you assess the podling's maturity? The project now has a diverse developer and user community, and seeing increased adoption. - [ ] Initial setup - [ ] Working towards first release - [X] Community building - [X] Nearing graduation - [ ] Other: ### Date of last release: 2019-10-24 ### When were the last committers or PPMC members elected? 2019-11-08 - Bhavani Sudha Saktheeswaran, Vino Yang and Leesf. ### Have your mentors been helpful and responsive? Yes. Very helpful! ### Is the PPMC managing the podling's brand / trademarks? Yes ### Signed-off-by: - [x] (hudi) Thomas Weise Comments: - [x] (hudi) Luciano Resende Comments: - [ ] (hudi) Kishore Gopalakrishnan Comments: - [X] (hudi) Suneel Marthi Comments: ### IPMC/Shepherd notes:
Hudi provides atomic upserts and incremental data streams on Big Data Hudi has been incubating since 2019-01-17. ### Three most important unfinished issues to address before graduating: 1. Making sufficient number of releases in the Apache way 2. Growing community further by grooming contributors to committers 3. ### Are there any issues that the IPMC or ASF Board need to be aware of? None ### How has the community developed since the last report? 1. ~400 conversations on dev ML across ~50 topics 2. 20-30 participants for each one month period 3. ~40 support issues opened on GitHub ### How has the project developed since the last report? 1. ~1500 gitbox activities over the three months 2. 122 JIRA issues created, 77 resolved 3. Community voted on two release candidates so far. RC3 underway towards first release 4. ApacheCon NA talk was well received at the conference 5. Hudi also featured in few industry blogs as an interesting project in the category. ### How would you assess the podling's maturity? Please feel free to add your own commentary. - [ ] Initial setup - [X] Working towards first release - [X] Community building - [ ] Nearing graduation - [ ] Other: ### Date of last release: N/A ### When were the last committers or PPMC members elected? During inception into incubator ### Have your mentors been helpful and responsive? Yes. Very helpful! ### Signed-off-by: - [X] (hudi) Thomas Weise Comments: Nice open collaboration on mailing list, close to first incubator release. - [X] (hudi) Luciano Resende Comments: - [ ] (hudi) Kishore Gopalakrishnan Comments: - [X] (hudi) Suneel Marthi Comments: ### IPMC/Shepherd notes:
Hudi provides atomic upserts and incremental data streams on Big Data Hudi has been incubating since 2019-01-17. ### Three most important unfinished issues to address before graduating: 1. Making sufficient number of releases in the Apache way 2. Legal/IP Clearance of software artifacts (LEGAL-461) 3. Growing community further by grooming contributors to committers ### Are there any issues that the IPMC or ASF Board need to be aware of? 1. PODLINGNAMESEARCH-162 has been completed. But not reflected on whimsy 2. Software grant has been signed by Uber. But not reflected on whimsy ### How has the community developed since the last report? 1. Mailing list subs grown to >50, 65 new mailing list threads 2. Slack is about 99 signups total (20-30 WAU), 39 total contributors on github, ~25 support issues closed on GitHub 3. 3 new organizations reported usage onto the Hudi site ### How has the project developed since the last report? 1. ~66 commits from ~15 contributors/committers, across 2 releases 2. All development now happening on ASF infrastructure, with source code being prepared for ASF release 3. External talks on DataCouncil SF19 and SF BigAnalytics Meetup ### How would you assess the podling's maturity? Please feel free to add your own commentary. - [ ] Initial setup - [X] Working towards first release - [ ] Community building - [ ] Nearing graduation - [ ] Other: ### Date of last release: N/A ### When were the last committers or PPMC members elected? During inception into incubator. ### Have your mentors been helpful and responsive? No Answer. ### Signed-off-by: - [X] (hudi) Thomas Weise Comments: Nice work on the collaboration side. Are there new contributors that could become committer candidates? - [X] (hudi) Luciano Resende Comments: The issues mentioned above that needs board attention are probably just a question of updating the project page file with the proper done status/date. Please speakup if the community needs help from mentors updating the file. - [ ] (hudi) Kishore Gopalakrishnan Comments: - [X] (hudi) Suneel Marthi Comments: ### IPMC/Shepherd notes:
Hudi provides atomic upserts and incremental data streams on Big Data Hudi has been incubating since 2019-01-17. Three most important unfinished issues to address before graduating: 1. Make frequent releases as per Apache guidelines 2. Grow community 3. Complete SGA, transfer code to ASF infra Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * PODLINGNAMESEARCH-162 has been completed, but is not reflected on Whimsy * SGA process has been delayed due to inability to quickly interact with ASF legal counsel. Help expediting this communication would help a lot to get the SGA done by Uber. How has the community developed since the last report? - Project source code/docs/issue management all now hosted on apache infrastructure - HIP, a process for proposing large changes to the project has been formalized by the community - 30+ new threads on dev ML, with ~10 non-PPMC contributors How has the project developed since the last report? 1. Code has been moved over to apache/incubator-hudi 2. hudi.apache.org site has been restructured and simplified for community consumption 3. Hudi Improvement Plan (based off Apache Kafka KIP) ratified and formalized. Few first HIPs written 4. Submitted a Hudi talk abstract for Kafka Summit 2019 5. ~20 PRs merged How would you assess the podling's maturity? Please feel free to add your own commentary. [X] Initial setup [ ] Working towards first release [ ] Community building [ ] Nearing graduation [ ] Other: Date of last release: Project still being established in Incubator When were the last committers or PPMC members elected? No new committers since incubation. Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. Yes. Mentors are continuing to help us make things better Signed-off-by: [X](hudi) Thomas Weise Comments: [X](hudi) Luciano Resende Comments: The podling is claiming 'Initial Setup'. What is still missing? Any help required from mentors ? [ ](hudi) Kishore Gopalakrishnan Comments: [X](hudi) Suneel Marthi Comments: IPMC/Shepherd notes:
Hudi provides atomic upserts and incremental data streams on Big Data Hudi has been incubating since 2019-01-17. Three most important issues to address in the move towards graduation: 1. Make frequent releases as per Apache guidelines 2. Grow community 3. Complete SGA, transfer code to ASF infra Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? * PODLINGNAMESEARCH-162 has been completed, but is not reflected on Whimsy How has the community developed since the last report? Project's still being setup. Few new user inquiries, 10+ signups on dev@ ML How has the project developed since the last report? 1. Podling name search has been completed (PODLINGNAMESEARCH-162) 2. hudi.apache.org site has been published with community guidelines 3. Hudi Improvement Plan (based off Apache Kafka KIP) under review 4. Agreement over code migration method, JIRA vs Github Issues 5. Submitted a Hudi talk abstract for upcoming Berlin Buzzwords in June 2019 6. Issues migrated to Apache JIRA from github. How would you assess the podling's maturity? Please feel free to add your own commentary. [X] Initial setup [ ] Working towards first release [ ] Community building [ ] Nearing graduation [ ] Other: Date of last release: Project still being established When were the last committers or PPMC members elected? No new committers since incubation. Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. Yes. Mentors are actively following up on ML questions and pointing out gaps Signed-off-by: [X](hudi) Thomas Weise Comments: Nice uptick on mailing list activity and collaboration thinking. [x](hudi) Luciano Resende Comments: A formal improvement process such based on Kafka KIP might be overkill for a podling that is actively looking for growing the community. Usually these processes are implemented on big communitites that want to have some control over stability or backward compatibility of the code. [ ](hudi) Kishore Gopalakrishnan Comments: [X](hudi) Suneel Marthi Comments: Good adaption of the Apache Way by PPMC. IPMC/Shepherd notes:
Hudi provides atomic upserts and incremental data streams on Big Data Hudi has been incubating since 2019-01-17. Three most important issues to address in the move towards graduation: 1. Make frequent releases as per Apache guidelines 2. Grow community 3. Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? None so far How has the community developed since the last report? Project's still being setup How has the project developed since the last report? 1. Initial set of committers have filed ICLA 2. Jira and Github repo have been setup 3. Mailing lists have been setup How would you assess the podling's maturity? Please feel free to add your own commentary. [X] Initial setup [ ] Working towards first release [ ] Community building [ ] Nearing graduation [ ] Other: Date of last release: Project's still being established When were the last committers or PPMC members elected? Initial set of committers added to the project Have your mentors been helpful and responsive or are things falling through the cracks? In the latter case, please list any open issues that need to be addressed. The mentors have been very helpful in getting this podling established. No open issues. Signed-off-by: [X](hudi) Thomas Weise Comments: [X](hudi) Luciano Resende Comments: [ ](hudi) Kishore Gopalakrishnan Comments: [X](hudi) Suneel Marthi Comments: IPMC/Shepherd notes: