This was extracted (@ 2024-11-20 22:10) from a list of minutes
which have been approved by the Board.
Please Note
The Board typically approves the minutes of the previous meeting at the
beginning of every Board meeting; therefore, the list below does not
normally contain details from the minutes of the most recent Board meeting.
WARNING: these pages may omit some original contents of the minutes.
Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).
## Description: - The Apache CarbonData is data store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - The community need to consider, just maintain the existing feature for version stability or encourage new contributors to do more new features, need to discuss in community. ## Activity: - Optimize the community building https://github.com/apache/carbondata/pull/4358 - Upgrade Thrift version : https://github.com/apache/carbondata/pull/4355 , https://github.com/apache/carbondata/pull/4356 - Upgrade spark version : https://github.com/apache/carbondata/pull/4354 - hulk as new contributor, optimized documents ## Health Report: - Commit activity: - 8 commits in the past quarter - 4 code contributors in the past quarter ## Releases: * currently , community is working for upgrade spark version : https://github.com/apache/carbondata/pull/4354 * 2.3.1 was released on 2023-11-25. * 2.3.0 was released on 2022-01-24. * 2.2.0 was released on 2021-08-05. - ## Project Composition: - There are currently 28 committers and 19 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:5 ## Community changes, past quarter: - Bo Xu was added to the PMC on 2023-04-22 - Brijoo Bopanna was added to the PMC on 2022-09-24 - Indhumathi was added to the PMC on 2022-02-16 - Vikram Ahuja was added as committer on 2022-02-10 - Akash R Nilugal was added to the PMC on 2021-04-11 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 156 subscribers (change 9): - dev@carbondata.apache.org had a 59% decrease in traffic in the past quarter (14 emails compared to 34): ## Github issues activity: - 5 issues be handled ## Github PR activity: - 6 PRs opened on GitHub, past quarter - 5 PRs closed on GitHub, past quarter
@Kanchana: pursue a roll call for PMC
## Description: - The Apache CarbonData is data store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - The community need to consider, just maintain the existing feature for version stability or encourage new contributors to do more new features, need to discuss in community. ## Activity: - Jacky Li raised two pull requests to upgrade Thrift version : https://github.com/apache/carbondata/pull/4355, https://github.com/apache/carbondata/pull/4356 - David cai qiang is working one big pull request , upgrade spark version https://github.com/apache/carbondata/pull/4354 - hulk as new contributor, optimized documents - Liang Chen reviewed PRs, and did some maintenances ## Health Report: - Commit activity: - 8 commits in the past quarter - 4 code contributors in the past quarter ## Releases: * currently , community is working for upgrade spark version : https://github.com/apache/carbondata/pull/4354 * 2.3.1 was released on 2023-11-25. * 2.3.0 was released on 2022-01-24. * 2.2.0 was released on 2021-08-05. - ## Project Composition: - There are currently 28 committers and 19 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:5 ## Community changes, past quarter: - Bo Xu was added to the PMC on 2023-04-22 - Brijoo Bopanna was added to the PMC on 2022-09-24 - Indhumathi was added to the PMC on 2022-02-16 - Vikram Ahuja was added as committer on 2022-02-10 - Akash R Nilugal was added to the PMC on 2021-04-11 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 158 subscribers (change 8): - dev@carbondata.apache.org had a 59% decrease in traffic in the past quarter (14 emails compared to 34): ## Github issues activity: - 3 issues be handled ## Github PR activity: - 5 PRs opened on GitHub, past quarter - 3 PRs closed on GitHub, past quarter
## Description: - The Apache CarbonData is data store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Last 3 months, we focused on bugfixs and discussed some good points in dev mailing list: - Liang Chen as release manager , finished 2.3.1 release. - David caiqiang fixed CarbonData integration issues, made the CI run properly. - David is working Spark integration upgrade as per dev mailing discussion : https://lists.apache.org/thread/bskfdyqfhsdk9jb56otc2rc3l3y8781c - hulk optimized documents - Bo xu is working some examples, how to use AI notebook to exist history carbondata - Jacky Li started discussion thread for C++ implementation for Carbondata reader and writer : https://lists.apache.org/thread/xt99wj4lk5gzgyymg4wbmmtt1q95lyfv ## Health Report: - Commit activity: - 13 commits in the past quarter - 4 code contributors in the past quarter ## Releases: * currently , community is working for upgrade spark version as per this discussion : https://lists.apache.org/thread/bskfdyqfhsdk9jb56otc2rc3l3y8781c * 2.3.1 was released on 2023-11-25. * 2.3.0 was released on 2022-01-24. * 2.2.0 was released on 2021-08-05. - ## Project Composition: - There are currently 28 committers and 19 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:5 ## Community changes, past quarter: - new contributor git-hulk : https://github.com/apache/carbondata/commits?author=git-hulk - Bo Xu was added to the PMC on 2023-04-22 - Brijoo Bopanna was added to the PMC on 2022-09-24 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 158 subscribers (change 8): - dev@carbondata.apache.org had a 557% increase in traffic in the past quarter (125 emails compared to 19): ## Github issues activity: - 5 issues be handled ## Github PR activity: - 6 PRs opened on GitHub, past quarter - 8 PRs closed on GitHub, past quarter
@Justin: follow up with CarbonData around accuracy of board report
## Description: - The Apache CarbonData is data store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Last 3 months, we focused on bugfixs and discussed how to integrate with AI computing engine: - In Nov, we finished 2.3.1 release. - David caiqiang fixed CarbonData integration with Spark issues, made the CI run properly. - Bo xu contributed : using Apache CarbonData to integrate with notebook, machine learning functions can query carbondata also. - Jacky Li started a new discussion thread for C++ implementation for Carbondata reader and writer ## Health Report: - Commit activity: - 29 commits in the past quarter - 6 code contributors in the past quarter ## Releases: - we are preparing 2.3.2, the community focus on fixing issues * 2.3.1 was released on 2023-11-25. * 2.3.0 was released on 2022-01-24. * 2.2.0 was released on 2021-08-05. - ## Project Composition: - There are currently 28 committers and 19 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:5 ## Community changes, past quarter: No new PMC members. Last addition was Bo Xu on 2023-04-21. No new committers. Last addition was Brijoo Bopanna on 2022-09-20. ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 160 subscribers (change 13): - dev@carbondata.apache.org had a 557% increase in traffic in the past quarter (125 emails compared to 19): ## Github issues activity: - 4 issues be handled ## Github PR activity: - 7 PRs opened on GitHub, past quarter - 57 PRs closed on GitHub, past quarter
## Description: - The Apache CarbonData is data store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Last 3 months, we focused on bugfixs: - Build carbondata notebook docker image by manual and by docker file - Support using Apache CarbonData in notebook - Add new example:Using CarbonData to visualization in notebook - Fixed CI issues ## Health Report: - Commit activity: - 15 commits in the past quarter - 5 code contributors in the past quarter ## Releases: - we are preparing 2.3.x, the community focus on fixing issues - 2.3.0 was released on 2022-01-24 - 2.2.0 was released on 2021-08-05 - 2.1.1 was released on 2021-03-29 ## Project Composition: - There are currently 28 committers and 19 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:5 ## Community changes, past quarter: - Bo Xu was added to the PMC on 2023-04-22 - Brijoo Bopanna was added to the PMC on 2022-09-24 - Indhumathi was added to the PMC on 2022-02-16 - Vikram Ahuja was added as committer on 2022-02-10 - Akash R Nilugal was added to the PMC on 2021-04-11 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 160 subscribers (change 13): - dev@carbondata.apache.org had a 23% decrease in traffic in the past quarter (28 emails compared to 36) ## Github issues activity: - 8 issues opened on Github. ## Github PR activity: - 10 PRs opened on Github. - 7 PRs closed.
@Christofer: perform a roll call
## Description: - The Apache CarbonData is data store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Last 3 months, we focused on fixing some issues: - Fixed performance issues by changing index id. - Fixed Secondary Index till segment level with SI as datamap, make Secondary Index as a coarse grain Datamap and use SI for Presto queries - Fixed Exception in loading data with overwrite on partition table. ## Health Report: - Commit activity: - 11 commits in the past quarter - 4 code contributors in the past quarter ## Releases: - we are preparing 2.3.x, the community focus on fixing issues - 2.3.0 was released on 2022-01-24 - 2.2.0 was released on 2021-08-05 - 2.1.1 was released on 2021-03-29 ## Project Composition: - Brijoo Bopanna was added to the PMC on 2022-09-24 - Brijoo Bopanna was added as committer on 2022-09-20 - There are currently 28 committers and 18 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:5. - In private mailing list, we are discussing a new PMC. ## Community changes, past quarter: - Brijoo Bopanna was added to the PMC on 2022-09-24 - Indhumathi was added to the PMC on 2022-02-16 - Vikram Ahuja was added as committer on 2022-02-10 - Akash R Nilugal was added to the PMC on 2021-04-11 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 161 subscribers (change 13): - dev@carbondata.apache.org had a 23% decrease in traffic in the past quarter (28 emails compared to 36) ## Github issues activity: - 5 issues opened on Github. ## Github PR activity: - 5 PRs opened on Github. - 7 PRs closed.
## Description: - The Apache CarbonData is data store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Last 3 months, we focus on fixing issues, some key issues as below: - Fixed spark integration compile issues. - Fixed Exception in loading data with overwrite on partition table. - Fixed index issues of "sort_columns" ## Health Report: - Commit activity: - 13 commits in the past quarter - 5 code contributors in the past quarter ## Releases: - we are preparing 2.3.x, the community focus on fixing issues - 2.3.0 was released on 2022-01-24 - 2.2.0 was released on 2021-08-05 - 2.1.1 was released on 2021-03-29 ## Project Composition: - Brijoo Bopanna was added to the PMC on 2022-09-24 - Brijoo Bopanna was added as committer on 2022-09-20 - There are currently 28 committers and 18 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:5. - In private mailing list, we are discussing a new PMC. ## Community changes, past quarter: - Brijoo Bopanna was added to the PMC on 2022-09-24 - Indhumathi was added to the PMC on 2022-02-16 - Vikram Ahuja was added as committer on 2022-02-10 - Akash R Nilugal was added to the PMC on 2021-04-11 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 161 subscribers (change 13): - dev@carbondata.apache.org had a 23% decrease in traffic in the past quarter (28 emails compared to 36) ## Github issues activity: - 5 issues opened on Github. ## Github PR activity: - 5 PRs opened on Github.
## Description: - The Apache CarbonData is data store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Last 3 months, we focused on fixing issues, some key issues as below: - Operations failed when index loaded - [CARBONDATA-4338] Moving dropped partition data to trash - Remove list files while query and invalid cache. - Fix Wrong Projection name displayed if MV create query has alias. - Fix Show Schema Issue. ## Health Report: - Commit activity: - 23 commits in the past quarter (10 increase) - 11 code contributors in the past quarter ( 10% change) ## Releases: - we are preparing 2.3.1 for fixing some issues - 2.3.0 was released on 2022-01-24 ## Project Composition: - There are currently 28 committers and 18 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:5. ## Community changes, past quarter: - New PMC members Brijoo Bopanna added on 2022-09-24 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 165 subscribers (change 13): - dev@carbondata.apache.org had a 36% decrease in traffic in the past quarter (1012 emails compared to 1566): - issues@carbondata.apache.org: - issues@carbondata.apache.org had a 80% decrease in traffic in the past quarter (67 emails compared to 324): - user@carbondata.apache.org: - 76 subscribers (no change): ## Busiest GitHub issues/PRs: - carbondata/pull/4291Bump jackson-databind from 2.10.0 to 2.13.4.1 in /integration/presto(4 comments) - carbondata/pull/4292Bump solr-core from 6.3.0 to 8.8.2 in /index/lucene(3 comments) - carbondata/pull/4293Bump jackson-databind from 2.10.0 to 2.12.7.1 in /integration/presto(0 comments)
## Description: - The Apache CarbonData is data store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## 2022-08-17 from sharan: I see a ASF Infra have sent the PMC a couple of emails regarding the failure of a signature key for your 2.0.1 download artifacts. Is any work currently being done to resolve this ? Re: Yes, our community solved this issue. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Last 3 months, we focus on fixing issues, some key issues as below: - Fixed index issues about multiple sessions table. - Fixed Exception in loading data with overwrite on partition table. - Fixed DDM sentence about column query failed. ## Health Report: - Commit activity: - 13 commits in the past quarter (10% decrease) - 9 code contributors in the past quarter ( 10% change) ## Releases: - we are preparing 2.3.x, the community focus on fixing issues - 2.3.0 was released on 2022-01-24 - 2.2.0 was released on 2021-08-05 - 2.1.1 was released on 2021-03-29 ## Project Composition: - Brijoo Bopanna was added to the PMC on 2022-09-24 - Brijoo Bopanna was added as committer on 2022-09-20 - There are currently 28 committers and 18 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:5. ## Community changes, past quarter: - Indhumathi was added to the PMC on 2022-02-16 - Vikram Ahuja was added as committer on 2022-02-10 - Akash R Nilugal was added to the PMC on 2021-04-11 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 162 subscribers (change 13): - dev@carbondata.apache.org had a 36% decrease in traffic in the past quarter (1012 emails compared to 1566) - issues@carbondata.apache.org: - issues@carbondata.apache.org had a 80% decrease in traffic in the past quarter (67 emails compared to 324) - user@carbondata.apache.org: - 75 subscribers (no change): ## JIRA activity: - 9 issues opened in JIRA, past quarter (10% increase) - 7 issues closed in JIRA, past quarter (10% increase)
## Description: - The Apache CarbonData is data store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Last 3 months, we focus on fixing issues, some key issues as below: - update/delete operations failed when other format segments deleted from carbon table. - Create MV fails with "LOCAL_DICTIONARY_INCLUDE/LOCAL _DICTIONARY_EXCLUDE column: does not exist in table. - Fix NullPointerException in load overwrite on partition table. - Fix Desc Columns shows New Column added, even though ALter ADD column query failed. - Incremental Dataload of Average aggregate in MV. - Fix multiple issues with External table. - Fix Query Performance issue for Spark 3.1. - Fix MV not hitting with multiple sessions issue. ## Health Report: - Commit activity: - 23 commits in the past quarter (10 increase) - 11 code contributors in the past quarter ( 10% change) ## Releases: - we are preparing 2.3.x, the community focus on fixing issues - 2.3.0 was released on 2022-01-24 - 2.2.0 was released on 2021-08-05 - 2.1.1 was released on 2021-03-29 - 2.1.0 was released on 2020-11-12. - 2.0.1 was released on 2020-06-01. - 2.0.0 was released on 2020-05-20. - 1.6.1 was released on 2019-10-25. ## Project Composition: - There are currently 27 committers and 17 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:5. ## Community changes, past quarter: - Indhumathi was added to the PMC on 2022-02-16 - Vikram Ahuja was added as committer on 2022-02-10 - Akash R Nilugal was added to the PMC on 2021-04-11 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 162 subscribers (change 13): - dev@carbondata.apache.org had a 12% decrease in traffic in the past quarter (970 emails compared to 1012) - issues@carbondata.apache.org: - issues@carbondata.apache.org had a 20% increase in traffic in the past quarter (83 emails compared to 67) - user@carbondata.apache.org: - 76 subscribers (no change): ## JIRA activity: - 11 issues opened in JIRA, past quarter (10% increase) - 9 issues closed in JIRA, past quarter (10% increase)
No report was submitted.
## Description: - The Apache CarbonData is data store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Last 3 months, we released 2.3.0, some key features as below: - Alter schema for complex columns. - Support for Dynamic Partition Pruning for Spark-3.1 to enhance performance - Support spatial index creation using data frame - Introduce Streamer tool for carbondata - cleaned up the Carbondata dist area. ## Health Report: - Commit activity: - 20 commits in the past quarter (-41% decrease) - 5 code contributors in the past quarter (-44% change) - GitHub PR activity: - 10 PRs opened on GitHub, past quarter (-23% change) - 10 PRs closed on GitHub, past quarter (-37% change) ## Releases: - 2.3.0 was released on 2022-01-24 - 2.2.0 was released on 2021-08-05 - 2.1.1 was released on 2021-03-29 - 2.1.0 was released on 2020-11-12. - 2.0.1 was released on 2020-06-01. - 2.0.0 was released on 2020-05-20. - 1.6.1 was released on 2019-10-25. ## Project Composition: - There are currently 27 committers and 17 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:5. ## Community changes, past quarter: - Indhumathi was added to the PMC on 2022-02-16 - Vikram Ahuja was added as committer on 2022-02-10 - Akash R Nilugal was added to the PMC on 2021-04-11 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 163 subscribers (change 13): - dev@carbondata.apache.org had a 36% decrease in traffic in the past quarter (1012 emails compared to 1566) - issues@carbondata.apache.org: - issues@carbondata.apache.org had a 80% decrease in traffic in the past quarter (67 emails compared to 324) - user@carbondata.apache.org: - 76 subscribers (no change): ## JIRA activity: - 9 issues opened in JIRA, past quarter (-26% decrease) - 8 issues closed in JIRA, past quarter (19% increase)
## Description: - The Apache CarbonData is data store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Last 3 months, we are preparing 2.3.0 release, some key features as below: - Support spatial index creation using data frame - Upgrade Prestosql to 333 version - Support Carbondata Streamer tool to fetch data incrementally and merge - Support DPP for carbon filters - Alter support for complex types - Apache Carbondata double-checked and analyzed Log4j2 Vulnerability (CVE-2021-44228, CVE-2021-45046,CVE-2021-45105) ,We currently believe that the Apache CarbonData platform is not impacted.Apache CarbonData does not directly use a version of log4j known to be affected by the vulnerability. We have reviewed the code and run the vulnerability tool, as per the tool report, these three vulnerabilities (CVE-2021-44228,CVE-2021-45046,CVE-2021-45105) are not identified. ## Health Report: - Commit activity: - 34 commits in the past quarter (-7% decrease) - 9 code contributors in the past quarter (28% increase) - GitHub PR activity: - 13 PRs opened on GitHub, past quarter (10% increase) - 16 PRs closed on GitHub, past quarter (-6% decrease) ## Releases: - 2.3.0 is processing of the RC1 in 2021 Dec - 2.2.0 was released on 2021-08-05 - 2.1.1 was released on 2021-03-29 - 2.1.0 was released on 2020-11-12. - 2.0.1 was released on 2020-06-01. - 2.0.0 was released on 2020-05-20. - 1.6.1 was released on 2019-10-25. ## Project Composition: - There are currently 26 committers and 16 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:4. ## Community changes, past quarter: - Akash R Nilugal was added to the PMC on 2021-04-11 - Ajantha Bhat U was added to the PMC on 2020-11-15 - Indhumathi was added as committer on 2020-10-02 - Kunal Kapoor was added to the PMC on 2020-03-29 - Tao Li was added as committer on 2020-02-04 - Zhi Liu was added as committer on 2020-02-27 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 179 subscribers (increase 6): - dev@carbondata.apache.org had a 36% decrease in traffic in the past quarter (1012 emails compared to 1566): - issues@carbondata.apache.org: - issues@carbondata.apache.org had a 48% increase in traffic in the past quarter (302 emails compared to 204): - user@carbondata.apache.org: - 76 subscribers (no change): ## JIRA activity: - 20 issues opened in JIRA, past quarter (-26% decrease) - 16 issues closed in JIRA, past quarter (19% increase)
## Description: - The Apache CarbonData is an indexed columnar store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Last 3 months our focus has been mainly towards development of these features in community - Support Add, Drop and rename column support for the complex column - Secondary Index Support for Presto - Local sort Partition Load and Compaction improvement - Improve table status and metadata writing - Geo Spatial Query enhancements - Integrate Carbondata with spark-3.1 - we actively participated in ApacheCon Asia 2021: - [Development Bank of Singapore] Data Platform Drives Real-time Insights & Analytics using Apache CarbonData - Ravindra Pesala, Kumar Vishal - Faster Bigdata Analytics by maneuvering Apache CarbonData’s Indexes - Akash R Nilugal, Kunal Kapoor ## Health Report: - Commit activity: - 48 commits in the past quarter (-7% decrease) - 18 code contributors in the past quarter (28% increase) - GitHub PR activity: - 55 PRs opened on GitHub, past quarter (10% increase) - 47 PRs closed on GitHub, past quarter (-6% decrease) ## Releases: - 2.2.0 was released on 2021-08-05 - 2.1.1 was released on 2021-03-29 - 2.1.0 was released on 2020-11-12. - 2.0.1 was released on 2020-06-01. - 2.0.0 was released on 2020-05-20. - 1.6.1 was released on 2019-10-25. ## Project Composition: - There are currently 26 committers and 16 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:4. ## Community changes, past quarter: - Akash R Nilugal was added to the PMC on 2021-04-11 - Ajantha Bhat U was added to the PMC on 2020-11-15 - Indhumathi was added as committer on 2020-10-02 - Kunal Kapoor was added to the PMC on 2020-03-29 - Tao Li was added as committer on 2020-02-04 - Zhi Liu was added as committer on 2020-02-27 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 179 subscribers (increase 6): - dev@carbondata.apache.org had a 8% decrease in traffic in the past quarter (1674 emails compared to 1819): - issues@carbondata.apache.org: - issues@carbondata.apache.org had a 48% increase in traffic in the past quarter (302 emails compared to 204): - user@carbondata.apache.org: - 76 subscribers (no change): ## JIRA activity: - 57 issues opened in JIRA, past quarter (-26% decrease) - 49 issues closed in JIRA, past quarter (19% increase)
## Description: - The Apache CarbonData is an indexed columnar store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Last 3 months our focus has been mainly towards development of these features in community - Integrate Carbondata with spark-3.1 - Leverage Secondary Index till segment level with SI as datamap - Make Secondary Index as a coarse grain Datamap and use SI for Presto queries - Geospatial Query Enhancements - CDC merge performance improvements - Next 3 months our main focus will be towards : - CDC enhancements with support for schema change capture - Transaction manager and segment interface refactor to support time travel - Integration of DPP to take leverage of partition based performance improvements in spark 3.1 - Improve the load performance when the schema is of wide table - We are developing Apache CarbonData 2.2.0, current is RC1. - Apache CarbonData has two topics in ApacheCon Asia 2021: - [Development Bank of Singapore] Data Platform Drives Real-time Insights & Analytics using Apache CarbonData - Ravindra Pesala, Kumar Vishal - Faster Bigdata Analytics by maneuvering Apache CarbonData’s Indexes - Akash R Nilugal, Kunal Kapoor ## Health Report: - Commit activity: - 52 commits in the past quarter (10% increase) - 14 code contributors in the past quarter (-6% change) - GitHub PR activity: - 49 PRs opened on GitHub, past quarter (8% increase) - 50 PRs closed on GitHub, past quarter (8% increase) ## Releases: - 2.1.1 was released on 2021-03-29 - 2.1.0 was released on 2020-11-12. - 2.0.1 was released on 2020-06-01. - 2.0.0 was released on 2020-05-20. - 1.6.1 was released on 2019-10-25. ## Project Composition: - There are currently 26 committers and 16 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:4. ## Community changes, past quarter: - Akash R Nilugal was added to the PMC on 2021-04-11 - Ajantha Bhat U was added to the PMC on 2020-11-15 - Indhumathi was added as committer on 2020-10-02 - Kunal Kapoor was added to the PMC on 2020-03-29 - Tao Li was added as committer on 2020-02-04 - Zhi Liu was added as committer on 2020-02-27 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 179 subscribers (increase 6): - dev@carbondata.apache.org had a 1268% increase in traffic in the past quarter (1930 emails compared to 141): - issues@carbondata.apache.org: - issues@carbondata.apache.org had a 89% decrease in traffic in the past quarter (221 emails compared to 1938): - user@carbondata.apache.org: - 76 subscribers (no change): ## JIRA activity: - 75 issues opened in JIRA, past quarter (20% increase) - 40 issues closed in JIRA, past quarter (-14% change)
## Description: - The Apache CarbonData is an indexed columnar store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Apache CarbonData has finished integration with Apache spark, Apache Flink, Apache Kafka, Presto etc. - The community released 2.1.1 , Some key features and improvements as belows: - Geospatial index algorithm improvement and UDFs enhancement. - Adding global sort support for Second Index segments data file merge operation. - Refactor CarbonDataSourceScan without Spark filter - Size control of minor compaction - Clean files become data trash manager - Fix error when loading string field with high cardinality(local dictionary fallback issue) - We already worked out two topics and already submitted CFP for ApacheCon Asia. ## Health Report: - Commit activity: - 46 commits in the past quarter (-31% decrease) - 14 code contributors in the past quarter (-22% decrease) - GitHub PR activity: - 41 PRs opened on GitHub, past quarter (-54% decrease) - 45 PRs closed on GitHub, past quarter (-52% decrease) ## Releases: - 2.1.1 was released on 2021-03-29 - 2.1.0 was released on 2020-11-12. - 2.0.1 was released on 2020-06-01. - 2.0.0 was released on 2020-05-20. - 1.6.1 was released on 2019-10-25. ## Project Composition: - There are currently 26 committers and 16 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:4. ## Community changes, past quarter: - Akash R Nilugal was added to the PMC on 2021-04-11 - Ajantha Bhat U was added to the PMC on 2020-11-15 - Indhumathi was added as committer on 2020-10-02 - Kunal Kapoor was added to the PMC on 2020-03-29 - Tao Li was added as committer on 2020-02-04 - Zhi Liu was added as committer on 2020-02-27 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 179 subscribers (increase 6): - dev@carbondata.apache.org had a 253% increase in traffic in the past quarter (368 emails compared to 104) - issues@carbondata.apache.org: - issues@carbondata.apache.org had a 50% decrease in traffic in the past quarter (1881 emails compared to 3732) - user@carbondata.apache.org: - 76 subscribers (no change): ## JIRA activity: - 57 issues opened in JIRA, past quarter (-20% decrease) - 47 issues closed in JIRA, past quarter (-33% decrease)
## Description: - The Apache CarbonData is an indexed columnar store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Apache CarbonData has finished integration with Apache spark, Apache Flink, Apache Kafka, Presto etc. - The community released 2.1.0 , Some key features and improvements as belows: - Support Float and Decimal in the Merge Flow - Implement delete and update feature in carbondata SDK. - Support array<string> with SI - Support IndexServer with Presto Engine - Insert from stage command support partition table. - Implementing a new Reindex command to repair the missing SI Segments - Support Change Column Comment - Presto complex type read support - SI global sort support - We organized two online discussion for planing 2021 feature list. ## Health Report: - Commit activity: - 75 commits in the past quarter (-31% decrease) - 17 code contributors in the past quarter (-19% decrease) - GitHub PR activity: - 98 PRs opened on GitHub, past quarter (-28% decrease) - 103 PRs closed on GitHub, past quarter (-25% decrease) ## Releases: - 2.1.0 was released on 2020-11-12. - 2.0.1 was released on 2020-06-01. - 2.0.0 was released on 2020-05-20. - 1.6.1 was released on 2019-10-25. ## Project Composition: - There are currently 26 committers and 15 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:4. ## Community changes, past quarter: - Ajantha Bhat U was added to the PMC on 2020-11-15 - Indhumathi was added as committer on 2020-10-02 - Kunal Kapoor was added to the PMC on 2020-03-29 - Tao Li was added as committer on 2020-02-04 - Zhi Liu was added as committer on 2020-02-27 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 178 subscribers (reduce 6): - dev@carbondata.apache.org had a 35% increase in traffic in the past quarter (104 emails compared to 113) - issues@carbondata.apache.org: - issues@carbondata.apache.org had a 15% decrease in traffic in the past quarter (3746 emails compared to 4400) - user@carbondata.apache.org: - 76 subscribers (no change): ## JIRA activity: - 76 issues opened in JIRA, past quarter (-41% decrease) - 85 issues closed in JIRA, past quarter (-26% decrease)
## Description: - The Apache CarbonData is an indexed columnar store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Apache CarbonData has finished integration with Apache spark, Apache Flink, Apache Kafka, Presto etc. - we are preparing 2.1.0, this version, Some key features and improvements in this release: - Support Float and Decimal in the Merge Flow - Implement delete and update feature in carbondata SDK. - Support array<string> with SI - Support IndexServer with Presto Engine - Insert from stage command support partition table. - Implementing a new Reindex command to repair the missing SI Segments - Support Change Column Comment - David Caiqiang(PMC) made a good presentation in ApacheCon on 29th Sep, he detailedly introduced 《New Features of Apache CarbonData 2.0》 ## Health Report: - Commit activity: - 114 commits in the past quarter (8% increase) - 21 code contributors in the past quarter (-12% decrease) - GitHub PR activity: - 136 PRs opened on GitHub, past quarter (-2% decrease) - 138 PRs closed on GitHub, past quarter (17% increase) ## Releases: - The community is preparing 2.1.0 , current is RC1 . - 2.0.1 was released on 2020-06-01. - 2.0.0 was released on 2020-05-20. - 1.6.1 was released on 2019-10-25. ## Project Composition: - There are currently 25 committers and 14 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:4. ## Community changes, past quarter: - Indhumathi was added as committer on 2020-10-02 - Kunal Kapoor was added to the PMC on 2020-03-29 - Tao Li was added as committer on 2020-02-04 - Zhi Liu was added as committer on 2020-02-27 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 186 subscribers (up 5 in the last 3 months): - dev@carbondata.apache.org had a 35% increase in traffic in the past quarter (126 emails compared to 93) - issues@carbondata.apache.org: - issues@carbondata.apache.org had a 76% increase in traffic in the past quarter (4323 emails compared to 2448): - user@carbondata.apache.org: - 76 subscribers (no change): ## JIRA activity: - 127 issues opened in JIRA, past quarter (no change) - 115 issues closed in JIRA, past quarter (55% increase)
## Description: - The Apache CarbonData is an indexed columnar store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Besides integration with Apache spark, now CarbonData can integrate with Apache Flink, Apache Kafka, Presto for further extending the ecosystem. - In the past 3 months, we are focusing on preparing 2.0 and 2.0.1 releases, which provided many significant features, such as : Support pre priming cache in Index cache server, Carbon Extension for Spark 2.4/3.0 without Carbon Session, MV Time-series support with Rollup support, multiple granularity, Supports the spatial index Data-map, Support Secondary Index, Support CDC merge functionality, Support Flink streaming write to carbon, Hive leverage the index for query performance enhancement, Hive Write support, Support for latest stable spark 2.4.5 version, Support prestodb-0.217 and prestosql-316, Insert into performance improvement, Optimize Bucket Table, pycarbon support for AI cases, Materialized view on all table. - Community organized online webinar for 2.0 release on 3rd June, 2020 . ## Health Report: - Commit activity: - 131 commits in the past quarter (1% decrease) - 35 code contributors in the past quarter (6% increase) - GitHub PR activity: - 156 PRs opened on GitHub, past quarter (-13% decrease) - 135 PRs closed on GitHub, past quarter (3% increase) ## Releases: - 2.0.1 was released on 2020-06-01. - 2.0.0 was released on 2020-05-20. - 1.6.1 was released on 2019-10-25. ## Project Composition: - There are currently 25 committers and 14 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:4. ## Community changes, past quarter: - Kunal Kapoor was added to the PMC on 2020-03-29 - Tao Li was added as committer on 2020-02-04 - Zhi Liu was added as committer on 2020-02-27 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 181 subscribers (up 10 in the last 3 months): - dev@carbondata.apache.org had a 28% increase in traffic in the past quarter (99 emails compared to 77) - issues@carbondata.apache.org: - 4305 emails sent to list - user@carbondata.apache.org: - 76 subscribers (up 3 in the last 3 months): ## JIRA activity: - 105 JIRA tickets created in the last 3 months (3% increase) - 79 JIRA tickets closed/resolved in the last 3 months(10% increase)
## Description: - The Apache CarbonData is an indexed columnar store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Flink among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Board comment in the last report: druggeri: The previous quarter's report indicated that there are potential PMC additions in the pipeline. Is anything preventing growth there? It's cool to see the Flink integration making it into the report! Re: Nothing preventing, just waiting for new PMC candidate end to end complete some contributions as per mailing list discussion of community. we started VOTE for new candidate after the merit is enough. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Besides integration with Apache spark, now CarbonData can integrate with Apache Flink for further extending the ecosystem. - In the past 3 months, we are focusing on preparing 2.0 releases, which provided many significant features, such as : Support pre priming cache in Index cache server, Carbon Extension for Spark 2.4/3.0 without Carbon Session, MV Time-series support with Rollup support, multiple granularity, Supports the spatial index Data-map, Support Secondary Index, Support CDC merge functionality, Support Flink streaming write to carbon, Hive leverage the index for query performance enhancement, Hive Write support, Support for latest stable spark 2.4.5 version, Support prestodb-0.217 and prestosql-316, Insert into performance improvement, Optimize Bucket Table, pycarbon support for AI cases, Materialized view on all table. - Community organized online webinar on 5th Mar, 2020 . ## Health Report: - Commit activity: - 122 commits in the past quarter (-3% decrease) - 30 code contributors in the past quarter (11% increase) - GitHub PR activity: - 133 PRs opened on GitHub, past quarter (-14% decrease) - 155 PRs closed on GitHub, past quarter (4% increase) ## Releases: - 1.5.4 was released on June 10 2019 - 1.6.0 was released on August 29 2019 - 1.6.1 was released on October 25 2019 - 2.0 is preparing rc1 ## Project Composition: - There are currently 25 committers and 14 PMC members in this project. - The Committer-to-PMC ratio is roughly 7:4. ## Community changes, past quarter: - Kunal Kapoor was added to the PMC on 2020-03-29 - Tao Li was added as committer on 2020-02-04 - Zhi Liu was added as committer on 2020-02-27 ## Notable mailing list trends:Mailing list activity stays at a high level - dev@carbondata.apache.org: - 181 subscribers (up 10 in the last 3 months): - 83 emails sent to list (138 in previous quarter) - issues@carbondata.apache.org: - 4943 emails sent to list (4277 in previous quarter) - user@carbondata.apache.org: - 76 subscribers (up 3 in the last 3 months): ## JIRA activity: - 113 JIRA tickets created in the last 3 months (3% increase) - 84 JIRA tickets closed/resolved in the last 3 months(10% increase)
## Description: - The Apache CarbonData is an indexed columnar store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache hive among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Besides integration with Apache spark, now CarbonData can integrate with Apache Flink for further extending the ecosystem. - We are focusing on the releases(2.0,1.6.1,1.6.0) in the last 3 months, which provided many significant features, such as : Support Page Level Bloom Filter, Support Time Series for MV datamap and autodatamap loading of timeseries datamaps, Support for Geospatial indexing etc. - Finished one topic in big data conference in China on 7th Dec, 2019 . ## Health Report: - Commit activity: - 125 commits in the past quarter (15% increase) - 27 code contributors in the past quarter (68% increase) - GitHub PR activity: - 156 PRs opened on GitHub, past quarter (75% increase) - 145 PRs closed on GitHub, past quarter (83% increase) ## Releases: - 1.5.3 was released on April 10 2019 - 1.5.4 was released on June 10 2019 - 1.6.0 was released on August 29 2019 - 1.6.1 was released on October 25 2019 ## PMC changes: - Currently 13 PMC members. - No new PMC members in the past quarter, some candidate in the pipeline ## Committer base changes: - Currently 23 committers. - kevinjmh was added as committer on 2019-08-27 - Ajantha Bhat was added as committer on 2019-10-04 - Some contributors are actively contributing to 2.0 release ## Mailing list activity: - Mailing list activity stays at a high level - dev@carbondata.apache.org: - 171 subscribers (down -18 in the last 3 months): - 143 emails sent to list (63 in previous quarter) - issues@carbondata.apache.org: - 10 subscribers (down -1 in the last 3 months): - 4542 emails sent to list (5095 in previous quarter) - user@carbondata.apache.org: - 73 subscribers (down -5 in the last 3 months): - 12 emails sent to list (6 in previous quarter) ## JIRA activity: - 110 JIRA tickets created in the last 3 months - 75 JIRA tickets closed/resolved in the last 3 months
## Description: - The Apache CarbonData is an indexed columnar store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, Apache Hive among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Besides integration with Apache spark, now CarbonData can integrate with Apache Hive, presto, alluxio for further extending the ecosystem. - We are focusing on the releases(2.0,1.5.3,1.5.4,1.6.0) in the last 3 months, which provided many significant features, such as : Support Map data type reading through hive, Support to write long string for streaming table, index server enhancement, support mixed data format in carbon, support segment move command in carbon etc. - Finished one topic in big data submit in China on 20th Sep, 2019 . ## Health Report: - The project is healthy, community keep active in all the various categories(dev mailing list, JIRAs, and pull requests). ## Releases: - 1.5.3 was released on April 10 2019 - 1.5.4 was released on June 10 2019 ## PMC changes: - Currently 13 PMC members. - Chuanyin Xu was added to the PMC on Mon Dec 31 2018 ## Committer base changes: - Currently 21 committers. - kevinjmh was added as committer on 2019-08-27 - Ajantha Bhat was added as committer on 2019-10-04 ## Mailing list activity: - Mailing list activity stays at a high level - dev@carbondata.apache.org: - 171 subscribers (down -18 in the last 3 months): - 149 emails sent to list (63 in previous quarter) - issues@carbondata.apache.org: - 10 subscribers (down -1 in the last 3 months): - 3061 emails sent to list (5095 in previous quarter) - user@carbondata.apache.org: - 73 subscribers (down -5 in the last 3 months): - 12 emails sent to list (6 in previous quarter) ## JIRA activity: - 115 JIRA tickets created in the last 3 months - 42 JIRA tickets closed/resolved in the last 3 months
## Description: - The Apache CarbonData is an indexed columnar store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Besides integration with Apache spark, now CarbonData can integrate with Apache Hive, presto, alluxio for further extending the ecosystem. - We are focusing on the releases(1.5.3,1.5.4,1.6.0) in the last 3 months, which provided many significant features, such as : Support Configurable Page Size, Support Binary Data Type, Supported Compaction on Range Sorted Segments , support gzip compressor to get better compression ratio etc. - Meetupbe organized in China on 5th June, 2019 . ## Health Report: - The project is healthy, community keep active in all the various categories(dev mailing list, JIRAs, and pull requests). ## Releases: - 1.5.3 was released on April 10 2019 - 1.5.4 was released on June 10 2019 ## PMC changes: - Currently 12 PMC members. - Chuanyin Xu was added to the PMC on Mon Dec 31 2018 - No new PMC members added in the last 3 months, the community is planning to invite some new PMC as per their merit contribution for project. ## Committer base changes: - Currently 21 committers. - Akash R Nilugal was added as a committer on Fri May 03 2019 ## Mailing list activity: - Mailing list activity stays at a high level - dev@carbondata.apache.org: - 171 subscribers (down -18 in the last 3 months): - 63 emails sent to list (177 in previous quarter) - issues@carbondata.apache.org: - 10 subscribers (down -1 in the last 3 months): - 5111 emails sent to list (3991 in previous quarter) - user@carbondata.apache.org: - 73 subscribers (down -5 in the last 3 months): - 6 emails sent to list (9 in previous quarter) ## JIRA activity: - 122 JIRA tickets created in the last 3 months - 93 JIRA tickets closed/resolved in the last 3 months
## Description: - The Apache CarbonData is an indexed columnar store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Besides integration with Apache spark, now CarbonData can integrate with Apache Hive, presto, alluxio for further extending the ecosystem. - We are focusing on the releases(1.5.1, 1.5.2, 1.5.3) in the last 3 months, which provided many significant features, such as : improve scan performance, improve the compaction feature, further enhance the MV feature, support gzip compressor to get better compression ratio etc. - Kindly find the published TPCH Report of CarbonData (1.5.2) and ORC, CarbonData (1.5.2) and Parquet on Presto 2.10 and Spark 2.3.2 respectively https://cwiki.apache.org/confluence/display/CARBONDATA/CarbonData+Performance+Reports - Meetup(Carbondata+Spark) be organized in China on 1st Mar, 2019 . ## Health Report: - The project is healthy, community keep active in all the various categories(dev mailing list, JIRAs, and pull requests). ## Releases: - 1.5.0 was released on Tue Oct 16 2018 - 1.5.1 was released on Wed Dec 05 2018 - 1.5.2 was released on Mon Feb 04 2019 ## PMC changes: - Currently 12 PMC members. - Chuanyin Xu was added to the PMC on Mon Dec 31 2018 ## Committer base changes: - Currently 20 committers. - Bo Xu was added as a committer on Sat Dec 08 2018 - CarbonData community is voting for new committer. ## Mailing list activity: - Mailing list activity stays at a high level - dev@carbondata.apache.org: - 188 subscribers (up 6 in the last 3 months): - 969 emails sent to list in the last 3 months - issues@carbondata.apache.org: - 11 subscribers (up 2 in the last 3 months): - 4033 emails sent to list (7687 in the previous cycle) - user@carbondata.apache.org: (actually, many users get used to discussing in dev mailinglist) - 78 subscribers (up 9 in the last 3 months): - 9 emails sent to list ## JIRA activity: - 114 JIRA tickets created in the last 3 months - 83 JIRA tickets closed/resolved in the last 3 months
## Description: - The Apache CarbonData is an indexed columnar store solution for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Board comment in the last report: rb : A reprise of a comment from an earlier month: You refer to active contributors from 30+ organizations, and 130 contributors, but you then mention only 19 committers. Who are these contributors, and are you actively considering which of them may become committers? Don't make the mistake of setting the bar so high that these contributors go away feeling that they have not received the recognition that they've earned! reply : Thanks for the kind reminder. Sure, we will actively identify the active contributors to become committers. Last month,the CarbonData community had 1 new committer and 1 new PMC. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Community is pretty active, contributors who are from many different organizations , and now the number of contributors is more than 140+, we are receiving around 200 pull requests per month. - We are focus on the two releases(1.5.1 and 1.5.2) in the last 3 months, which provided many significant features, such as : integration with spark 2.3.2 and hadoop 3.1.1, improve scan performance, support modify column name after table created, support C++ Interfaces with multi-threads to Query Data from CarbonData, further optimize complex data type, support gzip compressor to get better compression ratio etc. - The community helped many users to test carbondata and deploy carbondata in production via mailing list. - Meetup(Carbondata+Spark) be organized in shenzhen on 1st Dec, 2018 , the onsite attendees is more than 100 persons, the online attendees(internet) is more than 300 persons. ## Health Report: - The project is healthy, community keep active in all the various categories(dev mailing list, JIRAs, and pull requests). ## Releases: - 1.5.0 was released on Tue Oct 16 2018 - 1.5.1 was released on Wed Dec 05 2018 ## PMC changes: - Currently 12 PMC members. - Chuanyin Xu was added to the PMC on Mon Dec 31 2018 ## Committer base changes: - Currently 20 committers. - Bo Xu was added as a committer on Sat Dec 08 2018 ## Mailing list activity: - Mailing list activity stays at a high level - dev@carbondata.apache.org: - 181 subscribers (up 3 in the last 3 months): - 301 emails sent to list in the last 3 months - issues@carbondata.apache.org: - 9 subscribers (up 0 in the last 3 months): - 7922 emails sent to list (9974 in the previous cycle) - user@carbondata.apache.org: (actually, many users get used to discussing in dev mailinglist) - 68 subscribers (up 1 in the last 3 months): - 10 emails sent to list ## JIRA activity: - 236 JIRA tickets created in the last 3 months - 160 JIRA tickets closed/resolved in the last 3 months
## Description: - The Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Community is pretty active, contributors who are from 30+ different organizations , and now the number of contributors is more than 130+, we are receiving around 200 pull requests per month. - We completed the two releases(1.4.1, and 1.5.0) in the last 3 months, which provided many significant features, such as : support integration with spark 2.3.2 and hadoop 3.1.1, support local dictionary, support C++ Interfaces to Query Data from CarbonData, support map data type, support ZSTD to get better compression ratio etc. - The community helped many users to deploy carbondata in production via mailing list. - We organized two meetup in the past 3 months(3rd Aug, 8th Sep), in meetup we communicated the experience of how to use CarbonData, how to contribute to CarbonData , what is the next plan of CarbonData community. ## Health Report: - The project is healthy, community keep active in all the various categories(dev mailing list, JIRAs, and pull requests). - We added 1 new committers in Sep, 2018. - No new PMC members in the last 3 months. some active committers will be considered as new PMC. ## Releases: - 1.3.0 was released on Fri Mar 02 2018 - 1.3.1 was released on Mon Mar 12 2018 - 1.4.0 was released on Thu May 31 2018 - 1.4.1 was released on Wed Aug 15 2018 - 1.5.0 is voting for rc ## PMC changes: - Currently 11 PMC members. - No new PMC members in the last 3 months. - New PMC members: - Kumar Vishal was added to the PMC on Tue Jan 09 2018 - David Cai was added to the PMC on Tue Jan 09 2018 ## Committer base changes: - Currently 19 committers. - Raghunandan S was added as a committer on Wed Sep 26 2018 ## Mailing list activity: - Mailing list activity stays at a high level - dev@carbondata.apache.org: - 178 subscribers (down -1 in the last 3 months): - 294 emails sent to list - issues@carbondata.apache.org: - 9 subscribers (up 0 in the last 3 months): - 10199 emails sent to list (9967 in the previous cycle) - user@carbondata.apache.org: (actually, many users get used to discussing in dev mailinglist) - 67 subscribers (down -2 in the last 3 months): - 10 emails sent to list ## JIRA activity: - 285 JIRA tickets created in the last 3 months - 271 JIRA tickets closed/resolved in the last 3 months
## Description: - The Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Community is pretty active, contributors who are from 30+ different organizations , and now the number of contributors is more than 120+, we are receiving around 150-200 pull requests per month. - We completed the milestone release(1.4.0) in the last 3 months, which provided many significant features, such as : improve load performance 300%, support external table with location, support cloud storage, Supports Streaming on Pre-Aggregate Table ,support materialized view, support bloom filter, support lucene for like query etc. - Many users have tried 1.4.0 for better performance, as per the mailing list group discussion, we observed that most of them focused on using these features: streaming on pre-aggregation, MV, lucene etc. - We are doing for version 1.5.0 and 1.4.1 : further optimize MV, fix complex data type issues, improve local dictionary, provide merge index. - We organized one meetup in the past 3 months(22th May), the meetup focus on helping users to deploy CarbonData. - Liang Chen make one presentation in educational big data conference, Apache CarbonData can be considered as unified storage platform to collect the live class data for supporting data analytics. ## Health Report: - The project is healthy, community keep active in all the various categories(dev mailing list, JIRAs, and pull requests). - We added 2 new committers in May, 2018. - No new PMC members in the last 3 months. some active committers and contributors are contributing in 1.5.0, might be considered as new PMC and new committers in future. ## Releases: - 1.3.0 was released on Fri Mar 02 2018 - 1.3.1 was released on Mon Mar 12 2018 - 1.4.0 was released on Thu May 31 2018 ## PMC changes: - Currently 11 PMC members. - New PMC members: - Kumar Vishal was added to the PMC on Tue Jan 09 2018 - David Cai was added to the PMC on Tue Jan 09 2018 ## Committer base changes: - Currently 16 committers. - Chuanyin Xu was added as a committer on Tue May 01 2018 - Zhichao Zhang was added as a committer on Tue May 01 2018 ## Mailing list activity: - Mailing list activity stays at a high level - dev@carbondata.apache.org: - 177 subscribers (down -1 in the last 3 months): - 167 emails sent to list - issues@carbondata.apache.org: - 9 subscribers (up 0 in the last 3 months): - 10522 emails sent to list (9967 in the previous cycle) - user@carbondata.apache.org: - 69 subscribers (up 4 in the last 3 months): - 15 emails sent to list (12 in the previous cycle) ## JIRA activity: - 374 JIRA tickets created in the last 3 months - 238 JIRA tickets closed/resolved in the last 3 months
## Description: - The Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Community is pretty active, contributors who are from 30+ different organizations , and now the number of contributors is more than 120+, we are receiving around 100-200 pull requests per month. - We completed two releases(1.3.0 and 1.3.1) in the last 3 months. - Many users have upgraded to 1.3.1 for better performance, as per the mailing list group discussion, we observed that most of them focused on using these features: spark 2.2.1 integration , pre-aggregation, partition optimization, streaming ingestion. - We are doing for version 1.4.0 : firstly we are continuing to improve the above mentioned features, such as : support partition with pre-aggregation, support streaming with pre-aggregation. secondly we are doing CarbonData support cloud storage, like AWS S3 or other cloud object storage. - We organized one meetup in the past 3 months(23th Mar), the meetup focus on helping users to upgrade to CarbonData 1.3.1. ## Health Report: - The project is healthy, community keep active in all the various categories(dev mailing list, JIRAs, and pull requests). - We added 2 new PMC and 1 committer in Jan,2018. ## Releases: - 1.3.0 was released on Fri Mar 02 2018 - 1.3.1 was released on Mon Mar 12 2018 ## PMC changes: - Currently 11 PMC members. - New PMC members: - Kumar Vishal was added to the PMC on Tue Jan 09 2018 - David Cai was added to the PMC on Tue Jan 09 2018 ## Committer base changes: - Currently 16 committers. - No new committers added in the last 3 months - Last committer addition was Kunal Kapoor at Sat Jan 06 2018 ## Mailing list activity: - Mailing list activity stays at a high level - dev@carbondata.apache.org: - 178 subscribers (up 8 in the last 3 months): - 150 emails sent to list (250 in previous quarter) - issues@carbondata.apache.org: - 9 subscribers (up 0 in the last 3 months): - 10583 emails sent to list (10026 in previous quarter) - user@carbondata.apache.org: - 65 subscribers (up 5 in the last 3 months): - 14 emails sent to list (15 in previous quarter) ## JIRA activity: - 322 JIRA tickets created in the last 3 months - 288 JIRA tickets closed/resolved in the last 3 months
## Description: - The Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Board comment in the last month report: - rb: I'm fascinated by the statement that you helped 5 companies deploy CarbonData. Was this via the mailing lists, or was this more direct involvement? - Reply: All these discussions and help actions only be supported via mailing list ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Community is pretty active, contributors who are from 20+ different organizations , and now the number of contributors is more than 100+, we are receiving around 100-200 pull requests per month. - A new release 1.3.0 is preparing rc1, plan to finish this release in January. - In 1.3.0, there are many good features(supports spark 2.2.x integration, streaming ingestion, partition optimization, pre-aggregation etc.), especially "pre-agg feature" can improve performance 10+ times for some "group by scenarios". - We organized 1 time meetup in the past 3 months(19th Nov), the meetup focus on helping users to use CarbonData(How to use index feature, dictionary feature, update and delete feature, etc.) - Liang attended Chinese yearly open source conference in Nov, shared CarbonData story in Apache. ## Health Report: - The project is healthy, community keep active in all the various categories(dev mailing list, JIRAs, and pull requests). - We added 2 new PMC and 1 committer in Jan,2018, because all of them did very good contributions in 1.3.0 version. ## Releases: - Apache CarbonData 1.1.1 released on 2017-07-10 - Apache CarbonData 1.2.0 released on 2017-09-28 ## PMC changes: - Kumar Vishal was added to the PMC on Tue Jan 09 2018 - David Cai was added to the PMC on Tue Jan 09 2018 - Currently 11 PMC members ## Committer base changes: Currently 16 committers, 1 new committers added in the past quarter: - Kunal Kapoor was added as a committer on Sat Jan 06 2018 - Currently 16 committers ## Mailing list activity: - Mailing list activity stays at a high level - dev@carbondata.apache.org: - 170 subscribers (up 4 in the last 3 months): - 264 emails sent to list (244 in previous quarter) ## JIRA activity: - 459 JIRA tickets created in the last 3 months - 320 JIRA tickets closed/resolved in the last 3 months
Report from the Apache CarbonData Project [Liang Chen] ## Description: - The Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Community is pretty active, the number of contributors who are from 20+ different organizations is more than 100, we are receiving around 100-200 pull requests per month. - A new release 1.2.0 be completed in Sep. - We added more tools to ensure code quality, likes: find-bugs, coverage, cluster CI etc. - In dev mailing list, we have started 1.3.0 scope discussion. In 1.3.0, there are some significant feature(supports spark 2.0, streaming ingestion,partition optimization,pre-aggregation etc.) - We are optimizing API for easier integration with other big data project(Presto, Hive etc.) - We organized 2 times meetup in the past 3 months, the meetup on 2nd Sep has more than 300 persons attended it. - We helped 5 companies deployed CarbonData in production in the past 3 months. All these discussions and help actions be supported in mailing list and apache JIRA. ## Health Report: - The project is healthy, community keep active in all the various categories(dev mailing list, JIRAs, and pull requests). - There are 5 potential new committers who will work at 1.3.0 release. ## Releases: - Apache CarbonData 1.1.1 released on 2017-07-10 - Apache CarbonData 1.2.0 released on 2017-09-28 ## PMC changes: - No new PMC members in the last 3 months. - Last PMC addition: Mon May 22 2017 (Ravindra Pesala). - Currently 9 PMC members. ## Committer base changes: Currently 15 committers, two new committers added in the past quarter: - Lionel Cao was added as a committer on Wed Sep 13 2017 - Manish Gupta was added as a committer on Thu Aug 24 2017 ## Mailing list activity: - Mailing list activity stays at a high level - dev@carbondata.apache.org: - 165 subscribers (up 9 in the last 3 months): - 254 emails sent to list (790 in previous quarter) ## JIRA activity: - 261 JIRA tickets created in the last 3 months - 176 JIRA tickets closed/resolved in the last 3 months
## Description: - The Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Community is pretty active, we are receiving around 100-200 pull requests per month from community contributors. - A new patch release 1.1.1 be completed on 10th July - Most of contributors are working on the next major release Apache CarbonData 1.2.0, there are some significant feature(partition, sort column, etc), these feature would further improve performance and usability. - We are abstracting and refactoring index framework(datamap), aim to let users to extend other more index techniques(for example : lucene for text data to fast search) - We are optimizing API for easier integrating with other big data project (Beam, Presto, Hive, Flink etc.) - We are optimizing test cases, to add hadoop and spark cluster test cases - Liang made a presentation in L3C conference on 21st June. - We plan 3 meetups in the 2nd half of 2017 : Shanghai Meetup in Sep, Bangalore Meetup in Oct, Bay area Meetup in Nov/Dec ## Health Report: - The project is healthy, community keep active in all the various categories(dev mailing list, JIRAs, and pull requests). - There are 3 potential new committers who are working on partition feature, update&delete feature. ## Releases: - Apache CarbonData 1.1.0 released on 2017-05-16 - Apache CarbonData 1.1.1 released on 2017-07-10 ## PMC changes: - Ravindra Pesala was added to the PMC on Mon May 22 2017 - Currently 9 PMC members ## Committer base changes: Currently 13 committers, two new committers added in the past quarter: - hexiaoqiao was added as a committer on 2017-02-21. - qiangcai was added as a committer on 2017-05-09. ## Mailing list activity: - Mailing list activity stays at a high level
## Description: - The Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Board comment in the last month report: - For future reports, please include the date of the last PMC addition (as well as the last committer addition) Re: this month report already added these information. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - we finished 1st release as top level project, the apache carbondata 1.1.0 is a milestone release with V3 format to improve 50% performance for aggregation cases. - Liang Chen gave a talk to introduce Apache CarbonData during ApacheCon in Miami. - Jackylk present "CarbonData with SparkSQL" practices at China Spark summit on 2017-05-19. - We are prepare apache carbondata 1.1.1 and apache carbondata 1.2.0 - An important work is in progress of providing index framework for users to extend more index, for example : integrate with Apache Lucene for search data. ## Health Report: - The project is healthy, community keep active in all the various categories(dev mailing list, JIRAs, and pull requests). ## Releases: - Apache CarbonData 1.1.0 released on 2017-05-16 ## PMC changes: - Currently 9 PMC members. ## Committer base changes: Currently 13 committers, two new committers added in the past quarter: - hexiaoqiao was added as a committer on 2017-02-21. - qiangcai was added as a committer on 2017-05-09. ## Mailing list activity: dev@carbondata.apache.org: - 152 subscribers (up 34 in the last 3 months). - 862 emails sent in the past 3 months, 758 in the previous cycle issues@carbondata.apache.org: - 7 subscribers (up 1 in the last 3 months). - 4951 emails sent in the past 3 months, 4266 in the previous cycle user@carbondata.apache.org: - 35 subscribers (up 29 in the last 3 months) (27 emails sent in the past 3 months, 0 in the previous cycle) ## JIRA activity: - 393 JIRA tickets created in the last 3 months - 266 JIRA tickets closed/resolved in the last 3 months
The Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data, with the aim of using a unified file format to satisfy all kinds of data analysis cases. ## Issues: - There are no new issues requiring board attention at this time. ## Activity: - Apache CarbonData graduated on April 19, 2017. Prepared press work to announce the graduation, content work to remove incubation disclaimers, and coordination with Infra for the necessary adjustments there. - Nominated 1 committer and 1 PMC in private mailing list. - Preparing Apache CarbonData 1.1.0, will release in May. - Jackyli and Jihongma presented Apache CarbonData at Spark Summit East on 2017-02-08. - Liang Chen presented Apache CarbonData at beijing big data meetup on 2017-05-05. - Liang Chen will present Apache CarbonData at ApacheCon on 2017-05-16. - Jackylk will presente Apache CarbonData at China Spark summit on 2017-05-19. ## Health Report: - The project is healthy, community keep active in all the various categories(dev mailing list, JIRAs, and pull requests). ## Releases: - Apache CarbonData 1.0.0-incubating released on 2017-01-29 - Have started the new release Apache CarbonData 1.1.0, have submited RC2 for PMC vote. ## PMC changes: - Currently 8 PMC members. We are discussing for a new PMC candidate. ## Committer base changes: Currently 13 committers, two new committers added in the past quarter: - hexiaoqiao was added as a committer on 2017-02-21. - qiangcai was added as a committer on 2017-05-09. ## Mailing list activity: dev@carbondata.apache.org: - 141 subscribers (up 50 in the last 3 months). - 654 emails sent to list in March, April, May (672 in previous quarter). issues@carbondata.apache.org: - 7 subscribers (up 1 in the last 3 months). - 3552 emails sent to list in March, April, May (3964 in previous quarter). ## JIRA activity: - 320 JIRA tickets created in March, April, May. - 268 JIRA tickets closed/resolved in the last 3 months.
WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to an indexed columnar data format for fast analytics on big data platforms. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache CarbonData Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache CarbonData Project be and hereby is responsible for the creation and maintenance of software related to an indexed columnar data format for fast analytics on big data platforms; and be it further RESOLVED, that the office of "Vice President, Apache CarbonData" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache CarbonData Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache CarbonData Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache CarbonData Project: * Liang Chen <chenliang613@apache.org> * Jean-Baptiste Onofré <jbonofre@apache.org> * Henry Saputra <hsaputra@apache.org> * Uma Maheswara Rao G <umamahesh@apache.org> * Jihong Ma <jihongma@apache.org> * Jacky Li <jackylk@apache.org> * Vimal Das Kammath <vimaldas@apache.org> * Heng Qiu <jarray888@apache.org> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Liang Chen be appointed to the office of Vice President, Apache CarbonData, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the initial Apache CarbonData PMC be and hereby is tasked with the creation of a set of bylaws intended to encourage open development and increased participation in the Apache CarbonData Project; and be it further RESOLVED, that the Apache CarbonData Project be and hereby is tasked with the migration and rationalization of the Apache Incubator CarbonData podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator CarbonData podling encumbered upon the Apache Incubator Project are hereafter discharged. Special Order 7E, Establish the Apache CarbonData Project, was approved by Unanimous Vote of the directors present.
Apache CarbonData is a new Apache Hadoop native file format for faster interactive query using advanced columnar storage, index, compression and encoding techniques to improve computing efficiency, in turn it will help speedup queries an order of magnitude faster over PetaBytes of data. CarbonData has been incubating since 2016-06-02. Three most important issues to address in the move towards graduation: 1. Prepare a couple of new releases 2. Increase the communities 3. Prepare website Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? None How has the community developed since the last report? The community activity increased: many new users started to use and test CarbonData, we had more than 300 issues created till Nov; Two new finance enterprises have formally deployed CarbonData to their business system,and the query performance speeded up 10-70 times in comparison to old system (both are bank enterprise in China). We finished 2nd Meetup in Beijing on 29th Oct, and CarbonData has increased 10+ contributors in last month. How has the project developed since the last report? Code donation has been done and all resources have been created by INFRA(git, github mirror, mailing list, Jira, ...). We also created the Jenkins CI jobs, and preparing org website. We did the 2nd release (0.1.1-incubating) in Oct and we are preparing a new one(0.2.0) in Nov. We have finished 2 technical talks in Bay area with Databricks, Alluxio in last month for discussing ecosystem integration with Spark and Alluxio. Date of last release: 2016-10-10 When were the last committers or PMC members elected? We elected a new committer Kumar Vishal on 2016-10-15. Signed-off-by: [X](carbondata) Henry Saputra [X](carbondata) Jean-Baptiste Onofré [X](carbondata) Uma Maheswara Rao G
Apache CarbonData is a new Apache Hadoop native file format for faster interactive query using advanced columnar storage, index, compression and encoding techniques to improve computing efficiency, in turn it will help speedup queries an order of magnitude faster over PetaBytes of data. CarbonData has been incubating since 2016-06-02. Three most important issues to address in the move towards graduation: 1. Prepare new releases 2. Increase both dev and user communities 3. Publish and update website to promote CarbonData Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? None How has the community developed since the last report? The community activity increased: many new users started to use and test CarbonData, we had more than 100 new issues created during Aug; One user formally deployed CarbonData to their business system, the query efficiency speeded up 50 times (the user is the biggest internet finance company in China) How has the project developed since the last report? Code donation has been done and all resources have been created by INFRA (git, github mirror, mailing list, Jira, ...). We also created the Jenkins CI jobs. We did the first release (0.1.0-incubating) and we are preparing a new one. We're also working on the website (with improved look'n feel). We're also preparing some talks and presentations about CarbonData. Date of last release: 2016-08-27 When were the last committers or PMC members elected? 2016-07-15 Signed-off-by: [X](carbondata) Henry Saputra [X](carbondata) Jean-Baptiste Onofre [X](carbondata) Uma Maheswara Rao G
Apache CarbonData is a new Apache Hadoop native file format for faster interactive query using advanced columnar storage, index, compression and encoding techniques to improve computing efficiency, in turn it will help speedup queries an order of magnitude faster over PetaBytes of data. CarbonData has been incubating since 2016-06-02. Three most important issues to address in the move towards graduation: 1. Prepare first CarbonData release 2. Prepare first CarbonData website 3. Promote the project and grow user and dev community Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? None How has the community developed since the last report? We voted two new committers: Venkata Ramana and Ravindra Pesala. We are preparing the CarbonData website, blog posts and talks to promote CarbonData and grow the user and dev communities. The community activity increased: many new users started to use and test CarbonData, we had more than 100 new issues created during July. On the other hand, presentation material has been created and a first talk has been given: http://www.slideshare.net/liangchen18/apache-carbondatanew-high-performance-d ata-format-for-faster-data-analysis How has the project developed since the last report? Code donation has been done and all resources have been created by INFRA (git, github mirror, mailing list, Jira, ...). We also created the Jenkins CI jobs. We are now in the process of cleanup and polishing the build and legal to prepare the first release. Date of last release: Not yet available When were the last committers or PMC members elected? 2016-07-15 Signed-off-by: [ ](carbondata) Henry Saputra [X](carbondata) Jean-Baptiste Onofre [X](carbondata) Uma Maheswara Rao G
Apache CarbonData is a new Apache Hadoop native file format for faster interactive query using advanced columnar storage, index, compression and encoding techniques to improve computing efficiency, in turn it will help speedup queries an order of magnitude faster over PetaBytes of data. CarbonData has been incubating since 2016-06-02. Three most important issues to address in the move towards graduation: 1. Finalize code cleanup and code 2. Prepare releases 3. Grow up community Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? No How has the community developed since the last report? It's the first CarbonData report How has the project developed since the last report? We created the resources (git, github integration, Jira, ...). The code donation has been done, and the first PR merged. We are in the process of website creation (CWIKI requested) and creating the Jenkins CI jobs. Date of last release: XXXX-XX-XX When were the last committers or PMC members elected? N/A Signed-off-by: [X](carbondata) Henry Saputra [X](carbondata) Jean-Baptiste Onofre [ ](carbondata) Uma Maheswara Rao G Shepherd/Mentor notes: Jean-Baptiste Onofre: This report is the first CarbonData report. The activity is focused on the resources creation.