ApacheCon is Coming 9-12 Sept. 2019 - Las Vegas The Apache Software Foundation
Apache 20th Anniversary Logo

Community-led development "The Apache Way"

Apache Support Logo

This was extracted (@ 2020-10-21 21:10) from a list of minutes which have been approved by the Board.
Please Note The Board typically approves the minutes of the previous meeting at the beginning of every Board meeting; therefore, the list below does not normally contain details from the minutes of the most recent Board meeting.

Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).

Gobblin

21 Oct 2020

Report was filed, but display is awaiting the approval of the Board minutes.

15 Jul 2020

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

### Three most important unfinished issues to address before graduating:

 1. Review of maturity model and associated tasks (in progress).
 2. Address gaps identified on whimsy, podling namesearch (in progress).

### Are there any issues that the IPMC or ASF Board need to be aware of?
 No.

### How has the community developed since the last report?
 - Email stats since last report: dev@gobblin.incubator.apache.org : 410
 (May), 561 (June)
 - There have been 64 Commits since last report: git log --format='%ci' |
 grep -cE '((2020-0(5|6)))'
 - 41 ie. 64% of those commits were by non-committers: git log
 --format='%ae %ci' | grep -E '((2020-0(5|6)))' | cut -d ' ' -f 1 | sort |
 uniq -c | sort -n

### How has the project developed since the last report?
 - Owen O'Malley joined the Gobblin community as a mentor.
 - Discussion about graduation has started, and community is working
 towards it.
 - Two PPMC members were voted in.
 - Work on new release has started.

 On technical side:
 - Compaction suite was revamped to make action configurable.
 - Flow remove feature for Spec executors was added.
 - LogCopier was improved for long running jobs.
 - New API for proxy users in Azkaban.
 - Support for common properties in Helix job scheduler.
 - Hive Distcp support filter on partitioned or snapshot tables.
 - Generic wrapper producer client added for Kafka.
 - Autocommit added in JDBCWriters.
 - Metrics added in all SpecStore implementations.
 - Support in GobblinYarnAppLauncher to detach from Yarn app.
 - Support for overprovisioning Gobblin Yarn containers.
 - Enabled dataset cleaner to emit Kafka events.
 - Several other enhancements and bug fixes.

### How would you assess the podling's maturity?
 Please feel free to add your own commentary.

 - [ ] Initial setup
 - [ ] Working towards first release
 - [ ] Community building
 - [X] Nearing graduation
 - [ ] Other:

### Date of last release:

 2018-12-09 (work on new release has started)

### When were the last committers or PPMC members elected?

 Tamás Németh and Sudarshan Vasudevan for PPMC in June, 2020.

### Have your mentors been helpful and responsive?
 Yes.

### Is the PPMC managing the podling's brand / trademarks?
 Yes, but we have to perform podling name search.

### Signed-off-by:

 - [X] (gobblin) Jean-Baptiste Onofre
    Comments:
 - [ ] (gobblin) Olivier Lamy
    Comments:
 - [ ] (gobblin) Owen O'Malley
    Comments:

### IPMC/Shepherd notes:

20 May 2020

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

### Three most important unfinished issues to address before graduating:

 1. Revisit Apache Maturity Model assessment. [In progress since last
 report]
 2. Complete house-keeping tasks like revamp website, podling namesearch.
 [In progress since last report]

### Are there any issues that the IPMC or ASF Board need to be aware of?

 Yes, we were asked to report again this month since our mentors couldn't
 sign off the report. We would recommend IPMC or ASF Board to establish a
 documented process this situation.

### How has the community developed since the last report?

 * Email stats since last report: dev@gobblin.incubator.apache.org : 504
 (April), 79 (May, so far)
 * There have been 30 Commits since last report: git log --format='%ci' |
 grep -cE '((2020-0(4|5)))'
 * 17 ie. 56% of those commits were by non-committers: git log
 --format='%ae %ci' | grep -E '((2020-0(4|5)))' | cut -d ' ' -f 1 | sort |
 uniq -c | sort -n

### How has the project developed since the last report?
 * Support for common job properties in Helix job scheduler
 * New API for getting list of proxy users from Azkaban project
 * New API for adding proxy user to Azkaban project
 * Refresh capability in LogCopier for long running job use-cases
 * Back flow remove feature for Spec executors in DAG manager
 * Support for complete action configuration in Compaction suite
 * New metrics to measure job status state store performance
 * Orchestration delay reporter for Gobblin service flows
 * Dependency version upgrades for Helix, ORC, MySQL
 * Bug fixes in YarnService to use new token for new containers
 * Enhance HelixManager to reinitialize when Helx participant check happens
 * Enable close-on-flush for quality checker
 * Enable record count verification for ORC format
 * Add flow level data movement authorization in GaaS
 * OrcValueMapper schema evolution up-conversion support
 * Multiple bug fixes and optimizations

### How would you assess the podling's maturity?
 Please feel free to add your own commentary.

 - [ ] Initial setup
 - [ ] Working towards first release
 - [ ] Community building
 - [X] Nearing graduation
 - [ ] Other:

### Date of last release:

 2018-12-09

### When were the last committers or PPMC members elected?
 Kuai Yu in January 2020 and Lei Sun in February 2020

### Have your mentors been helpful and responsive?
 Yes, but they missed to sign off last quarterly report.

### Is the PPMC managing the podling's brand / trademarks?
 Yes, but we have to perform podling namesearch.

### Signed-off-by:

 - [X] (gobblin) Jean-Baptiste Onofre
    Comments:  I think the podling is close to graduation. Maybe worth to
    start a discussion.
 - [ ] (gobblin) Olivier Lamy
    Comments:
 - [ ] (gobblin) Jim Jagielski
    Comments:

### IPMC/Shepherd notes:
 Justin Mclean: If a report doesn't get sign off you need to report next
 month. This is documented incubator policy. I suggest you reach out to
 your mentors if you don't see sign-off on your report. The IPMC also
 notifies mentors of late reports or reports without sign offs.

15 Apr 2020

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

### Three most important unfinished issues to address before graduating:

 1. Revisit Apache Maturity Model assessment. [In progress since last
 report]
 2. Complete house-keeping tasks like revamp website, podling namesearch.
    [In progress since last report]

### Are there any issues that the IPMC or ASF Board need to be aware of?

 No.

### How has the community developed since the last report?
 * New committers Lei Sun (lesun) and Kuai Yu (kuyu).
 * Email stats since last report: user@gobblin.incubator.apache.org : 9
 dev@gobblin.incubator.apache.org : 1689
 * There have been 76 Commits since last report: git log
   --format='%ci' | grep -cE '((2020-0(1|2|3)))'
 * 43 ie. 56% of those commits were by non-committers: git log
   --format='%ae %ci' | grep -E '((2020-0(1|2|3)))' | cut -d ' ' -f 1 |
   sort | uniq -c | sort -n

### How has the project developed since the last report?
 * Handle orphaned Yarn containers in Gobblin-on-Yarn clusters
 * Track and report histogram of observed lag from Gobblin Kafka pipeline
 * Refresh flowgraph when templates are modified
 * HighLevelConsumer re-design by removing references to ConsumerConnector
 and KafkaStream
 * Add SFTP DataNode type in Gobblin-as-a-Service
 * Optimize unnecessary RPCs in distcp-ng
 * Supporting Avro logical type recognition in Avro-to-ORC transformation
 * Support for direct Avro and Protobuf formats through Parquet writer

### How would you assess the podling's maturity?
Please feel free to add your own commentary.

 - [ ] Initial setup
 - [ ] Working towards first release
 - [ ] Community building
 - [X] Nearing graduation
 - [ ] Other:

### Date of last release:

 2018-12-09

### When were the last committers or PPMC members elected?
 Kuai Yu in January 2020 and Lei Sun in February 2020

### Have your mentors been helpful and responsive?
 Yes.

### Is the PPMC managing the podling's brand / trademarks?
 Yes, but we have to perform podling namesearch.

### Signed-off-by:

 - [ ] (gobblin) Jean-Baptiste Onofre
    Comments:
 - [ ] (gobblin) Olivier Lamy
    Comments:
 - [ ] (gobblin) Jim Jagielski
    Comments:

### IPMC/Shepherd notes:

15 Jan 2020

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

### Three most important unfinished issues to address before graduating:

 1. Revisit Apache Maturity Model assessment. [In progress since last
 report]
 2. Ensure heavy contributors are awarded committership. [In progress
 since last report]
 3. Complete house-keeping tasks like revisiting website, podling
 namesearch. [In progress since last report]

### Are there any issues that the IPMC or ASF Board need to be aware of?

 No.

### How has the community developed since the last report?

 * 84% of commits were from non-committer contributors. (Active
contributors
 are being discussed for being voted as committers)
 * Healthy engagement and activity of committers and contributors.
 * Email stats since last report: user@gobblin.incubator.apache.org : 23
 dev@gobblin.incubator.apache.org : 2010
 * There have been 94 Commits since last report: git log --format='%ci' |
 grep -cE '((2019-1(0|1|2)))'
 * 79 ie. 84% of those commits were by non-committers: git log
 --format='%ae
 %ci' | grep -E '((2019-1(0|1|2)))' | cut -d ' ' -f 1 | sort | uniq -c |
 sort -n

### How has the project developed since the last report?
 * Add support to deploy GaaS in Azure.
 * Converter to eliminate recursion in Avro schemas.
 * Make token refresh mechanism pluggable for long running Gobblin-on-Yarn
 applications.
 * Refactor code for reporting Kafka Extractor stats to allow greater
 reuse.
 * Add support in GaaS to recognize Http and Hive based datasets.
 * Add multi-dataset support in GaaS to allow movement of multiple
 datasets in a single flow.
 * Add support to recognize datasets with Unix timestamp based versions
 for file based distcp.
 * Custom progress reporting from jobs running in MR mode to enable
 speculative execution.
 * Source-based PK chunking for the Salesforce connector to use a single
 PK chunking query to improve chunk distribution and conserve batch API
 calls.
 * Parquet support for complex types and support both apache parquet and
 twitter parquet

### How would you assess the podling's maturity?
Please feel free to add your own commentary.

 - [ ] Initial setup
 - [ ] Working towards first release
 - [ ] Community building
 - [X] Nearing graduation
 - [ ] Other:

### Date of last release:

 2018-12-09

### When were the last committers or PPMC members elected?

 Sudarshan Vasudevan in January, 2019. (Active contributors are being
 discussed for being voted as committers)

### Have your mentors been helpful and responsive?
 Yes.

### Is the PPMC managing the podling's brand / trademarks?
 Yes.

### Signed-off-by:

 - [X] (gobblin) Jean-Baptiste Onofre
    Comments:
 - [ ] (gobblin) Olivier Lamy
    Comments:
 - [X] (gobblin) Jim Jagielski
    Comments:

### IPMC/Shepherd notes:

16 Oct 2019

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

### Three most important unfinished issues to address before graduating:

 1. Revisit Apache Maturity Model assessment. [In progress since last
 report]
 2. Ensure heavy contributors are awarded committership. [In progress
 since last report]
 3. Complete house-keeping tasks like revisiting website, podling
 namesearch. [In progress since last report]

### Are there any issues that the IPMC or ASF Board need to be aware of?

 No.

### How has the community developed since the last report?

 *  65% of commits were from non-committer contributors. (Active
 contributors are being discussed for being voted as committers)
 *  Healthy engagement and activity of committers and contributors.
 *  Email stats since last report: user@gobblin.incubator.apache.org : 14
 dev@gobblin.incubator.apache.org : 1426
 *  There have been 101 Commits since last report: git log --format='%ae
 %ci' | grep -E '((2019-(0|1)(4|5|6|7|0)))' | cut -d ' ' -f 1 | sort |
 uniq -c | sort -n
 *  66 ie. 65% of those commits were by non-committers: git log
 --format='%ae %ci' | grep -E '((2019-(0|1)(4|5|6|7|0)))' | cut -d ' ' -f
 1 | sort | uniq -c | sort -n
 *  Gobblin was presented in ApacheCon NA 2019. (jointly by Paypal and
 LinkedIn engineers).

### How has the project developed since the last report?

 *  Support for filtering and tagging job status in GaaS.
 *  General purpose UniversalKafkaSource, and enhanced metrics.
 *  Docker support for Gobblin.
 *  Revamped Gobblin launcer and setup process.
 *  Secure template support in GaaS.
 *  ORC schema evolution support in MR mode.
 *  Support for new Couchbase version connectors.
 *  Pluggable Workunit packer and size-estimators.
 *  Encryption support in SFDC connector.
 *  Addition of flow level SLAs.
 *  Dynamic config support for JobSpec, and DAG enhancements.

### How would you assess the podling's maturity?
Please feel free to add your own commentary.

 - [ ] Initial setup
 - [ ] Working towards first release
 - [ ] Community building
 - [x] Nearing graduation
 - [ ] Other:

### Date of last release:

 2018-12-09

### When were the last committers or PPMC members elected?

 Sudarshan Vasudevan in January, 2019.
 (Active contributors are being discussed for being voted as committers)

### Have your mentors been helpful and responsive?

 Yes.

### Signed-off-by:

 - [X] (gobblin) Jean-Baptiste Onofre
    Comments:
 - [X] (gobblin) Olivier Lamy
    Comments:
 - [ ] (gobblin) Jim Jagielski
    Comments:

### IPMC/Shepherd notes:

17 Jul 2019

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

### Three most important unfinished issues to address before graduating:

 1. Revisit Apache Maturity Model assessment. [In progress]
 2. Ensure heavy contributors are awarded committership. [In progress]
 3. Complete house-keeping tasks like revisiting website, podling
 namesearch. [In progress]

### Are there any issues that the IPMC or ASF Board need to be aware of?

 No.

### How has the community developed since the last report?

 * 62% of commits were from non-committer contributors.
 * Healthy engagement and activity of committers and contributors.
 * Email stats since last report:
   user@gobblin.incubator.apache.org : 21
   dev@gobblin.incubator.apache.org : 1744
 * There have been 82 Commits since last report:
   git log --format='%ci' | grep -cE '((2019-0(4|5|6|7)))'
 * 51 ie. 62% of those commits were by non-committers:
   git log --format='%ae %ci' | grep -E '((2019-0(4|5|6|7)))'
   | cut -d ' ' -f 1 | sort | uniq -c | sort -n
 * Community's proposal to present in ApacheCon NA 2019 was accepted.
   (joint presentation by Paypal and LinkedIn engineers).

### How has the project developed since the last report?

 * Encryption support for Salesforce connector.
 * GobblinEventBuilder enhancements.
 * Metric reporter integration with dataset discovery.
 * Enhancement to RateBasedLimiter.
 * Dynamic config support in JobSpecs.
 * GaaS disaster recovery mode skeleton.
 * Addition of MySQL based DAG State store.
 * New filesystem based SpecProducer.
 * Auto-scalability in Gobblin on Yarn mode.
 * Container request and allocation optimizations.
 * New SQL dataset descriptor for JDBC sourced datasets.
 * Speculative safety checks in HiveWritable writer.
 * New Async loadable FlowSpecs.

### How would you assess the podling's maturity?
Please feel free to add your own commentary.

 - [ ] Initial setup
 - [ ] Working towards first release
 - [ ] Community building
 - [X] Nearing graduation
 - [ ] Other:

### Date of last release:

 2018-12-09

### When were the last committers or PPMC members elected?

 Sudarshan Vasudevan in January, 2019.

### Have your mentors been helpful and responsive?

 Yes.

### Signed-off-by:

 - [X] (gobblin) Jean-Baptiste Onofre
    Comments:
 - [ ] (gobblin) Olivier Lamy
    Comments:
 - [ ] (gobblin) Jim Jagielski
    Comments:

### IPMC/Shepherd notes:

17 Apr 2019

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

Three most important issues to address in the move towards graduation:

 Nothing at this time.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 No.

How has the community developed since the last report?

 * 53% of commits were from non-committer contributors.
 * Another committer was voted it, building a healthy cadence of
   contributors stepping up and being voted in as committers.
 * Email stats since last report:
   user@gobblin.incubator.apache.org : 30
   dev@gobblin.incubator.apache.org : 692
 * There have been 53 Commits since last report:
   git log --format='%ci' | grep -cE '((2019-0(1|2|3|4)))'
 * 28 ie. 53% of those commits were by non-committers:
   git log --format='%ae %ci' | grep -E '((2019-0(1|2|3|4)))'|
   cut -d ' ' -f 1 | sort | uniq -c | sort -n
 * After ApacheCon NA 2018, CrunchConf Budapest 2018, community is
   planning to present in ApacheCon NA 2019, ApacheCon EU 2019.

How has the project developed since the last report?

 * Enhancement to GaaS scheduler (more features like query for last k
   flow executions, explain query, auto state store cleanup,
   Azkaban client improvement, etc.).
 * Watermark manager improvements for streaming use-cases.
 * Lineage support for filesystem based sources.
 * Job catalog memory usage optimizations.
 * New versioning strategy for config based datasets in Distcp.
 * Dynamic mappers support.
 * Pluggable format-specific components in Gobblin compaction.
 * ORC based Gobblin compaction.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [ ] Initial setup
 [ ] Working towards first release
 [ ] Community building
 [x] Nearing graduation
 [ ] Other:

Date of last release:

 2018-12-09

When were the last committers or PPMC members elected?

 Sudarshan Vasudevan in January, 2019.

Have your mentors been helpful and responsive or are things falling
through the cracks? In the latter case, please list any open issues
that need to be addressed.

 Yes.

Signed-off-by:

 [X](gobblin) Jean-Baptiste Onofre
    Comments:
 [X](gobblin) Olivier Lamy
    Comments: Very healthy project with a lot of activities!
 [X](gobblin) Jim Jagielski
    Comments:

IPMC/Shepherd notes:

16 Jan 2019

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

Three most important issues to address in the move towards graduation:

 Nothing at this time.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 No.

How has the community developed since the last report?

 * 92% of commits were from non-committer contributors.
 * Email stats since last report:
   user@gobblin.incubator.apache.org : 27
   dev@gobblin.incubator.apache.org : 218
 * There have been 61 Commits since last report:
   git log --format='%ci' | grep -cE '((2018-1(0|1|2))|(2019-01))'
 * 56 ie. 92% of those commits were by non-committers:
   git log --format='%ae %ci' | grep -E '((2018-1(0|1|2))|(2019-01))'|
   cut -d ' ' -f 1 | sort | uniq -c | sort -n
 * Recurring video conference based meet-up has been happening every
   month with a healthy attendance.
 * After ApacheCon NA 2018, Gobblin was also presented in CrunchConf
   Budapest 2018, and has independently been featured in various
   meet-ups / conferences around the world.

How has the project developed since the last report?

 * Multi-hop support in Gobblin-as-a-Service with in built workflow
 manager.
 * Multicast through Multi-hop flow compiler.
 * Gobblin-as-a-Service integration with Azkaban.
 * New Elasticsearch writer intergration.
 * Optimized block level distcp-ng copy support.
 * HOCON support for flow requests to GaaS.
 * Ability to fork jobs when concatenating Dags
 * ServiceManager to manage GitFlowGraphMonitor in multihop flow compiler.
 * Distributed job launcher with Helix tagging support.
 * Several more enhancements and feature add-ons.
   Full list across last two releases.
 * Release 0.14.0 done.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [ ] Initial setup
 [ ] Working towards first release
 [ ] Community building
 [x] Nearing graduation
 [ ] Other:

Date of last release:

 2018-12-09

When were the last committers or PPMC members elected?

 Tamas Nemeth in November, 2018.

Have your mentors been helpful and responsive or are things falling
through the cracks? In the latter case, please list any open issues
that need to be addressed.

 Yes.

Signed-off-by:

 [X](gobblin) Jean-Baptiste Onofre
    Comments:
 [ ](gobblin) Olivier Lamy
    Comments:
 [X](gobblin) Jim Jagielski
    Comments:

IPMC/Shepherd notes:
 Justin Mclean: Given that 92% of commit are from non-committers why does
 the project not vote more committers in? I can only see one committer
 voted in in the previous year. For a project nearing graduation I'd expect
 to see a lot more people voted in. I also don't see what is discussed in the
 video conferences being brought back to the list.

17 Oct 2018

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

Three most important issues to address in the move towards graduation:

 Nothing at this time.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 No.

How has the community developed since the last report?

 * 74% of commits were from non-committer contributors.
 * Email stats since last report:
   user@gobblin.incubator.apache.org : 10
   dev@gobblin.incubator.apache.org : 243
 * There have been 61 Commits since last report:
   git log --format='%ci' | grep -cE '((2018-0(7|8|9))|(2018-10))'
 * 45 ie. 74% of those commits were by non-committers:
   git log --format='%ae %ci' | grep -E '((2018-0(7|8|9))|(2018-10))'|
   cut -d ' ' -f 1 | sort | uniq -c | sort -n
 * Recurring video conference based meet-up has been happening every
   month with a healthy attendance.
 * Gobblin had a presentation in ApacheCon NA 2018, and has independently
   been featured in various meet-ups / conferences around the world.

How has the project developed since the last report?

 * Gobblin's evolution as Platform-as-a-Service is near GA - driven by
   couple of non-committers.
 * Comprehensive work to stabilize Gobblin cluster at extreme scale by
   non-committer contributor.
 * Streaming pipeline simplification and enhancements.
 * New ElasticSearch support.
 * Gobblin - Azkaban integration.
 * Job quotas in Gobblin cluster mode through Apache Helix.
 * Couchbase integration enhancement.
 * New optimized Config store implementation.
 * Block level distcp-ng in progress.
 * Release 0.13.0 done.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [ ] Initial setup
 [ ] Working towards first release
 [ ] Community building
 [x] Nearing graduation
 [ ] Other:

Date of last release:

 2018-09-20

When were the last committers or PPMC members elected?

 Joel Baranick in December, 2017.

Have your mentors been helpful and responsive or are things falling
through the cracks? In the latter case, please list any open issues
that need to be addressed.

 Yes.

Signed-off-by:

 [X](gobblin) Jean-Baptiste Onofre
    Comments: Great presentation at ApacheCon (which convince me again to
    contribute on the codebase).
 [X](gobblin) Olivier Lamy
    Comments:
 [X](gobblin) Jim Jagielski
    Comments:

IPMC/Shepherd notes:

18 Jul 2018

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

Three most important issues to address in the move towards graduation:

   1. Make frequent releases

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 No

How has the community developed since the last report?

* Various major components and futuristic features are being driven by the
community
 (non-committers) thus building a very healthy pool of contributors that
can be voted
 in as committers.
* Continued growth in engagement over Gitter IRC, and mailing lists.
* 79% of commits (a record in Gobblin community) were from non-committer
contributors.
* Email stats since last report:
 user@gobblin.incubator.apache.org : 44 dev@gobblin.incubator.apache.org :
200
* There have been 66 Commits since last report:
   git log --format='%ci' | grep -cE '(2018-0(4|5|6|7))'
* 52 ie. 79% of those commits were by non-committers:
  git log --format='%ae %ci' | grep -E '(2018-0(4|5|6|7))' | cut -d ' ' -f 1
  | sort | uniq -c | sort -n
* Recurring video conference based meetup has been happening every month
with a
 healthy attendance.
* Gobblin was presented and well received in various meetups / conferences
 around the world (independently by Apache community members).

How has the project developed since the last report?

* Major progress in Gobblin's evolution as Platform-as-a-Service - being
driven
 by couple of non-committers.
* Comprehensive work being driven by a non-committer for stability of
Gobblin
 cluster at extreme scale.
* Enhancements to key integrations such as Salesforce, Couchbase, Kafka,
etc.
* Addition of features for compliance and security. Increased adoption in
this
 area by the community (for critical use-cases such as GDPR).
* Release 0.12.0 done.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [ ] Initial setup
 [ ] Working towards first release
 [x] Community building
 [ ] Nearing graduation
 [ ] Other:

Date of last release:

 2018-07-02

When were the last committers or PPMC members elected?

 Joel Baranick in December, 2017.

Signed-off-by:

 [X](gobblin) Jean-Baptiste Onofre
    Comments:
 [X](gobblin) Olivier Lamy
    Comments:
 [X](gobblin) Jim Jagielski
    Comments:

IPMC/Shepherd notes:

18 Apr 2018

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

Three most important issues to address in the move towards graduation:

 1. Make frequent releases

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

 None

How has the community developed since the last report?

* Gobblin community has continued to grow and engage more (on mailing lists
 and Gitter IRC).
* 51% of commits have been from non-committer contributors.
* Email stats since last report:
 user@gobblin.incubator.apache.org : 47 dev@gobblin.incubator.apache.org :
 694
* There have been 121 Commits since last report:
   git log --format='%ci' | grep -cE '(2018-0(1|2|3|4))'
* 62 ie. 51% of those commits were by non-committers:
  git log --format='%ae %ci' | grep -E '(2018-0(1|2|3|4))' | cut -d ' ' -f 1
  | sort | uniq -c | sort -n
* Recurring video conference based meetup has been happening every month with
 healthy attendance.
* Gobblin was presented and well received in various meetups / conferences
 around the world (independently by Apache community members) eg. Strata etc.

How has the project developed since the last report?

* Various new connectors for integration with more systems, and several
 enhancements / feature development.
* Continued development of Gobblin-as-a-Service (PaaS for Gobblin as well as
 non-Gobblin systems). More engagement of community on this front.
* Enhancements to website, and packaging / distribution of Gobblin.
* Release v0.12.0 is being voted on right now.

How would you assess the podling's maturity?

 [ ] Initial setup
 [X] Working towards first release
 [X] Community building
 [ ] Nearing graduation
 [ ] Other:

Date of last release:

 v0.12.0 is being voted on right now.

When were the last committers or PPMC members elected?

 Joel Baranick in December, 2017.
 (A few more contributors in the community are ready to be elected.)

Signed-off-by:

 [X](gobblin) Jean-Baptiste Onofré
    Comments:
 [X](gobblin) Olivier Lamy
    Comments:
 [X](gobblin) Jim Jagielski
    Comments: release process being better defined
  the 0.12 RC efforts. NOTICE requirements better understood.

17 Jan 2018

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

Three most important issues to address in the move towards graduation:

 1. Make frequent releases

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

 None

How has the community developed since the last report?

* Gobblin has seen an exciting growth on the community front. It has grown
 into a diverse self-sustained community, where non-committer members are
 often seen helping out each other on mailing lists and Gitter IRC (on most
 days more than the committers). Many contributors have also stepped up and
 contributed with important features and taken up ownership of critical
 components.
* 70% of commits have been from non-committer contributors.
* Email stats since last report: user@gobblin.incubator.apache.org : 92
 dev@gobblin.incubator.apache.org : 671
* Heavy activity on Gitter IRC channel (while the community uses Gitter IRC,
 it also does self policing and consciously moves any discussion-thread
 beyond casual chatter to the mailing lists)
* There have been 148 Commits since last report: git log --format='%ci' | grep
 -cE '(2017-0(9))|(2017-1(0|1|2)|(2018-0(1)))'
* 103 ie. 70% of those commits were by non-committers: git log --format='%ae
 %ci' | grep -E '(2017-0(9))|(2017-1(0|1|2)|(2018-0(1)))' | cut -d ' ' -f 1 |
 sort | uniq -c | sort -n
* Recurring video conference based meetup has been happening every month with
 healthy attendance.
* Gobblin was presented and well received in various conferences eg. Strata
 etc.
* More companies have adopted Gobblin, and different members of PPMC have
 received positive feedback and interest.

How has the project developed since the last report?

* Several new powerful features have been added to Gobblin that have enhanced
 Gobblin to be more valuable in Stream processing as it is in batch data
 world.
* Gobblin interestingly has started to evolve into an ecosystem rather than a
 singular platform with addition of major sub-systems such as
 Gobblin-as-a-Service (PaaS for Gobblin as well as non-Gobblin systems),
 Global Throttling (can be used with any distributed system) and existing
 Gobblin metrics.
* Documentation and stability has improved across the board.
* Release v0.12.0 is being voted on right now.
* The Apache way has become the normal way of doing things.

How would you assess the podling's maturity?

Gobblin has made good progress on the Community front and overall as a
project. However, before calling it nearing graduation, we will like to make
atleast couple of releases.

 [ ] Initial setup
 [X] Working towards first release
 [X] Community building
 [ ] Nearing graduation
 [ ] Other:

Date of last release:

 v0.12.0 is being voted on right now.

When were the last committers or PPMC members elected?

 Joel Baranick in December, 2017.
 (We have a few more strong contributors that we are looking to vote in soon)

Signed-off-by:

 [ ](gobblin) Jean-Baptiste Onofré
    Comments:
 [X](gobblin) Olivier Lamy
    Comments:
 [X](gobblin) Jim Jagielski
    Comments:

18 Oct 2017

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

Three most important issues to address in the move towards graduation:

 1. Cut our first release.
 2. Elect new Committer(s) / PPMC.
 3. Update links on website and documentation.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None

How has the community developed since the last report?

* 15+ major companies, startups, universites and research institutes are now using Gobblin (refer to Powered-by section [1] here: https://gobblin.apache.org/ )
* Email stats for last month:
 user@gobblin.incubator.apache.org : 25
 dev@gobblin.incubator.apache.org : 163
* There have been 40 Commits in last month:
   git log --format='%ci' | grep -cE '2017-0(9)'
* 29 of those commits were by non-committers:
  git log --format='%ae %ci' | grep -E '2017-0(9)' | cut -d ' ' -f 1 | sort | uniq -c | sort -n
* Another video conference based meetup happened last month with good attendance and interest.
* We are continuing to work towards our first release.

[1] This data was collected before incubation via a survey. It was expanded to include more companies as and when requested by respective contributors.

How has the project developed since the last report?

* Continued active development.
* Progress continues to be tracked via JIRA / Sprint dashboard.

How would you assess the podling's maturity?

There is an all around progress, and the podling is working towards its first release.

 [ ] Initial setup
 [X] Working towards first release
 [X] Community building
 [ ] Nearing graduation
 [ ] Other:

Date of last release:

 N/A

When were the last committers or PPMC members elected?

 N/A

Signed-off-by:

 [ ](gobblin) Jean-Baptiste Onofré
    Comments:
 [ ](gobblin) Olivier Lamy
    Comments:
 [X](gobblin) Jim Jagielski
    Comments:

IPMC/Shepherd notes:

 johndament: The podling has the right notion of next steps, website is probably the biggest area of work needed.

20 Sep 2017

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

Three most important issues to address in the move towards graduation:

 1. Cut our first release

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None

How has the community developed since the last report?

* We are working towards our first release.
* Email stats for last month:
 user@gobblin.incubator.apache.org : 14
 dev@gobblin.incubator.apache.org : 259
* There have been 54 Commits in last month:
   git log --format='%ci' | grep -cE '2017-0(8)'
* 30 of those commits were by non-committers:
  git log --format='%ae %ci' | grep -E '2017-0(8)' | cut -d ' ' -f 1 | sort | uniq -c | sort -n
* A video conference based meetup happened last month.

How has the project developed since the last report?

* Site has been setup.
* Apache wiki has been populated with relevant content.
* Code development is actively being tracked via JIRA / Sprint dashboard.

How would you assess the podling's maturity?

The podling is working towards its first release. Like last time, continued progress and activity on all fronts.

 [ ] Initial setup
 [X] Working towards first release
 [X] Community building
 [ ] Nearing graduation
 [ ] Other:

Date of last release:

 N/A

When were the last committers or PPMC members elected?

 N/A

Signed-off-by:

 [ ](gobblin) Jean-Baptiste Onofré
    Comments: Waiting the board report. I will help/ping.
 [X](gobblin) Olivier Lamy
    Comments:
 [ ](gobblin) Jim Jagielski
    Comments:

16 Aug 2017

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

Three most important issues to address in the move towards graduation:

 1. Cut our first release

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None

How has the community developed since the last report?

 We are very first steps of the project

How has the project developed since the last report?

* The code has been migrated now to the Apache Git Infra
* Issues has been migrated to Apache Jira Infra
* Site infrastructure has been created (now working on imported the content)
* Discussion on setup Jenkins build

How would you assess the podling's maturity?

The podling is still on early stage. But a lot of progress and activities has been made recently.

 [ ] Initial setup
 [X] Working towards first release
 [ ] Community building
 [ ] Nearing graduation
 [ ] Other:

Date of last release:

 N/A

When were the last committers or PPMC members elected?

 N/A

Signed-off-by:

 [X](gobblin) Olivier Lamy
    Comments:
 [](gobblin) Jean-Baptiste Onofre
    Comments:
 [X](gobblin) Jim Jagielski
    Comments:

21 Jun 2017

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

Few first steps has been made:

* mailing list setup
* jira setup
* few Apache account creation for new committers.

Three most important issues to address in the move towards graduation:

 1. Code import. Still need agreement from LinkedIn/Microsoft

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

 None

How has the community developed since the last report?

 We are very first steps of the project

How has the project developed since the last report?

 Not much. We are waiting code donation before start building the community.


How would you assess the podling's maturity? Please feel free to add your own
commentary.

 [X] Initial setup
 [ ] Working towards first release
 [ ] Community building
 [ ] Nearing graduation
 [ ] Other:

Date of last release:

 N/A

When were the last committers or PPMC members elected?

 N/A

Signed-off-by:

 [X](gobblin) Olivier Lamy
    Comments:
 [ ](gobblin) Jean-Baptiste Onofre
    Comments:
 [X](gobblin) Jim Jagielski
    Comments:

19 Apr 2017

Gobblin is a distributed data integration framework that simplifies common
aspects of big data integration such as data ingestion, replication,
organization and lifecycle management for both streaming and batch data
ecosystems.

Gobblin has been incubating since 2017-02-23.

Few first steps has been made:

* mailing list setup
* jira setup
* few Apache account creation for new committers.

Three most important issues to address in the move towards graduation:

 1. Code import. Still need agreement from LinkedIn/Microsoft

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 None

How has the community developed since the last report?

 We are very first steps of the project

How has the project developed since the last report?

 First report :-)

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [X] Initial setup
 [ ] Working towards first release
 [ ] Community building
 [ ] Nearing graduation
 [ ] Other:

Date of last release:

 N/A

When were the last committers or PPMC members elected?

 N/A

Signed-off-by:

 [X](gobblin) Olivier Lamy
    Comments:
 [X](gobblin) Jean-Baptiste Onofre
    Comments:
 [X](gobblin) Jim Jagielski
    Comments: