ApacheCon is Coming 9-12 Sept. 2019 - Las Vegas The Apache Software Foundation
Apache 20th Anniversary Logo

Community-led development "The Apache Way"

Apache Support Logo

This was extracted (@ 2020-11-18 21:10) from a list of minutes which have been approved by the Board.
Please Note The Board typically approves the minutes of the previous meeting at the beginning of every Board meeting; therefore, the list below does not normally contain details from the minutes of the most recent Board meeting.

Meeting times vary, the exact schedule is available to ASF Members and Officers, search for "calendar" in the Foundation's private index page (svn:foundation/private-index.html).

DataSketches

18 Nov 2020

Report was filed, but display is awaiting the approval of the Board minutes.

19 Aug 2020

DataSketches is an open source, high-performance library of stochastic
streaming algorithms commonly called "sketches" in the data sciences.
Sketches are small, stateful programs that process massive data as a stream
and can provide approximate answers, with mathematical guarantees, to
computationally difficult queries orders-of-magnitude faster than
traditional, exact methods.

DataSketches has been incubating since 2019-03-30.

### Three most important unfinished issues to address before graduating:

 1. Adding more committers. We have just added our first new committer
    since incubation! We have a few more individuals that have been
    consistent contributors to the project that we will soon want to
    go through the new committer election process. This is a big change
    from our last report where we had no candidates at all.
 2. Fill out the Maturity Model
 3. Prepare for Graduation.

### Are there any issues that the IPMC or ASF Board need to be aware of?

 We could use some help in finding people who would find working in
 the sketching algorithms area really interesting and would want
 to work with us to become committers.

### How has the community developed since the last report?

 The word is getting out! We presented talks at the USPTO 2020 tech
 conference and the Spark & AI 2020 conference, mentioned in the last
 report, with lots of good feedback.

 We will be co-authors in a tutorial on sketching technology at the
 upcoming ACM-KDD conference in August with one of the world's
 leading scientists in streaming algorithms and sketching.

 We have been invited to give a keynote talk at the upcoming
 DataCon2020 in Taiwan in early September.

 We have been accepted for a talk at ApacheCon again this year.

 We also are seeing a big increase in the number of single PRs coming
 from a number of different people, especially for our C++ components,
 which is very good news. This proves that there is growing
 interest in the project and there are folks out there that want to
 contribute to the project.

### How has the project developed since the last report?

 See the releases since the last report below.

 In addition we have made significant improvements to our website
 thanks to some external contributors!

 To the best of our knowledge all of our licensing and website issues
 have been addressed and have been implemented in formal releases or
 are in master-branch staging, awaiting the next release.

### How would you assess the podling's maturity?
 Please feel free to add your own commentary.

 - [ ] Initial setup
 - [ ] Working towards first release
 - [X] Community building
 - [X] Nearing graduation
 - [ ] Other:

### Date of last release:

 - 2020-07-06 incubating-datasketches-hive 1.1.0
 - 2020-06-19 incubating-datasketches-cpp  2.0.0
 - 2020-05-07 incubating-datasketches-java 1.3.0

### When were the last committers or PPMC members elected?

 August, 2020

### Have your mentors been helpful and responsive?

 Yes, in general. However, we do have to prod them with reminders
 to check-off our releases. Our releases have been taking
 longer and longer to get through the voting process especially
 when it is in the 2nd IPMC phase. A little help here would
 be appreciated.

### Is the PPMC managing the podling's brand / trademarks?

 To the best of our knowledge, yes.

 * Are 3rd parties respecting and correctly using the podlings
   name and brand?

   As far as we know, yes.

 * If not what actions has the PPMC taken to correct this?

   We have not had to face this issue yet.

 * Has the VP, Brand approved the project name?

   Yes, and it is clearly stated as such on
   http://incubator.apache.org/projects/datasketches.html

### Signed-off-by:

 - [X] (datasketches) Liang Chen
    Comments:
 - [X] (datasketches) Kenneth Knowles
    Comments:
 - [X] (datasketches) Furkan Kamaci
    Comments:
 - [X] (datasketches) Dave Fisher
    Comments:
 - [X] (datasketches) Evans Ye
    Comments:

### IPMC/Shepherd notes:

20 May 2020

DataSketches is an open source, high-performance library of stochastic
streaming algorithms commonly called "sketches" in the data sciences.
Sketches are small, stateful programs that process massive data as a
stream and can provide approximate answers, with mathematical
guarantees, to computationally difficult queries orders-of-magnitude
faster than traditional, exact methods.

DataSketches has been incubating since 2019-03-30.

### Three most important unfinished issues to address before graduating:

 1. Clearly, the most important issue for us is to add more committers.
    From the Clutch and Podling Website reports, this is the last
    major issue for us.

    We have tried to encourage folks that ask questions or raise issues
    to get more involved, and we have one or two folks that have
    expressed interest in submitting PRs or even a new sketch. But,
    alas, none have followed through, yet.

    Developing sketch code is very tricky and understanding how these
    algorithms work, and the math and statistics behind them, is a hurdle
    for most people. Yet, we have been very clear that we are prepared to
    train someone to become a committer.  All we ask is that the
    candidate be open to learning about these fascinating algorithms and
    committed to work with us. We could use some active help from our
    Mentors or from the Board to help us find someone that would find
    this work interesting.

    I am convinced that there are folks in the greater Apache community
    that would really enjoy working on this library, we just need to
    discover who they are!

 2. Referring to last month's report, we have made progress in setting up
    TODO lists on our major sites: Java and C++. And we keep working
    away at these lists.  We have also improved our Downloads page and
    brought it up to Apache standards. I don't feel these should be
    issues for graduation.

### Are there any issues that the IPMC or ASF Board need to be aware of?

 The issue mentioned above. We could use some help in finding someone
 who would find working in the sketching algorithms area really
 interesting and would want to work with us to become a committer.

### How has the community developed since the last report?

 We have been accepted to present at two conferences this Summer, the
 USPTO technology conference and the Spark & AI conference.

 We also have interest from Apache Flink and Apache Impala to
 integrate sketches into their systems. There has also been interest
 from Apache Beam, but so far no action.

### How has the project developed since the last report?

 We have done a lot of work making the C++ code more robust and will
 likely have a major new release of the C++ library before this
 report is read by the Board.  We also in the voting process for a
 new Java release that cleans up some licensing glitches and fixes
 a bug found by Druid.

 Our activity on Slack has increased quite a bit with
 interesting queries from all over.

 We also have done a lot of work on the website, adding content and
 improving navigation. The Community and Downloads pages are all new.
 Please have a look!

 We continue to improve our release process with more guided scripts
 and fix issues as we discover them.

### How would you assess the podling's maturity?
 Please feel free to add your own commentary.

 - [ ] Initial setup
 - [ ] Working towards first release
 - [X] Community building -- this is a continuous, on-going effort
 - [X] Nearing graduation
 - [ ] Other:

### Date of last release:

 * 2020-01-26 Java release 1.2.0-incubating.
 * The Java 1.3.0-incubating release will be out before the Board
   meeting.
 * A new C++ 2.0.0-incubating release may be out before
   the Board meeting.

### When were the last committers or PPMC members elected?

 No new committers since April, 2019.

### Have your mentors been helpful and responsive?

 Yes. No open issues.

### Is the PPMC managing the podling's brand / trademarks?

  To the best of our knowledge, yes.

 * Are 3rd parties respecting and correctly using the podlings name and
  brand?

  As far as we know, yes.

 * If not what actions has the PPMC taken to correct this?

  We have not had to face this issue yet.

 * Has the VP, Brand approved the project name?

  Yes, and it is clearly stated as such on
  http://incubator.apache.org/projects/datasketches.html

### Signed-off-by:

 - [X] (datasketches) Liang Chen
    Comments:
 - [ ] (datasketches) Kenneth Knowles
    Comments:
 - [X] (datasketches) Furkan Kamaci
    Comments:
 - [X] (datasketches) Dave Fisher
    Comments:
 - [X] (datasketches) Evans Ye
    Comments:

### IPMC/Shepherd notes:
 Justin Mclean: Perhaps one way of attracting more interest is to have more
 conversation on the mailing list?

19 Feb 2020

DataSketches is an open source, high-performance library of stochastic
streaming algorithms commonly called "sketches" in the data sciences.
Sketches are small, stateful programs that process massive data as a stream
and can provide approximate answers, with mathematical guarantees, to
computationally difficult queries orders-of-magnitude faster than
traditional, exact methods.

DataSketches has been incubating since 2019-03-30.

### Three most important unfinished issues to address before graduating:

 1. Be more communicative and document our code changes more clearly.
 2. We need to have more substantive discussions on dev@ especially about
    our growing
    TODO list and how we plan to address them -- create a roadmap as a
    guide for others to contribute.
 3. Find / Attract new code committers outside Yahoo!

### Are there any issues that the IPMC or ASF Board need to be aware of?
 No

### How has the community developed since the last report?
 We are presenting at more conferences which has attracted some interest.
 We are definitely getting more traffic on our forum, GitHub issues
 and email lists.  We recently added two channels on the-asf@slack:
 #datasketches and #datasketches-dev. The traffic has been fairly low on
 Slack as well as the forum. We could do more to publicize the slack
 channels.  I could be optimistic and believe the low traffic is due to
 the holidays -- or that the code just works :)

 Nonetheless, the download traffic measured by repository.a.o
 has grown exponentially since our first Apache release on Sep 23. We are
 over 1000
 unique IPs/ month and had a recent high of 22K downloads/ month.  Bear in
 mind
 that this is all traffic that has migrated from the older, pre-Apache
 artifacts
 at com.yahoo.datasketches and is already higher than our peak downloads
 prior to
 Apache. These numbers also do not reflect any downloads of our Zip
 artifacts
 from a.o./dist (which includes our C++ artifacts) or other external
 download
 repositories (for example, specific to PostgreSQL).

### How has the project developed since the last report?
 Our releases are becoming easier, more polished and routine.
 Nonetheless, our website needs a lot of work (as mentioned above) and
 this will become our focus for the next month or so.

### How would you assess the podling's maturity?
Please feel free to add your own commentary.

 - [ ] Initial setup
 - [ ] Working towards first release
 - [X] Community building
 - [ ] Nearing graduation
 - [ ] Other:

### Date of last release:
 These are the major components and their last release dates:

 * DataSketches-Java       2020-01-26
 * DataSketches-Memory     2019-11-21
 * DataSketches-CPP        2019-09-17
 * DataSketches-Hive       2019-10-11
 * DataSketches-Pig        2019-10-18
 * DataSketches-Postgresql 2019-10-29

### When were the last committers or PPMC members elected?
 No new committers since April, 2019.

### Have your mentors been helpful and responsive?
 Yes.
 No open issues.

### Is the PPMC managing the podling's brand / trademarks?
 To the best of our knowledge, yes.

 * Are 3rd parties respecting and correctly using the podlings name and
 brand?
   As far as we know, yes.

 * If not what actions has the PPMC taken to correct this?
   We have not had to face this issue yet.

 * Has the VP, Brand approved the project name?
   Yes, and it is clearly stated as such on
   http://incubator.apache.org/projects/datasketches.html

### Signed-off-by:

 - [X] (datasketches) Liang Chen
    Comments:
 - [X] (datasketches) Kenneth Knowles
    Comments:
 - [X] (datasketches) Furkan Kamaci
    Comments:
 - [X] (datasketches) Dave Fisher
    Comments:
 - [X] (datasketches) Evans Ye
    Comments:

### IPMC/Shepherd notes:

20 Nov 2019

DataSketches is an open source, high-performance library of stochastic
streaming algorithms commonly called "sketches" in the data sciences.
Sketches are small, stateful programs that process massive data as a stream
and can provide approximate answers, with mathematical guarantees, to
computationally difficult queries orders-of-magnitude faster than
traditional, exact methods.

DataSketches has been incubating since 2019-03-30.

### Three most important unfinished issues to address before graduating:

 1. Finish the transfer and bring-up of our website to
    github.com/apache/...  This is now in process.
 2. __Team Interactions:__ We want to have our exchanges on the ASF
    Slack DataSketches-dev channel posted to our dev@datasketches.a.o
    list on a daily basis for improved visibility and searchability.
    We have an open INFRA ticket on this issue.
    We are searching for a solution to provide more open access to
    our video conference sessions when we have them. We are in the
    process of moving more of our interactions into the slack
    DS-dev channel and dev@ list. This is a culture change for us
    and will take some getting used to. We clearly want open
    access to our team discussions.
 3. We would like to see a few more folks
    join our contributors list.  We have several folks that
    have come forward and offered help because they are interested
    in the project.  This is great.  It is our hope that they will
    grow into active contributors.

### Are there any issues that the IPMC or ASF Board need to be aware of?
 None

### How has the community developed since the last report?
 * We have added 1 new Mentor, Dave Fisher (thank you!) to our project
   and we have been approached by another Apache member
   who would also like to be a mentor, and eventually a contributor
   as well. This is very positive!

### How has the project developed since the last report?

 * We have now managed 7 releases,  6 Java releases and 1 C++ release.
   We have one more C++ release pending.  These are across 6 different
   components of the DataSketches library.  With the last pending C++
   release, all of the code components targeted for release will
   be complete.

### How would you assess the podling's maturity?
Please feel free to add your own commentary.

 - [ ] Initial setup
 - [ ] Working towards first release
 - [X] Community building
 - [ ] Nearing graduation
 - [ ] Other:

### Date of last release:

 2019-10-19  01:55 GMT DataSketches-pig

### When were the last committers or PPMC members elected?
 * Dave Fisher: 16 Sep 2019

### Have your mentors been helpful and responsive?
 * Helpful and responsive, Yes.
   Having additional mentors has helped the voting
   move forward more expeditiously!
 * I want to thank Dave Fisher for jumping in and helping us
   with a number of issues!


### Signed-off-by:

 - [X] (datasketches) Liang Chen
    Comments:
 - [x] (datasketches) Kenneth Knowles
    Comments:
 - [X] (datasketches) Furkan Kamaci
    Comments:
 - [X] (datasketches) Dave Fisher
    Comments:

### IPMC/Shepherd notes:

21 Aug 2019

DataSketches is an open source, high-performance library of stochastic
streaming algorithms commonly called "sketches" in the data sciences.
Sketches are small, stateful programs that process massive data as a stream
and can provide approximate answers, with mathematical guarantees, to
computationally difficult queries orders-of-magnitude faster than
traditional, exact methods.

DataSketches has been incubating since 2019-03-30.

### Three most important unfinished issues to address before graduating:

 1. Our vote letter on general@ had no responses from anyone (not just
    IPMC members) for the first 73 hours. After sending a pleading
    reminder email I finally got 3 +1 binding votes. I'm trying to be
    polite and not needle folks, but I need guidance on how to get IPMC
    members' attention. I realize the vote  must stay open for at least
    72 hours, but having to wait until the last minute get any response
    is very aggravating. Would it be fair to send out reminder notices on
    24 hour intervals?
 2. Continue to perfect the release process.
 3. After we get this first release, we need to finish migrating the
    remaining repos.

### Are there any issues that the IPMC or ASF Board need to be aware of?
 1. Yes. In addition to #1 above, not all of our Mentors have been
    involved. Why do Mentors sign up if they do not or cannot mentor?

### How has the community developed since the last report?
 Not too much at the committer level. We have drawn the
 interest of a few new scientists in our work, but they did not
 learn of our work from Apache.
 It is still very early.  I am speaking at ApacheCon
 In September, hopefully we can attract some interest there.
 I am hoping to attract some committers.

### How has the project developed since the last report?
 We continue to evolve the project and make commits to the code base.
 We are also heavily integrated into the Druid platform.

### How would you assess the podling's maturity?
Please feel free to add your own commentary.

 - [X] Initial setup
 - [X] Working towards next release
 - [ ] Community building
 - [ ] Nearing graduation
 - [ ] Other:

### Date of last release:

 2019-08-02 Our First release of our first component!
 Thanks to: Kenneth Knowles, Furkan Kamaci, Paul King and
 Justin Mclean for their help.

### When were the last committers or PPMC members elected?
 When we entered incubation.

### Have your mentors been helpful and responsive?
 Two (of 3) of our Mentors have been responsive when they are not
 otherwise unavailable (vacation, work, etc.)

### Signed-off-by:

 - [X] (datasketches) Liang Chen
    Comments:
 - [ ] (datasketches) Kenneth Knowles
    Comments:
 - [X] (datasketches) Furkan Kamaci
    Comments:

### IPMC/Shepherd notes:
 Justin Mclean: 72 hours is a minimum and a podling may not attract
 all needed votes in that time. I understand it may be frustrating
 but remember IPMC member are volunteers and mostly do this work
 unpaid in their spare time. If you need more Mentors just ask on
 the incubator general list.

17 Jul 2019

DataSketches is an open source, high-performance library of stochastic
streaming algorithms commonly called "sketches" in the data sciences.
Sketches are small, stateful programs that process massive data as a stream
and can provide approximate answers, with mathematical guarantees, to
computationally difficult queries orders-of-magnitude faster than
traditional, exact methods.

DataSketches has been incubating since 2019-03-30.

### Three most important unfinished issues to address before graduating:

 1. Complete a successful 1st snapshot release of Memory repo to DIST and
 Nexus. This is a blocking issue.
 2. Finish refactoring/snapshot releasing the other repos, which depend on
 #1.
 3. Move, refactor Website.

### Are there any issues that the IPMC or ASF Board need to be aware of?
 For the IPMC:
 As a newbie podling, my experience so far has been exasperating. Finding
 how to accomplish key tasks is difficult.  The information is spread all
 over and the essential details of how to actually
 accomplish tasks are often missing.

 I have run into multiple roadblocks, especially with regards to
 permissions. I have to keep filing new tickets with INFRA to setup access
 to infrastructure and they reply that the Mentors need to do this. When
 I ask on general@incubator, the replies I get suggest I need to file
 tickets with INFRA. So I am  confused.

### How has the community developed since the last report?
 Not much. I wish I could spend more time on this, but I need to get
 the migration done.

### How has the project developed since the last report?
 We continue to evolve the project's functionality with commits to our
 GitHub repos.

### How would you assess the podling's maturity?
Please feel free to add your own commentary.

 - [x] Initial setup
 - [x] Working towards first release
 - [ ] Community building
 - [ ] Nearing graduation
 - [ ] Other:

### Date of last release:

 No releases yet.

### When were the last committers or PPMC members elected?
 At the initial incubation date.

### Have your mentors been helpful and responsive?

 1. I have opened INFRA issues that have not yet been addressed and there
 will be more to come.
 2. I could REALLY use some 1:1 help from an experienced release engineer
 (perhaps from another project),that is very familiar with the Apache/Maven
 release process and POM to get us off the ground.
 Once we have created our first release, we can continue from there. But
 getting this first one is out is turning out to be quite a challenge.
 I don't think we need more than an hour with an experienced Apache
 release engineer, our project just isn't that complicated.
 3. I haven't heard from any of the mentors for the last week or so,
 perhaps they are all on vacation.

### Signed-off-by:

 - [ ] (datasketches) Liang Chen
    Comments:
 - [X] (datasketches) Kenneth Knowles
    Comments:
 - [ ] (datasketches) Furkan Kamaci
    Comments:

### IPMC/Shepherd notes:
 Justin Mclean: Please ask your mentors for help, they can setup most
 things or direct yo to when you can get help. If your mentors can't help
 then ask on teh incubator general list.

19 Jun 2019

 DataSketches is an open source, high-performance library of stochastic
 streaming algorithms commonly called "sketches" in the data sciences.
 Sketches
 are small, stateful programs that process massive data as a stream and can
 provide approximate answers, with mathematical guarantees, to
 computationally
 difficult queries orders-of-magnitude faster than traditional, exact
 methods.

 DataSketches has been incubating since 2019-03-30.

### Three most important unfinished issues to address before graduating:

 1. Finish code migration
 2. Set up automated builds
 3. Establish code review practices

### Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of

 No

### How has the community developed since the last report?

 We are still in the process of setting up permissions and figuring out
 Apache environment.

### How has the project developed since the last report?

 Most DataSketches repos have been moved to Apache repos.

### How would you assess the podling's maturity?
Please feel free to add your own commentary.

 - [X] Initial setup
 - [ ] Working towards first release
 - [ ] Community building
 - [ ] Nearing graduation
 - [ ] Other:

### Date of last release:

 No releases yet

### When were the last committers or PPMC members elected?

 We have just signed up our initial committers

### Have your mentors been helpful?

 Yes, very helpful.

### Signed-off-by:

 - [ ] (datasketches) Liang Chen
    Comments:
 - [X] (datasketches) Kenneth Knowles
    Comments:
 - [X] (datasketches) Furkan Kamaci
    Comments:

### IPMC/Shepherd notes:

15 May 2019

DataSketches is an open source, high-performance library of stochastic
streaming algorithms commonly called "sketches" in the data sciences.
Sketches
are small, stateful programs that process massive data as a stream and can
provide approximate answers, with mathematical guarantees, to
computationally
difficult queries orders-of-magnitude faster than traditional, exact
methods.

DataSketches has been incubating since 2019-03-30.

Three most important unfinished issues to address before graduating:

 1. Finish IP Assignments
 2. Code Migration
 3. Perform a Release

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 No

How has the community developed since the last report?

 We have the key committers signed up. We are all learning how
 to navigate in the Apache environment and how to find things.

How has the project developed since the last report?

 This is our first report.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [X] Initial setup
 [ ] Working towards first release
 [ ] Community building
 [ ] Nearing graduation
 [ ] Other:

Date of last release:

 Our DataSketches.GitHub.io site is quite active as we are
 very active with new code and releases from this site.
 For example, our latest release of sketches-core was yesterday,
 25 April 2019.

 We are a long way from being able to release from the migrated
 Apache code base as it doesn't yet exist.

 XXXX-XX-XX

When were the last committers or PPMC members elected?

 We have just signed up are initial list of committers.

Have your mentors been helpful and responsive or are things falling
through the cracks? In the latter case, please list any open issues
that need to be addressed.

 Kenneth Knowles has been extremely helpful! Thank you!

Signed-off-by:

 [X](datasketches) Liang Chen
 Comments:
 [X](datasketches) Kenneth Knowles
 Comments: Initial set up has been a bit slow; that's on me
 [X](datasketches) Furkan Kamaci
 Comments:

IPMC/Shepherd notes: