Count: 64 PPMCs (history)

Mean age: 418 days

Median age: 552 days

Currently in incubation, sorted by age

Project Description Sponsor (Champion) Mentors Start Date
Wave A wave is a hosted, live, concurrent data structure for rich communication. It can be used like email, chat, or a document. Incubator Upayavira 2010-12-04
ODF Toolkit Java modules that allow programmatic creation, scanning and manipulation of OpenDocument Format (ISO/IEC 26300 == ODF) documents Incubator Nick Burch, Yegor Kozlov 2011-08-01
Blur Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Incubator(Patrick Hunt) Doug Cutting, Patrick Hunt, Tim Williams 2012-07-24
Streams Apache Streams is a lightweight server for ActivityStreams. Incubator(Matt Franklin) Matt Franklin, Ate Douma, Craig McClanahan, Suneel Marthi 2012-11-20
MRQL MRQL is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, Spark, and Flink. Incubator(Edward J. Yoon) Alan Cabrera, Edward J. Yoon, Mohammad Nour El-Din 2013-03-13
BatchEE BatchEE projects aims to provide a JBatch implementation (aka JSR352) and a set of useful extensions for this specification. Incubator(FIXME) Jean-Baptiste Onofré, Olivier Lamy, Mark Struberg 2013-10-03
Sirona Monitoring Solution. Incubator(Olivier Lamy) Henri Gomez, Jean-Baptiste Onofre, Tammo van Lessen, Mark Struberg 2013-10-15
log4cxx2 Logging for C++. N.B. This is a reboot of the Log4cxx podling which previously graduated. Logging Services(Christian Grobmeier) Ralph Goers 2013-12-09
DataFu DataFu provides a collection of Hadoop MapReduce jobs and functions in higher level languages based on it to perform data analysis. It provides functions for common statistics tasks (e.g. quantiles, sampling), PageRank, stream sessionization, and set and bag operations. DataFu also provides Hadoop jobs for incremental data processing in MapReduce. Incubator(Jakob Homan) Ashutosh Chauhan, Roman Shaposhnik, Ted Dunning 2014-01-05
Slider Slider is a collection of tools and technologies to package, deploy, and manage long running applications on Apache Hadoop YARN clusters. Incubator(Vinod K) Arun C Murthy, Devaraj Das, Jean-Baptiste Onofré, Mahadev Konar 2014-04-29
Taverna Taverna is a domain-independent suite of tools used to design and execute data-driven workflows. Incubator(Andy Seaborne) Andy Seaborne, Daniel J Debrunner, Marlon Pierce , Stian Soiland-Reyes, Suresh Marru, Suresh Srinivas 2014-10-20
HTrace HTrace is a tracing framework intended for use with distributed systems written in java. Incubator(Roman Shaposhnik) Jake Farrell, Todd Lipcon, Lewis John Mcgibbney, Andrew Purtell, Billie Rinaldi, Michael Stack 2014-11-11
Tamaya Tamaya is a highly flexible configuration solution based on an modular, extensible and injectable key/value based design, which should provide a minimal but extendible modern and functional API leveraging SE, ME and EE environments. Incubator(David Blevins) John D. Ament, David Blevins 2014-11-14
SAMOA SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms that run on top of distributed stream processing engines (DSPEs). It features a pluggable architecture that allows it to run on several DSPEs such as Apache Storm, Apache S4, and Apache Samza. Incubator(Daniel Dai) Alan Gates, Ashutosh Chauhan, Enis Soztutar, Ted Dunning 2014-12-15
Myriad Myriad enables co-existence of Apache Hadoop YARN and Apache Mesos together on the same cluster and allows dynamic resource allocations across both Hadoop and other applications running on the same physical data center infrastructure. Incubator(Benjamin Hindman) Benjamin Hindman, Danese Cooper, Ted Dunning, Luciano Resende 2015-03-01
Singa Singa is a distributed deep learning platform. Incubator(Thejas Nair) Daniel Dai, Alan Gates, Ted Dunning, Thejas Nair 2015-03-17
Atlas Apache Atlas is a scalable and extensible set of core foundational governance services that enables enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the complete enterprise data ecosystem Incubator(Jitendra Nath Pandey) Arun Murthy, Chris Douglas, Jakob Homan, Vinod Kumar Vavilapalli 2015-05-05
Trafodion Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Hadoop. Incubator(Michael Stack) Andrew Purtell, Devaraj Das, Enis Söztutar, Lars Hofhansl, Michael Stack 2015-05-24
FreeMarker FreeMarker is a template engine, i.e. a generic tool to generate text output based on templates. FreeMarker is implemented in Java as a class library for programmers. Incubator(Jacopo Cappellato) Jacopo Cappellato, Jean-Frederic Clere, David E. Jones, Ralph Goers, Sergio Fernández 2015-07-01
HAWQ HAWQ is an advanced enterprise SQL on Hadoop analytic engine built around a robust and high-performance massively-parallel processing (MPP) SQL framework evolved from Pivotal Greenplum Database. Incubator(Roman Shaposhnik) Alan Gates, Konstantin Boudnik, Justin Erenkrantz, Thejas Nair, Roman Shaposhnik 2015-09-04
HORN HORN is a neuron-centric programming APIs and execution framework for large-scale deep learning, built on top of Apache Hama. Incubator(Edward J. Yoon) Luciano Resende, Edward J. Yoon 2015-09-04
MADlib Big Data Machine Learning in SQL for Data Scientists. Incubator(Roman Shaposhnik) Konstantin Boudnik, Ted Dunning, Roman Shaposhnik 2015-09-15
Rya Rya (pronounced "ree-uh" /rēə/) is a cloud-based RDF triple store that supports SPARQL queries. Rya is a scalable RDF data management system built on top of Accumulo. Rya uses novel storage methods, indexing schemes, and query processing techniques that scale to billions of triples across multiple nodes. Rya provides fast and easy access to the data through SPARQL, a conventional query mechanism for RDF data. Incubator(Adam Fuchs) Josh Elser, Edward J. Yoon, Venkatesh Seetharam, Billie Rinaldi 2015-09-18
Unomi Unomi is a reference implementation of the OASIS Context Server specification currently being worked on by the OASIS Context Server Technical Committee. It provides a high-performance user profile and event tracking server. Incubator(Jean-Baptiste Onofre) Bertrand Delacretaz, Jean-Baptiste Onofre 2015-10-05
Mynewt Mynewt is a real-time operating system for constrained embedded systems like wearables, lightbulbs, locks and doorbells. It works on a variety of 32-bit MCUs (microcontrollers), including ARM Cortex-M and MIPS architectures. Incubator(Marvin Humphrey) Sterling Hughes, Jim Jagielski, Justin Mclean, Greg Stein, P. Taylor Goetz 2015-10-20
SystemML SystemML provides declarative large-scale machine learning (ML) that aims at flexible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations, to distributed computations such as Apache Hadoop MapReduce and Apache Spark. Incubator(Luciano Resende) Luciano Resende, Patrick Wendell, Reynold Xin, Rich Bowen 2015-11-02
S2Graph S2Graph is a distributed and scalable OLTP graph database built on Apache HBase to support fast traversal of extremely large graphs. Incubator(Hyunsik Choi) Andrew Purtell, Seetharam Venkatesh, Sergio Fernández 2015-11-29
Toree Toree provides applications with a mechanism to interactively and remotely access Apache Spark. Incubator(Sam Ruby) Luciano Resende, Reynold Xin, Hitesh Shah, Julien Le Dem 2015-12-02
Impala Impala is a high-performance C++ and Java SQL query engine for data stored in Apache Hadoop-based clusters. Incubator(Tom White) Tom White, Todd Lipcon, Carl Steinbach, Brock Noland 2015-12-03
Metron Metron is a project dedicated to providing an extensible and scalable advanced network security analytics tool. It has strong foundations in the Apache Hadoop ecosystem. Incubator(Owen O'Malley) Billie Rinaldi, Chris Mattmann, Owen O'Malley, P. Taylor Goetz, Vinod Kumar Vavilapalli 2015-12-06
Fineract Fineract is an open source system for core banking as a platform. Incubator(Ross Gardler) Ross Gardler, Greg Stein, Roman Shaposhnik 2015-12-15
Milagro Distributed Cryptography; M-Pin protocol for Identity and Trust Incubator(Nick Kew) Sterling Hughes, Jan Willem Janssen, Nick Kew 2015-12-21
iota Open source system that enables the orchestration of IoT devices. Incubator(Hadrian Zbarcea) Daniel Gruno, Sterling Hughes, Justin Mclean, Hadrian Zbarcea 2016-01-20
Guacamole Guacamole is an enterprise-grade, protocol-agnostic, remote desktop gateway. Combined with cloud hosting, Guacamole provides an excellent alternative to traditional desktops. Guacamole aims to make cloud-hosted desktop access preferable to traditional, local access. Incubator(Jean-Baptiste Onofre) Jean-Baptiste Onofre, Daniel Gruno, Jim Jagielski, Greg Trasuk 2016-02-10
Joshua Joshua is a statistical machine translation toolkit Incubator(Chris Mattmann) Paul Ramirez, Lewis John McGibbney, Chris Mattmann, Tom Barber, Henri Yandell 2016-02-13
Edgent Edgent is a stream processing programming model and lightweight runtime to execute analytics at devices on the edge or at the gateway. Incubator(Katherine Marsden) Daniel Debrunner, Luciano Resende, Katherine Marsden, Justin Mclean 2016-02-29
Mnemonic Mnemonic is a Java based non-volatile memory library for in-place structured data processing and computing. Incubator(Patrick Hunt) Patrick Hunt, Andrew Purtell, James Taylor, Henry Saputra 2016-03-03
Tephra Tephra is a system for providing globally consistent transactions on top of Apache HBase and other storage engines. Incubator(James Taylor) Alan Gates, Andrew Purtell, James Taylor, Lars Hofhansl 2016-03-07
Gearpump Gearpump is a reactive real-time streaming engine based on the micro-service Actor model. Incubator(Andrew Purtell) Andrew Purtell, Jarek Jarcec Cecho, Reynold Xin, Todd Lipcon, Xuefu Zhang 2016-03-08
Omid Omid is a flexible, reliable, high performant and scalable ACID transactional framework that allows client applications to execute transactions on top of MVCC key/value-based NoSQL datastores (currently Apache HBase) providing Snapshot Isolation guarantees on the accessed data. Incubator(Daniel Dai) Alan Gates, Lars Hofhansl, Flavio P. Junqueira, Thejas Nair, James Taylor 2016-03-28
Quickstep Quickstep is a high-performance database engine. Incubator(Roman Shaposhnik) Konstantin Boudnik, Julian Hyde, Roman Shaposhnik 2016-03-29
Airflow Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines. Incubator(Chris Riccomini) Chris Nauroth, Hitesh Shah, Jakob Homan 2016-03-31
Gossip Gossip is an implementation of the Gossip Protocol. Incubator(P. Taylor Goetz) P. Taylor Goetz, Josh Elser, Drew Farris 2016-04-28
Fluo Fluo is a distributed system for incrementally processing large data sets stored in Accumulo. Incubator(Billie Rinaldi) Billie Rinaldi, Drew Farris, Josh Elser 2016-05-17
PredictionIO PredictionIO is an open source Machine Learning Server built on top of state-of-the-art open source stack, that enables developers to manage and deploy production-ready predictive services for various kinds of machine learning tasks. Incubator(Andrew Purtell) Andrew Purtell, James Taylor, Lars Hofhansl, Luciano Resende, Xiangrui Meng, Suneel Marthi 2016-05-26
Pony Mail Pony Mail is a mail-archiving, archive viewing, and interaction service, that can be integrated with many email platforms. Incubator(Suneel Marthi) Andrew Bayer, John D. Ament 2016-05-27
CarbonData Apache CarbonData is a new Apache Hadoop native file format for faster interactive query using advanced columnar storage, index, compression and encoding techniques to improve computing efficiency, in turn it will help speedup queries an order of magnitude faster over PetaBytes of data. Incubator(Jean-Baptiste Onofre) Henry Saputra, Jean-Baptiste Onofre, Uma Maheswara Rao G 2016-06-02
Pirk Pirk is a framework for scalable Private Information Retrieval (PIR). Incubator(Billie Rinaldi) Billie Rinaldi, Joe Witt, Josh Elser, Suneel Marthi, Tim Ellison 2016-06-17
DistributedLog DistributedLog is a high-performance replicated log service. It offers durability, replication and strong consistency, which provides a fundamental building block for building reliable distributed systems. Incubator(Flavio Junqueira) Flavio Junqueira, Chris Nauroth, Henry Saputra 2016-06-24
Juneau Apache Juneau is a toolkit for marshalling POJOs to a wide variety of content types using a common framework, and for creating sophisticated self-documenting REST interfaces and microservices using VERY little code. Incubator(John D. Ament) Craig Russell, Jochen Wiedmann, John D. Ament 2016-06-24
Traffic Control Traffic Control allows you to build a large scale content delivery network using open source. Incubator Phil Sorber, Eric Covener, Daniel Gruno, J. Aaron Farr 2016-07-12
SensSoft SensSoft is a software tool usability testing platform Incubator(Lewis John McGibbney) Paul Ramirez, Lewis John McGibbney, Chris Mattmann 2016-07-13
AriaTosca ARIA TOSCA project offers an easily consumable Software Development Kit(SDK) and a Command Line Interface(CLI) to implement TOSCA(Topology and Orchestration Specification of Cloud Applications) based solutions. Incubator(Suneel Marthi) Suneel Marthi, John D. Ament, Jakob Homan 2016-08-27
Annotator Annotator provides annotation enabling code for browsers, servers, and humans. Incubator(Daniel Gruno) Nick Kew, Brian McCallister, Daniel Gruno, Jim Jagielski 2016-08-30
Hivemall Hivemall is a library for machine learning implemented as Hive UDFs/UDAFs/UDTFs. Incubator(Roman Shaposhnik) Reynold Xin, Markus Weimer, Xiangrui Meng, Daniel Dai 2016-09-13
Spot Apache Spot is a platform for network telemetry built on an open data model and Apache Hadoop. Incubator(Doug Cutting) Jarek Jarcec Cecho, Brock Noland, Andrei Savu, Uma Maheswara Rao G 2016-09-23
NetBeans NetBeans is a development environment, tooling platform and application framework. Incubator(Bertrand Delacretaz) Ate Douma, Bertrand Delacretaz, Emmanuel Lecharny, Daniel Gruno, Mark Struberg 2016-10-01
RocketMQ RocketMQ is a fast, low latency, reliable, scalable, distributed, easy to use message-oriented middleware, especially for processing large amounts of streaming data. Incubator(Bruce Snyder) Bruce Snyder, Brian McCallister, Willem Ning Jiang, Luke Han, Justin McLean 2016-11-21
OpenWhisk distributed Serverless computing platform Incubator(Sam Ruby) Felix Meschberger, Isabel Drost-Fromm, Sergio Fernández 2016-11-23
Weex Weex is a framework for building Mobile cross-platform high performance UI. Incubator(Edward J. Yoon) Luke Han, Willem Jiang, Stephan Ewen, Niclas Hedhman 2016-11-30
Griffin Griffin is a open source Data Quality solution for distributed data systems at any scale in both streaming or batch data context Incubator(Henry Saputra) Kasper Sørensen, Uma Maheswara Rao Gangumalla, Luciano Resende 2016-12-05
Ratis Ratis is a java implementation for RAFT consensus protocol Incubator(Jitendra Pandey) Chris Nauroth, Devaraj Das, Jakob Homan, Uma Maheswara Rao G 2017-01-03
MXNet A Flexible and Efficient Library for Deep Learning Incubator(Henri Yandell) Sebastian Schelter, Suneel Marthi, Markus Weimer, Henri Yandell 2017-01-23
Gobblin Gobblin is a distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems. Incubator(Olivier Lamy) Jean-Baptiste Onofre, Jim Jagielski 2017-02-23