Count: 57 PPMCs (history)

Mean age: 270 days

Median age: 438 days

Currently in incubation, sorted by age

Project Description Sponsor (Champion) Mentors Start Date
Wave A wave is a hosted, live, concurrent data structure for rich communication. It can be used like email, chat, or a document. Incubator Christian Grobmeier, Upayavira 2010-12-04
ODF Toolkit Java modules that allow programmatic creation, scanning and manipulation of OpenDocument Format (ISO/IEC 26300 == ODF) documents Incubator Sam Ruby, Nick Burch, Yegor Kozlov 2011-08-01
Blur Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Incubator(Patrick Hunt) Doug Cutting, Patrick Hunt, Tim Williams 2012-07-24
Streams Apache Streams is a lightweight server for ActivityStreams. Incubator(Matt Franklin) Matt Franklin, Ate Douma, Craig McClanahan 2012-11-20
MRQL MRQL is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, Spark, and Flink. Incubator(Edward J. Yoon) Alan Cabrera, Edward J. Yoon, Mohammad Nour El-Din 2013-03-13
BatchEE BatchEE projects aims to provide a JBatch implementation (aka JSR352) and a set of useful extensions for this specification. Incubator(FIXME) Jean-Baptiste Onofré, Olivier Lamy, Mark Struberg 2013-10-03
Sirona Monitoring Solution. Incubator(Olivier Lamy) Olivier Lamy, Henri Gomez, Jean-Baptiste Onofre, Tammo van Lessen, Mark Struberg 2013-10-15
Twill Twill is an abstraction over Apache Hadoop YARN that reduces the complexity of developing distributed applications, allowing developers to focus more on their business logic Incubator(Vinod K) Arun C Murthy, Tom White, Patrick Hunt, Andrei Savu 2013-11-14
log4cxx2 Logging for C++. N.B. This is a reboot of the Log4cxx podling which previously graduated. Logging Services(Christian Grobmeier) Ralph Goers 2013-12-09
DataFu DataFu provides a collection of Hadoop MapReduce jobs and functions in higher level languages based on it to perform data analysis. It provides functions for common statistics tasks (e.g. quantiles, sampling), PageRank, stream sessionization, and set and bag operations. DataFu also provides Hadoop jobs for incremental data processing in MapReduce. Incubator(Jakob Homan) Ashutosh Chauhan, Roman Shaposhnik, Ted Dunning 2014-01-05
Slider Slider is a collection of tools and technologies to package, deploy, and manage long running applications on Apache Hadoop YARN clusters. Incubator(Vinod K) Arun C Murthy, Devaraj Das, Jean-Baptiste Onofré, Mahadev Konar 2014-04-29
Ranger The Ranger project is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. Incubator(Owen O'Malley) Alan Gates, Daniel Gruno, Devaraj Das, Jakob Homan, Owen O'Malley 2014-07-24
Taverna Taverna is a domain-independent suite of tools used to design and execute data-driven workflows. Incubator(Andy Seaborne) Andy Seaborne, Daniel J Debrunner, Suresh Srinivas, Suresh Marru, Marlon Pierce 2014-10-20
HTrace HTrace is a tracing framework intended for use with distributed systems written in java. Incubator(Roman Shaposhnik) Jake Farrell, Todd Lipcon, Lewis John Mcgibbney, Andrew Purtell, Billie Rinaldi, Michael Stack 2014-11-11
Tamaya Tamaya is a highly flexible configuration solution based on an modular, extensible and injectable key/value based design, which should provide a minimal but extendible modern and functional API leveraging SE, ME and EE environments. Incubator(David Blevins) John D. Ament, Mark Struberg, Gerhard Petracek, David Blevins 2014-11-14
SAMOA SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms that run on top of distributed stream processing engines (DSPEs). It features a pluggable architecture that allows it to run on several DSPEs such as Apache Storm, Apache S4, and Apache Samza. Incubator(Daniel Dai) Alan Gates, Ashutosh Chauhan, Enis Soztutar, Ted Dunning 2014-12-15
Zeppelin A collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. Incubator(Roman Shaposhnik) Konstantin Boudnik, Henry Saputra, Roman Shaposhnik, Ted Dunning, Hyunsik Choi 2014-12-23
TinkerPop TinkerPop is a graph computing framework written in Java Incubator(David Nalley) Rich Bowen, Daniel Gruno, Hadrian Zbarcea, Matt Franklin, David Nalley 2015-01-16
OpenAz Tools and libraries for developing Attribute-based Access Control (ABAC) Systems in a variety of languages. Incubator(Paul Fremantle) Emmanuel Lecharny, Colm O Heigeartaigh, Hadrian Zbarcea 2015-01-20
Myriad Myriad enables co-existence of Apache Hadoop YARN and Apache Mesos together on the same cluster and allows dynamic resource allocations across both Hadoop and other applications running on the same physical data center infrastructure. Incubator(Benjamin Hindman) Benjamin Hindman, Danese Cooper, Ted Dunning, Luciano Resende 2015-03-01
CommonsRDF Commons RDF is a set of interfaces and classes for RDF 1.1 concepts and behaviours. The commons-rdf-api module defines interfaces and testing harness. The commons-rdf-simple module provides a basic reference implementation to exercise the test harness and clarify API contracts. Incubator(Lewis John McGibbney) John D Ament, Gary Gregory 2015-03-06
Singa Singa is a distributed deep learning platform. Incubator(Thejas Nair) Daniel Dai, Alan Gates, Ted Dunning, Thejas Nair 2015-03-17
Geode Geode is a data management platform that provides real-time, consistent access to data-intensive applications throughout widely distributed cloud architectures. Incubator(Roman Shaposhnik) Konstantin Boudnik, Chip Childers, Justin Erenkrantz, Jan Iversen, Chris Mattmann, William A. Rowe Jr., Roman Shaposhnik 2015-04-27
Atlas Apache Atlas is a scalable and extensible set of core foundational governance services that enables enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the complete enterprise data ecosystem Incubator(Jitendra Nath Pandey) Arun Murthy, Chris Douglas, Jakob Homan, Vinod Kumar Vavilapalli 2015-05-05
Climate Model Diagnostic Analyzer CMDA provides web services for multi-aspect physics-based and phenomenon-oriented climate model performance evaluation and diagnosis through the comprehensive and synergistic use of multiple observational data, reanalysis data, and model outputs. Incubator(Chris Mattmann) James W. Carman, Chris Mattmann, Michael James Joyce, Kim Whitehall, Gregory D. Reddin 2015-05-08
Trafodion Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Hadoop. Incubator(Michael Stack) Andrew Purtell, Devaraj Das, Enis Söztutar, Lars Hofhansl, Michael Stack 2015-05-24
FreeMarker FreeMarker is a template engine, i.e. a generic tool to generate text output based on templates. FreeMarker is implemented in Java as a class library for programmers. Incubator(Jacopo Cappellato) Jacopo Cappellato, Jean-Frederic Clere, David E. Jones, Ralph Goers, Sergio Fernández 2015-07-01
HAWQ HAWQ is an advanced enterprise SQL on Hadoop analytic engine built around a robust and high-performance massively-parallel processing (MPP) SQL framework evolved from Pivotal Greenplum Database. Incubator(Roman Shaposhnik) Alan Gates, Konstantin Boudnik, Justin Erenkrantz, Thejas Nair, Roman Shaposhnik 2015-09-04
HORN HORN is a neuron-centric programming APIs and execution framework for large-scale deep learning, built on top of Apache Hama. Incubator(Edward J. Yoon) Luciano Resende, Robin Anil, Edward J. Yoon 2015-09-04
MADlib Big Data Machine Learning in SQL for Data Scientists. Incubator(Roman Shaposhnik) Konstantin Boudnik, Ted Dunning, Roman Shaposhnik 2015-09-15
Rya Rya (pronounced "ree-uh" /rēə/) is a cloud-based RDF triple store that supports SPARQL queries. Rya is a scalable RDF data management system built on top of Accumulo. Rya uses novel storage methods, indexing schemes, and query processing techniques that scale to billions of triples across multiple nodes. Rya provides fast and easy access to the data through SPARQL, a conventional query mechanism for RDF data. Incubator(Adam Fuchs) Josh Elser, Edward J. Yoon, Sean Busbey, Venkatesh Seetharam 2015-09-18
Unomi Unomi is a reference implementation of the OASIS Context Server specification currently being worked on by the OASIS Context Server Technical Committee. It provides a high-performance user profile and event tracking server. Incubator(Jean-Baptiste Onofre) Bertrand Delacretaz, Chris Mattmann 2015-10-05
Mynewt Mynewt is a real-time operating system for constrained embedded systems like wearables, lightbulbs, locks and doorbells. It works on a variety of 32-bit MCUs (microcontrollers), including ARM Cortex-M and MIPS architectures. Incubator(Marvin Humphrey) Sterling Hughes, Jim Jagielski, Justin Mclean, Greg Stein, P. Taylor Goetz 2015-10-20
Eagle Eagle is a Monitoring solution for Hadoop to instantly identify access to sensitive data, recognize attacks, malicious activities and take actions in real time. Incubator(Henry Saputra) Owen O'Malley, Henry Saputra, Julian Hyde, P. Taylor Goetz, Amareshwari Sriramdasu 2015-10-26
SystemML SystemML provides declarative large-scale machine learning (ML) that aims at flexible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations, to distributed computations such as Apache Hadoop MapReduce and Apache Spark. Incubator(Luciano Resende) Luciano Resende, Patrick Wendell, Reynold Xin, Rich Bowen 2015-11-02
S2Graph S2Graph is a distributed and scalable OLTP graph database built on Apache HBase to support fast traversal of extremely large graphs. Incubator(Hyunsik Choi) Andrew Purtell, Seetharam Venkatesh, Sergio Fernández 2015-11-29
Toree Toree provides applications with a mechanism to interactively and remotely access Apache Spark. Incubator(Sam Ruby) Luciano Resende, Reynold Xin, Hitesh Shah, Julien Le Dem 2015-12-02
Impala Impala is a high-performance C++ and Java SQL query engine for data stored in Apache Hadoop-based clusters. Incubator(Tom White) Tom White, Todd Lipcon, Carl Steinbach, Brock Noland 2015-12-03
Kudu Kudu is a distributed columnar storage engine built for the Apache Hadoop ecosystem. Incubator(Todd Lipcon) Jake Farrell, Brock Noland, Michael Stack, Jarek Jarcec Cecho, Chris Mattmann, Julien Le Dem, Carl Steinbach 2015-12-03
Metron Metron is a project dedicated to providing an extensible and scalable advanced network security analytics tool. It has strong foundations in the Apache Hadoop ecosystem. Incubator(Owen O'Malley) Billie Rinaldi, Chris Mattmann, Owen O'Malley, P. Taylor Goetz, Vinod Kumar Vavilapalli 2015-12-06
Fineract Fineract is an open source system for core banking as a platform. Incubator(Ross Gardler) Ross Gardler, Greg Stein, Roman Shaposhnik 2015-12-15
Milagro Distributed Cryptography; M-Pin protocol for Identity and Trust Incubator(Nick Kew) Sterling Hughes, Jan Willem Janssen, Nick Kew 2015-12-21
iota Open source system that enables the orchestration of IoT devices. Incubator(Hadrian Zbarcea) Daniel Gruno, Sterling Hughes, Justin Mclean, Hadrian Zbarcea 2016-01-20
Beam Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also brings DSL in different languages, allowing users to easily implement their data integration processes. Incubator(Jean-Baptiste Onofre) Jean-Baptiste Onofre, Jim Jagielski, Venkatesh Seetharam, Bertrand Delacretaz, Ted Dunning 2016-02-01
Guacamole Guacamole is an enterprise-grade, protocol-agnostic, remote desktop gateway. Combined with cloud hosting, Guacamole provides an excellent alternative to traditional desktops. Guacamole aims to make cloud-hosted desktop access preferable to traditional, local access. Incubator(Jean-Baptiste Onofre) Jean-Baptiste Onofre, Daniel Gruno, Olivier Lamy, Jim Jagielski, Greg Trasuk 2016-02-10
Joshua Joshua is a statistical machine translation toolkit Incubator(Chris Mattmann) Paul Ramirez, Lewis John McGibbney, Chris Mattmann, Tom Barber, Henri Yandell 2016-02-13
Quarks Quarks is a stream processing programming model and lightweight runtime to execute analytics at devices on the edge or at the gateway. Incubator(Katherine Marsden) Daniel Debrunner, Luciano Resende, Katherine Marsden, Justin Mclean 2016-02-29
Mnemonic Mnemonic is a Java based non-volatile memory library for in-place structured data processing and computing. Incubator(Patrick Hunt) Patrick Hunt, Andrew Purtell, James Taylor, Henry Saputra 2016-03-03
Tephra Tephra is a system for providing globally consistent transactions on top of Apache HBase and other storage engines. Incubator(James Taylor) Alan Gates, Andrew Purtell, Henry Saputra, James Taylor, Lars Hofhansl 2016-03-07
Gearpump Gearpump is a reactive real-time streaming engine based on the micro-service Actor model. Incubator(Andrew Purtell) Andrew Purtell, Jarek Jarcec Cecho, Reynold Xin, Todd Lipcon, Xuefu Zhang 2016-03-08
Omid Omid is a flexible, reliable, high performant and scalable ACID transactional framework that allows client applications to execute transactions on top of MVCC key/value-based NoSQL datastores (currently Apache HBase) providing Snapshot Isolation guarantees on the accessed data. Incubator(Daniel Dai) Alan Gates, Lars Hofhansl, Flavio P. Junqueira, Thejas Nair, James Taylor 2016-03-28
Quickstep Quickstep is a high-performance database engine. Incubator(Roman Shaposhnik) Konstantin Boudnik, Julian Hyde, Roman Shaposhnik 2016-03-29
Airflow Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines. Incubator(Chris Riccomini) Chris Nauroth, Hitesh Shah, Jakob Homan 2016-03-31
Gossip Gossip is an implementation of the Gossip Protocol. Incubator(P. Taylor Goetz) P. Taylor Goetz, Josh Elser, Sean Busbey 2016-04-28
Fluo Fluo is a distributed system for incrementally processing large data sets stored in Accumulo. Incubator(Billie Rinaldi) Billie Rinaldi, Drew Farris, Josh Elser 2016-05-17
PredictionIO PredictionIO is an open source Machine Learning Server built on top of state-of-the-art open source stack, that enables developers to manage and deploy production-ready predictive services for various kinds of machine learning tasks. Incubator(Andrew Purtell) Andrew Purtell, James Taylor, Lars Hofhansl, Luciano Resende, Xiangrui Meng, Suneel Marthi 2016-05-26
Pony Mail Pony Mail is a mail-archiving, archive viewing, and interaction service, that can be integrated with many email platforms. Incubator(Suneel Marthi) Andrew Bayer, John D. Ament 2016-05-27