Project |
Aliases |
Description |
Sponsor (Champion) |
Mentors |
Start Date |
Toree |
|
Toree provides applications with a mechanism to interactively and remotely
access Apache Spark. |
Incubator
(Sam Ruby)
|
Luciano Resende, Ryan Blue, Weiwei Yang
|
2015-12-02 |
Pony Mail |
|
Pony Mail is a mail-archiving, archive viewing, and interaction service, that
can be integrated with many email platforms. |
Incubator
(Suneel Marthi)
|
John D. Ament, Dave Fisher
|
2016-05-27 |
Livy |
|
Livy is web service that exposes a REST interface for managing long running
Apache Spark contexts in your cluster. With Livy, new applications can be built on top
of Apache Spark that require fine grained interaction with many Spark contexts. |
Incubator
(Sean Busbey)
|
Bikas Saha, Luciano Resende, Jean-Baptiste Onofré, Madhawa Kasun Gunasekara
|
2017-06-05 |
Pegasus |
|
Pegasus is a distributed key-value storage system which is designed to be
simple, horizontally scalable, strongly consistent and high-performance. |
Incubator
(Von Gosling)
|
Duo zhang, Liang Chen, Von Gosling, Liu Xun
|
2020-06-28 |
Wayang |
|
Wayang is a cross-platform data processing system that aims at decoupling the
business logic of data analytics applications from concrete data processing platforms,
such as Apache Flink or Apache Spark. Hence, it tames the complexity that arises from
the "Cambrian explosion" of novel data processing platforms that we currently witness. |
Incubator
(Christofer Dutz)
|
Christofer Dutz, Lars George, Bernd Fondermann, Jean-Baptiste Onofré
|
2020-12-16 |
HugeGraph |
|
A large-scale and easy-to-use graph database |
Incubator
(Willem Ning Jiang)
|
Lidong Dai, Trista Pan, Xiangdong Huang, Yu Li, Willem Ning Jiang
|
2022-01-23 |
Baremaps |
|
Apache Baremaps is a toolkit and a set of infrastructure components for
creating, publishing, and operating online maps. |
Incubator
(Bertrand Delacretaz)
|
Bertrand Delacretaz, Martin Desruisseaux, Julian Hyde, Calvin Kirs, George Percivall, Martin Desruisseaux
|
2022-10-10 |
KIE |
|
KIE (Knowledge is Everything) is a community of solutions and supporting
tooling for knowledge engineering and process automation, focusing on events, rules, and
workflows. |
Incubator
(Brian Proffitt)
|
Brian Proffitt, Claus Ibsen, Andrea Cosentino
|
2023-01-13 |
ResilientDB |
|
ResilientDB is a distributed blockchain framework that is open-source,
lightweight, modular, and highly performant. |
Incubator
(Atri Sharma)
|
Junping Du, Kevin Ratnasekera, Jean-Baptiste Onofré
|
2023-10-21 |
Seata |
|
Seata(Simple Extensible Autonomous Transaction Architecture)is an easy-to-use
and high-performance distributed transaction solution, used to solve the data
consistency problem. |
Incubator
(Sheng Wu)
|
Sheng Wu, Justin Mclean, Huxing Zhang, Heng Du, Xin Wang
|
2023-10-29 |
HoraeDB |
|
HoraeDB is a high-performance, distributed, cloud native time-series database. |
Incubator
(tison)
|
tison, Shaofeng Shi, Gang Li, Von Gosling
|
2023-12-11 |
Gluten |
|
Gluten is a middle layer responsible for offloading JVM-based SQL engines'
execution to native engines. |
Incubator
(Shaofeng Shi)
|
Yu Li, Wenli Zhang, Kent Yao, Shaofeng Shi, Felix Cheung
|
2024-01-11 |
XTable |
|
XTable is an omni-directional converter for table formats that facilitates
interoperability across data processing systems and query engines. |
Incubator
(Jesús Camacho Rodríguez)
|
Jesús Camacho Rodríguez, Stamatis Zampetakis, Jean-Baptiste Onofré
|
2024-02-11 |
Amoro |
|
Amoro is a Lakehouse management system built on open data lake formats like
Apache Iceberg and Apache Paimon. |
Incubator
|
Justn Mclean, Zhongyi Tan, Yu Li, Xinyu Zhou, Kent Yao
|
2024-03-11 |
GraphAr |
|
GraphAr is an open-source and language-independent data file format designed
for efficient graph data storage and retrieval. |
Incubator
(Yu Li)
|
Calvin Kirs, tison, Xiaoqiao He, Yu Li
|
2024-03-25 |
OpenServerless |
|
OpenServerless is an open source, cloud-agnostic, serverless platform. It
offers a complete environment for serverless applications development, based on
Kubernetes. With Apache OpenWhisk as its FaaS engine, it provides an unified developer
experience with a plethora of services (SQL or noSQL databases, key-value stores, object
storage, LLMs services, function schedulers) managed by the platform's core: the
operator, along with tooling (the CLI) to simplify (and interact with) deployments,
integrated ide and starter application and optimized runtimes integrated with the
staters. |
Incubator
(JB Onofré)
|
Bertrand Delacretaz, Enrico Olivelli, François Papon, JB Onofré, PJ Fanning
|
2024-06-17 |
OzHera |
|
OzHera is an application observation platform (APM) in the era of cloud native,
with the application as its core, integrating capabilities such as metric monitoring,
trace tracking, logging, and alerting |
Incubator
(Duo Zhang)
|
Yu Xiao, Yu Li, Kevin Ratnasekera, Duo Zhang
|
2024-07-11 |
Polaris |
|
Polaris is a catalog for data lakes. It provides new levels of choice,
flexibility and control over data, with full enterprise security and Apache Iceberg
interoperability across a multitude of engines and infrastructure. |
Incubator
(JB Onofre)
|
Bertrand Delacretaz, Holden Karau, Kent Yao, Ryan Blue, JB Onofre
|
2024-08-09 |
Cloudberry |
|
Cloudberry Database, built on the latest PostgreSQL kernel, is one of the most
advanced and mature open-source MPP (Massively Parallel Processing) databases available. |
Incubator
(Roman Shaposhnik)
|
Roman Shaposhnik, Willem Ning Jiang, Kent Yao
|
2024-10-11 |
Otava |
hunter |
Otava, a command-line tool, written in Python, that detects statistically
significant changes in time-series data stored either in databases or CSV files. Otava
entered Incubation as Hunter |
Incubator
(Mick Semb Wever)
|
Dave Fisher, Enrico Olivelli, Lari Hotari, Mick Semb Wever
|
2024-11-27 |
Iggy |
|
Iggy is a high-performance, ultra-low latency and large-scale persistent
message streaming platform written in Rust. |
Incubator
(Yonik Seeley)
|
Hao Ding, Yonik Seeley, Zili Chen, Hulk Lin
|
2025-02-04 |
Hamilton |
|
Hamilton is a lightweight in-process framework to define, execute, and observe
directed acyclic graphs (DAGs) that express data transformations. In Hamilton, one can
express complex DAGs of transformations, e.g. from dataframe transformations (using
pandas, polars, PySpark), machine learning pipelines, through to regular software
engineering API request and LLM API based workflows. Observability hooks are built into
the framework. The Hamilton UI is a self-hostable service to capture observability
output from workflow runs. |
Incubator
(PJ Fanning)
|
Kevin Ratnasekera, Ayush Saxena, PJ Fanning
|
2025-04-12 |
Texera |
|
Texera is an open-source system to support collaborative data science, AI, and
ML using GUI-based workflows. Our vision is to develop a system to support cloud
platforms on which users can easily analyze data and use AI/ML techniques provided as
operators. Users with various backgrounds, irrespective of whether they know coding or
not, can collaborate on the same project to construct a pipeline. Experienced users can
use programming languages such as Python, R, Java, and Scala to implement customized
computation logic. The platform allows users to pause the execution of a workflow to
investigate the operator states, and resume the execution at a later time. The platform
can be used by a research community to publish valuable resources such as data sets,
workflows, and ML models to share their domain-specific knowledge and support
reproducibility of scientific research. The platform also allows users to elastically
request computing resources from public clouds for computationally-intensive tasks. |
Incubator
(PJ Fanning)
|
Cezar Andrei, Gordon King, PJ Fanning, Ian Maxon
|
2025-04-12 |
PouchDB |
|
PouchDB is an open-source JavaScript database inspired by Apache CouchDB that
is designed to run well within the browser. |
Incubator
(Jan Lehnardt)
|
PJ Fanning, Jean-Baptiste Onofré
|
2025-04-15 |
BifroMQ |
|
BifroMQ is a Java-based, high-performance, distributed MQTT broker with native
multi-tenancy support, designed for large-scale connections and message delivery. |
Incubator
(Willem Ning Jiang)
|
Christofer Dutz, Xiangdong Huang, Calvin Kirs, Penghui Li, Sheng Wu
|
2025-04-22 |
Burr |
|
Burr is a lightweight in-process python framework that standardizes the
expression and execution of state machines as action-driven graphs, while making graph
execution easily observable. It is particularly suited for AI agent workflows,
simulations, and other dynamic systems, and comes with a self-hostable observability UI
that integrates with OpenTelemetry. |
Incubator
(PJ Fanning)
|
Kevin Ratnasekera, Ayush Saxena, PJ Fanning
|
2025-05-24 |
Fluss |
|
Fluss is a streaming storage built for real-time analytics which can serve as
the real-time data layer for Lakehouse architectures. |
Incubator
(Yu Li)
|
Jean-Baptiste Onofré, Becket Qin, Yu Li, Jingsong Lee, Zili Chen
|
2025-06-04 |
GeaFlow |
|
GeaFlow is a distributed stream and batch integration graph compute engine. |
Incubator
(Willem Ning Jiang)
|
Willem Ning Jiang, Xin Wang, Jingsong Lee, Paul Klingelhuber, Justin Mclean
|
2025-06-06 |
Auron |
|
Auron accelerates Apache Spark SQL by providing an alternative vectorized
execution layer implemented in Rust, enabling native performance while maintaining full
Spark compatibility. |
Incubator
(Calvin Kirs)
|
Becket Qin, Calvin Kirs, Hao Ding, Nicholas Jiang
|
2025-08-05 |
Fesod |
|
Fesod is a high-performance and memory-efficient Java library for reading and
writing Excel files, designed to simplify development and ensure reliability. |
Incubator
(tison)
|
tison, Dave Fisher, Huajie Wang, PJ Fanning
|
2025-09-17 |