Datapipe

Real-time, incremental ETL library for Python with record-level dependency tracking

Currently in alfa. Join the waitlist for detailed documentation and usage cases.

Features

Incremental processing

Datapipe processes only new or modified data, significantly reducing computation time and resource usage.
Real-time

The library supports real-time data extraction, transformation, and loading.
Dependancy tracking

Automatic tracking of data dependencies and processing states.
Python integration

Seamlessly integrates with Python applications, offering a Pythonic way to describe data pipelines.

Datapipe lets you build complex pipelines and track data dependencies. We can trace how a specific data row evolves from the first to the last transformation.

Datapipe early adopters

Projects with complex ML pipelines with a human-in-the-loop component
ML projects that require real-time model retraining based on newly labeled data
Projects that require content moderation

GitHub

Get started with Datapipe.

Testimonials

Hear from our satisfied users.

Imagine you have an ML pipeline with detection and classification steps. As development progresses, you eventually have 100 detection and 100 classification models to apply to a dataset of 1000 pictures. If you run the entire processing for the first time, it would take a few days. However, if you add more images or models later, only the new data will be incrementally recalculated, eliminating the need to write any additional code.

Renat Shakirov, ML engineer

The integration of human input into ML models through Datapipe is extremely efficient. Annotators' input is seamlessly incorporated into the pipeline, enabling real-time retraining of all models. This allows annotators to observe immediate improvements, thus accelerating their work.

Arseniy Koryagin, ml team lead

The ability to run pipelines in different modes – full or partial calculation – is invaluable. For real-time predictions and quick integration of new data, you can run the pipeline with a subset of data, ensuring agility and efficiency.

Renat Shakirov, ML engineer

Datapipe allows room for errors. If you miss a case in a transformation and it crashes, you can quickly fix it and resume the calculation without having to start all over again.

Andrey Serov, lead engineer

When you delete something in Datapipe, the cleanup is automatic and thorough. It saves a lot of time and ensures data integrity.

Aleksander Filatov, data analyst

Contact Us

Have questions? We're here to help.

Andrey Tatarinov

Founder

Andrey is the founder of Epoch8.co – agency for AI/ML projects. Developed Datapipe to speed up client ML projects
Alexander Kozlov

ML developer

A pioneering early adopter of Datapipe. He has developed over 1000 pipeline nodes across numerous client projects
Andrey Serov

Lead Engineer

Initially Andrey worked for our first client to implement Datapipe. He fixed numerous issues in the Datapipe core and eventually joined our team.
Olga Tatarinova

Devrel

Had to learn Python coding to work on Datapipe.