Realtime Computation
Compute realtime ML features
Chalk makes it easy to integrate data from any API or data source and compute features just-in-time. With Chalk, models operate on the freshest possible data and you don’t pay to fetch data you don’t need. Chalk automatically orchestrates compute, caching, scheduling, and streaming infrastructure, and executes your Python on a Rust-based runtime for maximum performance.
Instead of complex ETL jobs and streaming pipelines, make direct calls to your data sources. Chalk makes it fast and easy to call external APIs, query production databases and data warehouses, or fetch parquet files from S3. Chalk orchestrates pipeline stages automatically to achieve maximum possible parallelism, and makes it easy to pre-compute + cache features when necessary.
When you pre-fetch data, it can quickly become stale and drive unnecessary costs. Chalk’s just-in-time (JIT) feature pipelines fetch data on demand so you have fresh data for predictions and don’t waste money on data you don’t need. Chalk’s feature pipelines give you fine-grained control over freshness at a per-model level, so you can get the most accurate data possible for sensitive use-cases.
Tired of heavy frameworks and confusing abstractions? Use the libraries and patterns you know and love — SQLAlchemy , DataFrames, Pydantic, Pendulum. Use requirements.txt, or a custom Dockerfile to add any dependencies you need. Chalk automatically distributes and parallelizes Python methods so that you can write business logic without worrying about infrastructure.
Don’t repeat yourself. Chalks lets you declaratively specify dependencies between pipeline stages and easily compose resolvers. Just add an argument to your resolver to add a new data dependency, and let Chalk automatically re-orchestrate pipelines to supply your function with the data it needs. Elegantly handle errors, retries, and missing data with type annotations.
Application developers demand modern CI/CD workflows and preview environments. Why should data engineering be different? Chalk enables you to iterate on feature pipelines quickly with isolated preview deployments for pull requests and git branches. Run integration tests and manual QA against sandboxed deployments to verify that updates perform as expected, and share your in-progress work with other developers.
Sick of re-implementing production feature pipelines in Java, Go, or C++ when you’re worried about performance? Let Chalk do the hard work of making your code execute fast. Chalk executes simple Python in a Rust-based runtime that automatically parallelizes data fetches, multithreads dataframe operations, and pushes operations down to your underlying data sources. The result? Pure Python with native performance.
The hardest part of data infrastructure is debugging production issues with opaque tooling. Make your infrastructure transparent – Chalk automatically instruments all Python functions with logging, metrics, and data capture so that you can easily debug production issues and understand how your code is behaving in real-time. You can even export logs and metrics from Chalk to tools like Datadog for a full view of how your data infrastructure integrates with the rest of your system.