Branches

Instantly fork + experiment

Chalk Branches enable you to instantly fork feature engineering pipelines and experiment with new features. Define a new resolver in one notebook cell and use it to generate training sets in the next. You can seamlessly iterate on your definitions and visualize the impact of your changes, with deployment times measured in milliseconds.

neobank
resolvers.py
100%
$
Jupyter Notebook
In []:
client = ChalkClient(branch="testing")
In []:
@online
def name_match_score_v2(
  name: User.full_name,
  plaid_name: User.plaid.name_on_account,
) -> User.name_match_score:
  return jaccard_similarity(name, plaid_name)
In []:
df = client.offline_query(
  input=[User.uid],
  input_times=[datetime.now()],
  output=[User.name_match_score]
)
Out[]:
{ "name_match_score": 0.82 }
Compute point-in-time-correct training sets

Chalk makes it easy to generate historically-accurate training sets. Use online resolvers, define new ones, and integrate historical data all with a single SDK. Automatically run aggregations and feature derivations without polluting datasets with future knowledge.

Re-use your online pipelines

Instead of building an alternate pipeline for training set generation, Chalk allows you to automatically re-use online serving infrastructure to compute historical datasets. Chalk transforms inference pipelines into efficient batch pipelines and automatically time-filters data to make historical accuracy easy.

Batch and Streaming

In addition to online data sources, Chalk supports batch and streaming data sources. Chalk can automatically swap to data warehouses instead of online APIs or databases in order to source historical data points.

Notebook Support

Chalk’s Python SDK works in the Jupyter notebook of your choice–local Jupyter, Google Colab, Deepnote, Hex, or Databricks–if it can execute Python, you can generate dataframes of training data.

Dataset Governance

Every dataset that you generate is automatically versioned and retained, which lets you seamlessly travel back in time to view the output of any past computation. Datasets can be named and shared so that teammates can use your work. Track which features are used in which datasets to help with discovery.

Get Started with Code Examples
Unlock the power of real-time data pipelines
SEE ALL EXAMPLES
Feature Discovery
Tags & Owners
Assigning tags & owners to features.
GitHub Actions
Preview deployments
Set up preview deployments for all PRs.
Resolvers
Multi-Tenancy
Serve many end-customers with differentiated behavior.
Testing
Unit tests
Resolvers are just Python functions, so they are easy to unit test.