Branches
Instantly fork + experiment
Chalk Branches enable you to instantly fork feature engineering pipelines and experiment with new features. Define a new resolver in one notebook cell and use it to generate training sets in the next. You can seamlessly iterate on your definitions and visualize the impact of your changes, with deployment times measured in milliseconds.
In []:
client = ChalkClient(branch="testing")
In []:
@online
def name_match_score_v2(
name: User.full_name,
plaid_name: User.plaid.name_on_account,
) -> User.name_match_score:
return jaccard_similarity(name, plaid_name)
In []:
df = client.offline_query(
input=[User.uid],
input_times=[datetime.now()],
output=[User.name_match_score]
)
Out[]:
{ "name_match_score": 0.82 }
In []:
client = ChalkClient(branch="testing")
In []:
@online
def name_match_score_v2(
name: User.full_name,
plaid_name: User.plaid.name_on_account,
) -> User.name_match_score:
return jaccard_similarity(name, plaid_name)
In []:
df = client.offline_query(
input=[User.uid],
input_times=[datetime.now()],
output=[User.name_match_score]
)
Out[]:
{ "name_match_score": 0.82 }
Chalk makes it easy to generate historically-accurate training sets. Use online resolvers, define new ones, and integrate historical data all with a single SDK. Automatically run aggregations and feature derivations without polluting datasets with future knowledge.
Instead of building an alternate pipeline for training set generation, Chalk allows you to automatically re-use online serving infrastructure to compute historical datasets. Chalk transforms inference pipelines into efficient batch pipelines and automatically time-filters data to make historical accuracy easy.
In addition to online data sources, Chalk supports batch and streaming data sources. Chalk can automatically swap to data warehouses instead of online APIs or databases in order to source historical data points.
Chalk’s Python SDK works in the Jupyter notebook of your choice–local Jupyter, Google Colab, Deepnote, Hex, or Databricks–if it can execute Python, you can generate dataframes of training data.
Every dataset that you generate is automatically versioned and retained, which lets you seamlessly travel back in time to view the output of any past computation. Datasets can be named and shared so that teammates can use your work. Track which features are used in which datasets to help with discovery.