Branches

Instantly fork feature engineering pipelines

Experiment with new features

Define features in notebooks

neobank

resolvers.py

100%

Jupyter Notebook

In []:

client = ChalkClient(branch="testing")

In []:

@online
def name_match_score_v2(
  name: User.full_name,
  plaid_name: User.plaid.name_on_account,
) -> User.name_match_score:
  return jaccard_similarity(name, plaid_name)

In []:

df = client.offline_query(
  input=[User.uid],
  input_times=[datetime.now()],
  output=[User.name_match_score]
)

Out[]:

{ "name_match_score": 0.82 }

neobank

resolvers.py

100%

Jupyter Notebook

In []:

client = ChalkClient(branch="testing")

In []:

@online
def name_match_score_v2(
  name: User.full_name,
  plaid_name: User.plaid.name_on_account,
) -> User.name_match_score:
  return jaccard_similarity(name, plaid_name)

In []:

df = client.offline_query(
  input=[User.uid],
  input_times=[datetime.now()],
  output=[User.name_match_score]
)

Out[]:

{ "name_match_score": 0.82 }

Re-use your online pipelines

Instead of building an alternate pipeline for training set generation, Chalk allows you to automatically re-use online serving infrastructure to compute historical datasets. Chalk transforms inference pipelines into efficient batch pipelines and automatically time-filters data to make historical accuracy easy.

Batch and Streaming

In addition to online data sources, Chalk supports batch and streaming data sources. Chalk can automatically swap to data warehouses instead of online APIs or databases in order to source historical data points.

Notebook Support

Chalk’s Python SDK works in the Jupyter notebook of your choice–local Jupyter, Google Colab, Deepnote, Hex, or Databricks–if it can execute Python, you can generate dataframes of training data.

Dataset Governance

Every dataset that you generate is automatically versioned and retained, which lets you seamlessly travel back in time to view the output of any past computation. Datasets can be named and shared so that teammates can use your work. Track which features are used in which datasets to help with discovery.