Feature Store

Store, serve, and discover features

Share and re-use features across models

Unify production and training data

neobank

resolvers.py

100%

Jupyter Notebook

In []:

df = client.offline_query(
  input=labels[[User.uid]],
  input_times=[datetime.now()] * len(labels),
  output=[
    User.name,
    User.credit_report,
    User.account.bal,
  ]
)

Out[]:

# xgboost train / predict
xgb = XGBClassifier(
    eval_metric="logloss",
    use_label_encoder=False
)

neobank

resolvers.py

100%

Jupyter Notebook

In []:

df = client.offline_query(
  input=labels[[User.uid]],
  input_times=[datetime.now()] * len(labels),
  output=[
    User.name,
    User.credit_report,
    User.account.bal,
  ]
)

Out[]:

# xgboost train / predict
xgb = XGBClassifier(
    eval_metric="logloss",
    use_label_encoder=False
)

One source of truth

Traditionally, teams write one pipeline to fetch data for training and another to fetch that same data for production. To make matters worse, teams often re-write pipelines for each model that relies on that data which can lead to significant bugs. Chalk solves this with a unified feature repository – a single pipeline to fetch data which is then accessible for all training and production models.

Hot-reload your pipelines

Developing new features or iterating on existing ones can be a slow and painful process. You edit your pipeline or create a new one, then wait days, weeks, or months for results to roll in from production. With Chalk, preview feature updates in real time, so you can iterate quickly and develop better pipelines faster. Chalk even backfills features using historical data so you can immediately start using it to train models.

Time-Travel

When computing historical training data, it’s common to accidentally include data “from the future.” However, models trained on faulty data sets fail to perform in production. Chalk's time-travel functionality makes it easy to compute historical feature sets that accurately show how your features would have appeared in the past.

Low latency serving

Spark and Snowflake can’t serve production traffic. Chalk's feature serving platform scales horizontally out-of-the-box and can handle high-volume production workloads at low-latency. Let us handle the complexity and operational overhead of scaling feature serving.

Feature Discovery

Stop reinventing the wheel for each new model. Chalk’s feature catalog makes it easy to discover and organize the features that your team develops. Version and track feature definitions and metadata in code stored in git for easy review and change control.

Audit

Ever need to debug, complete an audit, or figure out why your model made the prediction it made? Chalk automatically tracks detailed metadata about the provenance of each feature computed and served to your models. This makes it easy to debug issues, justify decisions to regulators, and perform exploratory data analysis.