Feature Store

Store, serve, and discover features
Share and re-use features across models
Unify production and training data
neobank
resolvers.py
100%
$
Jupyter Notebook
In []:
df = client.offline_query(
  input=labels[[User.uid]],
  input_times=[datetime.now()] * len(labels),
  output=[
    User.name,
    User.credit_report,
    User.account.bal,
  ]
)
Out[]:
# xgboost train / predict
xgb = XGBClassifier(
    eval_metric="logloss",
    use_label_encoder=False
)
One source of truth
Traditionally, teams write one pipeline to fetch data for training and another to fetch that same data for production. To make matters worse, teams often re-write pipelines for each model that relies on that data which can lead to significant bugs. Chalk solves this with a unified feature repository – a single pipeline to fetch data which is then accessible for all training and production models.
Hot-reload your pipelines
Developing new features or iterating on existing ones can be a slow and painful process. You edit your pipeline or create a new one, then wait days, weeks, or months for results to roll in from production. With Chalk, preview feature updates in real time, so you can iterate quickly and develop better pipelines faster. Chalk even backfills features using historical data so you can immediately start using it to train models.
Time-Travel
When computing historical training data, it’s common to accidentally include data “from the future.” However, models trained on faulty data sets fail to perform in production. Chalk's time-travel functionality makes it easy to compute historical feature sets that accurately show how your features would have appeared in the past.
Low latency serving
Spark and Snowflake can’t serve production traffic. Chalk's feature serving platform scales horizontally out-of-the-box and can handle high-volume production workloads at low-latency. Let us handle the complexity and operational overhead of scaling feature serving.
Feature Discovery
Stop reinventing the wheel for each new model. Chalk’s feature catalog makes it easy to discover and organize the features that your team develops. Version and track feature definitions and metadata in code stored in git for easy review and change control.
Audit
Ever need to debug, complete an audit, or figure out why your model made the prediction it made? Chalk automatically tracks detailed metadata about the provenance of each feature computed and served to your models. This makes it easy to debug issues, justify decisions to regulators, and perform exploratory data analysis.
Get Started with Code Examples
Unlock the power of real-time data pipelines.
Feature Time
Features
Access and override the time at which a feature should be recorded.
Resolvers are shared between all models.
Descriptions
Feature Discovery
Describe features at a feature class or feature level.
Has One
Features
Define a has-one relationship between feature classes.
Explore All