by Amy Nguyen

Chalk Summer Product Update

July 19, 2024

I joined Chalk 2 weeks ago as our first developer advocate! As one of my first orders of business, I found out what all of my new teammates have been working on over the past few months and collected everything here in one place. I'll be writing product updates for the team going forward, so please watch this space and keep up with changes as they're announced on our changelog.

What's new

Chalk gRPC

We shipped a gRPC engine for Chalk that improved performance by at least 2x through improved data serialization, efficient data transfer, and a migration to our C++ server. You can now use ChalkGRPCClient to run queries with the gRPC engine and fetch enriched metadata about your feature sets and resolvers through the get_graph method.

Use SQL in offline queries

With ChalkPy v2.38.8 or later, you can pass spine sql queries to offline queries instead of passing input data directly. Chalk will run your query on your offline data store and the resulting rows will be used as input to the offline query. Chalk will compute an efficient query plan to retrieve your SQL data without requiring you to load the data and transform it into input before sending it back to Chalk. Here's an example:

output = chalk_client.offline_query(
    spine_sql_query=f"""
        SELECT
            t.txn_time AS ts,
            t.seller_id AS "seller.id",
            t.buyer_id as "buyer.id",
            t.amount as "txn.amount",
            t.payment_type as "txn.payment_type"
        FROM transactions AS t
        WHERE t.update_at >= {now - timedelta(days=30)}
    """,
    outputs=[
        Seller.id,
        Buyer.id,
        Buyer.account_created_date,
        Txn.payment_type,
        # Computed in the seller namespace from the 'seller.id' spine feature.
        Seller.recent_transactions_volume,
        # Computed in the buyer namespace from the 'buyer.id' spine feature.
        Buyer.total_spent_last_30d,
        # Passed through from the SQL query
        Txn.amount,
    ],
)

Role-based access control for data sources and features

We expanded the functionality of our service tokens to enable role-based access control (RBAC) at both the data source and feature level. On the datasource level, you can now restrict a token to only access data sources with matching tags to resolve features. On the feature level, you can restrict a token’s access to tagged features either by blocking the token from returning tagged features in any queries but allowing the feature values to be used in the computation of other features, or by blocking the token from accessing tagged features entirely.

Track incremental resolver status via CLI

You can now get the current progress state for your incremental resolvers by using the incremental status command:

chalk incremental status  --scheduled_query get_some_data__daily
✓ Fetched resolver progress state
Resolver:                 N/A
Query:                    run_this_query_daily
Environment:              chalk12345
Max Ingested Timestamp:   2024-07-01T16:01:46+00:00
Last Execution Timestamp: 2024-07-01T00:01:27.421873+00:00

Chalk deployment tags

You can now add tags to your deployments. Tags must be unique to each of your environments. If you add an already existing tag to a new deployment, Chalk will remove the tag from your old deployment. Use the --deployment-tag flag:

chalk apply --deployment-tag=latest --deployment-tag=v1.0.4

Heartbeat monitoring for long–running queries

We now have heartbeating to poll the status of long-running queries and resolvers, which will now mark any hanging runs that are no longer detected as "failed" after a certain period of time.

Miscellaneous improvements

  • We have integrations for Trino and Spanner as data sources.
  • Search and filter features in the Chalk feature catalog by their tags and owners.
  • Windowed resolvers have expanded to allow for hourly cadences.
  • SQL file resolvers now check your target return columns for typos, and suggest the closest named features.
  • Failed annotation parsing raises a type error with a more helpful error message.
  • SQL resolvers have improved error reporting for failures related to type conversion (e.g., if your resolver selects an int column, but the feature’s type is string).