Get Started with Code Examples
Unlock the power of real-time data pipelines
Setup
To run a Chalk resolver in airflow, you'll need to add
CHALK_CLIENT_ID
and CHALK_CLIENT_SECRET
andCHALK_ENVIRONMENT
environment variables to airflow.User-Seller Affinity
Create Chalk features for Users and Sellers and evaluate whether a user and seller have matching categories.
Custom Feature Types
Failing Sensors
Combine batch, caching, and DataFrames to create a powerful predictive maintenance pipeline.
Pre-Fetching
Keep the cache warm by scheduling a resolver to run more frequently than the max-staleness.
Downstream Scalars
Resolvers chain together through their required dependencies and declared outputs.
Define features
First, we define features representing our users and their card transactions:
Isolated Python Environment
To isolate the chalkpy dependency from your python environment, you can use airflow's
@task.virtualenv
decorator. Note, this is slightly slower since a python virtual environment is created for the task, but it might be a useful approach if you want to avoid conflicts with other python dependencies.Override Cache Values
Supply a feature value in the input to skip the cache and any resolver entirely.
OpenAI
Chalk also makes it easy to integrate LLMs like ChatGPT, into your resolvers. In the following example, we use Chat-GPT to answer questions about our Users.
Identity Verification
Make use of vendor APIs to verify identities, control costs with Chalk's platform.
Polling the Resolver Run
To wait for the resolver run to complete in airflow, you can use the
get_run_status
Chalk method to poll the status of the resolver run. One way to accomplish this is by using Airflow's Sensor framework.Define LLM resolvers
In the rest of this readme, we will focus on the LLM-dependent features:
completion
, clean_memo
, category
, is_nsf
, and is_ach
. (You can check out the full code and our documentation to see how we resolve the rest of the features using SQL file resolvers and windowed aggregations.)Track Interactions
Identify the number of interactions that have occurred between users and sellers.