A few weeks ago, Chalk won Best Technology at The GenAI Collective's Demo Night!
In his demo, Elliot showed how Chalk achieves the best of both worlds by combining structured data with LLM analysis in a unified feature store. He used Chalk to ingest financial transactions from a traditional database and used Gemini to analyze unstructured memo lines to retrieve the merchant category, clean the memo, and find other meaningful information.
Here's an abridged preview of how we implemented a feature pipeline for analyzing transaction data:
import google.generativeai as genai
import json
from chalk import feature
from chalk.features import features
@features
class Transaction:
id: int
# ... See full code for more features
# Gemini features
# Store completion as a separate feature so that we can iterate on
# response parsing without retrying Gemini repeatedly
completion: str = feature(max_staleness="infinity", default=default_completion)
clean_memo: str
category: str = "unknown"
is_nsf: bool = False # NSF: insufficient funds
is_ach: bool = False # ACH: direct deposit
model = genai.GenerativeModel(model_name="models/gemini-1.5-flash-latest")
@online
async def get_transaction_classification(memo: Transaction.memo) -> Transaction.completion:
return model.generate_content(
textwrap.dedent(
f"""\
Please return JSON for classifying a financial transaction
using the following schema.
{{"category": str, "is_nsf": bool, "clean_memo": str, "is_ach": bool}}
All fields are required. Return EXACTLY one JSON object with NO other text.
Memo: {memo}"""
),
generation_config={"response_mime_type": "application/json"},
).candidates[0].content.parts[0].text
@online
def get_structured_outputs(completion: Transaction.completion) -> Features[
Transaction.category,
Transaction.is_nsf,
Transaction.is_ach,
Transaction.clean_memo,
]:
"""Given the completion, we parse it into a structured output."""
body = json.loads(completion)
return Transaction(
category=body["category"],
is_nsf=body["is_nsf"],
is_ach=body["is_ach"],
clean_memo=body["clean_memo"],
)
Writing a feature pipeline with Chalk is as easy as writing Python. Chalk makes prompt engineering simple because Chalk handles passing structured data to your LLM prompts and caches responses to prompts indefinitely to reduce API costs.
You can find Elliot's full demo on GitHub.
It was a great night all around and we always love hanging with other members of the ML/AI community in San Francisco!