Hands-On SQLMesh Using the VS Code Extension

May 31, 2025

Introduction

SQLMesh is a next-generation data transformation framework that offers virtual data environments, column-level lineage, automatic data contracts, and built-in CI/CD. This tutorial will walk you through using SQLMesh with the VS Code extension and the sushi example project.

What You’ll Learn

How to set up SQLMesh with VS Code
Understanding the sushi example project
Using the VS Code extension features
Creating and running SQLMesh plans
Deploying to production

Prerequisites

Before starting, ensure you have:

Python 3.8 or higher installed
VS Code installed
Git installed
Basic understanding of SQL and data modeling

System Requirements

macOS, Linux, or Windows
At least 4GB of RAM
2GB of free disk space

Setting Up the Environment

Step 1: Create a Project Directory

# Create a directory for the tutorial
mkdir sqlmesh-tutorial
cd sqlmesh-tutorial

Step 2: Set Up Python Virtual Environment

# Create virtual environment
python3 -m venv .venv

# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate# On Windows:
venv\Scripts\Activate

Step 3: Install SQLMesh with LSP Support

# Install SQLMesh with VS Code Language Server Protocol support
pip install 'sqlmesh[lsp]'

# Verify installation
sqlmesh --version

Installing the VS Code Extension

Step 1: Install from VS Code Marketplace

Open VS Code
Go to Extensions (Ctrl+Shift+X / Cmd+Shift+X)
Search for “SQLMesh”
Click “Install” on the official SQLMesh extension by TobikoData

Step 2: Configure Python Interpreter

Open Command Palette (Ctrl+Shift+P / Cmd+Shift+P)
Type “Python: Select Interpreter”
Choose the Python interpreter from your virtual environment (.venv)

Step 3: Verify Extension Installation

The SQLMesh icon should appear in the Activity Bar on the left side of VS Code.

Setting Up the Sushi Example Project

Step 1: Clone the SQLMesh Examples Repository

# Clone the examples repository
git clone https://github.com/TobikoData/sqlmesh-examples.git

# Navigate to the sushi moderate example
cd sqlmesh-examples/001_sushi/2_moderate

Step 2: Install Additional Dependencies

# Install Jupyter for notebooks (optional)
pip install jupyter

Step 3: The Project

Understanding the Sushi Project Structure

Project Overview

The sushi example simulates a fictional sushi restaurant’s data pipeline with:

Entities: Waiters, Customers, Orders
Transactions: Order details and items
Analytics: Revenue calculations and top performers

Directory Structure

Key Models

Source Models (from raw schema):
Transformation Models:

Working with the VS Code Extension

Feature 1: Interactive Lineage Visualization

Open any model file (e.g., models/top_waiters.sql)
Click on the “Lineage” tab in the bottom panel
Explore the interactive lineage graph: Click nodes to see model details, Trace dependencies upstream and downstream View column-level lineage

Feature 2: Intelligent Code Assistance

When editing models, the extension provides:

Auto-completion: Press Ctrl+Space for model names and SQLMesh keywords

Tooltips: Descriptions, linter, and render

Go-to-definition: Ctrl+Click on model names to navigate

Feature 3: Real-time Error Detection

The extension validates your models in real-time:

Syntax errors appear with red squiggles
Missing dependencies are highlighted
Invalid configurations are flagged

Feature 4: Format on Save

Enable automatic formatting:

Go to VS Code Settings
Search for “Format on Save”
Enable the option
SQLMesh models will auto-format when saved

Creating and Modifying Models

Understanding Model Structure

Let’s examine a model from the sushi project: top_waiters.sql

/*
  View of 10 waiters with highest revenue on most recent day of data.
*/
MODEL (
  name sushimoderate.top_waiters,
  kind VIEW,
  cron '@daily',
  grain waiter_id,
  audits (
    unique_values(columns=[waiter_id])
  ),
);

SELECT
  waiter_id::INT AS waiter_id, /* Waiter ID */
  name::TEXT AS waiter_name, /* Waiter name */
  revenue::DOUBLE AS revenue /* Waiter revenue on most recent day of data */
FROM sushimoderate.waiter_revenue_by_day as r
LEFT JOIN sushimoderate.waiter_names AS n
  ON r.waiter_id = n.id
WHERE
  ds = (
    SELECT
      MAX(ds)
    FROM sushimoderate.waiter_revenue_by_day
  )
ORDER BY
  revenue DESC
LIMIT 10;

Model Components

MODEL block: Defines metadata
SELECT statement: The transformation logic

Creating a New Model

Let’s create a new model for analyzing customer ordering patterns: customer_order_patterns.sql

-- models/customer_order_patterns.sql
MODEL (
    name sushimoderate.customer_order_patterns,
    kind VIEW,
    cron '@daily',
    grain 'customer_id',
    description 'Customer ordering patterns and preferences'
);

SELECT 
    o.customer_id::INT,
    COUNT(DISTINCT o.id)::INT AS total_orders,
    COUNT(DISTINCT oi.item_id)::INT AS unique_items_ordered,
    SUM(oi.quantity)::INT AS total_items_quantity,
    AVG(oi.quantity)::DOUBLE AS avg_items_per_order,
    MAX(o.ds)::DATE AS last_order_date
FROM sushimoderate.orders AS o
JOIN  sushimoderate.order_items AS oi
    ON o.id = oi.order_id
LEFT JOIN  sushimoderate.customers AS c
    ON o.customer_id = c.customer_id
GROUP BY o.customer_id;

Adding Audits

Add data quality checks to your model. Create an ‘audits’ folder inside the 2_moderate project, and add a mustomeer_moidel_audit.sql file with the following:

AUDIT (
  name does_not_exceed_threshold
);
SELECT * FROM @this_model
WHERE @column >= @threshold;

Running Plans and Deployments

Understanding Environments

SQLMesh uses virtual data environments:

dev: Development environment for testing changes
prod: Production environment with validated data

Step 1: Create a Development Plan

sqlmesh plan dev

Step 2: Apply the Plan

Type y to apply the changes. SQLMesh will:

Create the new model
Backfill historical data
Update the lineage

Step 3: Validate the Results

# Query the new model
sqlmesh fetchdf "SELECT * FROM sushimoderate__dev.customer_order_patterns LIMIT 5"

# Run tests
sqlmesh test

Step 4: Deploy to Production

# Create a production plan
sqlmesh plan

# Review changes
# Apply when ready

Advanced Features

1. Python Models

Create models using Python for complex logic:

# models/customer_segments.py
from sqlmesh import model
import pandas as pd
import typing as t # It's good practice to import typing for type hints

@model(
    "sushimoderate.customer_segments",
    kind="FULL", # FULL kind means it rebuilds every time
    cron="@daily",
    columns={
        "customer_id": "INT",
        "segment": "TEXT",
        "revenue_percentile": "DOUBLE"
    }
)
def execute(context, start, end, execution_time, **kwargs: t.Any) -> pd.DataFrame: # Added type hint for kwargs and return
    # Read customer revenue data
    df = context.fetchdf(f"""
        SELECT
            customer_id,
            SUM(revenue) as total_revenue
        FROM sushimoderate.customer_revenue_by_day
        -- For FULL models, you might not need to filter by start/end here
        -- unless the upstream model's data is inherently time-partitioned
        -- and you only want to process a full snapshot based on its entirety.
        -- If sushimoderate.customer_revenue_by_day is incremental,
        -- this query will fetch ALL data from it, which is typical for a FULL downstream model.
        GROUP BY customer_id
    """)    if df.empty:
        # If the DataFrame is empty (e.g., no revenue data from upstream),
        # yield from an empty generator as per SQLMesh best practices.
        yield from ()
        return # Explicitly return after yielding from empty generator    # Calculate percentiles
    # Ensure total_revenue is not all NaNs or empty before ranking
    if df['total_revenue'].notna().any():
        df['revenue_percentile'] = df['total_revenue'].rank(pct=True)
    else:
        # Handle case where total_revenue might be all NaNs after sum if source revenue is all NaNs
        df['revenue_percentile'] = pd.NA # Or 0.0, or handle as appropriate    # Assign segments
    # Ensure revenue_percentile exists and is not all NA before pd.cut
    if 'revenue_percentile' in df and df['revenue_percentile'].notna().any():
        df['segment'] = pd.cut(
            df['revenue_percentile'],
            bins=[0, 0.25, 0.75, 1.0],
            labels=['Low', 'Medium', 'High'],
            include_lowest=True # Important for including the 0th percentile
        )
    else:
        df['segment'] = pd.NA # Or a default segment    # It's good practice to select and reorder columns explicitly
    # to match the 'columns' definition in the @model decorator.
    # Also, ensure all defined columns are present.
    final_df = df[['customer_id', 'segment', 'revenue_percentile']]    yield final_df

2. Incremental Models

Handle large datasets efficiently. Create a new SQL model file ‘daily_order_summary.sql’:

MODEL (
  name sushimoderate.daily_order_summary,
  kind INCREMENTAL_BY_TIME_RANGE (
    time_column ds -- Specifies 'ds' as the time column for incremental processing
  ),
  start '2023-01-01',
  cron '@daily',
  grain (ds, waiter_id), -- The grain of the output table
  description 'Daily order summary by waiter, including order count and total revenue.'
);

SELECT
  o.ds::DATE AS ds,                            -- The date of the summary
  o.waiter_id::INT AS waiter_id,               -- The waiter ID
  COUNT(DISTINCT o.id)::INT AS order_count,    -- Count of unique orders handled by the waiter on that day
  SUM(oi.quantity * i.price)::DOUBLE AS total_revenue -- Total revenue generated from those orders
FROM raw.orders AS o
JOIN raw.order_items AS oi
  ON o.id = oi.order_id AND o.ds = oi.ds       -- Join orders with order_items on order ID and date
JOIN raw.items AS i
  ON oi.item_id = i.id AND oi.ds = i.ds        -- Join order_items with items on item ID and date to get the price for that day
WHERE
  o.ds BETWEEN @start_ds AND @end_ds          -- Filter for the incremental time range
GROUP BY
  o.ds,
  o.waiter_id;

3. Macros

Create reusable SQL functions:

# macros/date_utils.py
from sqlmesh import macro

@macro()
def fiscal_quarter(evaluator, date_col):
    """Convert date to fiscal quarter (Q1 starts in April)"""
    return f"CASE WHEN MONTH({date_col}) >= 4 THEN 'Q' || ((MONTH({date_col}) - 4) / 3 + 1) ELSE 'Q' || ((MONTH({date_col}) + 8) / 3 + 1) END"

Use in models:

SELECT 
    @fiscal_quarter(ds) AS fiscal_quarter,
    SUM(revenue) AS quarterly_revenue
FROM raw.orders
GROUP BY fiscal_quarter;

4. Unit Tests

Create comprehensive tests:

# tests/test_customer_order_patterns.yaml
test_customer_order_patterns:
  model: sushi.customer_order_patterns
  inputs:
    raw.orders:
      - order_id: 1
        customer_id: 101
        waiter_id: 1
        ds: '2023-01-01'
      - order_id: 2
        customer_id: 101
        waiter_id: 2
        ds: '2023-01-02'
    raw.order_items:
      - order_id: 1
        item_id: 1
        quantity: 2
      - order_id: 2
        item_id: 2
        quantity: 1      
  outputs:
    query:
      - customer_id: 101
        total_orders: 2
        unique_items_ordered: 2
        total_items_quantity: 3
        avg_items_per_order: 1.5
        last_order_date: '2023-01-02'

Best Practices

1. Model Organization

Use clear, descriptive model names
Group related models in subdirectories
Follow a consistent naming convention

2. Documentation

Add descriptions to all models
Document column purposes using column_descriptions
Include business logic explanations in comments

3. Testing Strategy

Write unit tests for critical models
Use audits for data quality checks
Test edge cases and null handling

4. Performance Optimization

Use appropriate model kinds (VIEW vs FULL vs INCREMENTAL)
Add indexes for frequently queried columns
Monitor query performance in production

5. Version Control

Commit config.yaml and all model files
Use meaningful commit messages
Review changes before deploying to production

6. Development Workflow

Create a feature branch
Develop models in VS Code
Run sqlmesh plan dev to test
Validate results with sqlmesh fetchdf
Run tests with sqlmesh test
Create a pull request
Deploy to production after review

Troubleshooting

Common Issues

Extension not detecting models:
Lineage not displaying:
Plan failures:

Getting Help

SQLMesh Documentation:

https://sqlmesh.readthedocs.io/

Slack Community

Summary

You’ve now learned how to:

Set up SQLMesh with the VS Code extension
Work with the sushi example project
Create and modify models
Use advanced features like Python models and macros
Deploy changes safely using virtual environments

SQLMesh’s powerful features, like virtual data environments and column-level lineage, make it an excellent choice for modern data transformation workflows. Continue exploring the documentation and examples to master more advanced features!

Andrew Madson MSc, MBA

Head of Education & Evangelism at Tobiko

Founder, Insights x Design

Insights x Design