New

Table-Valued Python UDFs to demonstrate JSON Expansion

Notebook


SingleStore Notebooks

Table-Valued Python UDFs to demonstrate JSON Expansion

Setup: Database Schema

In [1]:

1%%sql2-- Clean up any existing resources3DROP TABLE IF EXISTS orders;4
5-- Create sample table with nested JSON6CREATE TABLE orders (7    order_id INT PRIMARY KEY,8    customer_id INT NOT NULL,9    order_date DATE NOT NULL,10    line_items JSON NOT NULL,11    metadata JSON12);

Out [1]:

Insert Sample Data with Nested JSON

In [2]:

1%%sql2INSERT INTO orders (order_id, customer_id, order_date, line_items, metadata) VALUES3(1001, 501, '2026-01-15',4 '[{"product_id": "P123", "product_name": "Laptop", "quantity": 1, "unit_price": 1299.99},5   {"product_id": "P456", "product_name": "Mouse", "quantity": 2, "unit_price": 29.99}]',6 '{"source": "web", "promo_code": "WINTER25", "shipping_method": "express"}'),7
8(1002, 502, '2026-01-16',9 '[{"product_id": "P789", "product_name": "Keyboard", "quantity": 1, "unit_price": 149.99},10   {"product_id": "P101", "product_name": "Monitor", "quantity": 2, "unit_price": 399.99},11   {"product_id": "P202", "product_name": "USB Cable", "quantity": 3, "unit_price": 12.99}]',12 '{"source": "mobile_app", "shipping_method": "standard"}'),13
14(1003, 503, '2026-01-17',15 '[{"product_id": "P303", "product_name": "Headphones", "quantity": 1, "unit_price": 89.99}]',16 '{"source": "web", "promo_code": "SAVE10"}');

Out [2]:

3 rows affected.

Verify Sample Data

In [3]:

1%%sql2-- Review the raw JSON structure3SELECT4    order_id,5    customer_id,6    line_items,7    metadata8FROM orders9LIMIT 3;

Out [3]:

3 rows affected.

Define Table-Valued Python UDF: Line Items Expansion

In [4]:

1import typing2import json3import pandas as pd4import logging5from singlestoredb.functions import udf, Table6
7# Configure logging8logger = logging.getLogger(__name__)9
10class LineItemOutput(typing.NamedTuple):11    product_id: str12    product_name: str13    quantity: int14    unit_price: float15    line_total: float16
17@udf(returns=LineItemOutput)18async def expand_line_items(json_array: str) -> Table[pd.DataFrame]:19    """20    Expands a JSON array of line items into individual rows.21
22    Parameters23    ----------24    json_array : str25        JSON string containing array of line item objects26
27    Returns28    -------29    Table[pd.DataFrame]30        Table with columns: product_id, product_name, quantity,31        unit_price, line_total (computed)32    """33    # Use NamedTuple fields to ensure schema consistency34    columns = LineItemOutput._fields35
36    # Handle null/empty input gracefully37    if not json_array:38        logger.debug("Received null or empty json_array, returning empty result")39        return Table(pd.DataFrame(columns=columns))40
41    # Parse JSON - let decode errors bubble up to caller42    items = json.loads(json_array)43
44    if not items:45        logger.debug("Parsed empty array, returning empty result")46        return Table(pd.DataFrame(columns=columns))47
48    rows = []49    for idx, item in enumerate(items):50        try:51            quantity = int(item.get('quantity', 0))52            unit_price = float(item.get('unit_price', 0.0))53
54            rows.append({55                'product_id': item.get('product_id', ''),56                'product_name': item.get('product_name', ''),57                'quantity': quantity,58                'unit_price': unit_price,59                'line_total': quantity * unit_price60            })61        except (ValueError, TypeError) as e:62            # Log the problematic item but continue processing others63            logger.warning(f"Skipping item at index {idx} due to type conversion error: {e}")64            continue65
66    # DataFrame columns automatically match NamedTuple schema67    return Table(pd.DataFrame(rows, columns=columns))

Define Table-Valued Python UDF: Metadata Expansion

In [5]:

1class MetadataOutput(typing.NamedTuple):2    key: str3    value: str4
5@udf(returns=MetadataOutput)6async def expand_metadata(json_obj: str) -> Table[pd.DataFrame]:7    """8    Expands a JSON object into key-value pairs (one row per field).9    Useful for dynamic schemas where field names aren't known in advance.10
11    Parameters12    ----------13    json_obj : str14        JSON string containing object with arbitrary fields15
16    Returns17    -------18    Table[pd.DataFrame]19        Table with columns: key, value20    """21    # Use NamedTuple fields to ensure schema consistency22    columns = MetadataOutput._fields23
24    # Handle null/empty input gracefully25    if not json_obj:26        logger.debug("Received null or empty json_obj, returning empty result")27        return Table(pd.DataFrame(columns=columns))28
29    # Parse JSON - let decode errors bubble up to caller30    obj = json.loads(json_obj)31
32    rows = [{'key': k, 'value': str(v)} for k, v in obj.items()]33
34    if not rows:35        logger.debug("Parsed empty object, returning empty result")36        return Table(pd.DataFrame(columns=columns))37
38    # DataFrame columns automatically match NamedTuple schema39    return Table(pd.DataFrame(rows, columns=columns))

Start the Python UDF Server

This registers all @udf annotated functions as external functions in your database.

In [6]:

1import singlestoredb.apps as apps2import logging3
4# Configure logging level5logging.basicConfig(level=logging.INFO)6logger = logging.getLogger(__name__)7
8connection_info = await apps.run_udf_app()9logger.info("Python UDF server is running")10logger.info(f"Functions registered with '_test' suffix for notebook testing")

Run the following commands in a new Editor for demonstrating production usecase

Verify Function Registration

In [7]:

1%%sql2-- Check registered functions3SHOW FUNCTIONS LIKE '%expand%';

Out [7]:

4 rows affected.

In [8]:

1%%sql2-- View function signature3SHOW CREATE FUNCTION expand_line_items_test;

Out [8]:

1 rows affected.

Why Not Use Native JSON Functions?

Native JSON functions work for simple extractions but become verbose and brittle for complex transformations.

%%sql
-- Native approach: requires knowing array size, returns strings, no computed fields
SELECT
    order_id,
    JSON_EXTRACT_STRING(JSON_ARRAY_ELEMENT(line_items, 0), 'product_name') as item1_name,
    JSON_EXTRACT_STRING(JSON_ARRAY_ELEMENT(line_items, 0), 'quantity') as item1_qty,
    JSON_EXTRACT_STRING(JSON_ARRAY_ELEMENT(line_items, 1), 'product_name') as item2_name
    -- Breaks if order has different number of items!
FROM orders
LIMIT 3;

With Table-Valued Python UDFs:

  • Dynamic: Works regardless of array size

  • Clean: Single function call vs. nested extractions

  • Computed fields: Calculate line_total during expansion

  • Type safety: Returns proper int/float types

  • Reusable: Define once, use everywhere

Publish the above code as a Python UDF and then run the following SQL code to test its working.

Use Case 1: Basic JSON Array Expansion

Instead of querying raw JSON, we've normalized the nested array into relational rows that are easy to filter and aggregate.

-- Flatten nested line items into individual rows
SELECT
    o.order_id,
    o.customer_id,
    o.order_date,
    li.product_id,
    li.product_name,
    li.quantity,
    li.unit_price,
    li.line_total
FROM orders o
CROSS JOIN expand_line_items(o.line_items) AS li
ORDER BY o.order_id, li.product_id;

Use Case 2: Product Revenue Analysis

We can aggregate across all orders to understand product performance

-- Calculate total revenue and units sold per product
SELECT
    li.product_name,
    SUM(li.quantity) as total_units_sold,
    SUM(li.line_total) as total_revenue,
    COUNT(DISTINCT o.order_id) as num_orders,
    ROUND(AVG(li.unit_price), 2) as avg_unit_price
FROM orders o
CROSS JOIN expand_line_items(o.line_items) AS li
GROUP BY li.product_name
ORDER BY total_revenue DESC;

Use Case 3: Dynamic Metadata Extraction

We don't need to know metadata field names in advance. The TVF extracts whatever keys exist

-- Find all orders that used promo codes (filtering on dynamically extracted fields)
SELECT
    o.order_id,
    o.customer_id,
    o.order_date,
    m.value as promo_code_used
FROM orders o
CROSS JOIN expand_metadata(o.metadata) AS m
WHERE m.key = 'promo_code'
ORDER BY o.order_id;

Use Case 4: Multi-TVF Analytics - Revenue by Channel

Join multiple TVF expansions to answer complex business questions: Which marketing channels drive the most revenue?"

-- Calculate revenue breakdown by traffic source (combines both TVFs)
SELECT
    m.value as channel,
    COUNT(DISTINCT o.order_id) as orders,
    SUM(li.line_total) as total_revenue,
    ROUND(AVG(li.line_total), 2) as avg_line_item_value,
    SUM(li.quantity) as total_units
FROM orders o
CROSS JOIN expand_line_items(o.line_items) AS li
CROSS JOIN expand_metadata(o.metadata) AS m
WHERE m.key = 'source'
GROUP BY m.value
ORDER BY total_revenue DESC;

Use Case 5: Full Order Denormalization with Pivot

Complete denormalization - transform nested JSON into a flat, analytics-ready table. This usabe by BI tools, dashboards, and reporting, and reporting pipelines.

-- Create fully denormalized view combining line items with pivoted metadata
SELECT
    o.order_id,
    o.customer_id,
    o.order_date,
    li.product_name,
    li.quantity,
    li.line_total,
    MAX(CASE WHEN m.key = 'source' THEN m.value END) as order_source,
    MAX(CASE WHEN m.key = 'promo_code' THEN m.value END) as promo_code,
    MAX(CASE WHEN m.key = 'shipping_method' THEN m.value END) as shipping_method
FROM orders o
CROSS JOIN expand_line_items(o.line_items) AS li
CROSS JOIN expand_metadata(o.metadata) AS m
GROUP BY o.order_id, o.customer_id, o.order_date, li.product_name, li.quantity, li.line_total
ORDER BY o.order_id;

When to Use Table-Valued Python UDFs

Best Use Cases

Complex transformations: Logic difficult with native SQL/JSON functions ✓ Dynamic schemas: Field names/structure varies by record ✓ Computed fields: Calculations during expansion (like line_total) ✓ Reusable logic: Define once, use across queries and applications

Optimization Tips

  1. Materialize frequent queries in views

  2. Apply WHERE clauses before TVF when possible

  3. Index supporting columns (order_id, customer_id, etc.)

  4. Use EXPLAIN to verify query plans

When NOT to Use

✗ Simple single-field extractions (use JSON_EXTRACT_STRING) ✗ Very high-frequency point queries (consider denormalization)

Details


About this Template

Learn how to how to use table-valued Python UDFs to expand nested JSON data into normalized relational rows

This Notebook can be run in Standard and Enterprise deployments.

Tags

advancednotebookspython

See Notebook in action

Launch this notebook in SingleStore and start executing queries instantly.

License

This Notebook has been released under the Apache 2.0 open source license.