Table-Valued Python UDFs to demonstrate JSON Expansion
Notebook
Setup: Database Schema
In [1]:
1%%sql2-- Clean up any existing resources3DROP TABLE IF EXISTS orders;4 5-- Create sample table with nested JSON6CREATE TABLE orders (7 order_id INT PRIMARY KEY,8 customer_id INT NOT NULL,9 order_date DATE NOT NULL,10 line_items JSON NOT NULL,11 metadata JSON12);
Out [1]:
Insert Sample Data with Nested JSON
In [2]:
1%%sql2INSERT INTO orders (order_id, customer_id, order_date, line_items, metadata) VALUES3(1001, 501, '2026-01-15',4 '[{"product_id": "P123", "product_name": "Laptop", "quantity": 1, "unit_price": 1299.99},5 {"product_id": "P456", "product_name": "Mouse", "quantity": 2, "unit_price": 29.99}]',6 '{"source": "web", "promo_code": "WINTER25", "shipping_method": "express"}'),7 8(1002, 502, '2026-01-16',9 '[{"product_id": "P789", "product_name": "Keyboard", "quantity": 1, "unit_price": 149.99},10 {"product_id": "P101", "product_name": "Monitor", "quantity": 2, "unit_price": 399.99},11 {"product_id": "P202", "product_name": "USB Cable", "quantity": 3, "unit_price": 12.99}]',12 '{"source": "mobile_app", "shipping_method": "standard"}'),13 14(1003, 503, '2026-01-17',15 '[{"product_id": "P303", "product_name": "Headphones", "quantity": 1, "unit_price": 89.99}]',16 '{"source": "web", "promo_code": "SAVE10"}');
Out [2]:
Verify Sample Data
In [3]:
1%%sql2-- Review the raw JSON structure3SELECT4 order_id,5 customer_id,6 line_items,7 metadata8FROM orders9LIMIT 3;
Out [3]:
Define Table-Valued Python UDF: Line Items Expansion
In [4]:
1import typing2import json3import pandas as pd4import logging5from singlestoredb.functions import udf, Table6 7# Configure logging8logger = logging.getLogger(__name__)9 10class LineItemOutput(typing.NamedTuple):11 product_id: str12 product_name: str13 quantity: int14 unit_price: float15 line_total: float16 17@udf(returns=LineItemOutput)18async def expand_line_items(json_array: str) -> Table[pd.DataFrame]:19 """20 Expands a JSON array of line items into individual rows.21 22 Parameters23 ----------24 json_array : str25 JSON string containing array of line item objects26 27 Returns28 -------29 Table[pd.DataFrame]30 Table with columns: product_id, product_name, quantity,31 unit_price, line_total (computed)32 """33 # Use NamedTuple fields to ensure schema consistency34 columns = LineItemOutput._fields35 36 # Handle null/empty input gracefully37 if not json_array:38 logger.debug("Received null or empty json_array, returning empty result")39 return Table(pd.DataFrame(columns=columns))40 41 # Parse JSON - let decode errors bubble up to caller42 items = json.loads(json_array)43 44 if not items:45 logger.debug("Parsed empty array, returning empty result")46 return Table(pd.DataFrame(columns=columns))47 48 rows = []49 for idx, item in enumerate(items):50 try:51 quantity = int(item.get('quantity', 0))52 unit_price = float(item.get('unit_price', 0.0))53 54 rows.append({55 'product_id': item.get('product_id', ''),56 'product_name': item.get('product_name', ''),57 'quantity': quantity,58 'unit_price': unit_price,59 'line_total': quantity * unit_price60 })61 except (ValueError, TypeError) as e:62 # Log the problematic item but continue processing others63 logger.warning(f"Skipping item at index {idx} due to type conversion error: {e}")64 continue65 66 # DataFrame columns automatically match NamedTuple schema67 return Table(pd.DataFrame(rows, columns=columns))
Define Table-Valued Python UDF: Metadata Expansion
In [5]:
1class MetadataOutput(typing.NamedTuple):2 key: str3 value: str4 5@udf(returns=MetadataOutput)6async def expand_metadata(json_obj: str) -> Table[pd.DataFrame]:7 """8 Expands a JSON object into key-value pairs (one row per field).9 Useful for dynamic schemas where field names aren't known in advance.10 11 Parameters12 ----------13 json_obj : str14 JSON string containing object with arbitrary fields15 16 Returns17 -------18 Table[pd.DataFrame]19 Table with columns: key, value20 """21 # Use NamedTuple fields to ensure schema consistency22 columns = MetadataOutput._fields23 24 # Handle null/empty input gracefully25 if not json_obj:26 logger.debug("Received null or empty json_obj, returning empty result")27 return Table(pd.DataFrame(columns=columns))28 29 # Parse JSON - let decode errors bubble up to caller30 obj = json.loads(json_obj)31 32 rows = [{'key': k, 'value': str(v)} for k, v in obj.items()]33 34 if not rows:35 logger.debug("Parsed empty object, returning empty result")36 return Table(pd.DataFrame(columns=columns))37 38 # DataFrame columns automatically match NamedTuple schema39 return Table(pd.DataFrame(rows, columns=columns))
Start the Python UDF Server
This registers all @udf annotated functions as external functions in your database.
In [6]:
1import singlestoredb.apps as apps2import logging3 4# Configure logging level5logging.basicConfig(level=logging.INFO)6logger = logging.getLogger(__name__)7 8connection_info = await apps.run_udf_app()9logger.info("Python UDF server is running")10logger.info(f"Functions registered with '_test' suffix for notebook testing")
Run the following commands in a new Editor for demonstrating production usecase
Verify Function Registration
In [7]:
1%%sql2-- Check registered functions3SHOW FUNCTIONS LIKE '%expand%';
Out [7]:
In [8]:
1%%sql2-- View function signature3SHOW CREATE FUNCTION expand_line_items_test;
Out [8]:
Why Not Use Native JSON Functions?
Native JSON functions work for simple extractions but become verbose and brittle for complex transformations.
%%sql
-- Native approach: requires knowing array size, returns strings, no computed fields
SELECT
order_id,
JSON_EXTRACT_STRING(JSON_ARRAY_ELEMENT(line_items, 0), 'product_name') as item1_name,
JSON_EXTRACT_STRING(JSON_ARRAY_ELEMENT(line_items, 0), 'quantity') as item1_qty,
JSON_EXTRACT_STRING(JSON_ARRAY_ELEMENT(line_items, 1), 'product_name') as item2_name
-- Breaks if order has different number of items!
FROM orders
LIMIT 3;
With Table-Valued Python UDFs:
Dynamic: Works regardless of array size
Clean: Single function call vs. nested extractions
Computed fields: Calculate line_total during expansion
Type safety: Returns proper int/float types
Reusable: Define once, use everywhere
Publish the above code as a Python UDF and then run the following SQL code to test its working.
Use Case 1: Basic JSON Array Expansion
Instead of querying raw JSON, we've normalized the nested array into relational rows that are easy to filter and aggregate.
-- Flatten nested line items into individual rows
SELECT
o.order_id,
o.customer_id,
o.order_date,
li.product_id,
li.product_name,
li.quantity,
li.unit_price,
li.line_total
FROM orders o
CROSS JOIN expand_line_items(o.line_items) AS li
ORDER BY o.order_id, li.product_id;
Use Case 2: Product Revenue Analysis
We can aggregate across all orders to understand product performance
-- Calculate total revenue and units sold per product
SELECT
li.product_name,
SUM(li.quantity) as total_units_sold,
SUM(li.line_total) as total_revenue,
COUNT(DISTINCT o.order_id) as num_orders,
ROUND(AVG(li.unit_price), 2) as avg_unit_price
FROM orders o
CROSS JOIN expand_line_items(o.line_items) AS li
GROUP BY li.product_name
ORDER BY total_revenue DESC;
Use Case 3: Dynamic Metadata Extraction
We don't need to know metadata field names in advance. The TVF extracts whatever keys exist
-- Find all orders that used promo codes (filtering on dynamically extracted fields)
SELECT
o.order_id,
o.customer_id,
o.order_date,
m.value as promo_code_used
FROM orders o
CROSS JOIN expand_metadata(o.metadata) AS m
WHERE m.key = 'promo_code'
ORDER BY o.order_id;
Use Case 4: Multi-TVF Analytics - Revenue by Channel
Join multiple TVF expansions to answer complex business questions: Which marketing channels drive the most revenue?"
-- Calculate revenue breakdown by traffic source (combines both TVFs)
SELECT
m.value as channel,
COUNT(DISTINCT o.order_id) as orders,
SUM(li.line_total) as total_revenue,
ROUND(AVG(li.line_total), 2) as avg_line_item_value,
SUM(li.quantity) as total_units
FROM orders o
CROSS JOIN expand_line_items(o.line_items) AS li
CROSS JOIN expand_metadata(o.metadata) AS m
WHERE m.key = 'source'
GROUP BY m.value
ORDER BY total_revenue DESC;
Use Case 5: Full Order Denormalization with Pivot
Complete denormalization - transform nested JSON into a flat, analytics-ready table. This usabe by BI tools, dashboards, and reporting, and reporting pipelines.
-- Create fully denormalized view combining line items with pivoted metadata
SELECT
o.order_id,
o.customer_id,
o.order_date,
li.product_name,
li.quantity,
li.line_total,
MAX(CASE WHEN m.key = 'source' THEN m.value END) as order_source,
MAX(CASE WHEN m.key = 'promo_code' THEN m.value END) as promo_code,
MAX(CASE WHEN m.key = 'shipping_method' THEN m.value END) as shipping_method
FROM orders o
CROSS JOIN expand_line_items(o.line_items) AS li
CROSS JOIN expand_metadata(o.metadata) AS m
GROUP BY o.order_id, o.customer_id, o.order_date, li.product_name, li.quantity, li.line_total
ORDER BY o.order_id;
When to Use Table-Valued Python UDFs
Best Use Cases
✓ Complex transformations: Logic difficult with native SQL/JSON functions ✓ Dynamic schemas: Field names/structure varies by record ✓ Computed fields: Calculations during expansion (like line_total) ✓ Reusable logic: Define once, use across queries and applications
Optimization Tips
Materialize frequent queries in views
Apply WHERE clauses before TVF when possible
Index supporting columns (order_id, customer_id, etc.)
Use EXPLAIN to verify query plans
When NOT to Use
✗ Simple single-field extractions (use JSON_EXTRACT_STRING) ✗ Very high-frequency point queries (consider denormalization)

Details
About this Template
Learn how to how to use table-valued Python UDFs to expand nested JSON data into normalized relational rows
This Notebook can be run in Standard and Enterprise deployments.
Tags
See Notebook in action
Launch this notebook in SingleStore and start executing queries instantly.
License
This Notebook has been released under the Apache 2.0 open source license.