Accelerating JSON Array Queries with Multi-Value Hash Indexes in SingleStore

Modern applications increasingly rely on JSON to model complex, semi-structured data — especially when records naturally contain arrays of values (e.g. tags, event logs, product SKUs).

Accelerating JSON Array Queries with Multi-Value Hash Indexes in SingleStore

Yet array-centric queries (“find all users with role ‘admin’,” “filter orders containing item X”) can quickly become performance bottlenecks in databases without good index support on semi-structured columns. 

SingleStore’s Multi-Value Hash Index on JSON changes the game, delivering sub-millisecond lookups over deeply nested arrays. Lookup queries leverage the index and avoid a full table scan, and by combining it with the power of columnar storage and vectorized distributed queries you achieve blazing fast analytics on JSON data.

Here are a few common real-world scenarios involving arrays:

  • eCommerce. Orders with an array of product IDs, where you need to count all orders containing a specific SKU.
  • Event tracking. Each user record has an array of “interests” or “actions”— filtering millions of users by interest used to mean expensive full-table scans.
  • Content management. Articles tagged with multiple topics, requiring fast faceted search across tag arrays.

In each case, you’re asking “does this JSON array include X?” or “which rows have any element in this array matching my filter?” Without specialized indexing, every query forces the engine to unpack the JSON, scan arrays and evaluate predicates one each array item which is costly at scale.

introducing-the-multi-value-hash-indexIntroducing the Multi-Value Hash Index

Multi-Value Hash Index is built upon the regular columnar hash index of SingleStore. Instead of building a hash index on the entire column contents, Multi-Value Hash Index indexes individual array elements of a row, resulting in each array element pointing back to its parent document.

JSON_MATCH_ANY with equality and IN-based queries currently use this index automatically. As a result we get O(1) execution for such queries.

Let's take the example of a users table that stores the various interests of each user in a JSON array.

1SELECT user_id2FROM users3WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON() ='"hiking"', interests);

This query runs super fast with indexed lookup. Behind the scenes, each distinct interest is stored in the index, so the final SELECT runs in microseconds regardless of table size.

singlestore-jso-ns-competitive-advantageSingleStore JSON’s competitive advantage

SingleStore’s JSON data type has special optimizations for the columnstore storage architecture, where each of the fields in the JSON are natively exploded into their own columns, this gives immense benefits for analytical queries on specific fields as the entire JSON does not need to be parsed or loaded into memory — and it also gets other benefits of SingleStore like with columnar compression, and vectorized and distributed query execution under the hood. Combining this with a multi-value index results in fast analytics even with lookups.

This is unlike PostgreSQL’s GIN indexes, which only speed up individual lookups but fall back to costly per-row JSON or document fetches when you add grouping, or sorting, so aggregation latencies grow super-linearly with data size. MongoDB multikey index similarly accelerates simple matches, but its aggregation pipeline still fetches and decodes full documents for grouping or sorting.

To witness the performance benefits of Multi-Value Hash Index, here is a simple example:

1-- Create orders with JSON array of product_ids2CREATE TABLE `orders` (3  `id` bigint(20) NOT NULL,4  `created` datetime(6) DEFAULT NULL,5  `product_ids` JSON,6  PRIMARY KEY (`id`),7  SHARD KEY `__SHARDKEY` (`id`),8  SORT KEY `__UNORDERED` ()9)
1-- Create 1M orders each with 5 products sampled from 100000 product ids2DELIMITER //3DO DECLARE4    arr ARRAY(RECORD(id BIGINT, created DATETIME(6), product_ids JSON)) = CREATE_ARRAY(1000000);5    product_ids JSON;6    n BIGINT;7BEGIN8    FOR i IN 0..999999 LOOP9        product_ids = JSON_BUILD_ARRAY(10            (RAND()*100000):>INT,11            (RAND()*100000):>INT,12            (RAND()*100000):>INT,13            (RAND()*100000):>INT,14            (RAND()*100000):>INT15        );16        arr[i] = ROW(i, NOW(), product_ids);17    END LOOP;18    n = INSERT_ALL('orders', arr);19END //20
21DELIMITER ;
1-- Unindexed scan: ~3 seconds2SELECT * FROM orders WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON() = 11221, product_ids)3
4-- Add Multi-Value Hash Index5ALTER TABLE orders ADD MULTI VALUE INDEX (product_ids) 6INDEX_OPTIONS='{"TOKENIZER":"MATCH_ANY", "PATH":[]}';7
8-- Indexed lookup: ~300 milliseconds for Equality and In based queries9SELECT * FROM orders WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON() = 11221, product_ids)10
11SELECT * FROM orders WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON() IN (11221,23232), product_ids)

We expose the same multi-value index capability on BSON — ideal for MongoDB® applications using SingleStore Kai™. You can create these indexes either with familiar MongoDB commands or directly in SQL, giving developers the flexibility to choose their favorite toolchain.

In summary, with Multi-Value Hash Indexes on JSON in SingleStore, you get the flexibility of schemaless arrays plus the speed of native SQL indexing — all in a single, unified database. Whether you’re building a real-time analytics dashboard or powering an app with millions of users, SingleStore delivers the performance you need for modern JSON-centric applications.

Learn more:

Start building now

Get started with SingleStore Helios today and receive $600 in credits.

Start free

Start building with SingleStore