Order by rand() does not consider LIMIT

Hi,

my Singlestore version: 8.1.26 under Ubuntu 22.04.3 LTS.

I’m a bit confused: on a simple query of my Articles table (with 275K entries) with:

SELECT id,title FROM articles ORDER BY RAND() LIMIT 4

I rarely really get 4 results back.

Sometimes there are 2, sometimes 3, sometimes 4, sometimes 6 entries. This is just an example. Of course I know that the SQL is not very performant or optimal with 275K entries. But that is not the point.

The LIMIT is not completely ignored, I never get more than 7 to 8 entries back, but the specification of “LIMIT 4” is simply ignored.

A bug, or did I not understand something correctly?

Examples (the contents are in german language):

singlestore> select id, title from articles order by rand() limit 4;
+--------+--------------------------------------------------------------------+
| id     | title                                                              |
+--------+--------------------------------------------------------------------+
| 291401 | Optisches Laufwerk im Netzwerk                                     |
|  68710 | Freigabe Risiko                                                    |
|  59518 | 2 Rechner 1 VPN verbindung über DSL Leitung an Server              |
|  13781 | bei automatischer Anmeldung wird kein logon.bat ausgeführt         |
| 398181 | Drucker schaltet sich auf Windows 2012 R2 Server immer auf offline |
+--------+--------------------------------------------------------------------+
5 rows in set (0,25 sec)

singlestore>
singlestore> select id, title from articles order by rand() limit 4;
+--------+-------------------------------------------------------------------------+
| id     | title                                                                   |
+--------+-------------------------------------------------------------------------+
| 242523 | Kein Routerzwang mehr ab Herbst 2014                                    |
| 589705 | Software oder Möglichkeit um Notebooks nach Ablaufdatum sperren lassen  |
| 167489 | In VBS Date-Funktionen nutzen...                                        |
|  11064 | Toshiba HDD Tool                                                        |
| 575246 | HP LaserJet 2300 schmiert bzw hat einen Streifen                        |
| 206950 | Win7 Pro 64b und Raid 0                                                 |
+--------+-------------------------------------------------------------------------+
6 rows in set (0,01 sec)

singlestore> select id, title from articles order by rand() limit 4;
+------------+-----------------------------------------------------------+
| id         | title                                                     |
+------------+-----------------------------------------------------------+
|     175649 | Server zum einrichten Von Standartiesierten PC - Notebook |
|     192574 | Druckerverbindung mit Batchdatei löschen                  |
| 1637061622 | WinServer2019 Netzwerkadapter streikt nach Win-Update     |
+------------+-----------------------------------------------------------+
3 rows in set (0,00 sec)
singlestore> select count(id) FROM articles;
+-----------+
| count(id) |
+-----------+
|    275417 |
+-----------+
1 row in set (0,12 sec)

thx

It sounds like a bug. Can you give the EXPLAIN plan and the SHOW CREATE TABLE for the articles table?

I tried it on a similar query in our master branch for the next release and did not have a problem – always got 4 rows.

To work around it, consider doing something like

select id, title, rand() * (id - id + 1) as r from articles order by r limit 4;

I’ll open a bug to have the developers look at it. But they will need your repro information.

Here the Explain (db name is “core”):

 Explain select id,title from articles ORDER BY RAND() LIMIT 3;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                       |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project [remote_0.id, remote_0.title] est_rows:3                                                                                                                              |
| Sort [RAND()]                                                                                                                                                                 |
| Gather partitions:all est_rows:3 alias:remote_0 parallelism_level:segment                                                                                                     |
| Project [raw_proj.id, raw_proj.title] est_rows:3 est_select_cost:48                                                                                                           |
| Sort [RAND()]                                                                                                                                                                 |
| HashJoin                                                                                                                                                                      |
| |---HashTableProbe [raw_proj.id = r1.OSJ_IND_id_1]                                                                                                                            |
| |   HashTableBuild alias:r1                                                                                                                                                   |
| |   Project [r0.OSJ_IND_id_1] alias:r1 hash_key:[OSJ_IND_id_1] parallelism_level:partition est_rows:3                                                                         |
| |   TopSort limit:3 [RAND()]                                                                                                                                                  |
| |   TableScan r0 storage:list stream:yes table_type:reference est_table_rows:3 est_filtered:3                                                                                 |
| |   Broadcast [articles.id AS OSJ_IND_id_1] AS r0 distribution:tree parallelism_level:segment est_rows:3 est_select_cost:2,203,336                                            |
| |   TopSort limit:3 [RAND()]                                                                                                                                                  |
| |   ColumnStoreScan core.articles, SORT KEY stamp_create (stamp_create DESC) table_type:sharded_columnstore est_table_rows:275,417 est_filtered:275,417         |
| ColumnStoreFilter [raw_proj.id = r1.OSJ_IND_id_1 join index]                                                                                                                  |
| ColumnStoreScan core.articles AS raw_proj, SORT KEY stamp_create (stamp_create DESC) table_type:sharded_columnstore est_table_rows:275,417 est_filtered:275,417 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
16 rows in set (0,01 sec)

Unfortunately, the workaround does not work either:

singlestore> select id, title, rand() * (id - id + 1) as r from articles order by r limit 4;
+------------+-----------------------------------------------------------------+--------------------+
| id         | title                                                           | r                  |
+------------+-----------------------------------------------------------------+--------------------+
|       5960 | Freigabeberechtigungen - Suchpfad                               | 0.4798912845003436 |
|     106133 | DECTanbindung für Computer                                      | 0.8400707273869789 |
| 1409727645 | Powershell: Aus Zertifikat ausgelesenes Datum anders darstellen | 0.9403310346544506 |
+------------+-----------------------------------------------------------------+--------------------+
3 rows in set (0,00 sec)

singlestore> select id, title, rand() * (id - id + 1) as r from articles order by r limit 4;
+------------+---------------------------------------------------------------------------------+---------------------+
| id         | title                                                                           | r                   |
+------------+---------------------------------------------------------------------------------+---------------------+
|      38892 | Probleme nach Umstellung auf MS-Exchange                                        |  0.0822300273360751 |
|     578901 | Script zum erstellen von Scripte                                                | 0.10935901832430162 |
|     665630 | Umwandeln einer Dezimalzahl in eine Binärzahl                                   |  0.4484697938507265 |
|     158700 | Wie ist die Sicherheit von mod mono? Also dot Net für einem Apache und Debian?  |  0.5382782625225225 |
| 2020469764 | Entwicklertagebuch: Internationale Version Teil 2                               |   0.634140302182887 |
|      34438 | Word 2003 Symbolleisten Darstellung erzwingen                                   |  0.8525010707826901 |
|     257255 | Läuft eine GTK-PLUS und GNOME -basierte Anwendung auf einen Server?             |  0.9263848813881428 |
+------------+---------------------------------------------------------------------------------+---------------------+
7 rows in set (0,01 sec)

singlestore> select id, title, rand() * (id - id + 1) as r from articles order by r limit 4;
+------------+-----------------------------------------------------------+---------------------+
| id         | title                                                     | r                   |
+------------+-----------------------------------------------------------+---------------------+
|      26449 | Freigabe von XPHome aus nicht sichtbar                    | 0.11239358111376974 |
|     276726 | VBScript - Kopiere lokale Datei auf mehrere Netzwerkpfade |  0.3419217919467452 |
| 2020469772 | Entwicklertagebuch: Internationale Version Teil 2         | 0.37059039661203097 |
| 2043298539 | Powershell Sample 3                                       | 0.37514383693551734 |
|     143016 | Wer kennt diesen Fehler im Ereignisprotokoll?             |  0.8284508513184066 |
+------------+-----------------------------------------------------------+---------------------+
5 rows in set (0,01 sec)

It works correctly at all times for our Rowstore tables. It seems to affect some of the columnstore tables. But only the ones with high entries. Our “article” (275.42K rows) and the “comments” tables (1.47M rows). The smaller ones (under 15K rows) don’t seem to be affected by the error. But this could just be a coincidence.

All tables are quite similar in structure. Have nothing special.

Here is the CREATE TABLE for articles (somewhat shortened, a few fields are missing, but everything is present from the structure):

CREATE TABLE `articles` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `status` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT 'new',
  `title` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL,
  `language` varchar(5) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT 'de',
  `images` JSON COLLATE utf8_bin DEFAULT NULL,
  `meta` JSON COLLATE utf8_bin DEFAULT NULL,
  `count_tags` int(11) DEFAULT 0,
  `count_images` int(11) DEFAULT 0,
  `stamp_create` datetime DEFAULT NULL,
  `stamp_last_activity` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  UNIQUE KEY `PRIMARY` (`id`) USING HASH,
  SORT KEY `stamp_create` (`stamp_create` DESC),
  KEY `filter` (`status`,`language`) USING HASH,
  FULLTEXT KEY `search` (`title`),
  SHARD KEY `__SHARDKEY` (`id`)
) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE='STRICT_ALL_TABLES'

If I add a WHERE statement, it seems to work again. Without it it does not. But even that can only be a coincidence.

singlestore> SELECT id,title FROM articles WHERE status='ok' ORDER BY RAND() LIMIT 3;
+--------+------------------------+
| id     | title                  |
+--------+------------------------+
|  21348 | RAID abschalten        |
| 540702 | Dateiablage aufräumen  |
|  57917 | Virus Steam Keygen2.3  |
+--------+------------------------+
3 rows in set (0,03 sec)

singlestore> SELECT id,title FROM articles WHERE status='ok' ORDER BY RAND() LIMIT 3;
+--------+------------------------------------------------------------------------------+
| id     | title                                                                        |
+--------+------------------------------------------------------------------------------+
|  54807 | Windows Server 2003 - POP3 Dienst - mehrere Domänen aber gleiche Benutzer!?  |
|  70863 | Word 2003 autotext USERNAME bsp. mm Michael Maier !?                         |
| 250550 | Datev und Server                                                             |
+--------+------------------------------------------------------------------------------+
3 rows in set (0,03 sec)

singlestore> SELECT id,title FROM articles ORDER BY RAND() LIMIT 3;
+------------+---------------------------------------------------+
| id         | title                                             |
+------------+---------------------------------------------------+
| 2020469764 | Entwicklertagebuch: Internationale Version Teil 2 |
+------------+---------------------------------------------------+
1 row in set (0,07 sec)

singlestore> SELECT id,title FROM articles ORDER BY RAND() LIMIT 3;
+--------+------------------------------------------------------------------+
| id     | title                                                            |
+--------+------------------------------------------------------------------+
| 438937 | Probleme beim Öffnen von Mails und Links in outlook 2010 win 10  |
|  28891 | Programme des Servers (W2k3) auf dem Client (XP) starten         |
|  68467 | Probleme mit Geschwindigkeit bei Netzwerkzugriff                 |
| 299134 | Beim zusätzlichen DC fehlen die Freigaben SYSVOL und NETLOGON    |
+--------+------------------------------------------------------------------+
4 rows in set (0,00 sec)

Thanks for these additional details. I have passed this to our dev team to investigate.

The dev team is fixing this. I expect a patch to correct it within a few weeks.

1 Like