Is it recommended to schedule ANALYZE queries on all tables?

We just experienced a huge speed improvement (x16 times) by running ANALYZE on a few tables after debugging some queries with unusual plans using the VISUAL EXPLAIN feature in MemSQL Studio.

The huge impact was probably caused by a larger data movement/regeneration yesterday, but it made us wonder if we should schedule some ANALYZE queries to always keep the statistics up-to-date.

The documentation mentions some automatic statistics collections, but strongly recommends executing ANALYZE queries manually.

We do mostly have rowstores but also a single columnstore with histograms.

I guess we would benefit from a daily/weekly statistics collection, but wonders why this isn’t happening automatically by MemSQL. The only real draw appears to be plan invalidation, which probably isn’t the worst drawback since you gave us the INTERPRET_FIRST option.

Is this the right direction to follow, or am I missing something?

For MemSQL 6.8 and earlier, I would recommend running ANALYZE after large loads or updates, and also nightly or weekly to account for changes after continuous updates.

MemSQL 7.0 will have fully-automatic statistics gathering on all table types, so the large majority of users will never have to run ANALYZE again.

Great news regarding 7.0 - thanks for sharing.

We’ll follow your recommendations with a combination of scheduled nightly and weekly executions until 7.0 is released.

Regarding the same point, is there a way to run analyze on all tables in a database instead of going at each table level.

We just wrote a quick procedure a few days ago that accepts the name of the database as a parameter.

  1. It hits the information_schema.Tables
  2. grabs the lists of tables - "select table_name from information_schema.tables where table_schema = ‘ABC’ "
  3. then loops through an execute immediate of "Analyze table XYZ "

It’s not handling the updating of the individual column histograms, but it accomplishes a quick analyze on a schedule.

1 Like

@sproksell

Would you mind sharing your procedure to the community :)?

Here’s one that simply rolls through the tables in a given schema. If you are executing in mysql workbench, it may not finish because of the maximum return grids allowed in the UI. May want to comment out the “print completed” portion.

DELIMITER //
CREATE OR REPLACE PROCEDURE analyze_table_by_schema(_table_schema text ) RETURNS void AS
DECLARE
/* Use static parameterized SQL when declaring ‘q’ because _table_name
and _table_schema are placeholders for constant values. */
qry QUERY(table_name VARCHAR(64)) = SELECT TABLE_NAME FROM information_schema.TABLES WHERE TABLE_TYPE = ‘BASE TABLE’ AND TABLE_SCHEMA = _table_schema;
arr ARRAY(record(table_name VARCHAR(64)));
_table_name varchar(64);
exec_string varchar(1000);
table_completed varchar(64);
complete_string varchar(1000);
BEGIN
arr = COLLECT(qry);
FOR x IN arr LOOP
_table_name = x.table_name;
exec_string = 'analyze table ‘|| _table_schema || ‘.’ || _table_name ;
EXECUTE IMMEDIATE exec_string;
complete_string = ‘echo select ‘’’ || _table_name || ‘’’ Completed ';
EXECUTE IMMEDIATE complete_string;

END LOOP;
END;//
DELIMITER ;

1 Like

I was thinking about this procedure a bit more. If you have a table with the tables you want to analyze, maybe by groups, you could drive this procedure from that list instead of all the tables. No reason to analyze code tables that don’t change.

1 Like