Is it recommended to schedule ANALYZE queries on all tables?

mpskovvang · June 21, 2019, 10:04am

We just experienced a huge speed improvement (x16 times) by running ANALYZE on a few tables after debugging some queries with unusual plans using the VISUAL EXPLAIN feature in MemSQL Studio.

The huge impact was probably caused by a larger data movement/regeneration yesterday, but it made us wonder if we should schedule some ANALYZE queries to always keep the statistics up-to-date.

The documentation mentions some automatic statistics collections, but strongly recommends executing ANALYZE queries manually.

We do mostly have rowstores but also a single columnstore with histograms.

I guess we would benefit from a daily/weekly statistics collection, but wonders why this isn’t happening automatically by MemSQL. The only real draw appears to be plan invalidation, which probably isn’t the worst drawback since you gave us the INTERPRET_FIRST option.

Is this the right direction to follow, or am I missing something?

hanson · June 21, 2019, 5:03pm

For MemSQL 6.8 and earlier, I would recommend running ANALYZE after large loads or updates, and also nightly or weekly to account for changes after continuous updates.

MemSQL 7.0 will have fully-automatic statistics gathering on all table types, so the large majority of users will never have to run ANALYZE again.

mpskovvang · June 21, 2019, 5:57pm

Great news regarding 7.0 - thanks for sharing.

We’ll follow your recommendations with a combination of scheduled nightly and weekly executions until 7.0 is released.

sushma.r.hiregoudra1 · June 27, 2019, 8:53pm

Regarding the same point, is there a way to run analyze on all tables in a database instead of going at each table level.

sproksell · August 21, 2019, 8:17pm

We just wrote a quick procedure a few days ago that accepts the name of the database as a parameter.

It hits the information_schema.Tables
grabs the lists of tables - "select table_name from information_schema.tables where table_schema = ‘ABC’ "
then loops through an execute immediate of "Analyze table XYZ "

It’s not handling the updating of the individual column histograms, but it accomplishes a quick analyze on a schedule.

Jacky · August 21, 2019, 8:49pm

@sproksell

Would you mind sharing your procedure to the community :)?

sproksell · February 3, 2020, 12:40pm

Here’s one that simply rolls through the tables in a given schema. If you are executing in mysql workbench, it may not finish because of the maximum return grids allowed in the UI. May want to comment out the “print completed” portion.

DELIMITER //
CREATE OR REPLACE PROCEDURE analyze_table_by_schema(_table_schema text ) RETURNS void AS
DECLARE
/* Use static parameterized SQL when declaring ‘q’ because _table_name
and _table_schema are placeholders for constant values. */
qry QUERY(table_name VARCHAR(64)) = SELECT TABLE_NAME FROM information_schema.TABLES WHERE TABLE_TYPE = ‘BASE TABLE’ AND TABLE_SCHEMA = _table_schema;
arr ARRAY(record(table_name VARCHAR(64)));
_table_name varchar(64);
exec_string varchar(1000);
table_completed varchar(64);
complete_string varchar(1000);
BEGIN
arr = COLLECT(qry);
FOR x IN arr LOOP
_table_name = x.table_name;
exec_string = 'analyze table ‘|| _table_schema || ‘.’ || _table_name ;
EXECUTE IMMEDIATE exec_string;
complete_string = ‘echo select ‘’’ || _table_name || ‘’’ Completed ';
EXECUTE IMMEDIATE complete_string;

END LOOP;
END;//
DELIMITER ;

sproksell · February 3, 2020, 3:01pm

I was thinking about this procedure a bit more. If you have a table with the tables you want to analyze, maybe by groups, you could drive this procedure from that list instead of all the tables. No reason to analyze code tables that don’t change.