Checksum for the data contents of a table

Is there a way to get a quick checksum based on the data row contents in an entire table.

I’m trying to find a quick way to validate that 2 databases are identical; or at least identify which table(s) have different row content… is there any information_schema table I could query for a checksum of the row contents for all rows in a table ?

I’ve experimented with using crc32( json_agg( to_json( table.* ) ) ) but unfortunately many of our tables are just too large for json_agg .

Would summing the crc32 of the to_json(table.*) of all rows be meaningful ?

Summing crc32 of the to_json(table.*) would be meaningful, yes. Consider something like this:

singlestore> select count(*), sum(crc32(to_json(r.*)) :> bigint) from r;
+----------+------------------------------------+
| count(*) | sum(crc32(to_json(r.*)) :> bigint) |
+----------+------------------------------------+
|        2 |                         4358384770 |
+----------+------------------------------------+
1 row in set (0.05 sec)

singlestore> select * from r;
+------+------+
| a    | b    |
+------+------+
|    1 |    2 |
|    3 |    4 |
+------+------+
2 rows in set (0.06 sec)

singlestore> insert r values(100,200);
Query OK, 1 row affected (0.06 sec)

singlestore> select count(*), sum(crc32(to_json(r.*)) :> bigint) from r;
+----------+------------------------------------+
| count(*) | sum(crc32(to_json(r.*)) :> bigint) |
+----------+------------------------------------+
|        3 |                         5782152428 |
+----------+------------------------------------+
1 row in set (0.00 sec)

Then if the count or the crc32 changes, the tables are different. You could do the count first before calculating the CRC32 to save time & only do the CRC32 if the count is equal.

I’m not sure the cast to bigint is necessary. I just did that to be safe to make sure the sum would be a bigint. That might already be happening by default.