TABLE_STATISTICS calculates the same memory_use of MEDIUMINT and INT

Based on the Data Types documentation section, I was expecting MEDIUMINT (8) UNSIGNED to consume less memory than INT (10) UNSIGNED.

From the MemSQL Studio this seems true while exploring the Databases report:

CREATE TABLE `keywords-int` (
  `id` binary(16) NOT NULL,
  `location` int(10) unsigned DEFAULT null,
  PRIMARY KEY (`id`),
  KEY `location` (`location`)
)
+----------+------------------+--------------+
| Name     | Data Type        | Memory Usage |
+----------+------------------+--------------+
| id       | binary(16)       | 331 MB       |
| location | int(10) unsigned | 165 MB       |
+----------+------------------+--------------+
CREATE TABLE `keywords-mediumint` (
  `id` binary(16) NOT NULL,
  `location` mediumint(8) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `location` (`location`)
)
+----------+-----------------------+--------------+
| Name     | Data Type             | Memory Usage |
+----------+-----------------------+--------------+
| id       | binary(16)            | 331 MB       |
| location | mediumint(8) unsigned | 83 MB        | <--- 50% lower
+----------+-----------------------+--------------+

Index usage of both tables remains the same (PRIMARY = 1 GB and location = 827 MB).

But the memory_use provided by INFORMATION_SCHEMA.TABLE_STATISTICS shows otherwise:

+--------------------+----------+------------+
| table_name         | rows     | memory_use |
+--------------------+----------+------------+
| keywords-int       | 21670444 | 2799602432 |
| keywords-mediumint | 21670444 | 2799602432 |
+--------------------+----------+------------+

I did execute ANALYZE TABLE ... on both tables.

What could be the explanation? I guess the reported usage from MemSQL Studio per column is the most accurate?

The reason of chosing MEDIUMINT over INT is because of the expectation of reduced memory usage. With 50M rows and counting, this should save 200 MB per column.

Am I wrong in my understanding of storage calculation?

Note: In all 21M rows the location is exactly a four digit number. Perhaps MemSQL doesn’t store the integers as a fixed width?

Thanks!

INT and MEDIUMINT do indeed take different amounts of memory. But with a narrow table like that, there could be issues of storage alignment or fixed allocation sizes that could mask that.

E.g.

create table t(c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int, c8 int);
insert t values (1,1,1,1,1,1,1,1);

/* do a bunch of times to get 1/2 million rows total */
insert into t
select * from t;

create table s(c1 mediumint, c2 mediumint, c3 mediumint, c4 mediumint, c5 mediumint, c6 mediumint, c7 mediumint, c8 mediumint);

insert into s
select * from t;

select table_name, sum(memory_use) m 
from information_schema.table_statistics 
group by table_name;

/* 
Result:

table_name    m
t             71958528
s             55312384
*/