Disk usage of database logs

prerak · February 19, 2021, 11:52pm

I know the question about disk space usage has been asked a few times, but I believe I’m seeing something that may be wrong. Our use case is a bit weird: we have two databases (trading and trading_hist), where trading is used during the day, and at the end of the day, we copy the data from trading to trading_hist. Once the data is copied, we drop all tables in trading and start over again next day with new tables (of the same names).

The log files for trading are now taking 23GB space on a single node, while the data we store in it is approx 100MB each day. I’ve checked snapshots_to_keep is set to 2, and I’ve run snapshot trading several times successfully. The initial log file size is the default 256MB, and there are 16 partitions, and we’re running with redundancy mode set to 2. I would still only expect 16x2x256MB = 8GB or so of disk space to be used once snapshots are taken. As of right now, trading db is empty because we’ve dropped the tables, and I’ve run snapshot trading several times, but it never frees any disk space.

I know our usage pattern is unusual, but any idea on how to truncate the log files? I can probably drop the database and recreate it, but then I assume I would have to re-grant privileges, and of course, I’d have to recreate all the stored procedures and functions.

Looking for advice. Thanks.

adam · February 22, 2021, 4:48pm

Hi prerak,

The 256 mb for the log file size isn’t the maximum size of the log files on disk. That is controlled by snapshot_trigger_size (which is 2 GB by default). The log files size impacts mostly empty databases (we pre-create a few log files even when the database is empty)

So the expected disk space usage for a leaf for a database in active use (where num_partition_dbs_per_leaf is total number of partitions - master or replica partitions - on a leaf).

num_partition_dbs_per_leaf * snapshot_trigger_size * snapshots_to_keep + num_partition_dbs_per_leaf * snapshot_disk_size * snapshots_to_keep

So if your using the default config its:

num_partition_dbs_per_leaf * 2GB * 2 + num_partition_dbs_per_leaf * snapshot_disk_size * 2

-Adam

prerak · February 22, 2021, 5:09pm

I see. In my case, num_partition_dbs_per_leaf = 8, and snapshot_trigger_size = 2GB, so even when it is empty, the DB can take 64GB disk space on each leaf. (not sure what snapshot_disk_size is, as it doesn’t seem to be a documented variable, but I’m assuming snapshot_disk_size = 2GB as well).

Even so, I was expecting it to truncate the logs once I run the “snapshot <db>” command (twice perhaps since snapshots_to_keep=2).

Just as feedback for you, and even though disk space is relatively cheap, it would be helpful to be able to set these variables per DB, as there are use cases where the DB is just not likely to be big (reference DBs or dev/uat DBs or per-user DBs).

Thanks,
Prerak

adam · February 22, 2021, 6:02pm

Hi Prerak

snapshot_disk_size in my calculation above is how much disk space your snapshot files are using on disk. That’s dependent on how much data you have stored in the database (its not a knob value).

Running 2 snapshots should reduce the disk space (if no new data had been written since the two snapshots). At that point I would the calculation you had above:

num_partition_dbs_per_leaf * 2 * 256mb

The “*2” part here is the number of pre-allocated log files. It can be as high as 5 (the current open log file and up to 4 pre-allocated log files for future use). It depends on how aggressive the write workload was in the past.

I will open a task internally to track setting some of these values per database. I agree with you, the product today doesn’t make it easy to reduce disk space usage at this scale (say trying to trim 10s of GBs of disk usage off mostly empty dbs).

-Adam

prerak · February 22, 2021, 6:36pm

Thanks for that info - the write workload is certainly aggressive during the day for us (100K+ records/sec at times), which would explain why we have 5-6 pre-allocated log files even for an empty db (with 8x2 partitions per leaf, because of redundant mode).

[ec2-user@ip-172-31-16-188 logs]$ du trading_prod_* | cut -f1-3 -d_ | sort | uniq -c
  6 262144  trading_prod_0
  6 262144  trading_prod_1
  6 262144  trading_prod_10
  5 262144  trading_prod_11
  5 262144  trading_prod_12
  6 262144  trading_prod_13
  5 262144  trading_prod_14
  6 262144  trading_prod_15
  6 262144  trading_prod_2
  6 262144  trading_prod_3
  6 262144  trading_prod_4
  5 262144  trading_prod_5
  6 262144  trading_prod_6
  5 262144  trading_prod_7
  5 262144  trading_prod_8
  6 262144  trading_prod_9
  3 65536   trading_prod_log

prerak · February 22, 2021, 6:46pm

Just to add to this, my main concern in reporting this was that there may be a leak in reclaiming disk space once the tables are dropped. I thought that because even after dropping all tables and taking multiple snapshots, the 6 allocated log files per partition were never deleted.

I will report back in a few days if the disk continues to grow indefinitely.