Disk space management issue

Hello guys,

I’ve a cluster with the following 6 databases:

The nodes take more than 42G of disk:

The top 5 disk usage folders are:
du

So far I did:

  • reducing the snapshots_to_keep to 1
  • log_file_size_partitions from 256mb to 64mb

However, the old log files remained at 256mb.

Most of these empty databases have more than 30 log files that looks like:

  • db_name_1_log_v1_0
  • db_name_1_log_v1_65536
  • and so on…

Could you please tell us how to clean/manage those log files automatically to avoid running out of disk space?

Looks like it’s a recurring pain point and I couldn’t find more resources about it…

Thanks for your help!

Hi,

Changing log_file_size_partitions only impacts newly created databases as you noted.

Another knob you can modify to reduce space is snapshot_trigger_size, which is how much log we allow to accumulate on disk before a snapshot is taken. Its set to 1 GB by default.

-Adam

See Adam’s earlier post for some context. We pre-allocate large log files since 7.0, and they will fill up with user data if you use the DB significantly. So the issue is mostly for empty DBs. And disk is cheap these days.

Thanks @adam, @hanson for your answers,

In my use case I’m creating a DB per “project” and as a SaaS it might have hundreds of “projects”… it won’t be cost effective to reserve 6go on every node for each database that might remain empty.

I expect the active databases to grow at 5-10go / year, could you please give me insights on the right log_file_size_partitions to use?

What is the impact of having a log_file_size_partitions at 8mo? 16mo? 64mo?

Thanks

A few things on log_file_size:

  • It mainly impacts write throughput when there is a sustained burst of rowstore writes. You could see dips in throughput when we need to create new log files due to having it smaller - how much so depends more on your disk throughput . If your workload is mostly columnstore its unlikely you would notice anything. Setting it to 64 mb should be fine.
  • This only saves you space for mostly empty databases. It doesn’t save space for database with data in them

Probably the best way to save disk space is to use fewer partitions for some of your smaller databases (I see you are already going that a bit. Some of them are 8 partitions vs 16). You can increase the partition count later, but it is an offline operation right now.

Alright, thank you for the details!

Can you confirm that reducing the number of partitions will also reduce the parallelism of the queries, and increase the response time for CPU-intensive queries?

Thanks

That’s correct. Reducing the partition count reduces parallelism.

-Adam

@pierre where are you going to run your databases and what are you going to have to pay for storage (say per GB per month)?

The actual rate for Google Compute Engine SSD disks in my location is $8.5 /mo for 50G (without snapshots).

If the cluster is not taking more space per empty DB it’s fine, but coming from other engines (MySQL, Mongo…) it’s shocking to see 42G used for 200mo of data!

Can you confirm that a 4 vCPU node will process a 4 partitions db as fast as a 8 partitions, when it receives a query?

Thanks

For most query shapes that’s correct Pierre. 4 partitions on a 4 vCPU node is optimal.

And yeah, we don’t really optimize disk usage for the many small databases case. The system is more optimized for fewer larger databases (so this is why it does more upfront disk allocation for each database).

-Adam

1 Like

use disk space analyzer , TreeSize is the best disk space analyzer but there are also many disk space analyzers you can also try them