Multi-tenancy — a very early decision for any company or piece of software that aims to serve multiple customers at the same time, each with their own data — is hard. Multi-tenancy is common in SaaS companies, and if you don’t have time to stop and really think about this decision at the beginning of your project, you will have to live with that for some time — and migrating from one strategy to another might not be as simple as you’d expect.
Let’s dive into some of the models that commonly use multi-tenancy. Keep in mind that the ideas and examples here are geared toward relational databases, but the same applies to most kinds of datastores available.
Taking a closer look at multi-tenancy models
One schema: multi-tenancy by tenant column
In this model, you have one master schema and one version of each of your tables. All customer data is stored in the same set of tables, independent of the customer. To distinguish data between customers, every table has a tenant ID column (or whatever you prefer to call it).
This means that for every query you develop in your application, you want to ensure it has that tenant ID column filtered to the correct customer issuing the request. There are a lot of ways to ensure this accuracy, but it’s very dependent on the language and frameworks you are using to access the data. As a rule of thumb, you’ll want to write several tests around that logic to ensure that even when a developer makes a mistake, data from one customer will not show up when another customer is using the app (since this can be really harmful to your company’s reputation).
Apart from the app logic to make sure you are querying the correct data, there are performance and maintenance considerations as well. If all data is within one table that will be served by one database server (in most cases), you can run into a few problems:
When using SingleStoreDB, multi-tenancy by tenant column is a very good approach. Keep in mind that as the database is distributed, you want to think about a good shard key so data is evenly distributed across the cluster nodes. This can be achieved by using the tenant ID with some extra columns. Using only the tenant ID is not a good idea, since queries from one tenant will be concentrated on a small number of nodes.
Tables can be very big, but Columnstore allows you to query that data super fast. By using clever techniques, like compression, queries executed in tables with billions of rows can be very fast. That store model has been evolving steadily throughout the years and is moving toward the Universal Storage concept, where you have a table type that is capable of running analytical queries (OLAP) and high transactional (OLTP) workloads.
Changing the table schema is also something that is very easy with SingleStoreDB since all operations are lock-free. You can easily create new indexes, alter columns and more without worrying too much about your application being unavailable. Of course, you need to keep in mind the processing and memory requirements necessary to perform those operations.
One schema per tenant
In this model, each tenant has a schema with its own set of tables. It provides a higher level of isolation between tenants, but you will end up with a lot more tables. Consider the number of tables your application needs, and also the number of customers you expect to use your product.
You can serve each schema from a completely different database server for each tenant, or you can serve them from just one and distribute the load as you grow the user base. See the operations section for more on that.
Another downside of this approach is that you will end up with a high number of tables in your database server. That number can easily reach millions of tables — consider an app with 100 tables and 10,000 tenants. Most databases do not work well with such a high number of tables. For instance, MySQL will suffer because it has multiple caches that keep information about tables in memory as it has to load the table definitions used on each connection. If you find yourself in that scenario, try optimizing the table cache vs. buffer memory so you have a good balance, and the server is stable.
SingleStoreDB has a very different approach to support schemas. Since it’s designed to support data-intensive applications, there exists a notion that there won’t be a large number of schemas in the database cluster. Each schema will have a huge amount of data (much more now with unlimited storage support), and the goal is to support transactional and analytical queries on tables with literally billions of rows.
For each schema, there is memory allocated to support various buffers, caches and transaction logs, as well as pre-allocated disk space for each partition. This can easily and quickly fill your cluster disk if you create hundreds of databases in a cluster with default settings. You get 256MB per partition, per database only for transaction logs, and it is recommended to have one partition per CPU for better performance. That means for a 32 CPU cluster you need roughly 8GB per database, just to sit there without any data.
That becomes impractical if you need several databases — so unless you know you will have very few tenants and can live with the overhead, this is not a good approach for SingleStoreDB. There are some system variables that control the size of those files, so you can reduce them in any case.
One schema per tenant (via tables)
Another alternative to the previous approach is this: Instead of having multiple schemas, you can have one (or fewer) schema and separate customer data into different tables. This will leave you with a few schemas that have a large number of tables. It also means you have almost the same data isolation, without many of the same downsides you would get with locks on DDL operations. But it’s also important to keep in mind that customers in the same schema are also sharing the same resources, so the “noisy neighbor” problem still exists.
This approach works better with SingleStoreDB since tables have a much smaller overhead than databases — but don’t expect to have millions of tables. The table implementation offers extremely similar isolation to what you would get using databases. And since each table is unique, queries to multiple tenants will have to be compiled for each one, even when the query is exactly the same except by using different tables. This also has some effect on other databases, as the engine needs to somehow figure out the table schema to run the query — and having multiple tables can mess up the caches.
All the models previously mentioned can be mixed together to help you achieve your goals. For instance, you can have any number of servers, with each server having a set of tables for tenants in the “one table per tenant” schema, multiple servers with the tenant column in all tables, and so on. You get the idea. That can be a solution for a problem you have in your current setup — like adding extra servers. In some cases, you can adapt your application to the hybrid model very easily. And in others, it can be difficult depending on your stack and the number of services that need to be changed.
With SingleStoreDB, it is not very cost-effective to have separate clusters for the same application since each cluster has at least one master and a few aggregators. To scale the cluster, you can add more and more leaf nodes adding CPU and RAM to the problem, simplifying the operation and application code as you do not need to handle the logic of figuring out where your customer data is located before accessing it. You won’t be able to escape that if you need to serve customers globally, so you can be prepared if you want to reach that goal.
How SingleStoreDB features support a multi-tenancy approach
There are a few key SingleStoreDB features that can help you make a decision on your multi-tenant strategy.
SingleStoreDB’s bottomless architecture enables you to copy data to an object store (S3 compatible) asynchronously. This means that the data loaded in the cluster nodes can be totally copied to S3 when you don't need to use that database anymore, and lazy-loaded when you need it again. The feature to attach/detach a database can be helpful in a lot of scenarios, depending on your specific use case. Of course, there is a latency on those commands because you need time to get data from the object store and load it in memory, and that will depend on the tables used and the data size.
Other noteworthy features of SingleStoreDB that help with the “noisy neighbor” problem are the workload management and resource pool features. Workload management automatically helps your cluster perform better when there are spikes in concurrency by prioritizing cheap queries. The resource pool feature allows you to create a pool of resources by specifying CPU, memory and concurrency of queries in that pool — limiting the effect those queries can have on the entire cluster.
Finally, row-level security at the database level can help mitigate any security concerns with the tenant column approach.
Some of these multi-tenancy models make day-to-day operations as simple as running one query, or as complicated as lots of tooling around to manage your tenants. Let’s dive into four real-life scenarios where the choice you make on day one can impact your operations later down the road.
With robust functionality and distributed, relational model, SingleStoreDB delivers what most multi-tenancy approaches require: Reliable, elastic scalability, high query ingest and concurrency and the ability to deliver real-time analytics on constantly changing data.
Try it out for free