Multi-tenancy — a very early decision for any company or piece of software that aims to serve multiple customers at the same time, each with their own data — is hard. Multi-tenancy is common in SaaS companies, and if you don’t have time to stop and really think about this decision at the beginning of your project, you will have to live with that for some time — and migrating from one strategy to another might not be as simple as you’d expect.
First, let’s dive into some of the models that commonly use multi-tenancy. Keep in mind that the ideas and examples here are more focused on relational databases, but the same applies to most kinds of datastores available.
In this model, you have one master schema and one version of each of your tables. All customer data is stored in the same set of tables, independent of the customer. To distinguish data between customers, every table has a tenant ID column (or whatever you prefer to call it).
This means that for every query you develop in your application, you better make sure it has that tenant ID column filtered to the correct customer issuing the request. There are a lot of ways to ensure this accuracy, but it’s very dependent on the language and frameworks you are using to access the data. As a rule of thumb, write a lot of tests around that logic with test cases ensuring that even when a developer makes a mistake, data from one customer will not show up when another customer is using the app (since this can be really harmful to your company’s reputation).
Apart from the app logic to make sure that you are querying the correct data, there are performance and maintenance considerations as well. If all data is within one table and that table will be served by one database server (in most cases), there can be a few problems:
There can be a “noisy neighbor”— one enterprise customer that imported a lot of data and has a lot of users, where the database resources are not properly divided and customers can be compromised.
Depending on the database you’re using, you may experience “locking.” When running transactions or DDL queries, you risk locking the table for minutes. It’s also not trivial to dedicate database resources to a certain customer in that scenario.
When using SingleStore, multi-tenancy by tenant column is a very good approach. Keep in mind that as the database is distributed, you want to think about a good shard key so data is evenly distributed across the cluster nodes. This can be achieved by using the tenant ID with some extra columns. Using only the tenant ID is not a good idea, since queries from one tenant will be concentrated on a small number of nodes.
Tables can be very big, but ColumnStore allows you to query that data super fast. By using clever techniques, like compression, queries executed in tables with billions of rows can be very fast. That store model has been evolving steadily throughout the years and is moving toward the Universal Storage concept, where you have a table type that is capable of running analytical queries and high transactional workloads.
Changing the table schema is also something that is very easy with SingleStore since all operations are lock-free. You can easily create new indexes, alter columns, and more without worrying too much about your application being unavailable. Of course, you need to keep in mind the processing and memory requirements necessary to perform those operations.
In this model, each tenant has a schema with its own set of tables. It provides a higher level of isolation between tenants, but you will end up with a lot more tables. Consider the number of tables your application needs, and also the number of customers you expect to use your product.
You can serve each schema from a completely different database server for each tenant, or you can serve them from just one and distribute the load as you grow the user base. See the operations section below for more on that.
Another downside of this approach is that you will end up with a high number of tables in your database server. That number can easily reach millions of tables — consider an app with 100 tables and 10,000 tenants. Most databases do not work well with such a high number of tables. For instance, MySQL will suffer because it has multiple caches that keep information about tables in memory as it has to load the table definitions used on each connection. If you find yourself in that scenario, try optimizing the table cache vs. buffer memory so you have a good balance, and the server is stable.
SingleStore has a very different approach to support schemas. Since the main use cases it’s trying to solve involve data-intensive applications, there exists a notion that there won’t be a large number of schemas in the database cluster. Each schema will have a huge amount of data (much more now with unlimited storage support), and the goal is to support transactional and analytical queries on tables with literally billions of rows.
For each schema, there is memory allocated to support various buffers, caches, and transaction logs, as well as pre-allocated disk space for each partition. This can easily and quickly fill your cluster disk if you create hundreds of databases in a cluster with default settings. You get 256MB per partition, per database only for transaction logs, and it is recommended to have one partition per CPU for better performance. That means for a 32 CPU cluster you need roughly 8GB per database, just to sit there without any data. That becomes impractical if you need several databases — so unless you know you will have very few tenants and can live with the overhead, this is not a good approach for SingleStore. There are some system variables that control the size of those files, so you can reduce them in any case.
As another alternative to the previous approach, instead of having multiple schemas, you can have one (or fewer) schema and separate customer data into different tables. This will leave you with a few schemas that have a large number of tables. It also means you have almost the same data isolation, without many of the same downsides you would get with locks on DDL operations. But it’s also important to keep in mind that customers in the same schema are also sharing the same resources, so the “noisy neighbor” problem still exists.
This approach works better with SingleStore since tables have a much smaller overhead than databases — but don’t expect to have millions of tables. The table implementation offers extremely similar isolation to what you would get using databases. And since each table is unique, queries to multiple tenants will have to be compiled for each one, even when the query is exactly the same except by using different tables. This also has some effect on other databases, as the engine needs to somehow figure out the table schema to run the query — and having multiple tables can mess up the caches.
All the models mentioned above can be mixed together to help you achieve your goals. For instance, you can have any number of servers, with each server having a set of tables for tenants in the “one table per tenant” scheme, multiple servers with the tenant column in all tables, and so on. You get the idea. That can be a solution for a problem you have in your current setup — for instance, adding extra servers. In some cases, you can adapt your application to the hybrid model very easily, and in others, it can be difficult depending on your stack and the number of services that need to be changed.
With SingleStore, it is not very cost-effective to have separate clusters for the same application since each cluster has at least one master and a few aggregators. To scale the cluster, you can add more and more leaf nodes adding CPU and RAM to the problem, simplifying the operation and application code as you do not need to handle the logic of figuring out where your customer data is located before accessing it. You won’t be able to escape that if you need to serve customers globally, so you can be prepared if you want to reach that goal.
I will talk about a few SingleStore specific features that can help you make a decision on your strategy — but it’s worth keeping in mind that other databases may have similar features that can help.
SingleStore’s bottomless architecture enables you to copy data to an object store (S3 compatible) asynchronously. This means that the data loaded in the cluster nodes can be totally copied to S3 when you don’t need to use that database anymore, and lazy-loaded when you need it again. The feature to attach/detach a database can be helpful in a lot of scenarios, depending on your specific use case. Of course, there is a latency on those commands because you need time to get data from the object store and load it in memory, and that will depend on the tables used and the data size.
Other noteworthy features of SingleStore that help with the “noisy neighbor” problem are the workload management resource pool features. The workload management automatically helps your cluster perform better when there are spikes in concurrency by prioritizing cheap queries (it is a lot more complex than that, but that is the idea). The resource pool feature allows you to create a pool of resources by specifying CPU, memory, and concurrency of queries in that pool, limiting the effect those queries can have on the entire cluster.
Finally, row-level security at the database level can help mitigate any security concerns with the tenant column approach. Each database implements this differently — and in SingleStore, it is not straightforward to maintain since you will probably need to change the application code to keep the row mappings to roles that can access them.
Some of those models make day-to-day operations as simple as running one query, or as complicated as lots of tooling around to manage your tenants. Let’s dive into a few real-life scenarios where the choice you make at day one can catch up with you later down the road.
What are the steps you need to perform to get a new customer or company to use your service? If you want to provide a free tier and allow new users to subscribe after that, you should prepare for that — and the database model you choose will impact the time it takes you to do this. The tenant ID column approach is the easiest here; you don’t have to do anything special, it is only a new customer with a new tenant ID, and life goes on. If you have multiple servers and tenants distributed among them, you need a catalog to map tenants to database servers, and some sort of logic to decide where a new customer should be placed.
The other approaches all involve some sort of replication logic as well. If you have one schema per customer, when a new customer wants to use your product, you need to create that schema and all associated tables (using the best tools available in your database). This can take some time depending on the database and number of tables, so you may need a buffer of unused schemas on standby. You also need a base schema to copy the structure from, usually without any data on it. Again, you need a catalog to determine which schema belongs to which customer.
Usually developers will have database migrations to control database schema changes like adding columns, tables, indexes, etc. This is easy to execute if you have one schema, but remember that depending on the database engine you are using, some DDL operations might lock your tables for some time depending on the data size. Luckily for SingleStore, this is not the case as operations are lock-free.
If you have a lot of database servers and schemas to run your migration script, this can start to become a complicated process. You need to make sure that all the schemas are in sync with each other, or your application will eventually fail for some of your customers. You also need to make sure that the application code can handle the schema change gracefully while you are running the migration because some schemas will be migrated while some will not. And finally, if you are onboarding new customers while the schema is changing, make sure you are running the script first in the base schema (the one that is copied for each new customer).
As a SaaS company, you need to analyze your customers’ behavior and product usage. This drives a lot of decisions on how to improve the product for customer engagement and retention. As you can imagine, this will vary a lot based on the multi-tenancy model being used.
If all data is in the same location, simple queries will allow you to analyze all customers at the same time, grouping by the tenant ID. In cases where data is separated in multiple servers, schemas and even tables, you basically need to analyze each customer separately and consolidate that somewhere — something that requires a lot more work. It also takes more time to run 10,000 queries than one query (independent of its complexity), so it is a lot harder to get a snapshot of your customer base at any one point in time.
If the application is used by customers around the world, you probably will end up with multiple servers located close to them. As you grow and add more locations, data migration can be necessary to minimize latency. And in some models, you may need to do migrations based on the performance needs of the customers — like migrating a large enterprise customer consuming all resources on the current server to a new one with more resources, or fewer customers.
Migrating tenant data from one location to another usually incurs some downtime for that particular customer while you transfer the data. If you allow transactions to be applied in the origin server while the migration is in progress, you will likely end up with an inconsistent copy of the data on the other side. I would say that this is one of the biggest pain points companies with a large, distributed customer base will have in their data tier as getting all details of that orchestration is hard. The solution? Try to automate as much of the process as possible with as many fail-safes in place to prevent data loss.
Aside from this post being way longer than I intended in the first place, this is a very small overview of the subject. Check out the references to read more about the strategies, and get more details about some of the stuff we talked about.
I hope this was helpful, and good luck with choosing the model that best suits your use case!