Materialized views in the future?

christoph · May 31, 2020, 11:10am

Hi everyone,

we are currently working on a BI use case in which we will have near real time requirements.
Essentially we will get the main entities (e.g. employee) pushed into MemSQL real time and then need to do analytics on those entities.
This will however require us to join multiple tables.
A view would make this easier for the BI folks but we’ll still have the join costs at run time.
Are there any plans to support materialized views?

Thanks!
Christoph

nikita · May 31, 2020, 3:04pm

Christoph, thank you for suggesting a feature. We have it on the roadmap, but can’t commit on the date yet. I’m wondering if something can be done with secondary indexes and vectorized joins that we already support. Can you share your schema, queries, sized of tables, and how selective predicates are.

christoph · June 5, 2020, 7:12am

Thanks for reaching out Nikita!
Right now I cannot share the exact schemas but here are some more details:

There is going to be an “Employee” table with roughly 350k entries
There is also going to be a EmployeeHistory" table with multiple million rows
Employee salaries are going to be increased in “seasons”
There will be many many seasons as they will also be used for simulations

Once we have the exact requirements we will have a tool with 350k+ employees, millions of historic entries, thousands of seasons and in each season different salary increases per employee (e.g. you month fixed salary may change, your bonus may change, etc…).

Does this help you to get a very rough understanding on what we are trying to implement?
Thanks,
Christoph

nikita · June 5, 2020, 4:12pm

Yes, and how fast do you want the query to work? Is this a three way join?

christoph · June 11, 2020, 8:11pm

our goal is that every request is answered in <250ms as 250ms is perceived as instant by humans. Since there is latency and the backend needs some time, we are trying to keep all queries at below 100ms.
Does this help and make sense?

nikita · June 11, 2020, 8:29pm

And how many joins? BTW i think we can meet your SLA without MVs

christoph · June 11, 2020, 9:44pm

I don’t know yet for sure but I assume between 3 and 6 joins.
MVs would probably have the added benefit of resulting in less CPU usage as we wouldn’t have the costs of doing the join every time the statement is executed. We are looking at a self-service use case with a bunch of Tableau users so it is difficult to forecast the concurrency.
Thank you for your fast responses Nikita!

chrisgrasp · April 20, 2021, 8:47pm

@nikita Any update or public roadmap defining the materialized views support timeline?

hanson · April 20, 2021, 9:32pm

Hi @chrisgrasp – we don’t have a committed timeline for materialized views but we’re considering it for a future release. Thanks for checking back!

As you might expect some of our customers are using user-defined summary tables instead of materialized views, which can work for situations where maintaining them with your app is not too much of a burden. Mastering Data Warehouse Aggregates by Adamson is a good book about the technique.

Prajwal.Baliga · May 6, 2022, 2:16pm

Hello Singlestore,
Can we please know the latest on materialized views. Thanks

hanson · May 6, 2022, 5:25pm

We’re planning to do it. No dates I can share.

nlello · February 24, 2023, 12:50pm

Hi Hanson,

Just wondered if there was any news on this ?

Is there a page where we can view in-development/planned changes where a tentative timeline has been set ?

hanson · February 24, 2023, 11:41pm

Hi @nlello! Materialized views are in our future, but I’m not ready to share dates. In general we don’t have a public roadmap. We do share it as needed with paying customers and sales prospects.

There’s a lot of breadth to materialized views. Is there something specific you’re looking for?

fred.rabelo · December 7, 2023, 6:57pm

is materialized view available in singlestore?
I am looking for a way to speed up OLAP queries. I have all my tables normalized, as it is used by my web application, but I need to run OLAP queries on it. What is the best approach since materialized view is not available? Should I run OLAP on the normalized tables?
Or should I create some sort of pipeline to duplicate the data in a more denormalized design such as a star schema or flat table and lose the freshness of the data? what is the correct way of executing OLAP queries in a HTAP database such as Singlestore?

hanson · December 7, 2023, 10:31pm

Hi Fred! You ask a bunch of great questions. For your app, on SingleStore, I’d consider seeing if you can make analytics perform well enough on the normalized schema. Use columnstores for all the tables or at least the very large tables.

If you can’t, or if you want a flat or star schema that makes things easier for BI tools to work with, then consider transforming it to a flat or star schema within SingleStore. Use columnstore fact tables for sure (the default). Probably all the tables can be columnstores.

It’s a lot easer for the developer to use a fast columnstore schema with enough hardware to make everything interactive than it is to use materialized views, in general. MVs have some downsides because they are potentially expensive to maintain and hard to design. And if you change your query just a little bit, you can’t use the MV anymore sometimes. But for high-frequency queries with aggregates, materialized views can work well.

Here is a great book “Mastering data warehouse aggregates” on how to create user-defined summary tables if you really need materialized aggregates for some reason.

Materialized views are in our future. Projections will come first; they are a second copy of the table sorted and sharded different than the primary copy, but they depend on the table, like an index. Projections can have all or a subset of the columns of the table. Projections are a kind of materialized view and are a foundation for more general materialized views. You won’t have to wait long for projections.