Author

Rick Negrin
VP, Product Management

Data Intensity1 min Read
How to Evaluate Your Application's Data Intensity with SingleStore's Data Intensity Assessment Calculator
Modern applications live in the cloud, and access and generate large amounts of data. This data needs to be aggregated, summarized and…
Read Post

Product2 min Read
Data Is Your Most Important Asset
What is the one thing that if it was taken away, it would irrevocably cripple your business? Your data. Your data is all the information you…
Read Post

Product5 min Read
Eliminating Database Sprawl, Part 2: How Three Companies Beat the Odds
What pain points does SingleStore solve to help eliminate database sprawl? Rick Negrin, Field CTO at SingleStore, explained how a Tier…
Read Post

Product4 min Read
Eliminating Database Sprawl, Part 1: How to Escape a Slow-moving Car Crash
How do companies veer dangerously close to the perils of database sprawl? Even more importantly, how do they correct their course? Find out…
Read Post

Product5 min Read
Limitless Point-in-Time Recovery: What's in It for DBAs, Execs and Developers?
Limitless point-in-time recovery (PITR) is one of the hottest features in SingleStore 7.5, offering new, mission-critical capabilities for…
Read Post

Product4 min Read
SingleStoreDB Self-Managed 7.5 Now Available
Today’s data-intensive applications require a completely new approach to data management. Data-intensive applications share four qualities…
Read Post

Data Intensity9 min Read
Data-Intensive Applications Need A Modern Data Infrastructure
Gone are the days when applications were installed locally, had a handful of users at any one time, and only focused on basic data entry and…
Read Post

Product4 min Read
SingleStoreDB Self-Managed 7.3 is Now Generally Available
SingleStore is proud to announce the general availability of SingleStoreDB Self-Managed 7.3 for immediate download. SingleStoreDB Self…
Read Post

Product2 min Read
SingleStore Now Available on Red Hat Marketplace
We have exciting news: SingleStore is now certified and available through Red Hat Marketplace. This open cloud marketplace makes it easier…
Read Post

Data Intensity13 min Read
Eliminate Data Infrastructure Sprawl, Stopping the Insanity
There is a trend in industry which says that modern applications need to be built on top of one or more special-purpose databases. That…
Read Post

Data Intensity12 min Read
NoSQL Databases: Why You Don’t Need Them
It’s time for us to admit what we have all known is true for a long time: NoSQL is the wrong tool for many of the modern application use…
Read Post

Product11 min Read
Why SingleStore: SingleStoreDB Self-Managed 7.0 In Depth – Webinar Recap 3 of 3
SingleStore VP of Product Rick Negrin describes the upcoming SingleStoreDB Self-Managed 7. 0 in depth. He describes crucial features such as…
Read Post

Product9 min Read
Why SingleStore: SingleStoreDB Cloud In-Depth – Webinar Recap 2 of 3
This blog post shares the second part of our recent webinar, SingleStoreDB Cloud and SingleStoreDB Self-Managed 7.0 Overview . (Part 1 is…
Read Post

Product10 min Read
Why SingleStore: SingleStore Database Overview – Webinar Recap 1 of 3
This blog post shares the initial section of our recent webinar, SingleStoreDB Cloud and SingleStoreDB Self-Managed 7.0 Overview . In this…
Read Post

Data Intensity10 min Read
The Need for Operational Analytics
The proliferation of streaming analytics and instant decisions to power dynamic applications, as well as the rise of predictive analytics…
Read Post

Data Intensity11 min Read
Pre-Modern Databases: OLTP, OLAP, and NoSQL
In this blog post, the first in a two-part series, I’m going to describe pre-modern databases: traditional relational databases, which…
Read Post
Data Intensity
Selecting the Right Database for Your Scale Demands
Scaling distributed systems is hard. Scaling a distributed database is really hard. Databases are particularly hard to scale because there are so many different aspects you have to consider. For example, which dimension is growing, and how fast, will dictate what resources you need to increase in order to handle the workload and maintain your service level agreements (SLAs).
Some problems require more CPU, some more RAM, and some more storage. Many times, you need a combination. Knowing how much you need up front is tough to determine. In addition, requirements change over time. When demand increases, you need to be able to dynamically add hardware resources, without compromising your availability. Most legacy databases only support a single-box model making it difficult to scale and maintain availability. SingleStore is a scale-out relational database with ANSI SQL support and, as such, is a much better solution for workloads where scaling up on a single machine is not viable.
Let’s look at the different dimensions.
Read Post

Product
Announcing SingleStoreDB Self-Managed 6.7: The No-Limits Database Gets Even Faster, Easier, and Now Free to Use
Today, we announced the general availability of SingleStoreDB Self-Managed 6.7. This latest release introduces features that provide faster query performance and improved usability with more robust monitoring, simplified deployment tooling, and a new free tier that anyone can use.
Data and the ability to analyze and act on it are the most critical capabilities of modern businesses. Leveraging data quickly and effectively is key to delivering new customer experiences, enabling competitive advantage, and optimizing operations.
Query performance, concurrency, and scalability are critical requirements for modern applications and analytical systems, yet legacy databases and big data systems struggle to keep up. SingleStore addresses all these challenges, and SingleStoreDB Self-Managed 6.7 makes processing and analyzing data easier and faster for both streaming data and big data, while continuing to support familiar, standard SQL, and the broad ecosystem that uses it.
New advances include a free production license for SingleStoreDB Self-Managed 6.7, within limits; dramatically faster queries; new tooling for monitoring and management; new native connections; and a new developer forum for tips and troubleshooting.
SingleStoreDB Self-Managed 6.7 Now Free to Use, Making Maximum Performance Available to Everyone
SingleStoreDB Self-Managed 6.7 can be used for free in production, up to a limit of 128GB of RAM capacity, with unlimited disk/solid state drive (SSD) usage.
Unlike many of our competitors, the free tier of SingleStore offers all the capabilities found in the Enterprise licensing tier of the database, including the full performance and security features.
The free tier offers tremendous value. Customers of legacy databases are paying \$100,000 and more per year for similar capacity for operational systems, data warehouses, and data marts. Yet these existing systems have fundamental scalability limits that are not present with SingleStore.
The free tier is backed by extensive documentation and a community of users, but does not include professional SingleStore support. Companies looking for interactive, ticket-based support and guidance, or who want more than 128GB of RAM capacity (with no limits on disk/SSD usage), will need to upgrade to an Enterprise license. Contact SingleStore for more details.
Faster Queries Out of the Box
Getting excellent database performance often requires experimentation. Developers and analysts try various query hints and tuning steps as stop-gaps when query optimization, execution, or indexing don’t work fast enough. The result is more time spent tuning, and less time discovering and acting on new insights.
Star Joins are a type of query commonly used in data analytics, in which a large table is joined to one or more smaller tables. SingleStoreDB Self-Managed 6.7 has made Star Joins faster – up to 100x faster, in some cases – by leveraging query vectorization and single instruction, multiple data (SIMD) technology, a kind of parallel processing. Queries will also execute faster at first run with improved sampling eliminating cumbersome hints and tuning often associated with new query development. You can read more about SingleStoreDB Self-Managed 6.7 performance results.
New Monitoring and Management Tooling Simplifies Tuning and Deployment
SingleStore Studio gives users a simple way to visualize the health of SingleStore clusters across resources, events, and queries. The new query inspector, for example, enables immediate discovery and debugging of query bottlenecks across the spectrum of CPU, memory, disk, and IO, which enables faster tuning and improved integration with other management and DevOps tools.
Read Post

Product
A New Toolset for Managing and Monitoring SingleStore
Today, we introduced a new toolset for SingleStore that makes it easier to deploy, manage, troubleshoot, and monitor SingleStore. The new tools are easier to use, integrate more broadly, and contain new capabilities.
SingleStore already has management tools for these operations, so why create new ones? First, the existing toolset is a closed loop. The tools only work with SingleStore and it is not easy to integrate with them. We’ve had many customers request integration with the broader ecosystem of tools they already use for configuration, alerting, etc.
The new tools have been designed from the ground up to be easily integrated. In addition, the new toolset takes advantage of the role-based access control already built into the SingleStore engine. This gives you one place to manage which capabilities are available to each user.
Lastly, the new tools are completely separate from the SingleStore engine, which allows us to iterate on them faster. So you will see new releases of the tools on a faster cadence going forward.
The tools are broken down into two packages: command-line tools for managing SingleStore, provided in the toolbox package, and the monitoring UI provided in SingleStore Studio. All of the tools can be used together.
Management Tools
The three command-line tools for managing SingleStore are memsql-deploy, memsql-admin, and memsql-report:
memsql-deploy is used to deploy the SingleStore software (the engine) to the various host machines that make up a SingleStore cluster. It can also uninstall and upgrade the software. memsql-deploy is also used for simple cluster installs, where it wraps a number of calls to the other tools.memsql-admin is used for creating and managing a cluster of nodes that have SingleStore running on them. You use it to create and delete nodes, assign each node a role, and stitch nodes into a cluster. You also use it for miscellaneous operations, such as setting a license for a cluster and setting configuration variables.memsql-report ensures that you have an optimal environment for running SingleStore. It runs checks to evaluate the health of the hardware, operating system, and cluster. The output of the checks come out as a simple Pass/Fail so you can easily spot potential trouble before it affects your running cluster. In fact, we recommend running memsql-report on your hosts before you install SingleStore, to ensure that your hardware and operating system are optimally configured.
Read Post

Product
Implementing a Scalable Multi-tenant Service
Many organizations find they need to support a database-as-a-service workload. This is a model where multiple organizations have data living side by side to reduce management costs but each organization requires a strong namespace and security boundary around its data. Organizations also often want to leverage the existing ecosystem of tooling available on top of the data layer. This is a challenge to do using legacy databases because they do not scale out and a challenge with NoSQL systems because they lack the tooling ecosystem and structure their users want.
This pattern appears in two different scenarios.
The “Enterprise Database-as-a-Service”
This is where a large enterprise with an IT team that wants to enable self-service for departments that build their own applications or manages their own data for doing analytics. They need a database to do it but don’t want to be responsible for managing the hardware or system software. The data owners own the logical management (defining their schema, tuning their queries, creating indexes, etc.) but the IT department manages the physical aspects (hardware, system software, and capacity management). The number of databases is usually a multiple of the number of departments that need this functionality, which means there could be hundreds or thousands of databases in a large organization. Some databases are small (tens or hundreds of gigabytes) with a few that are large (multiple terabytes). The activity on the database (i.e. how many users or applications are querying it at the same time) will also vary wildly depending on the use case. Data owners will also have varying levels of SLAs on availability and durability of the data. This makes resource consumption highly variable. Given these requirements it is a challenge for IT to operate and manage the databases and maintain the SLAs required by the data owners.
The “Multi-tenant SaaS Service”
This is where a company is building a multi-tenant service that is sold to organizations where each organization owns its data. An example of this would be a marketing analytics service where the service takes in data about how a marketing campaign did (with data coming from many sources), then offering canned and/or ad-hoc analytics over the campaign results. In this case, the service owner wants the ability to easily separate the data for each of its customers for security, namespace, and performance reasons while still retaining a single control point for management (i.e. a single cluster to manage). Each database likely has the same schema, and the schema needs to be updated to keep it in sync, with perhaps small customizations. This amortizes the cost of management so that the service owner can maintain profitability as it acquires more customers. In addition, customers want to support both very large and very small customer databases without worrying about over-provisioning, under-provisioning, or hitting scale limits of a single node.
SingleStore is a great platform for building such a system. A database in a SingleStore cluster is a natural namespace and security boundary. A cluster is a perfect single control point for managing the system. Additionally, SingleStore is a distributed database so you don’t have to worry about one of your customers hitting a scalability limit, like you would for single box databases, such as Azure SQL DB or Aurora. In legacy database systems the largest databases have to be manually sharded in the application layer because they outgrow a single node. Manually sharding like this is very difficult and error prone. SingleStore handles this naturally and transparently. Customers can grow their usage as needed and simply use more resources in their cluster. These operations are online and therefore transparent to the data owners. Customers, especially smaller ones, share resources, which keeps costs low. Because of the sharding model of SingleStore, the workload is naturally spread evenly across the cluster. When your aggregate usage grows larger than the cluster can handle, a cluster can be increased by simply adding more nodes to handle the load. SingleStore also has a resource governance mechanism to prevent one database from unfairly using more than its fair share of resources. Last, SingleStore supports both transactional and analytical workloads making it appealing regardless of the workload type that you need to support.
So if you are an enterprise architect tasked with building a database-as-a-service model or a company building a new software-as-a-service offering, then SingleStore is a great option for your data layer.
Read Post

Product
Announcing SingleStoreDB Self-Managed 6.5
The World’s Fastest Database Just Got Faster
At SingleStore, we’re on a mission to create the world’s best database. On our path to achieve that goal, today we announced our latest product, SingleStoreDB Self-Managed 6.5, is generally available.
As the no-limits database™, SingleStore provides maximum performance, scalability, and concurrency for your most important applications and analytical systems. This latest release further improves on what customers love about SingleStore, advancing performance and adding capabilities to accelerate time to insight and simplify operations.
Today, the pace of innovation and competition means that businesses are asking more and more of their technology teams. You must provide flawless customer experiences. The business relies on you for consistently fast insights. And you must create and scale these systems as quickly as possible. To meet these demands, SingleStore allows customers to run both transactional and analytical workloads at web scale with a single database — all without having to abandon familiar, standard SQL.
This means that whether you are building a new application, modernizing existing applications or infrastructure, or trying to improve database performance, SingleStore can help everyone quickly achieve the results business demands without having to learn a host of new tools or abandon existing application logic.
New advancements in SingleStoreDB Self-Managed 6.5 take these abilities even further.
Experience Latency-Free Queries
The world’s fastest SQL database just got faster. Customers – whether end or internal users – have no tolerance for latency. Faster database response times means more data can be analyzed more frequently while improving the accuracy of insights.
SingleStoreDB Self-Managed 6.5 has made queries 2-4x faster than the previous version (which was already 10x faster than legacy database providers) and improves data ingestion speeds for high data volume applications. A series of query improvements extends breakthrough performance of group-by, joins, and filter queries to deliver insights in milliseconds across billions of rows.
Read Post

Product
Full-Text Search in SingleStore
Today, we are sharing that SingleStore now has Full-Text Search, a highly requested feature, built into the product. Thanks to customer feedback, we are delighted to make it available for all companies building real-time applications.
What is Full-Text Search?
You might be thinking, “SingleStore is pretty fast at searching things and they already support large strings, so why do they need to add anything?” So let’s start with a description of Full-Text Search (FTS). Full-Text Search is different than querying for a string value in two ways.
The first is a performance difference. Sometimes when you search for text you know exactly what you are looking for. For example, if you want to see a list of mystery books you would ask the database to find all the books with a subject of “Mystery”, easy! But what if you have many documents, each with a lot of words in it, and you don’t know what information the documents contain. Specifically, suppose you wanted to find all the documents that talk about “Canada”. You could use a LIKE query or a regular expression, but that is going to be pretty slow with a regular database index, no matter how fast your database is. What you need is a different kind of index that keeps track of all the words in the document and how often they appear. So when you search for Canada, the system can quickly determine if the word Canada appears in the document and how many times. This is called an inverted index.
Secondly, your search might have a lot of matches. FTS includes the notion of relevance. Relevance tells you not just whether there is a match but also how good the match is. This allows you to sort the results showing the most relevant results first.
The combination of the inverted index data structure, stemming capabilities and relevance scoring are why Full-Text Searching is advantageous over searching strings using regular database queries.
How is Full-Text Search integrated with SingleStore?
SingleStore is a system that understands the notion of an index, so integrating FTS into SingleStore was relatively straightforward. We just had to teach SingleStore to understand this new kind of index. Like most technologies that support full-text indexing, we took advantage of the open-source Lucene technology. Lucene is a robust existing technology that is essentially the standard for full-text searching in the industry. It has been around a long time, so we didn’t have to reinvent the wheel.
To create an index, a user simply has to specify which columns they want to be full-text indexed when creating the table. Under the covers this causes a full-text index to be created. The index is partitioned using the same sharding pattern as the table, allowing for a distributed index that can also participate efficiently in local joins. The lifecycle of the index is governed by the lifecycle of the table, so you don’t have to do any extra work. All the management is handled by the system (failover, rebalance, node addition and removal, backup/restore, etc.). In addition, the index is bound to the same security mechanisms as the table, so the data is protected by the role-based access control features in SingleStore. This ensures the data in your index is only available to the users who have been granted permission to see the data. So all the management of the full-text capability is transparent to the user.
To make use of the index, you simply include the MATCH statement directly in your SQL query. This is the great part about the integration with SingleStore. You can mix and match your full-text search along with structured search in your filter. This makes the process very easy for someone familiar with SQL to take advantage of the full-text index. Because the index is automatically sharded, users get to take advantage of all the machines in the cluster to execute the lookup in the index resulting in blazing fast results. The full-text query itself is the standard Lucene syntax, which will be familiar to those who have used other full-text systems.
How would I use it?
Utilizing FTS in conjunction with a regular database application is helpful across a number of different use cases. Here are a few examples.
AutoComplete
The simplest case is doing something such as an autocomplete functionality in a search box for your application. In this case, there are no documents in the application, it simply wants to quickly identify which results of a string column match a subset of characters. A simple query against the table using the match statement and returning the full-text of the column accomplishes the task.
Mixing Structured and Unstructured Query
A more complex example shows up in industries such as insurance or real-estate, which are inherently document-centric. In these use cases, the documents are a core part of the data set. Let’s take a real estate transaction as an example. There is a lot of data that is well understood (and therefore easy to structure) such as location, number of bedrooms and bathrooms, year built, etc. So you can imagine if someone wants to query the system using a structured query to limit a specific location and age of house, but also look for agreements that have a particular phrase in the doc. In this case, the user constructs a query with a structured search filter for location and age and adds a match statement in the filter to look for the phrase.
Some Examples
First let’s create a table. This looks like a regular CREATE TABLE statement. The only thing added is the FULLTEXT statement with the column names “body” and “title” added. This means we want a full-text index on the body column and the title column. Note you can only do full-text on string types. That’s it, your full-text index is ready to use. Just insert some data and you can start writing queries.
CREATE TABLE books(
id INT UNSIGNED,
title VARCHAR(100),
publish_year INT UNSIGNED,
body TEXT,
KEY (id) USING CLUSTERED COLUMNSTORE,
FULLTEXT (title, body));
Now let’s put some data into the table.
INSERT INTO books VALUES (1, 'Typing Test', 1981, 'The quick brown fox jumped over the lazy dog');
INSERT INTO books VALUES(2, 'Ylvis', 2014, 'What does the fox say');
INSERT INTO books VALUES(3, 'Gone With the Wind', 1936, 'In her face were too sharply blended the delicate features of her mother, a Coast aristocrat of French descent, and the heavy ones of her florid Irish father. But it was an arresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel, starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black brows slanted upward, cutting a startling oblique line in her magnolia-white skin--that skin so prized by Southern women and so carefully guarded with bonnets, veils and mittens against hot Georgia suns.');
Now we run optimized tables to ensure the index is updated. (Inserts are flushed asynchronously to the index). You don’t need to do this in practice as it will flush reasonably quickly, but for the purposes of this example you want to make sure the inserts are in the index.
`OPTIMIZE TABLE books FLUSH;`
Here is an example of a basic full-text query. The query looks for all rows that have a title or body with the string “fox” in it.
SELECT *
FROM books
WHERE MATCH (title, body) AGAINST ('fox');
This example uses a structured predicate (matching only books published in 2014) with a full-text predicate (books with the word fox in it).
SELECT count(*)
FROM books
WHERE publish_year = 2014 AND MATCH (body) AGAINST ('fox');
We also support returning a relevance score, which is a score for how successful the match is. You can put the relevance as an output column in your query.
SELECT id, title, MATCH (body) AGAINST ('Fox') relevance
FROM books
WHERE MATCH (body) AGAINST ('Fox');
You can also use the relevance score in the WHERE clause to filter the results.
SELECT id, title, MATCH (body) AGAINST ('fox')
FROM books
WHERE MATCH (body) AGAINST ('fox') >= 0.12;
Doing FTS with SingleStore is that simple. But don’t take my word for it. Download the beta, and start playing with the Full-Text Index feature right now.
Read Post

Data Intensity
Machine Learning and SingleStore
What is Machine Learning?
Machine learning (ML) is a method of analyzing data using an analytical model that is built automatically, or “learned”, from training data. The idea is that the model gets better as you feed it more data points, enabling your algorithm to automatically get better over time.
Machine learning has two distinct steps, training, and operationalization. Training takes a data set you know a lot about (known as a training set), then explores the data set to find patterns, and develop your model. Once you have developed your model you move on to operationalization. This is where you deploy it to a production system where it runs to score new data, then the system returns the results to the user.
How to Get Started with Machine Learning
To accomplish these steps, you will commonly make use of several tools. You need a tool to bring the data in, a tool to cleanse the data, libraries to develop the calculations, and a platform for testing the algorithm. Once you are ready to operationalize the model, you need a compatible platform to run your model and an application to process and/or display the results.
Using SingleStore for Machine Learning Operationalization
SingleStore is a distributed database platform that excels at doing the kind of calculations typically found in a machine learning model. SingleStore is a great environment for storing training data, as the user can run it in a small configuration, such as a single node model on a laptop. Because SingleStore is compatible with MySQL, a data scientist could also use a MySQL instance for the algorithmic development.
Where SingleStore really shines is the operationalization of the model. The key requirements for effectively operationalizing an algorithm are the following:
Ingest the data quicklyFast calculationsScale out to handle growthCompatibility with existing librariesA powerful programming language to express the algorithmOperational management capabilities to ensure data durability, availability, and reliability
SingleStore is a perfect fit for these requirements and can be used in an ML solution in a couple of different ways.
Three ways to Operationalize ML with SingleStore
Calculations Outside the Database
SingleStore can be a fast service layer that both stores the raw data and serves the results to the customer. This is useful when the creation of the model is done with existing infrastructure, such as a Spark cluster. A real-world example of this is a large energy company that is using SingleStore for upstream production processing. The company has a set of oil drills all around the world. The drills are expensive to fix, because of the cost of parts and the cost of the labor (as the drills are often in remote locations). Keeping the drill from breaking results in a dramatic cost savings. The drills are equipped with numerous sensors (collecting heat, vibration, directionetc.) that continuously send data back to a Kafka queue. Data is pulled from this queue into a Spark cluster, where a PMML (Predictive Model Markup Language) model calculates the health of the drill. The scored data then lands in SingleStore, and is served to the drill operators in real time. This allows the operators to slow down or reposition the drill if it is in danger of damage. Having a data platform that can continuously ingest the scored data at high throughput, while still allowing the models to be run, is critical to delivering this scenario. Because SingleStore is a modern scale-out architecture and sophisticated query processor, it can handle data processing better than any other database in the industry.
Calculations on Ingest
Some customers don’t want to maintain a separate calculation cluster, but still want to make use of existing statistical or ML libraries. In this case, they can use the SingleStore Pipelines feature to easily ingest data into the database. Customers can then execute the ML scoring algorithm as data arrives, using the transform capability of Pipelines. Transforms are a feature that allow customers to execute any code on the data prior to its insertion in the database. This code can easily integrate or call out to existing libraries, such as TensorFlow. The results of the calculations are then inserted in the database. Because SingleStore is a distributed system and SingleStore Pipelines run in parallel, the workload is evenly distributed over the resources of the cluster.
Calculations in the Database
Sometimes it is more efficient to do the scoring calculations as close to the data as possible, especially when the new data needs to be compared with a larger historical set of data. In this case, you need a language to encode the algorithm in the database. It is important the language is expressive enough to enable the algorithm and core operations fast, allowing efficient querying over the existing data, and can be composed with other functionality.
One example of an organization that has successfully used this approach is Thorn, a non-profit that uses image recognition to find missing and exploited children. The application keeps a set of pictures of exploited children in its system, and matches the faces of those children to new pictures that are continuously culled from websites around the country. The new pictures are reduced to vectors using a deep learning-based approach, and are matched against vectors representing the base pictures.
Prior to using SingleStore, the matching process would take hours or days. By using the SingleStore high-performance vector DOT_PRODUCT built-in function, processing the incoming pictures can be done in minutes or seconds. Another image recognition example is Nyris.io, which uses a similar technique to match product photos using deep learning coupled with fast in-database DOT_PRODUCT calculations. The application quickly matches user provided images with reference product images to enable ecommerce transactions.
To build an operational ML application with SingleStore, please visit singlestore.com.
Read Post