Using SingleStoreDB as  a Vector Database for Q&A Chatbots
Engineering

Using SingleStoreDB as a Vector Database for Q&A Chatbots

SingleStoreDB has long supported vector functions like dot_product, which make it a good fit for AI applications that require text similarity matching. An example of this type of AI application is a chatbot that answers questions from a corpus of information.In this blog post, we’ll demonstrate how we use SingleStoreDB — along with AI models like Whisper and ChatGPT — to create a chatbot that uses the YCombinator Youtube channel to answer questions about startups, and give startup-related advice. We initially built this as a side project using another vector database, but recently converted it to SingleStoreDB.The bot is accessible here: https://transcribe.param.codes/ask/yc-s2.How We Built ItStep 1. Transcribing the videosWe first used OpenAI’s whisper model to transcribe all the videos on the YC YouTube channel. Instead of running the model ourselves, we used Replicate to run the model and give us the transcriptions, which are stored in a simple SQLite database.Step 2. Creating embeddings from the transcriptionsBecause models like ChatGPT have a limited context length of 4096 tokens, we cannot just give ChatGPT all the transcriptions and ask it questions based on the entire corpus. So, when we get a question, we need to find the parts of the transcriptions that are most relevant to the question and only give those to ChatGPT in the prompt. To do this, we need to first create embeddings for the text in the transcriptions.An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness, and large distances suggest low relatedness. Here’s a nice video explaining how this works.
Read Post
Getting Started with OpenAI Embeddings Search & SingleStoreDB
Engineering

Getting Started with OpenAI Embeddings Search & SingleStoreDB

In this article, we will look at how to use SingleStoreDB to store and query the OpenAI Wikipedia vector database dataset.SingleStoreDB has supported a range of vector functions for some time, and these functions are ideally suited for storing embeddings, doing  semantic search and using the data to provide context to OpenAI as part of the prompt. With this mechanism, we will be able to add “short-term” memory to ChatGPT.The notebook file used in this article is available on GitHub.In several previous articles, we have used some of the vector capabilities built into SingleStoreDB:Quick Tip: SingleStoreDB’s EUCLIDEAN_DISTANCE and JSON_ARRAY_PACK FunctionsUsing SingleStore, Spark and Alternating Least Squares (ALS) to Build a Movie Recommender SystemIn this article, we’ll test the `JSON_ARRAY_PACK` and `DOT_PRODUCT` vector functions with the OpenAI Wikipedia Vector Database dataset.There is an OpenAI notebook available on GitHub under an MIT License that tests several vector database systems. The tests can be run using local clients or in the cloud. In this article, we’ll use SingleStoreDB Cloud.Create a SingleStoreDB Cloud AccountA previous article showed the steps required to create a free SingleStoreDB Cloud account. We’ll use the following settings:Workspace Group Name: OpenAI Demo GroupCloud Provider: AWSRegion: US East 1 (N. Virginia)Workspace Name: openai-demoSize: S-00Advanced Settings: MarTech Application deselectedFrom the left-navigation pane, we’ll select DEVELOP 〉SQL Editor to create a new database, as follows:CREATE DATABASE IF NOT EXISTS openai_demo;Import NotebookFrom the left-navigation pane, we’ll select DEVELOP 〉Notebooks. In the top right of the web page we’ll select New Notebook 〉Import From File, as shown in Figure 1.
Read Post
Forrester
SingleStore Recognized In

The Forrester WaveTM

Translytical Data
Platforms Q4 2022

Spark-SingleStoreDB Integration
Engineering

Spark-SingleStoreDB Integration

Integrating Spark with SingleStoreDB enables Spark to leverage the high-performance, real-time data processing capabilities of SingleStoreDB — making it well-suited for analytical use cases that require fast, accurate insights from large volumes of data.The Hadoop ecosystem has been in existence for well over a decade. It features various tools and technologies includingHDFS (Hadoop Distributed File System), MapReduce, Hive, Pig, Spark and many more. These tools are designed to work together seamlessly and provide a comprehensive solution for big data processing and analysis.However, there are some major issues with existing Hadoop environments, one of which is the complexity of the Hadoop ecosystem, making it challenging for users to set up and manage. Another issue is the high cost of maintaining and scaling Hadoop clusters, which can be a significant barrier to adoption for smaller organizations. In addition, Hadoop has faced challenges in keeping up with the rapid pace of technological change and evolving user requirements — leading to some criticism of the platform's ability to remain relevant in the face of newer technologies.The good news? Apache Spark can be used with a modern database like SingleStoreDB to overcome these challenges.Apache SparkApache Spark is a popular tool for analytical use cases due to its ability to handle large-scale data processing with ease. It offers a variety of libraries and tools for data analysis, including Spark SQL, which allows users to run SQL queries on large datasets, as well as MLlib, a library for machine learning algorithms. Spark's distributed nature makes it highly scalable, allowing it to process large volumes of data quickly and efficiently. Additionally, Spark Streaming enables real-time processing of data streams, making it well-suited for applications in areas like fraud detection, real-time analytics and monitoring.Overall, Apache Spark's flexibility and powerful tools make it an excellent choice for analytical use cases, and it has been widely adopted in various industries including finance, healthcare, retail and more.SingleStoreDBSingleStoreDB is a real-time, distributed SQL database that stores and processes large volumes of data. It is capable of performing both OLAP (Online Analytical Processing) and OLTP (Online Transaction Processing) workloads on a unified engine, making it a versatile tool for a wide range of use cases.Overall, SingleStoreDB's high-performance, distributed architecture — combined with its advanced analytical capabilities — makes it an excellent choice for analytical use cases including real-time analytics, business intelligence and data warehousing. It has been widely adopted by companies across finance, healthcare, retail, transportation, eCommerce, gaming and more. And, SingleStoreDB can be integrated with Apache Spark to enhance its analytical capabilities.Using Apache Spark with SingleStoreDBSingleStoreDB and Spark can be used together to accelerate analytics workloads by taking advantage of the computational power of Spark, together with the fast ingest and persistent storage of SingleStoreDB. The SingleStore-Spark Connector allows you to connect your Spark and SingleStoreDB environments. The connector supports both data loading and extraction from database tables and Spark DataFrames.The connector is implemented as a native Spark SQL plugin, and supports Spark’s DataSource API. Spark SQL supports operating on a variety of data sources through the DataFrame interface, and the DataFrame API is the widely used framework for how Spark interacts with other systems.In addition, the connector is a true Spark data source; it integrates with the Catalyst query optimizer, supports robust SQL pushdown and leverages SingleStoreDB LOAD DATA to accelerate ingest from Spark via compression.Spark and SingleStoreDB can work together to accelerate parallel read and write operations. Spark can be used to perform data processing and analysis on large volumes of data, writing  the results back to SingleStoreDB in parallel. This can be done using Spark's distributed computing capabilities, which allow it to divide data processing tasks into smaller chunks that can be processed in parallel across multiple nodes. By distributing the workload in this way, Spark can significantly reduce the time it takes to process large volumes of data and write the results back to SingleStoreDB.Overall, by combining Spark's distributed computing capabilities with SingleStore's distributed architecture, it is possible to accelerate parallel read and write operations on large volumes of data, enabling real-time processing and analysis. The parallel read operation creates multiple Spark tasks, which can drastically improve performance.The Spark-SingleStore connector also provides parallel read repartitioning features to ensure that each task reads approximately the same amount of data. In queries with top-level limit clauses, this option helps distribute the read task across multiple partitions so that all rows do not belong to a single partition.Spark-SingleStoreDB Integration Architecture
Read Post
Utilization Monitoring in SingleStoreDB Cloud
Product

Utilization Monitoring in SingleStoreDB Cloud

SingleStoreDB is the database of choice for developers building immersive applications that require real-time analytics. To fully optimize applications developers, administrators and users alike need to understand the current system performance — as well as how to tune their queries SingleStoreDB now has enhanced native monitoring capabilities that allow users to easily visualize performance, identify potential bottlenecks, and tune and optimize queries to maximize performance as workloads scale.Let’s get into the specifics of the monitoring capabilities we offer for our cloud customers by Workspaces ( a collection of compute resources).vCPU UtilizationvCPU utilization can help identify performance bottlenecks, optimize resource usage and proactively address issues before they cause any disruption.To see performance across many vCPUs we show the overall compute load, as well as max and min to identify when workloads are unevenly distributed across the workspace.
Read Post
AI-Powered Semantic Search in SingleStoreDB
Engineering

AI-Powered Semantic Search in SingleStoreDB

SingleStoreDB can supercharge your apps with AI. In this blog, we demonstrate how semantic search can be performed on your data in SingleStoreDB — including code examples and a motivating case study from Siemens, a SingleStore customer.What Is Semantic Search?At its core, semantic search relies on natural language processing (NLP) to accurately interpret the context and intent behind a user's search query. Unlike traditional keyword-based search methods, semantic search algorithms take into account the relationship between words and their meanings, enabling them to deliver more accurate and relevant results — even when search terms are vague or ambiguous. Semantic search relies heavily on machine learning algorithms to identify language patterns and understand concept relationships. Embeddings are a key tool in semantic search, creating vector representations of words that capture their semantic meaning. These embeddings essentially create a "meaning space," where words with similar meanings are represented by nearby vectors.What Is SingleStoreDB?SingleStoreDB is a real-time, distributed SQL database designed to handle both transactional (OLTP) and analytical (OLAP) within a unified engine. With support for fast writes and efficient querying, SingleStoreDB excels at managing large-scale transactional workloads and delivering real-time analytics.SingleStoreDB is available as a cloud service (SingleStoreDB Cloud) or for self-hosted installation.SingleStoreDB is also a multi-model database and provides vector database extensions, in addition to support for relational, semistructured, full-text, spatial and time-series data. Its vector capabilities include built-in functions for vector similarity calculations such as cosine similarity and Euclidean distance. These functions can be leveraged in SQL queries to perform similarity calculations efficiently on large volumes of vector data. Moreover, filters on metadata (other descriptive data about objects for which you've created vector embeddings) can be easily intermixed with vector similarity search, by simply using standard SQL WHERE clause filters. An easy way to get started is to sign up for a SingleStoreDB Cloud trial — and get $500 in credits.Is SingleStoreDB the Optimal Foundation for Semantic Search in Your Applications?SingleStoreDB's patented Universal Storage supports both OLTP and OLAP workloads, making it ideal for semantic search use cases. Adding embeddings to your data is simple — just place the vector data in a binary or blob column, using json_array_pack() or unhex() functions.Efficient retrieval of high-dimensional vectors and handling of large-scale vector similarity matching workloads are made possible by SingleStoreDB’s distributed architecture and efficient low-level execution. You can also rely on SingleStoreDB’s built-in parallelization and Intel SIMD-based vector processing to take care of the heavy lifting involved in processing vector data. This enables you to achieve fast and efficient vector similarity matching without the need for parallelizing your application or moving lots of data from your database into your application. We previously benchmarked the performance of our vector matching functions in our blog, “Image Matching in SQL with SingleStoreDB.” We ran the dot_product function as a measure of cosine similarity on 16 million records in just 5 milliseconds.With its support for SQL, SingleStoreDB provides developers with a familiar and powerful interface for building semantic search applications. SQL can be used to create complex queries and perform advanced analytics on the text data stored in SingleStoreDB. In fact, with just one line of SQL, developers can run a semantic search algorithm on their vector embeddings, as demonstrated in the following example.SingleStoreDB's ability to update and query vector data in real-time enables us to power applications that continuously learn and adapt to new inputs, providing users with increasingly precise and tailored responses over time.  By eliminating the need for periodic retraining of machine-learning models or other time-consuming processes, SingleStoreDB allows for seamless and efficient provision of real-time insights.See Semantic Search with SingleStoreDB in Action!The following tutorial will guide you through an example of adding embeddings to each row in your SingleStoreDB database using OpenAI APIs, enabling you to run semantic search queries in mere milliseconds using Python. Follow along to add embeddings to your dataset in your desired Python development environment.Our goal in this example is to extract meaningful insights from a hypothetical company’s employee review dataset by leveraging the power of semantic search. By using OpenAI's Embeddings API and vector matching algorithms on SingleStoreDB, we can conduct sophisticated queries on the reviews left by employees about their company. This approach allows us to delve deeper into the true sentiments of employees, without being constrained by exact keyword matches.Step 1: Install and import dependencies in your environmentInstall the following dependencies in your development environment using pip3.pip3 install mysql.connector openai matplotlib plotly pandas scipyscikit-learn requestsThen start python3 —and at the python3 command prompt, import the following dependencies.import osimport openaiimport jsonfrom openai.embeddings_utils import get_embeddingimport mysql.connectorimport requestsStep 2: Create an OpenAI account and get API connection detailsTo vectorize and embed the employee reviews and query strings, we leverage OpenAI's embeddings API. To use this API you will need an API key, which you can get here. You'll need to add a payment method to actually get vector embeddings using the API, though the charges are minimal for a small example like we present here. Once you have your key, you can add it to your environment variables as OPENAI_API_KEY.os.environ["OPENAI_API_KEY"] = 'youropenAIAPIKey'openai.api_key = os.getenv("OPENAI_API_KEY")Step 3: Sign up for your free SingleStoreDB trial and add your connection details to your Python environmentWe'll go through the example using SingleStoreDB Cloud, but of course you can self-host it and run the example in a similar way. If you're going to use our cloud, sign up for your SingleStoreDB Cloud trial and get $500 in credits.First, create your workspace using the + icon next to your desired workspace group. S-00  is sufficient for this use case.
Read Post
How to Build a Charismatic Twitter Chatbot in a Few Hours
Product

How to Build a Charismatic Twitter Chatbot in a Few Hours

Discover the secret to building an interactive conversation chatbot on Twitter with state-of-the-art natural language processing in this technical tutorial — and create a chatbot that can respond to users with the appropriate context and personality.But why stop there? This tutorial also dives into advanced capabilities using the powerful combination of SingleStoreDB and MindsDB, instead of direct API integration with the GPT-4 model. Take your chatbot game to the next level — and learn how to create a more personalized, engaging user experience.In this technical tutorial, we'll show you how to create a chatbot that can interact with users on Twitter, responding with the appropriate context and personality using state-of-the-art natural language processing.To help you get started, we'll use the example of @Snoop_Stein, a Twitter bot that combines the unique personalities of Snoop Dogg and Albert Einstein. By tweeting @Snoop_Stein, users can engage with a rapping physicist who will respond with witty and intelligent remarks, all thanks to the advanced capabilities of the latest OpenAI GPT-4 model.
Read Post
Looking for a Solution to NoSQL Analytics Limitations? Here’s How to Map NoSQL JSON to SingleStoreDB Tables
Engineering

Looking for a Solution to NoSQL Analytics Limitations? Here’s How to Map NoSQL JSON to SingleStoreDB Tables

NoSQL has raised a big interest in database trends in the past 10 years, commonly referenced for scalable databases and pure OLTP speed for lookup queries. But when it comes to running analytics queries — or more complex OLTP queries —  NoSQL starts to fail.Why? Because NoSQL queries are limited to key value queries, which are very fast but sometimes require you to add an additional layer of computation on the application side to achieve expected results — where with SQL, you can simply query the result you want. Today, with the rise of distributed SQL databases like SingleStore, it’s easier to handle scalability issues you might encounter with legacy NoSQL databases.In this blog, we will go through best practices to move from a NoSQL database to SingleStoreDB — including how to quickly import JSON data into SQL tables. But first, let’s see more on SingleStoreDB. What Is SingleStoreDB?SingleStore is a distributed SQL database that handles both analytical (OLAP) and transactional (OLTP) workloads in the same table type. SingleStoreDB provides fast ingestion and high query speed for complex OLTP and OLAP queries. It provides a robust, scalable solution that is levels above what other legacy single node databases can do. There is also a managed service that can be deployed on AWS, GCP or Microsoft Azure.Moving Past NoSQL LimitationsNoSQL databases are more scalable than legacy SQL databases, handling hundreds of millions of transactions in a high concurrency environment for pure OLTP queries. But today, as data rules  the world, you need the best insights from your database via analytics dashboards, or complex OLTP queries. Unfortunately, these insights can not be obtained properly by a NoSQL database, so users often have to add a new layer to handle analytics with a data warehouse — then also add another OLTP SQL database to handle some more complex SQL queries.All of that will result in data movement, ETL processes and database sprawl which leads to high latency and poor user experiences. What we’ve seen is a progression from a SQL to NoSQL era, leading us to where SingleStore is presently — the NewSQL era.Mapping NoSQL JSON to SingleStore TablesSingleStoreDB provides excellent support for JSON, especially since our latest product improvements in version 8.0  Even better, you can ingest data directly from JSON files stored in cold storage — with high ingest speeds being one of the strongest capabilities in SingleStoreDB.To ingest data, we’ll use SingleStore Pipelines.Let’s move some data from a NoSQL database to a AWS S3 bucket, ingesting it into SingleStoreDB Here is  basic .json data stored on a bucket called nosql migration :{ "id": "1",  "name": "John Smith",  "job": "Director" ,  "Address": { "Street": "Anger street",    "StreetNum": "32 bis",    "City": "London",    "Country": "United Kingdom"  }}And here is the corresponding table in SingleStoreDB :CREATE TABLE employee (id int,name varchar(32),job varchar(32),gender varchar(10),address JSON,Shard key (id));We can also provide a better table definition to have all address information directly in that table, and ingest this information later :CREATE TABLE employee (id int,name varchar(32),job varchar(32),gender varchar(10),address JSON,street varchar(32),streetnum varchar(32),city varchar(32),country varchar(32),Shard key (id));Now, as we define the equivalent table in SingleStoreDB, let’s ingest it via a Pipeline:CREATE PIPELINE pipeline_migration_nosql ASLOAD DATA S3 'nosqlmigration'CONFIG '{"region": "eu-west-1"}'CREDENTIALS '{"aws_access_key_id": "aws_access_key_id",             "aws_secret_access_key": "your_aws_secret_access_key",              "aws_session_token": "your_aws_session_token"}'INTO TABLE employee(id <- id,name <-name,job <- job,gender <- gender,address <- address,street <- address::street,streetnum <- address::streetnum,city <- address::city,country <- address::country)FORMAT JSON;We want to extract the JSON object address and ingest it directly into table fields. That way, we can easily use these fields to run more advanced queries. Now, let’s see what we can do if we have a more nested JSON with an array. It’s pretty common for some NoSQL databases to have a collection of items with one array as field type.Handling a JSON ArrayAn array in JSON is a list of keys and values. There are multiple options to import it properly into SingleStoreDB tables. The best options depend on which type of operation (aggregation, lookup select, etc.) and how often you want to access these items in an array.Let’s use this nested JSON as an example :{ "id": "1",  "name": "John Smith",  "job": "Director" ,  "address": { "street": "Anger street",    "streetnum": "32 bis",    "city": "London",    "country": "United Kingdom"  }, "experience": [ { "role": "Lead Engineer",    "company": "Json",    "yoe": 3  },{ "role": "Senior Engineer",    "company": "Avro",    "yoe": 3  },{ "role": "Junior Engineer",    "company": "Parquet",    "yoe": 4  }  ]}Option 1: Import it as a field JSONThis option is very performant (and even better now with our 8.0 release!) if you want to complete simpler operations, like lookups or aggregations.The idea is to store the array field into a JSON type, directly using the JSON field in your query. The drawback of this method is that you lose the interesting table structure of a SQL database.Here is an example of table definition and  query to access the specific value you are looking for:CREATE TABLE employee (id int,name varchar(32),job varchar(32),gender varchar(10),address JSON,street varchar(32),streetnum varchar(32),city varchar(32),country varchar(32),experience JSON,shard key (id));Query 1 : Find previous experiences of employee 1SELECT emp.name,emp.job, exp.table_col::$role as 'role',exp.table_col::$company as 'company'FROM employee emp , TABLE(JSON_TO_ARRAY(emp.experience)) expWHERE emp.id=1;Query 2 : Aggregate the total years of experience for employee 1SELECT emp.name, SUM(exp.table_col::yoe) as 'Total YoE'FROM employee emp , TABLE(JSON_TO_ARRAY(emp.experience)) expWHERE emp.id=1;Option 2: Create a table to represent a JSON ArrayThe second option is to fully use the relational database SQL to build tables that represent these arrays. Here is the second table that will represent the array (employee experience):CREATE TABLE experience (id_employee int,role varchar(32),company varchar(32),yoe int,Shard key (id_employee));This table structure makes your data more readable and usable for your application or tool. You will need to re-write the pipeline that ingests the JSON file to interpret this array, and insert into the corresponding tables. To make it work, we will write a stored procedure that will insert into two tables, redirecting the ingestion from the pipeline into this procedure.Here is an example of a stored procedure that will insert into these two tables:-- Stored Procedure for ingesting json array in multiple tableDELIMITER //CREATE OR REPLACE PROCEDURE employee_proc(batch QUERY(idvarchar(32),name varchar(32),job varchar(32),gender varchar(32),address JSON,street varchar(32),streetnum varchar(32),cityvarchar(32),country varchar(32),experience JSON))ASDECLARE json_array ARRAY(json); e json;BEGINFOR batch_record IN COLLECT(batch) LOOP   BEGIN INSERT INTO employee(id, name,job,gender,address,street,streetnum,city,country) VALUES(batch_record.id,batch_record.name,batch_record.job,batch_record.gender,batch_record.address,batch_record.address::$street,batch_record.address::$streetnum,batch_record.address::$city,batch_record.address::$country); json_array = JSON_TO_ARRAY(batch_record.experience); FOR i IN 0 .. LENGTH(json_array) - 1 LOOP   e = json_array[i];   INSERT INTO experience(id_employee,role,company,yoe) VALUES(batch_record.id,e::$role,e::$company,e::yoe); END LOOP;   END; END LOOP; END //DELIMITER ;And here is the new pipeline SQL statement:-- Pipeline for ingesting json into a Stored procedureCREATE PIPELINE pipeline_migration_nosql ASLOAD DATA S3 'nosqlmigration'CONFIG '{"region": "eu-west-1"}'CREDENTIALS '{"aws_access_key_id": "your_aws_access_key_id",            "aws_secret_access_key": "your_aws_secret_access_key",             "aws_session_token": "your_aws_session_token"}'INTO PROCEDURE employee_procFORMAT JSON;Query 1: Find experiences of employee 1SELECT emp.name,emp.job, exp.role , exp.companyFROM employee emp JOIN experience exp ON emp.id = exp.id_employeeWHERE emp.id=1;Query 2: Aggregate the total number of years experience for employee 1SELECT emp.name,SUM(exp.yoe) as 'Total YoE'FROM employee emp JOIN experience exp ON emp.id = exp.id_employeeWHERE emp.id=1;Performance Comparison Between Both OptionsBoth options can achieve very fast performance in SingleStore — but they came with some drawbacks. Option one is good if the query is light on the JSON array. Option two makes your data schema more readable, and offers more possibilities in terms of queries shapes and computation. But option two will have an impact on ingest speed and making joins between two big tables can be costly in terms of CPU. Here is a benchmark chart that shows performance for both options for the two queries described above. The experience table has an average of three times the number of rows of the employee table.
Read Post
Announcing the SingleStoreDB Driver for the SQLTools VSCode Extension
Engineering

Announcing the SingleStoreDB Driver for the SQLTools VSCode Extension

For those of you who haven't heard of it, SQLTools is a VSCode extension for connecting to different databases. This extension is extremely popular, recording more than 2 million downloads. SQLTools works using drivers — which are installed separately from the main extension — to connect to a variety of different databases. During a recent internal hackathon, we worked on a driver for SingleStoreDB, so that SQLTools users can connect to SingleStore databases. This extension has been published on GitHub, where you can learn all about it!Installing this driver is as easy as:code --install-extension mtxr.sqltoolscode --install-extension singlestore.sqltools-singlestore-driverHere's a tease of what you'll get:
Read Post
Full-Text Search in SingleStoreDB
Engineering

Full-Text Search in SingleStoreDB

We’re demonstrating the power of full-text search capabilities  in SingleStoreDB — complete with code and examples for you to give it a try.What Is SingleStoreDB and Full-Text Search?SingleStoreDB is a real-time, distributed SQL database that handles both transactional (OLTP) and analytical (OLAP) workloads in a single, unified database. SingleStoreDB is a multi-model database that natively allows you to store and retrieve several different data types including relational, document, time-series and geospatial, along with providing full-text search capabilities. SingleStoreDB has supported full-text search (FTS) since 2018. Full-text search refers to the ability to search text inside text data stored in a database table, returning results that contain some or all of the words from the query that was executed. This differs from a traditional query that returns only an exact match of the data.  For traditional databases, full-text searches can be challenging — which often leads to the development of specialty databases being used strictly for this purpose.  SingleStoreDB helps customers simplify their data landscape by integrating full-text search capabilities into the core functionality of the database engine.How Does Full-Text Search Work?So, how do full-text searches in SingleStoreDB work?  As previously described, full-text search allows for the searching of words or phrases in a large body of text. For full-text searches to work, an index must be created on a text data type. In SingleStoreDB, text data types include char, varchar, text and long text.  Full-text indexes are only supported on columnstore tables, and they must be created as part of the CREATE TABLE process using the FULLTEXT index type. How Do We Do It?Full-text search takes advantage of SingleStoreDB’s patented Universal Storage, a unified table type that supports both OLTP and OLAP workloads. The columnstore table is a perfect table type for full-text indexes because a columnstore table lends itself to very fast scans of individual — or indexed — columns. Columnstore tables are also ideal for great compression. In fact, it's not uncommon to see SingleStoreDB compression of up to 90% on a columnstore table.  What Are Some Types of Full-Text Search?What will full-text search allow you to do in SingleStoreDB? You can search for an exact phrase or word, or you can complete “fuzzy” searches.  A fuzzy search is similar to a query with a “LIKE” filter in the `where` clause, and allows you to find words or phrases that are similar to the text you are searching for. ExamplesThe following tutorial will walk you through an example of creating a full-text index and doing several searches.Step 1 — Create a database, table and use SingleStore Pipelines to load some example data.CREATE DATABASE fulltextdb;USE fulltextdb;CREATE TABLE jeopardy_questions (  show_number text,  air_date date,  round text,  category text,  value text,  question text,  answer text,  KEY (category) using clustered columnstore,  FULLTEXT (question, answer));CREATE PIPELINE load_questionsAS LOAD DATA S3 's3://testdata.memsql.com/full-text/*'CONFIG '{"region": "us-east-1"}'SKIP PARSER ERRORSINTO TABLE jeopardy_questionsFIELDS TERMINATED BY ','OPTIONALLY ENCLOSED BY '"'LINES TERMINATED BY "\r\n"IGNORE 1 LINES(  show_number,  @air_date,  round,  category,  value,  question,  answer)set air_date = to_date(@air_date,'YYYY-MM-DD');START PIPELINE load_questions;OPTIMIZE TABLE jeopardy_questions flush;Step 2 — Perform a full-text search using the function MATCH AGAINST.Select question From jeopardy_questionsWhere MATCH (question) AGAINST ('baseball') and category ='SPORTS';
Read Post
Your First Five Minutes with SingleStoreDB: What to Expect
Engineering

Your First Five Minutes with SingleStoreDB: What to Expect

In this blog post, we guide you through what to expect when you sign up for a new SingleStoreDB account. Explore our Quick Start Checklist, find out how to invite your team members, where to go when you need expert support and more. Table of ContentsSingleStoreDB CloudThe onboarding checklistCreate a WorkspaceMarketing tech ("martech") app tutorialInviting a team memberChat with an expertUpgrade your planThe help menuQuery Playground — No Sign Up Required!SingleStoreDB CloudHave you tried SingleStore before? If not, we’re very excited to have you! By signing up to use SingleStoreDB Cloud, you have a lot of options to explore — and we’re here to guide you through this journey.The onboarding checklistOn the "Start" page, you will find a list of items we consider important for your experience. We encourage you to follow this list, and get your tasks completed along the way.
Read Post
Using Cypress Logs Instead of Videos and Screenshots to Debug Test Failures
Engineering

Using Cypress Logs Instead of Videos and Screenshots to Debug Test Failures

In this blog post, we’ll show how we used cypress-terminal-report to facilitate debugging test failures by fetching all the Cypress Logs. For developing the frontend of our product (Customer Portal), we use Cypress as the primary testing framework. We have two types of Cypress tests: end-to-end and component tests.We have relatively few end-to-end tests, yet these test all layers of the product — including the SingleStoreDB database engine, backend and frontend. Since some of these tests create database clusters, they can take some time to finish. Thus, the Cypress Component Tests (CCT) are the main tests for every facet of our product’s interface.However, these tests come with a caveat: they can be very flaky if not written properly. Our CI/CD job for CCT fails most frequently due to flaky tests. The only artifacts our pipeline collects to inspect the failed tests are the screenshots and videos that Cypress creates, which are not always enough to easily assess what happened to the failed test. For this reason, we tried to find a way to get the Cypress logs for each test, including the console logs printed while the tests are run (which are not visible in the Cypress videos).At the moment, Cypress doesn’t support this kind of logging by default. In an open issue on Cypress’s GitHub, someone provided a list of current workarounds for this problem. Of all the plugins mentioned, we decided to go with `cypress-terminal-report`.cypress-terminal-reportThis npm package is still in active development, collecting all the necessary logs for a very simple debugging of Cypress tests. Before we go on, let’s see how we set up this plugin.How to setup cypress-terminal-reportFirst, we need to install the plugin:npm i --save-dev cypress-terminal-reportAfterward, we need to import two things:`installLogsCollector``installLogsPrinter`The collector needs to be imported in the support file for Cypress tests. Since we want to use the plugin for both e2e and CCT, we’ll add the import in file `cypress/support/logging.ts`:import installLogsCollector from "cypress-terminal-report/src/installLogsCollector";installLogsCollector({});And then both support files, `cypress/support/e2e.ts` and `cypress/support/component.ts`, should import `logging.ts`.As for the printer, it needs to be imported in `cypress.config.ts`:import { defineConfig } from "cypress";import installLogsPrinter from "cypress-terminal-report/src/installLogsPrinter";export default defineConfig({ e2e: {    setupNodeEvents(on, config) {      installLogsPrinter(on, {});    }, },  component:   setupNodeEvents(on, config) {     installLogsPrinter(on, {});    },  },});With this basic setup, the plugin will capture Cypress command logs and network requests, and output them to the terminal when a test fails. It will also capture console logs, but only for e2e tests (not for CCT). These tests run in a different iframe, so the plugin cannot redirect the logs directly to the terminal. However, it is very important to us to get the CCT console logs, since these are usually flakier. We were able to accomplish this with a workaround.How to get console logs for Cypress Component testsFirst, we replace the `window.console` functions with our custom one, which will use Cypress's internal logging function — Cypress.log. We do this for every component test, so we write this inside the `before` hook. To prevent the extra logging when running Cypress in headed mode (“isInteractive”), we add the code inside an if condition. The function `processArg` will stringify the objects to be readable.before(() => {   if (!Cypress.config("isInteractive")) {    const consoleFuncs = ["log", "error", "info", "warn"];       cy.window().then((win) =>           consoleFuncs.forEach((method) =>             // @ts-ignore               win.console[method] = (...args: Array => {                 Cypress.log({                    name: `${method}-1`,                     message: args                       .map((arg) => processArg(arg))                          .join(",\n"),                 });            };          });      });   }});With this block of code, the plugin will output the logs with the following format:cy:command    log-1      console log outputcy:command    error-1    console error outputcy:command    info-1     console log outputcy:command    warn-1     console warn outputThis could be enough, but we want to correctly specify the types of logs, not just `cy:command`, just like the plugin does for end-to-end tests.One of the config options available in `installLogsCollector` is processLog which we can set to a custom function to process the logs the way we want. The function accepts a list of three arguments: `[type, message, severity]`. It then returns the same type of list, after the necessary processing. Here, we check for the `cy:command` type of logs, and if the message matches the regex expression for any of the console functions, we substitute the type of log with the new one (cons:log, cons:error, …) and the message without the first part (log-1, error-1, …).import installLogsCollector, { LogType,} from "cypress-terminal-report/src/installLogsCollector";const CONSOLE_FUNCS_LOG_TYPES_MAP: Record<string, LogType> = { log: "cons:log", error: "cons:error", info: "cons:info", warn: "cons:warn",};installLogsCollector({ processLog: (args) => {     const [argType, message, severity] = args;     let newArgType = argType;    let newMessage = message;     if (newArgType === "cy:command") {      for (const [func, logType] of Object.entries(            CONSOLE_FUNCS_LOG_TYPES_MAP        )) {             const regex = new RegExp(`${func}-1\t(.+)`, "s");         const match = newMessage.match(regex);           if (match) {                newArgType = logType;                newMessage = match[1];           }       }     }    return [newArgType, newMessage, severity]; },});Now, the output will have the following format:cons:log     console log outputcons:error   console error outputcons:info    console log outputcons:warn    console warn outputWith these new changes, we can get all the output for each Cypress test in the terminal.Now, there’s another catch: we have several logs for each test, and with the increase of lines, the terminal won’t go back far enough to show all the logs for one test. So, the best way to analyze the logs will be if they are collected in a file.How to print logs to filesThe plugin already has the option to write the logs to a file. We can define a single file to output the logs to, which will be overwritten each time we run a new test. Or, we can write to a file for each spec test, using the name of the test as the name of the log file. However, we were having an issue with the naming of the files for CCT, where they would be something like `https:localhost:XXXX/..../cypress_runner.js`, which was not very useful.To circumvent this issue, we chose to create and write the log files with our own code. With the printer option function `collectTestLogs`, we can, as the name suggests, collect the test logs for the current test and save them in a variable.// cypress.config.tsimport { defineConfig } from "cypress";import installLogsPrinter from "cypress-terminal-report/src/installLogsPrinter";let testLogs = "";const logsPrinterConfig: installLogsPrinter.PluginOptions = {  collectTestLogs: (testInfo, logs) => {   // testInfo.state: "passed" | "failed"     const statusSymbol = testInfo.state === "passed" ? "(+)" : "(-)";     let testContent = `${statusSymbol} ${testInfo.test}\n`;     logs.forEach((logType, message, severity]) => {        testContent += `\t\t<$logType} (${severity}):\n`;       testContent += `${message            .split("\n")            .map((line) => `\t\t\t\t${line}`)            .join("\n")}\n`;    });     testLogs += testContent; },};export default defineConfig({  e2e: {  setupNodeEvents(on, config) {      installLogsPrinter(on, logsPrinterConfig);    }, },  component: {    setupNodeEvents(on, config) {      installLogsPrinter(on, logsPrinterConfig);   },  },});With these changes, when the tests run in headless mode we will have a new folder `logs` next to the folders `videos` and `screenshots` created by Cypress. On the definition for the e2e and CCT jobs on GitLab, we can add:artifacts:</span>  when: always  paths:    - frontend/cypress/screenshots    - frontend/cypress/videos   - frontend/cypress/logsAnd all the artifacts will be available for us to look through when trying to debug a test failure.
Read Post
Integrating SingleStoreDB with Presto
Engineering

Integrating SingleStoreDB with Presto

We’re guiding you through how to integrate Presto, a distributed query engine for SQL users with SingleStoreDB — complete with a deep dive into architecture, installation, queries and more.What Is Presto?Presto is a distributed query engine for big data that uses SQL query language. Its architecture enables users to query data sources like Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata — and allows use of multiple data sources.  Simply put, Presto offers compute that runs on top of storage.Presto architecture
Read Post
Image Matching in SQL With SingleStoreDB
Engineering

Image Matching in SQL With SingleStoreDB

Vector functions in SingleStoreDB make it possible to solve AI problems, including face matching, product photo matching, object recognition, text similarity matching and sentiment analysis. In this article, we’ll demonstrate how we use the dot_product function (for cosine similarity) to find a matching image of a celebrity from among 16 million records in just 5 milliseconds! And it's easy – SingleStoreDB does the heavy lifting of parallelization and SIMD-based vector processing for you so you can worry about your application, not your data infrastructure. Other vector functions supported in SingleStoreDB include euclidean distance calculation, transforming JSON arrays to binary vectors, vector math and vector manipulation. Want to see our YouTube video on this topic instead of reading about it? Check it out here.
Read Post
SingleStoreDB Is Now a Designated AWS PrivateLink Partner
Engineering

SingleStoreDB Is Now a Designated AWS PrivateLink Partner

At SingleStore, nothing is more important than the security and reliability of your data — which is why we work diligently to ensure security is considered, designed, reviewed and implemented into every corner of your database technology.As part of our ongoing commitment to security — not only through us, but with our partners — we’re proud to announce that SingleStoreDB is now a designated AWS PrivateLink Partner.What is AWS PrivateLink?If you’re unfamiliar with it, here’s an overview of what exactly AWS PrivateLink facilitates:“AWS PrivateLink enables you to connect to some AWS Services, services hosted by other AWS accounts (referred to as endpoint services) and supported AWS Marketplace partner services, via private IP address in your VPC.”Our designation as an AWS PrivateLink Ready Partner indicates that we’ve built SingleStoreDB to enhance the security  and privacy of your data workloads by connect your VPC through the AWS Private Network — ensuring you can continue to protect and manage sensitive data including personal information, while also adhering to PCI, HIPAA and other government regulations. Keep reading to find out how you can take advantage of AWS PrivateLink in SingleStoreDB Cloud.Setting Up AWS PrivateLink in SingleStoreDB You can connect from a SingleStoreDB Cloud cluster to private services and networks. Setting up an AWS PrivateLink involves actions from your end, as well as the SingleStoreDB infrastructure end. There are two types of requests you can complete — inbound and outbound. To make the process as seamless as possible, we recommend gathering all the necessary details for your type of request before sharing with SingleStore support (you’ll eventually need to contact our team to complete the connection).For inbound requestsAn inbound request means your applications are querying in SingleStoreDB. The following pieces of information are required when opening a support ticket for an inbound PrivateLink connection:Cluster URL (Cluster ID)Be sure to mention your request is for inboundAWS account IDRegion detailsAfter you’ve shared these details with our support team, we’ll create and share the service name for your cluster.Next, create an Endpoint Connection, mapping to the service name generated by SingleStore support. Our team will then accept the service request to complete the connection.Check out the full documentation on how to create an inbound connection to SingleStoreDB Cloud Workspaces via AWS PrivateLinkFor outbound requestsAn outbound request means SingleStoreDB makes the request (usually via Pipelines, but it can also be via SELECT … INTO …), so the configuration starts at the customer end.The following pieces of information are required when opening a support ticket for an outbound PrivateLink connection. As with an inbound requires, you’ll need to set up an endpoint service and share the details with our support team:Cluster URL (Cluster ID) or Workspace ID (for Workspace clusters)Be sure to mention your request is for outboundAWS account ID (request from SingleStore)Region detailsSpecify if the request is outbound to Kafka if you’re using SingleStoreDB Self-Managed and your own Kafka cluster — or, if it’s a third-party-managed Kafka (like Amazon MSK or Confluent Cloud). Depending on the Kafka instance, you’ll need to supply an extra configuration to the CREATE PIPELINE clause.Note: Your Workspace and endpoint service must be in the same region.Check out the full documentation on how to create an outbound connection from SingleStoreDB Cloud Workspaces via AWS PrivateLinkGet Started With SingleStoreDB & AWS PrivateLinkBy tapping into the power of SingleStoreDB Cloud and AWS PrivateLink, you maintain up to 99.99% uptime SLA, zero downtime and high availability. Ready to get started? Head to SingleStore Docs for everything you need to connect with PrivateLink.
Read Post
Ready for 2023? Up Your Game With These Updates to Our Laravel and Python Connectors
Engineering

Ready for 2023? Up Your Game With These Updates to Our Laravel and Python Connectors

We’re bringing you all new Laravel and Python connectors in SingleStoreDB. Here’s what’s new — and how you can try them out. Laravel Connector Ready for Production! We shipped the 1.0 release of our native Laravel connector back in June, but haven't stopped improving it since then! In total we have had 12 contributions from six contributors since 1.0. For the uninitiated, Laravel is a PHP framework for building modern web applications. It's used by some amazing companies you may have heard about including Twitch, Disney, The New York Times and Fathom Analytics. Laravel is focused on developer productivity and generally getting out of your way to let you build applications quickly. The official SingleStoreDB Laravel Connector extends Laravel's built-in ORM (Eloquent) to support SingleStoreDB specific features including shard keys and JSON. This allows you to do something like this, which wouldn't work out of the box in Laravel:Schema::create('events', function (Blueprint $table) { $table->string('name')->unique()->shardKey(); $table->json('properties'); $table->datetime('created_at')->sortKey()->seriesTimestamp(); }); The functions are all extensions our connector adds to Eloquent to support the similarly named features in SingleStoreDB. You can install our Laravel connector via composer:composer require singlestoredb/singlestoredb-laravel You can also find the driver and it's documentation on Github. See more: Laravel and SingleStoreDB Quickstart Guide SingleStoreDB for Laravel Masterclass With Jack Ellis
Read Post
The JSON Playground in SingleStoreDB
Engineering

The JSON Playground in SingleStoreDB

SingleStoreDB’s new playground allows users to run queries over TPC-H, TPC-DS and a semi-structured game dataset sourced from a Confluent Cluster provided by DataGen. In this post, you will learn how to work with JSON data in SingleStoreDB utilizing the json_game_data dataset in the playground, and get several tips and tricks on schema and query optimizations.   Explore the SingleStoreDB playground. Navigate to the Playground
Read Post