Recent Articles

Accelerating Data Insights with Tableau & SingleStore
Product

Accelerating Data Insights with Tableau & SingleStore

Together, Tableau and SingleStoreDB provide a powerful solution for businesses looking to gain instant insights from their data. SingleStore's high-performance relational database makes it possible to perform real-time analytics, while Tableau's data visualization capabilities make it easy to understand and act on the insights generated by the data. The two products are easily integrated, allowing businesses to quickly start using the solution, and improve their performance and decision-making capabilities.We’re pleased to announce that SingleStore has released our own JDBC connector on Tableau Exchange, driving low latency and high-fidelity data transfer between SingleStoreDB and Tableau.Enterprises use business intelligence (BI) tools to gain insights into their data and make better-informed business decisions. BI tools allow businesses to analyze and visualize large amounts of data quickly and easily, enabling them to identify trends, patterns and insights that might otherwise be difficult to uncover.The main reasons why enterprises use business intelligence tools like Tableau include data analysis, reporting and visualization, improved decision-making and increased efficiency. Overall, business intelligence tools are essential for enterprises that want to gain a competitive advantage by leveraging their data to make better-informed decisions.TableauTableau is a leader in enterprise BI software, helping people see and understand data. The visual analytics platform is transforming the way people use data to solve problems — organizations of all sizes trust Tableau to help them be more data driven. Customers like Verizon, Charles Schwab, Nissan and Chipotle trust Tableau to boost their bottom line with actionable insights. Tableau drives better business outcomes and intelligent customer experiences with insights everywhere, for everyone.SingleStoreDBSingleStoreDB is a multi-model, multi-cloud, scalable, distributed SQL database for both transactions and real-time analytics. Customers like Uber, Hulu, Armis and Comcast trust SingleStore to accelerate their time-to-insights while decreasing the size, complexity and cost of their data architecture.SingleStoreDB unifies transactions and analytics in a single engine to drive low-latency access to large datasets, simplifying the development of fast, modern enterprise applications. Built for developers and architects, SingleStoreDB is based on a distributed SQL architecture, delivering millisecond performance on complex queries — all while ensuring your business can effortlessly scale.There are a few main areas where SingleStore helps customers modernize, listed here. These areas enable use cases like operational analytics, AI/ML, IoT, fraud detection, customer 360, dashboard acceleration and many more:Eliminate database sprawlAccelerate real-time applications at scaleScale databases and power modern apps with cloud flexibilityDeliver real-time analyticsTableau + SingleStoreDB via new JDBC connector on Tableau Exchange
Read Post
Oracle GoldenGate Announces Native Integration for SingleStoreDB
Company

Oracle GoldenGate Announces Native Integration for SingleStoreDB

Oracle GoldenGate, an industry-leading data replication tool, has released support for SingleStoreDB and SingleStoreDB Cloud as a target database. This blog explores the details of this new integration.Oracle GoldenGate, an industry-leading data replication tool, has just released support for SingleStoreDB as a target database. This exciting development allows SingleStoreDB users to easily integrate their data with other systems in real time,  without the need for complicated and time consuming data migration processes.With this new connector, users can enjoy the benefits of Oracle GoldenGate’s powerful data replication capabilities while leveraging SingleStoreDB’s high-performance, distributed SQL database.In this blog post, we’ll explore the details of the new integration and what it means to users looking to improve their data integration workflows.What Is Oracle GoldenGate?Oracle GoldenGate is a real-time data replication platform that enables businesses to capture, transform and move transactional data across heterogeneous systems. Thousands of global banks, retailers, telecoms, healthcare companies, etc. run their operational data platforms on the foundation of GoldenGate, which can detect data events and route them across networks at very low latencies. GoldenGate’s CDC capabilities are also used for detecting and transmitting data events, including database DML and DDL.Key features of Oracle GoldenGate include:Real-time data movement, minimizing latencyOnly committed transactions are moved, enabling consistency and improving performanceSupports a wide range of heterogeneous databases running on a variety of operating systems. Data can be replicated from Oracle to SingleStoreDBSimple architecture and easy configurationHigh performance with minimal overhead on the underlying infrastructure
Read Post
Forrester
SingleStore Recognized In

The Forrester WaveTM

Translytical Data
Platforms Q4 2022

The Power of SQL for Vector Database Operations, Part 1
Engineering

The Power of SQL for Vector Database Operations, Part 1

Here’s how to use one SQL query to get the top K vector matches in each category.At SingleStore, our position on vector database processing is that you should have all the benefits of a modern, full-featured DBMS available to you when working with vectors [Han23]. This includes full SQL support.We're working with a prospective user of SingleStoreDB for vector database operations related to an application with a semantic search component. He asked how we could do the following easily: find the top K items in each category. This is something he didn't have an easy time with in Milvus, a specialty vector database.With Milvus, he was finding the categories in one query, looping through them and finding the top K elements for one category at a time with a separate query. This is not easily parallelizable, and requires more work from the application side than many would prefer.Here's how you can do this in a single SQL query in SingleStoreDB:/* Make some items in multiple categories, with associated   vector embeddings. */create table items(id int, category varchar(50), vector blob);insert into items values  (1, "food", json_array_pack('[0,0,0,1]')),  (2, "food", json_array_pack('[0,0.5,0.3,0.05]')),  (3, "food", json_array_pack('[0,0.5,0.2,0]')),  (4, "facilities", json_array_pack('[0,0,1,0]')),  (5, "facilities", json_array_pack('[0,0.6,0.1,0.05]')),  (6, "facilities", json_array_pack('[0,0.4,0.3,0]'));-- query vectorset @qv = json_array_pack('[0,0.4,0.3,0]');-- get top 2 in each category using rankingwith scored as(  select id, category, dot_product(vector, @qv) as score  from items),ranked as (select  row_number() over(partition by category order by score desc)    as rank, *  from scored)select *from rankedwhere rank <= 2order by category, rank;These are the results:+------+------+------------+---------------------+| rank | id   | category   | score               |+------+------+------------+---------------------+|    1 |    4 | facilities | 0.30000001192092896 ||    2 |    5 | facilities | 0.27000001072883606 ||    1 |    2 | food       |  0.2900000214576721 ||    2 |    3 | food       | 0.25999999046325684 |+------+------+------------+---------------------+It’s important to note we're just focusing on ease of expression here, not performance. For this particular application, the scope is usually a few million vectors at most — so a full-scan, exact-nearest-neighbor approach is plenty fast.When you choose a tool for vector processing for nearest-neighbor search applications like semantic search, chatbots, other LLM applications, face matching, object matching and more, we think it's a good idea to consider the power of the query language — and having full SQL available just makes things easier.References[Han23] E. Hanson and A. Comet, Why Your Vector Database Should Not be a Vector Database, SingleStoreDB blog, April 24, 2023.Explore more vector database-related resourcesSingleStoreDB VectorseBook: Selecting the Optimal Database for Generative AI
Read Post
Using SingleStoreDB as  a Vector Database for Q&A Chatbots
Engineering

Using SingleStoreDB as a Vector Database for Q&A Chatbots

SingleStoreDB has long supported vector functions like dot_product, which make it a good fit for AI applications that require text similarity matching. An example of this type of AI application is a chatbot that answers questions from a corpus of information.In this blog post, we’ll demonstrate how we use SingleStoreDB — along with AI models like Whisper and ChatGPT — to create a chatbot that uses the YCombinator Youtube channel to answer questions about startups, and give startup-related advice. We initially built this as a side project using another vector database, but recently converted it to SingleStoreDB.The bot is accessible here: https://transcribe.param.codes/ask/yc-s2.How We Built ItStep 1. Transcribing the videosWe first used OpenAI’s whisper model to transcribe all the videos on the YC YouTube channel. Instead of running the model ourselves, we used Replicate to run the model and give us the transcriptions, which are stored in a simple SQLite database.Step 2. Creating embeddings from the transcriptionsBecause models like ChatGPT have a limited context length of 4096 tokens, we cannot just give ChatGPT all the transcriptions and ask it questions based on the entire corpus. So, when we get a question, we need to find the parts of the transcriptions that are most relevant to the question and only give those to ChatGPT in the prompt. To do this, we need to first create embeddings for the text in the transcriptions.An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness, and large distances suggest low relatedness. Here’s a nice video explaining how this works.
Read Post
Introducing SingleStore Kai™ for MongoDB
Product

Introducing SingleStore Kai™ for MongoDB

An API to boost your MongoDB analytics by 100x, without compromising transactional capabilities.In this blog:Why We Built an API for MongoDBIntroducing SingleStore Kai™ for MongoDBBest of Both Worlds: Bringing the MongoDB Developer Experience Together With the Benefits of SingleStoreDBHow We Built SingleStore Kai™ for MongoDBAdoption Models for SingleStore Kai™Augmenting MongoDBReplacing MongoDBBuild Your Next Intelligent Application on MongoDBHow to Get Started with SingleStore Kai™Today, we are thrilled to introduce SingleStore Kai™ for MongoDB, an API that lets you run up to 100x faster analytics on JSON without having to change queries or refactor your application code written for MongoDB. This feature is now available in public preview and is a game-changer for app developers, enabling you to take advantage of both SQL and MongoDB APIs in a single database engine to power fast, real-time applications.Why We Built an API for MongoDBMongoDB is one of the most popular and widely adopted databases for storing and processing JSON data. It is widely utilized for its document-oriented data model, thanks to its simplicity and efficiency in handling JSON/BSON data formats. However, being a document database, MongoDB is not performant enough or designed to power real-time analytics on JSON, which modern applications demand. It has both functional and architectural limitations when it comes to analytics — especially around performing complex analytics or fast analytics to power modern interactive applications. So today, customers invariably have to “flatten” complex JSON data and arrays, transform and ETL (or move) data from MongoDB to other relational databases like Elastic, Rockset or Snowflake to perform analytics on the data.This usually means normalizing data and re-writing MongoDB queries for new analytical applications — a process that is cumbersome, error-prone and takes massive effort. And even then, with all the effort, it fails to deliver fast analytics and results in reduced data fidelity, increased latencies, increased data movement and rising costs.
Read Post
SingleStore Kai™ for MongoDB®: Real-Time Analytics Benchmarks
Product

SingleStore Kai™ for MongoDB®: Real-Time Analytics Benchmarks

Based on extensive internal and external benchmarks and performance testing, SingleStore Kai™ for MongoDB® (our recently released MongoDB API) delivers 100x faster analytics over MongoDB — and returns results more than 1,000x faster for some queries.SingleStore Kai™ also exhibits consistently better performance and resilience, with the performance gap widening as data volumes grow, or as operations scale. Overall, SingleStore Kai™ offers an easy and effective API to turbocharge analytics on MongoDB, while also delivering significantly better price-performance on analytics.SingleStore Kai™ for MongoDB provides a fast, easy and powerful API to significantly accelerate MongoDB applications — without any code changes or data conversions. You can achieve real-time analytics and low-latency aggregations using familiar MongoDB client drivers, tools and APIs, and can quickly connect any application written for MongoDB to SingleStore Kai™ for MongoDB simply by changing the connection string.MongoDB is a document database that isn’t designed to power fast analytics or large-scale aggregations on JSON. In contrast, SingleStoreDB supports a wide variety of workloads, and is the ideal solution to these growing pains — either as either a drop-in full replacement, or a turnkey augmentation for your primary database. Now, with SingleStore Kai™ for MongoDB, those options are available to applications and workloads built around MongoDB and other MongoDB-compatible databases.For more details, check out our SingleStore Kai™ documentation.How Does SingleStore Kai™ Work?SingleStore Kai™ for MongoDB implements the MongoDB wire protocol and provides a mapping from MongoDB’s non-relational organization of JSON-like documents to SingleStoreDB’s distributed organization of relational SQL tables. To the application, SingleStore Kai for MongoDB appears to be an ordinary MongoDB-compatible database. Behind the scenes, SingleStore Kai for MongoDB accelerates MongoDB operations using SingleStoreDB's highly optimized storage and execution engines.
Read Post
Universal Storage, Part 6: Column Group
Product

Universal Storage, Part 6: Column Group

Universal Storage in SingleStoreDB is a single table type that supports both analytical and transactional workloads. It stores data in columnar format, optimized for seekability. There have been multiple enhancements introduced over the years to better handle real-time, hybrid transactional and analytics workloads including hash index, sub-segment access[1] and seekable encoding, etc. But our journey doesn’t stop there. In the latest version of SingleStoreDB, 8.1, we are introducing a new feature called Column Group, which can be used to significantly improve transactional workload performance.But first, let’s revisit columnstores. The idea of a columnstore is to store table data in a column-oriented format to achieve better compression efficiency, and support column-wise query execution. There are substantial benefits to using columnstores in analytics scenarios, but there are also limitations and drawbacks in certain cases — one of which is transactional operations involving a wide table.Our seek performance for columnstores — with support for tens of columns — is actually quite good [3], and can return results in low, single-digit milliseconds. But the number of table columns doesn't stop there. In fact,  we have customers that use over 300 columns in one table!To access a particular row in super wide tables, the query engine needs to open and decode a potentially large number of columnstore blobs, as each column is stored independently within its own blob. The overhead of initializing and decoding the blob metadata is non trivial, especially when dealing with a large number of blobs. Also, since data is stored in columnar format, there are a significant amount of random seeks on disk to reassemble a row. In other words, it’s IO-expensive.Column Group is a new type of index we introduced in SingleStoreDB 8.1 to speed up row  retrieval, especially from wide tables (see the graph later in this blog).It creates an on-disk, row-oriented representation of the table data living side-by-side to the columns blobs. The query engine makes sure the data in Column Group is always in-sync and consistent with columnstore data. During query execution, the engine combines  Column Group and columnstore to deliver an optimal execution plan.UsageColumn Group can be defined when you create a table. I will use the following example to show how to define a Column Group, and how it can help speed up the query. Let’s say that we have an employee table definition like this:CREATE TABLE emp( id INT PRIMARY KEY, name VARCHAR(256), dept VARCHAR(16), salary FLOAT, manager INT, ... COLUMN GROUP cg_full (*));Assuming this table has many columns (the ‘...’ indicates there are many more column definitions followed) and is considered to be a wide table, we create a Column Group named ‘cg_full’ using this clause ‘COLUMN GROUP cg_full(*)’. The * here means the Column Group covers all the columns in the table. We currently only support Column Group over the entire table.Note that Column Group can also be added to an existing table through ‘ALTER TABLE’ statement, such as:ALTER TABLE emp ADD COLUMN GROUP cg_full (*);Similarly, if Column Group is no longer needed, user can drop it through ‘ALTER TABLE’ statement:ALTER TABLE emp DROP COLUMN GROUP cg_full;Now consider the following simple query:SELECT * FROM emp WHERE id = 56231;This is a single-point lookup using primary key id. The query engine will lookup the primary key index and locate the row through the index. Now since the result only contains one row, and we project all the columns, the query engine will retrieve the entire row from the Column Group using a single IO. Here is another example:SELECT * FROM emp WHERE dept='IT';This query uses a filter to list all employees in the IT department. During the execution, the query engine still uses the columnstore for filtering execution, as this only requires loading the ‘dept’ column and it’s the most efficient way for evaluating the predicate. After filtering (assuming IT is a small department) there are only a few rows passing the filter. In this case, the query engine uses the Column Group to read the full rows from the ‘emp’ table to maximize the IO efficiency.Here’s another example regarding updates:UPDATE emp SET dept='IT' WHERE id = 56231;This query updates the department name to ‘IT’ for an employee with ID 56231. Although the query only affects the IT column, the SingleStoreDB query engine still needs to read the entire row to implement Row Level Locking [2]. In this case, Column Group will be used to retrieve the row and load it into the in-memory segment. Similarly, Column Group can greatly improve delete query performance, since delete also requires reading and locking the entire row. To summarize, a typical workflow is illustrated in the following diagram:User creates a columnstore table with Column GroupWhen the in-memory segment is flushed to the disk (in addition to creating the columnstore blobs), we also create a Column Group blob (marked as green in the diagram)During query execution, columnstore hash indexes and columnstore blobs can be used for effective filteringThe Column Group blob is used to materialize the rows post-filtering when only a small fraction of rows are retrieved.
Read Post
Getting Started with OpenAI Embeddings Search & SingleStoreDB
Engineering

Getting Started with OpenAI Embeddings Search & SingleStoreDB

In this article, we will look at how to use SingleStoreDB to store and query the OpenAI Wikipedia vector database dataset.SingleStoreDB has supported a range of vector functions for some time, and these functions are ideally suited for storing embeddings, doing  semantic search and using the data to provide context to OpenAI as part of the prompt. With this mechanism, we will be able to add “short-term” memory to ChatGPT.The notebook file used in this article is available on GitHub.In several previous articles, we have used some of the vector capabilities built into SingleStoreDB:Quick Tip: SingleStoreDB’s EUCLIDEAN_DISTANCE and JSON_ARRAY_PACK FunctionsUsing SingleStore, Spark and Alternating Least Squares (ALS) to Build a Movie Recommender SystemIn this article, we’ll test the `JSON_ARRAY_PACK` and `DOT_PRODUCT` vector functions with the OpenAI Wikipedia Vector Database dataset.There is an OpenAI notebook available on GitHub under an MIT License that tests several vector database systems. The tests can be run using local clients or in the cloud. In this article, we’ll use SingleStoreDB Cloud.Create a SingleStoreDB Cloud AccountA previous article showed the steps required to create a free SingleStoreDB Cloud account. We’ll use the following settings:Workspace Group Name: OpenAI Demo GroupCloud Provider: AWSRegion: US East 1 (N. Virginia)Workspace Name: openai-demoSize: S-00Advanced Settings: MarTech Application deselectedFrom the left-navigation pane, we’ll select DEVELOP 〉SQL Editor to create a new database, as follows:CREATE DATABASE IF NOT EXISTS openai_demo;Import NotebookFrom the left-navigation pane, we’ll select DEVELOP 〉Notebooks. In the top right of the web page we’ll select New Notebook 〉Import From File, as shown in Figure 1.
Read Post
Mental Health Awareness Month and the Responsible Role of Technology
Company

Mental Health Awareness Month and the Responsible Role of Technology

In February, I read a sobering article. Headlined “Teen Girls Report Record Levels of Sadness, C.D.C. Finds,” the piece described the high rates of persistent sadness, depression and suicidal thoughts reported by girls as well as lesbian, gay and bisexual teenagers. The article also pointed to the connection between social media use and negative mental health. This made me pause. I reflected on the socio economic and political difficulties of the past few years, and the role technology plays in mental health. As a father of four with two teenage daughters, the protector in me wondered how these complex issues could be solved. In May, we observe Mental Health Awareness Month. I believe this is a key opportunity to encourage understanding and reduce stigma around mental health. In 2021, 57.8 million U.S. adults experienced mental illness. That’s 1 in 5 adults! What can be done to reverse this trend? I have dedicated my life to technological innovation, and believe it is extremely important to explore how technology can damage mental health and how it can be used to instead benefit mental health. We, myself included, must examine ways our industry can make stronger efforts to mitigate the harms and increase the benefits. Technology has made an immeasurable contribution to our modern lives, transforming everything from transportation to communication to entertainment, and much more. I believe it has also done a lot good for mental health treatment. Digital health services, including telehealth services, have increased access to mental health care, with studies finding similar outcomes for in-person and virtual therapy. This is a boon for people who live in communities where there are a dearth of local mental health professionals, and for people whose otherwise hectic schedules may not allow them to visit an in-person care center. Access to telehealth was especially important during the COVID-19 pandemic. A scientific brief released by the World Health Organization found that in the first year of the pandemic, “global prevalence of anxiety and depression increased by a massive 25%”. Similarly, the NYT report I previously referenced also noted the negative impact long periods of social isolation had on young people’s mental health during the pandemic. Lockdowns and social distancing, which reduced the spread of the virus, also induced feelings of loneliness. Technology and social media served as tools to combat that. Who could forget the viral “dance challenges” that kept us active, the television shows that kept us entertained and the many, many hours spent talking with family, friends and colleagues on video chat platforms. In short, it was our tech that kept us connected during one of the darkest times in recent memory – and still keeps us connected to this day. However, one study noted that while routine social media use can have positive outcomes on social well-being, positive mental health and self-rated health, having an emotional connection to social media is negatively associated with these areas. An emotional connection includes “checking apps excessively out of fear of missing out” and “being disappointed about or feeling disconnected from friends when not logged into social media.” This is only exacerbated by the fact that social media platforms are designed to be addictive for users. Other phenomena such as cyber-bullying and repeated exposure to unrealistic and unhealthy beauty standards are other ways social media can be toxic. The pandemic also underscored how digitally spread misinformation can take a deadly toll on mental and physical health. What can be done? I don’t believe the solution is to ban social media or tech innovation. However, we must hold industry leaders accountable to developing solutions to these rampant problems. We cannot allow them to focus solely on the number of users engaging with their platforms while ignoring the social ills that are also developing.One key solution should be to encourage tech firms to implement features to stop and/or contextualize the spread of misinformation, hateful content and bullying. Companies should partner with mental health professionals to identify the specific ways their platforms can possibly harm mental health and determine ways to mitigate and reverse it. Furthermore, tech firms should be aware of bias in their algorithms, especially on the basis of sex and race. Similarly, tech firms should focus on improving diversity in their employee base. As I’ve previously discussed, there is a significant gender gap in STEM fields, as well as a lack of racial diversity. A 2019 report noted, “The AI field, which is overwhelmingly white and male, is at risk of replicating or perpetuating historical biases and power imbalances.” If tech is to be used by everyone, tech firms need to ensure that people from all backgrounds are represented in its development. I am glad to report that we at SingleStore, have taken some steps in accessibility of mental healthcare resources like offering our employees both free therapy and coaching sessions through Modern Health. We also offer mindfulness and meditation resources focused on both work and non-work related topics.Solving complex problems surrounding mental health will not happen overnight. Yet I do believe that with determination and prioritization, we can achieve great improvements from where we are today, to ultimately save lives and ensure positive outcomes for all.If you or a loved one would like to access mental health resources, visit https://mhanational.org/get-involved/contact-us. 
Read Post
Supercharged Data Management: Exploring the Power of Flyway & SingleStore
Product

Supercharged Data Management: Exploring the Power of Flyway & SingleStore

In today's data-driven world, effective database management is crucial for businesses to thrive.Managing database schema changes can be challenging, especially in large-scale and complex applications. With the recent integration of Flyway, a popular open-source database migration tool with more than 40 million downloads and SingleStoreDB, a high-performance distributed SQL database, we are witnessing a groundbreaking partnership that offers exceptional benefits for developers and organizations. In this blog, we will delve into the technical aspects of the exciting partnership between Flyway and SingleStoreDB, and how their combined capabilities can help you effortlessly manage and scale your database.What's Flyway and Why Is It Important?Flyway is a widely used open-source database migration tool that efficiently manages and versions database schema changes. It simplifies collaboration and ensures consistency across development environments. Flyway is especially valuable for developers and organizations looking for a reliable, streamlined way to handle complex database migrations.Flyway & SingleStore Partnership: What It Means for CustomersThe integration of Flyway and SingleStorDB opens new doors for developers and organizations alike. Combining Flyway's seamless schema management with SingleStoreDB's high-performance distributed SQL database, this partnership enables customers to enjoy faster development cycles, improved database performance and reduced infrastructure costs.How the Integration WorksTo connect SingleStoreDB with Flyway, follow these steps for both SingleStoreDB Cloud and SingleStoreDB Self-Managed:Make sure you have the Teams or Enterprise Edition of Flyway, as it is required for native SingleStoreDB supportDownload and install Flyway following the official installation guideLocate the Flyway configuration file (flyway.conf) in the //conf/ directoryUpdate the connection parameters and Flyway license key in the flyway.conf fileFor SingleStoreDB Cloud:
Read Post
Developer Quick Tip: Replicating a MongoDB Atlas Database to SingleStoreDB Cloud
Product

Developer Quick Tip: Replicating a MongoDB Atlas Database to SingleStoreDB Cloud

A range of open-source and commercial tools can provide Extract, Load and Transform (ELT) capabilities. These tools provide source and destination connectors and can use automatic data format conversion. This article uses a commercial ELT tool, Hevo Data, to replicate a MongoDB Atlas database to SingleStoreDB Cloud.In a previous article, we used open-source Airbyte to create an ELT pipeline between SingleStoreDB and Apache Pulsar. We have also seen in another article several methods to ingest JSON data into SingleStoreDB. In this article, we'll evaluate a commercial ELT tool called Hevo Data to create a pipeline between MongoDB Atlas and SingleStoreDB Cloud. Switching to SingleStoreDB has many benefits, as shared by Rimdian (formerly Captain Metrics) in a webinar detailing why they moved away from MongoDB. SingleStoreDB ConfigurationWe’ve previously covered the steps required to create a free SingleStoreDB Cloud account. We'll use Hevo Demo Group as our Workspace Group Name, and hevo-demo as our Workspace Name. We'll make a note of our password and host name. Finally, we'll create a new database using the SQL Editor:CREATE DATABASE hevo_demo;MongoDB ConfigurationWe'll use the MongoDB Atlas shared cluster deployment on AWS. This will give us a three-node cluster (one primary and two secondary). Once the cluster is deployed, we'll load the sample dataset. We'll also create a user called `hevo` and assign `readAnyDatabase` privileges to this user for our initial tests.Hevo Data Configuration1. Configure SourceWe'll search and choose MongoDB Atlas as the source.We'll fill in the Configure your MongoDB Atlas Source form as follows:Pipeline Name: MongoDB Atlas SourceGeneral Connection Settings:Select Paste Connection StringConnection URI: mongodb+srv://hevo:<password>@<cluster>Select an Ingestion Mode: Change StreamsAdvanced Settings: Disable Load All Databases and select sample_restaurants from the listWe'll replace the `<password>` and `<cluster>` with the values from MongoDB Atlas.Several Hevo Data IP addresses are also listed, and should be added to the IP Access List in MongoDB Atlas.We'll use the TEST CONNECTION button (the connection should be successful).Next, we'll click TEST & CONTINUE.2. Select objectsOn the next page, we'll check (✔) All objects selected and click CONTINUE.3. Configure destinationWe'll search and choose MySQL as the destination.We'll fill in the Configure your MySQL Destination form with the following information: Destination Name: SingleStoreDB DestinationDatabase Host: <host>Database Port: 3306Database User: adminDatabase Password: <password>Database Name: hevo_demoWe'll replace the `<host>` and `<password>` with the values from our SingleStoreDB Cloud account.Several Hevo Data IP addresses are also listed, and these should be added to the Inbound IP Allowlist in the SingleStoreDB Cloud Firewall. We'll use the TEST CONNECTION button, and the connection should be successful.Next, we'll click SAVE & CONTINUE.4. Final settingsWe'll use Auto Mapping and Replicate JSON fields to JSON columns. Next, we'll click CONTINUE.The pipeline should start running shortly afterwards, as shown in Figure 1.
Read Post
Spark-SingleStoreDB Integration
Engineering

Spark-SingleStoreDB Integration

Integrating Spark with SingleStoreDB enables Spark to leverage the high-performance, real-time data processing capabilities of SingleStoreDB — making it well-suited for analytical use cases that require fast, accurate insights from large volumes of data.The Hadoop ecosystem has been in existence for well over a decade. It features various tools and technologies includingHDFS (Hadoop Distributed File System), MapReduce, Hive, Pig, Spark and many more. These tools are designed to work together seamlessly and provide a comprehensive solution for big data processing and analysis.However, there are some major issues with existing Hadoop environments, one of which is the complexity of the Hadoop ecosystem, making it challenging for users to set up and manage. Another issue is the high cost of maintaining and scaling Hadoop clusters, which can be a significant barrier to adoption for smaller organizations. In addition, Hadoop has faced challenges in keeping up with the rapid pace of technological change and evolving user requirements — leading to some criticism of the platform's ability to remain relevant in the face of newer technologies.The good news? Apache Spark can be used with a modern database like SingleStoreDB to overcome these challenges.Apache SparkApache Spark is a popular tool for analytical use cases due to its ability to handle large-scale data processing with ease. It offers a variety of libraries and tools for data analysis, including Spark SQL, which allows users to run SQL queries on large datasets, as well as MLlib, a library for machine learning algorithms. Spark's distributed nature makes it highly scalable, allowing it to process large volumes of data quickly and efficiently. Additionally, Spark Streaming enables real-time processing of data streams, making it well-suited for applications in areas like fraud detection, real-time analytics and monitoring.Overall, Apache Spark's flexibility and powerful tools make it an excellent choice for analytical use cases, and it has been widely adopted in various industries including finance, healthcare, retail and more.SingleStoreDBSingleStoreDB is a real-time, distributed SQL database that stores and processes large volumes of data. It is capable of performing both OLAP (Online Analytical Processing) and OLTP (Online Transaction Processing) workloads on a unified engine, making it a versatile tool for a wide range of use cases.Overall, SingleStoreDB's high-performance, distributed architecture — combined with its advanced analytical capabilities — makes it an excellent choice for analytical use cases including real-time analytics, business intelligence and data warehousing. It has been widely adopted by companies across finance, healthcare, retail, transportation, eCommerce, gaming and more. And, SingleStoreDB can be integrated with Apache Spark to enhance its analytical capabilities.Using Apache Spark with SingleStoreDBSingleStoreDB and Spark can be used together to accelerate analytics workloads by taking advantage of the computational power of Spark, together with the fast ingest and persistent storage of SingleStoreDB. The SingleStore-Spark Connector allows you to connect your Spark and SingleStoreDB environments. The connector supports both data loading and extraction from database tables and Spark DataFrames.The connector is implemented as a native Spark SQL plugin, and supports Spark’s DataSource API. Spark SQL supports operating on a variety of data sources through the DataFrame interface, and the DataFrame API is the widely used framework for how Spark interacts with other systems.In addition, the connector is a true Spark data source; it integrates with the Catalyst query optimizer, supports robust SQL pushdown and leverages SingleStoreDB LOAD DATA to accelerate ingest from Spark via compression.Spark and SingleStoreDB can work together to accelerate parallel read and write operations. Spark can be used to perform data processing and analysis on large volumes of data, writing  the results back to SingleStoreDB in parallel. This can be done using Spark's distributed computing capabilities, which allow it to divide data processing tasks into smaller chunks that can be processed in parallel across multiple nodes. By distributing the workload in this way, Spark can significantly reduce the time it takes to process large volumes of data and write the results back to SingleStoreDB.Overall, by combining Spark's distributed computing capabilities with SingleStore's distributed architecture, it is possible to accelerate parallel read and write operations on large volumes of data, enabling real-time processing and analysis. The parallel read operation creates multiple Spark tasks, which can drastically improve performance.The Spark-SingleStore connector also provides parallel read repartitioning features to ensure that each task reads approximately the same amount of data. In queries with top-level limit clauses, this option helps distribute the read task across multiple partitions so that all rows do not belong to a single partition.Spark-SingleStoreDB Integration Architecture
Read Post
Utilization Monitoring in SingleStoreDB Cloud
Product

Utilization Monitoring in SingleStoreDB Cloud

SingleStoreDB is the database of choice for developers building immersive applications that require real-time analytics. To fully optimize applications developers, administrators and users alike need to understand the current system performance — as well as how to tune their queries SingleStoreDB now has enhanced native monitoring capabilities that allow users to easily visualize performance, identify potential bottlenecks, and tune and optimize queries to maximize performance as workloads scale.Let’s get into the specifics of the monitoring capabilities we offer for our cloud customers by Workspaces ( a collection of compute resources).vCPU UtilizationvCPU utilization can help identify performance bottlenecks, optimize resource usage and proactively address issues before they cause any disruption.To see performance across many vCPUs we show the overall compute load, as well as max and min to identify when workloads are unevenly distributed across the workspace.
Read Post
SingleStoreDB: The Best Database for AI and Machine Learning Models
Product

SingleStoreDB: The Best Database for AI and Machine Learning Models

Artificial intelligence (AI) and machine-learning models rely on large volumes of data to train and improve their accuracy. As a result, choosing the right database is crucial for the success of AI and machine learning projects.In this article: Real-Time Data ProcessingVector DatabaseSemantic SearchScalability and IntegrationBuilt-in Analytics CapabilitiesSingleStoreDB is a modern, highly performant distributed SQL database that offers a wide range of benefits for organizations working with AI and machine learning models.Real-Time Data ProcessingReal-time processing is essential when dealing with streaming data or building applications that require quick response times, like fraud detection or recommendation systems. SingleStoreDB's distributed architecture and in-memory data storage enable it to process data at lightning-fast speeds, providing organizations with the ability to make decisions in real time. With SingleStoreDB, organizations can handle large volumes of data, gain valuable insights into their data quickly and build successful AI and machine learning models with ease — including custom GPT models.Read more: How two novice developers built a movie recommendation applicationVector DatabaseVector databases are designed to handle high-dimensional data, and provide advanced search and similarity capabilities. They are ideal for use cases including natural language processing, image recognition and recommendation systems, which require the ability to search for similar data points quickly. In a vector database, each row in a table represents a vector and each column represents a feature of that vector, where the vector is a multi-dimensional array of numerical values that represent the features or attributes of an object. For example, imagine a database containing information about different animals. Each row in the database might represent a different animal, and each column might represent a different feature of that animal such as its weight, height and number of legs.SingleStoreDB's vector database functionality not only allows organizations to store and search high-dimensional vectors efficiently — enabling them to build more powerful and accurate machine learning models — but it also provides a single platform for handling both vector databases and traditional relational databases. This eliminates the need to run multiple database types and simplifies the data management process, making it easier and more efficient for organizations to work with diverse datasets and workflows.With SingleStoreDB, organizations can easily leverage the power of vector databases without sacrificing the benefits of a comprehensive and flexible database system.Semantic SearchSemantic search allows organizations to search for information based on the meaning and context of the query, rather than just matching keywords. It is an important feature for applications like chatbots, virtual assistants and question-answering systems where users often use natural language to search for information — including SingleStore’s brand new chatbot, SQrL. SingleStoreDB's semantic search capabilities enable organizations to build applications that can understand the meaning behind queries and return relevant results, providing a more personalized and accurate user experience.Here is an example of how SingleStoreDB would convert a semantic query into SQL:> Find all articles that mention climate change and renewable energyUsing natural language processing (NLP), SingleStoreDB would parse the query and identify the relevant keywords and concepts. The query might be parsed something like this:> Find all articles that mention “climate change” and “renewable energy”The parsed query would then be converted to SQL which can be executed directly against the database.SELECT *FROM articlesWHERE body LIKE '%climate change%'AND body LIKE '%renewable energy%'This query would search the `articles` table in the database for any rows where the `body` column contains the phrases "climate change" and "renewable energy." SingleStoreDB's semantic search capabilities allow users to enter natural language queries like this, automatically converting them into SQL that can be executed against the database — making it easier and more intuitive for users to search and retrieve data.Read more: How to perform AI-powered semantic search on your data in SingleStoreDBScalability and IntegrationSingleStoreDB is also highly scalable, which means it can grow as your data needs grow. It is compatible with popular machine learning frameworks including TensorFlow, PyTorch and Apache Spark, making it easy to integrate with existing workflows and tools.Using these powerful integrations makes SingleStoreDB a valuable tool for companies looking to leverage their data to create custom-trained GPT models. By using SingleStoreDB as an ingest point for their data, you can extract and transform  data to generate powerful prompts that can be used to train GPT-4. With SingleStoreDB's vector database and semantic search functionalities, engineers can easily create and test different prompts, making the GPT training process more efficient and effective. By leveraging the power of SingleStoreDB as an ingest point, you can build more accurate and powerful GPT models to gain a competitive edge in your industry.SingleStoreDB's distributed architecture also makes it an ideal choice for large-scale analytics workloads. The platform's ability to scale out horizontally across multiple nodes enables it to handle massive amounts of data and perform complex analytics queries at scale.Built-in Analytics CapabilitiesSingleStoreDB is widely regarded as one of the best databases for analytics, and for good reason. The platform's built-in analytics capabilities provide organizations with a powerful tool for gaining insights into their data. With the ability to run complex analytics queries in realtime, you can quickly identify trends and patterns in your data, leading to more on-they-fly and informed decision making.One key advantage of SingleStoreDB is its support for hybrid transactional/analytical processing (HTAP). HTAP allows organizations to perform both transactional and analytical queries on the same database system, eliminating the need for separate systems for each task. This can greatly simplify your data architecture and reduce infrastructure costs.In addition, SingleStoreDB's advanced indexing and query optimization capabilities make it a reliable and effective tool for analytics. Columnstore indexing technology allows for faster data retrieval and compression, while the query optimization engine automatically optimizes queries for faster execution.Overall, SingleStoreDB's combination of HTAP support, distributed architecture and advanced indexing and query optimization capabilities make it the best database out there for analytics. Its ability to handle complex analytics workloads in real time makes it an invaluable tool for organizations looking to gain insights and make data-driven decisions.ConclusionSingleStoreDB is an ideal choice for organizations that are working with AI and custom trained GPT models. Its powerful vector database and semantic search functionalities provide advanced search and similarity capabilities for high-dimensional data. What's more, these functionalities are integrated into a comprehensive and flexible database system that can handle diverse datasets and workloads like large-scale, real-time analytics. While other vector databases may offer specialized capabilities, SingleStoreDB's broad feature set and flexible architecture make it a versatile tool for managing data and analytics needs.Whether you're working with natural language processing or image recognition, SingleStoreDB provides the speed and accuracy you need to keep pace with the real-time demands of AI and your business.Interested in learning more? Join our webinar on May 3, “ChatGPT for Developers: How to Choose Your Data Stack and Strategy.” where we’ll dive deeper into creating a real-time data architecture to build an AI-based application, the aspects of vector databases you need to account for early in your development process and more.Start building on SingleStoreDB today.More Resources for AI & Machine Learning Models From SingleStoreWebinar: Build a ChatGPT App on Your Own DataBlog: How to Build a Charismatic Twitter Chatbot in a Few HoursBlog: Using SingleStoreDB & ChatGPT for Custom Data SetsBlog: Why Your Vector Database Should Not be a Vector DatabaseBlog: Introducing SQrL: Your SingleStore Co-Pilot
Read Post
Understanding PostgreSQL’s Data Fragmentation Problem, and How SingleStoreDB Is Better
Product

Understanding PostgreSQL’s Data Fragmentation Problem, and How SingleStoreDB Is Better

PostgreSQL is a relational database management system with a client-server architecture.At the server- side, PostgreSQL’s processes and shared memory work together and build an instance, which handles data access. Client programs connect to the instance, and request read and write operations.Let's talk  about PostgreSQL first, and look at the fragmentation problem it has:
Read Post
Using SQrL for a Movie Recommendation App
Product

Using SQrL for a Movie Recommendation App

If you haven’t already, check out SQrL, the new SingleStore chatbot dedicated to assisting you in learning, developing and building your real-time applications on SingleStoreDB.In this blog post, we'll dive into the process of how two novice developers, Arnaud and Pranav, built a fun little AI movie recommender on SingleStoreDB and OpenAI. Unlike some of the 10x engineers at SingleStore, we are relatively new to building generative AI apps. Here’s how we were able to spin up this app in just a weekend with the help of SQrL, SingleStore's AI bot that is hyper-trained on its docs and resources.Getting the Right Data and ShapeData IngestionThe first step in creating our movie recommender app was finding the right dataset and bringing it into SingleStoreDB. We used the full MovieLens 25M Dataset. SQrL and SingleStore Notebooks made this process incredibly easy.
Read Post
Partnering for Success: Our April Partner Update for SingleStore
Company

Partnering for Success: Our April Partner Update for SingleStore

The SingleStore partner ecosystem helps businesses unlock the full potential of their data. In this blog, we will be sharing updates on our growing partner ecosystem, including new technology partnerships, program updates and upcoming events. We will also highlight the exciting work our partners are doing with SingleStore and our customers to deliver innovative solutions.As we usher in the spring season, we are excited to share the latest SingleStore technology partnership updates from the past month. From informative webinars on building a GPT chatbot to new SingleStore Connect listings, it has been a busy month of growth and collaboration within our partner community. In this post, we will take a closer look at April highlights, showcasing the valuable contributions of our partners and the ways they are driving innovation with SingleStore. Let’s dive in and catch up on all the exciting happenings in the SingleStore Partner Program!New Partners and SingleStore Connect ListingsThis month, we added nine new partner product listings to SingleStore Connect. These partners provide seamless integrations, enabling our customers to leverage the full potential of SingleStore for their real-time applications and analytics. Learn more about these partners:
Read Post
Introducing SQrL: Your SingleStore Co-Pilot
Product

Introducing SQrL: Your SingleStore Co-Pilot

Today we are thrilled to introduce SQrL, the SingleStore chatbot, dedicated to assisting you in learning and developing on SingleStoreDB.
Read Post
Why Your Vector Database Should Not be a Vector Database
Product

Why Your Vector Database Should Not be a Vector Database

The database market is seeing a proliferation of specialty vector databases.People who buy these products and plumb them into their data architectures may find initial excitement with what they can do with them to query for vector similarity. But eventually, they will regret bringing yet another component into their application environment.Vectors and vector search are a data type and query processing approach, not a foundation for a new way of processing data. Using a specialty vector database (SVDB) will lead to the usual problems we see (and solve) again and again with our customers who use multiple specialty systems: redundant data, excessive data movement, lack of agreement on data values among distributed components, extra labor expense for specialized skills, extra licensing costs, limited query language power, programmability and extensibility, limited tool integration, and poor data integrity and availability compared with a true DBMS.Instead of using a SVDB, we believe that application developers using vector similarity search will be better served by building their applications on a general, modern data platform that meets all their database requirements, not just one. SingleStoreDB is such a platform.SingleStoreDBSingleStoreDB is a high-performance, scalable, modern SQL DBMS and cloud service that supports multiple data models including structured data, semi-structured data based on JSON, time-series, full text, spatial, key-value and vector data. Our vector database subsystem, first made available in 2017 and subsequently enhanced, allows extremely fast nearest-neighbor search to find objects that are semantically similar, easily using SQL. Moreover, so-called "metadata filtering" (which is billed as a virtue by SVDB providers) is available in SingleStoreDB in far more powerful and general form than they provide — simply by using SQL filters, joins and all other SQL capabilities.The beauty of SingleStoreDB for vector database management is that it excels at vector-based operations and it is truly a modern database management system. It has all the benefits one expects from a DBMS including ANSI SQL, ACID transactions, high availability, disaster recovery, point-in-time recovery, programmability, extensibility and more. Plus, it is fast and scalable, supporting both high-performance transaction processing and analytics in one distributed system.SingleStoreDB Support for VectorsSingleStoreDB supports vectors and vector similarity search using dot_product (for cosine similarity) and euclidean_distance functions. These functions are used by our customers for applications including face recognition, visual product photo1 search and text-based semantic search [Aur23]. With the explosion of generative AI technology, these capabilities form a firm foundation for text-based AI chatbots.The SingleStore vector database engine implements vector similarity matching extremely efficiently using Intel SIMD instructions.
Read Post
Using SingleStoreDB & ChatGPT for Custom Data Sets
Product

Using SingleStoreDB & ChatGPT for Custom Data Sets

Since its launch in November 2022, Open AI’s ChatGPT has taken the world by storm. It’s a powerful language tool that taps into the unique capabilities of Artificial Intelligence (AI), helping users with tasks from writing emails to generating catchy names for a new podcast.ChatGPT’s ability to predict and generate text comes from its ability to learn from high volumes of data. It continually iterates based on what users input, allowing it to deliver more accurate outputs with smaller margins of error.While ChatGPT has an impressive array of capabilities, it also has its limitations. Mainly, it can only generate text that is similar to the text it was trained on. For everyday users tapping into publicly available data and information, this isn’t an issue — but what if you want to use ChatGPT to generate responses dependent on your own data sets? Imagine you want to sort customer reviews based on sentiment, or you’d like to make it easy for employees to search internal documentation to find answers to their questions. Using your own data, you can create a repository of information to sort through, empowering the tool to generate responses based entirely on your proprietary information.We’ll walk you through what you need from a database — like SingleStoreDB — to store your relevant company data, creating a centralized source of truth for ChatGPT to reference when generating responses to your questions.Why SingleStoreDB for ChatGPTThe next iteration of ChatGPT for businesses includes using it against custom company data — and that starts with the right database. To be efficient with search results and query speed, you want a database that:Stores and queries large amounts of data in various formatsFeatures real-time functionalities like low latency and high concurrencyStores data as vectors and includes semantic search capabilities to find relevant data in milliseconds
Read Post
What Is JSON?
Product

What Is JSON?

In this article, you'll learn all about JSON and its benefits, especially when it comes to performing data analytics.Table of ContentsWhat Is JSON? Why JSON Is PopularJSON's StructureHow JSON Differs from Other Data StructuresJSON vs. Binary DataJSON vs. XMLJSON vs. CSVJSON vs. YAMLBenefits of JSONJSON Use CasesJSON Document Database and How They WorkBenefits of JSON Document DatabasesPerforming Analytics on JSON Document DatabasesConclusionWhat Is JSON?JavaScript Object Notation, more commonly referred to as JSON, is a universal data format for sharing data between different software applications. JSON has revolutionized the database industry over time, as evidenced by the use of JSON as the underlying data storage format in NoSQL databases, like SingleStore.In this article, you'll learn more about JSON and why it's such a popular data format. You'll take a look at some of its benefits and learn how analytics can be performed on JSON data structures.Why JSON Is PopularAs mentioned previously, JSON is considered a universal data exchange format, which means it's used everywhere for everything. However, this was not always the case. It was originally designed as a communication channel between JavaScript-based frontend clients and backend servers.However, because JSON is concise, readable, lightweight, and highly flexible, the software community began to see its practical applications in other areas. Because it's a text-based format, it's easy for both humans and machines to read. And with the explosion in popularity of JavaScript-based frameworks, such as Express, React and Node.js — and because JSON is natively supported within these frameworks — it quickly became the most popular data format choice.JSON's StructureThe structure of a JSON record is very straightforward: there is an object denoted with `{}` and properties/attributes represented by key-value pairs, `{ key : value}`.For instance, in the following code, a `"person"` object is defined in JSON:{    "name": "John",    "age": 25}The `{}` denotes the boundaries of the `"person"` object. Within the confines of the curly braces are the properties or attributes of the object. Key-value pairs represent one property of the object separated via a column; in this case, `name` is the key, and `John` is the value. Similarly, `age` is another key, for which `25` is the value.You can represent any real-life object having any number of attributes with this simple structure. Some other rules for the JSON data format include the following:Different properties are separated by commas: `,` (*ie* `{key1 : value1, key2: value2}`)Strings are enclosed within quotes: `""` (*ie* `{ key : "abcd" }`)Arrays are represented with `[]` (*ie* `{ key: [1,2,3,4, "a", "s", "d", "f"] }`)JSON supports all major data types, which include strings, numbers, arrays, Boolean, objects and null, and the nested objects are presented like this: `{ key : { k : v } }`.How JSON Differs from Other Data StructuresFor communication and data exchange between frontends and backends, different data formats have been created over the years. These include CSV, XML, YAML, binary files, Apache Parquet and Apache Avro.In the following sections, you'll compare JSON with other data formats to see how they differ.JSON vs. Binary DataAs previously stated, JSON is a text-based format that is both human- and machine-readable. In contrast, binary data is only machine-readable. This makes it almost impossible for a human to work with it while programming software.JSON vs. XMLBefore JSON became widely popular, the XML format was the de facto format used to exchange data between systems. If you use the same person object in XML, you'll have the following structure:<Person>  <name>John</name>  <age>25</age></Person>If you compare JSON and XML formats, you'll find that JSON is more readable than XML due to the arrangement of elements within both data formats. In addition, as you define complex objects with hundreds of attributes, it becomes a challenge not only to understand the structure but to design a system for parsing the data from XML.JSON vs. CSVThe CSV format is a great option for storing large sets of data in tabular formats. However, unlike JSON, CSV records have little to no structure, which is not ideal for defining complex objects.For instance, the same person object in CSV would look like this:name, ageJohn, 25The first row, also called the header row, represents the name for each attribute (commonly referred to as columns). Subsequent rows represent the data for each attribute. CSV format fails to cater to a large number of complex objects, and the structure starts breaking when objects have a variable number of attributes and data types. Say for example you have a JSON record like this:{    "key1" : "val1",    "key2" : "val2",    "key3" : [1,2,3,4,5],    "key4" : [                    {                       "k1": "v1",                       "k2" : "v2"                     },                    {                       "k3": "v3",                       "k4" : "v4"                     },                 ]}Storing the preceding record in CSV format would look like this:key1, key2, key3, key4"val1", "val2",[1,2,3,4,5], [{"k1":"v1", "k2":"v2" },{"k3":"v3","k4":"v4"}]As you can see, this isn't the best way to store complex objects.JSON vs. YAMLIn terms of formatting options, YAML is the most similar to JSON since they have very similar structures. In fact, YAML is a superset of JSON, making it possible for parsing JSON data with a YAML parser.YAML records are more concise compared to the same JSON record and have a hierarchical structure based on indentation instead of curly braces. This makes it a suitable choice for a data exchange format. For example, the person object in YAML would look like this:Person:    name: John    age: 25The most popular application of YAML is for defining configuration files. Although the YAML format is gaining traction with applications currently, JSON remains more popular since it's already an established data exchange format.Benefits of JSONWorking with JSON is easy. Some of its most striking features include the following:Compact and easy to read. JSON is text-based, so it's easy for humans to read — and with a proper structure, it's easily parsed by machines.Key-value pair approach. The key-value approach in JSON is much easier to work with because each value corresponds to one key in the object. This also makes it much more efficient programmatically because, with a known key, data access is in constant time (big O(1)).Support for a wide range of data types. One of JSON's primary features is its flexibility in supporting different data types within the same object. A JSON object can have text, numbers, floats, arrays and nested JSON objects within the same structure, which can all be easily accessed, updated and removed. JSON can also provide levels of nesting records and even support object graphs and circular references.Widely supported. The developer community for JSON is very large. It's a de facto data format for all new systems, for all types of applications, processes and communication among systems.JSON Use CasesAs previously discussed, JSON is used for all types of use cases. The following are some of the most common:Programming languages. Even though JSON stands for JavaScript Object Notation, it's actually language-independent. All popular languages provide JSON parsers to work with JSON data.Web services and REST APIs. If you've ever worked with a REST API or web service, you know that the request body and the response returned by the API are in JSON format.System configuration files. JSON can define configuration files in certain instances, including when providing credentials for a user or initial startup parameters for an application.Infrastructure as a Service (IaaS). In the case of IaaS, the entire infrastructure can be concisely defined as a JSON document and submitted for deployment. JSON also enables version control over the infrastructure, and any changes to the infrastructure can be easily tracked and managed using a version control system such as Git.JSON Web Tokens (JWT). JSON is used for JWT secure data transmission between two parties.Document data stores. Document data stores or databases are among the more recent applications of the JSON format, advocating the shift toward NoSQL databases, many of which are JSON based. MongoDB is a popular document database and an integral part of the MERN or MEAN stack technologies. JSON Document Database and How They WorkA JSON document database is a NoSQL database that has been designed to store and retrieve data as JSON records (i.e. JSON documents). With a JSON document database, you can store and retrieve data in JSON; however, like traditional SQL or relational databases, you get support for all standard SQL data types and indexing, connectivity and querying features.JSON document databases are widely popular because of the flexibility they offer to store and retrieve data in a semi-structured manner (e.g. JSON). In comparison, traditional SQL databases have to adhere to a fixed schema for storing data.In addition, with JSON document databases, developers can work with data in the same format their application code is in. This means they're ideal for rapidly evolving products or systems that don't adhere to a fixed schema. Ultimately, they help increase the development speed as compared to SQL or relational databases for which careful schema designing is required.JSON document databases also have wide applications, or in other words, they can be used as a data store for IoT systems, which have millions of random inserts and retrievals data, click streams, monitoring and logging data, and sensory data.Benefits of JSON Document DatabasesJSON document databases have numerous benefits, including the following:Flexibility. One of the most striking features of a JSON document database is its flexibility. Since these databases are based on JSON, a semi-structured data format, they can work with a wide variety of data types and require no upfront schema design. This means it can be used for various types of applications, such as content management applications and data cataloging services.Easy migrations. Because there's no schema to manage and everything adheres to the JSON structure, moving data between systems is relatively easy and typically only requires a JSON parser.Low-code and rapid development. Overall, the developer community is fairly familiar with JSON data, and since there's no schema to design or manage, developers can focus on implementing the core business logic rather than spending time managing databases or writing and optimizing SQL queries. As a result, products can be developed rapidly.No schema evolution challenges. Since there's no rigid schema involved that data has to adhere to, changes that occur in data over time can be easily incorporated within the system.Diverse data storage. All JSON data types are natively supported, which means you can store any type of data.Highly scalable. In a cloud environment, a document database can be configured in a completely serverless manner. This enables exceptional scalability and delivers elevated levels of availability.Performing Analytics on JSON Document DatabasesAs with any form of data, the ability to perform analytics and derive insights is important. With JSON data, it's common to use traditional SQL databases to store and perform analytics. However, these databases aren't designed for analytics and especially don't work well with semi-structured data, such as JSON, because they are required to have a fixed and well-defined schema.For instance, if you try to perform analytics on JSON data using an SQL database, you'll most likely experience the following issues:Complex infrastructure requirements. Because SQL databases aren't designed for JSON data, you have to design a complex infrastructure that can appropriately scale up or down to handle the analytical workloads. This is particularly important when working with operational data due to its rapidly changing nature or when performing analytics in real time.Development of complex extract, transform, load (ETL). Since JSON is not natively supported, data needs to be subjected to ETL processes to transform it into a more suitable format (*ie* transforming JSON to SQL tables and then loading it into the SQL database, which can then be used in analytical jobs).Problems storing a variety of data. Data conforming to a specific schema restricts what can and cannot be stored in a database. For big data analytics, the data from heterogeneous sources vary widely and, in most cases, cannot conform to one format or schema. This limits the spectrum of different types of data that can be stored.Inability to perform real-time analytics. Most of the time, data is analyzed using batch processes, resulting in delayed rather than real-time analytics.Schema evolution challenges. The structure of data changes over time, and with rapidly changing data, the structure becomes an even greater concern. The structure then needs to be managed so that you can use the data for practical purposes. This becomes a big challenge for SQL-based analytical systems that conform to a fixed schema.Due to these aforementioned issues, a better approach would be to use JSON document databases for performing analytics. While they may not be the best data store, JSON document databases have proven to perform exceptionally well for analytics, especially for real-time and rapidly changing data.For instance, JSON document databases reduce infrastructure complexity and can easily scale both vertically and horizontally as the workload increases or decreases. Moreover, with a serverless configuration, there is hardly any infrastructure or governance required.In addition, the data from real-time systems is delivered in JSON, which can be naturally processed by a JSON document database. They're also flexible and natively supported, which means you can easily make changes in the schema and can store all types of data due to their flexible nature.Because JSON is native to the database, complex ETL processes aren't needed and insights can be generated in real time.As you can see, there are a lot of benefits to using JSON-based document databases. JSON has proven to be great for a wide variety of data, different types of platforms and all kinds of use cases — whether it be for storing data or performing analytics on the data.ConclusionIn this article, you learned all about JSON and why it's one of the most popular data exchange formats. In addition, you learned about the benefits of JSON document databases, especially when it comes to performing data analytics.If you want to power your modern data-intensive applications with a solution that can handle both operational workloads and data analytics, you should check out SingleStoreDB. It's based on a distributed SQL architecture that can deliver tens of milliseconds of performance and work for both operational and analytical workloads within the same engine. It works exceptionally well with JSON-intensive applications and provides performance at scale.Interested in powering real-time JSON workloads? Get started with SingleStore today.
Read Post
Ensuring Stability in Uncertain Times: Why MariaDB Users Should Consider Migrating to SingleStoreDB
Product

Ensuring Stability in Uncertain Times: Why MariaDB Users Should Consider Migrating to SingleStoreDB

MariaDB has long been a popular choice for an open-source relational database due to its simplicity and flexibility.However, with the exponential growth of data and the increasing demand for real-time processing, many users are searching for new solutions to optimize their database performance. Recent news regarding MariaDB's plummeting valuation and the company's struggles to raise funds has left some customers concerned about the future stability of their database infrastructure. In this blog post, we will explore the benefits of migrating from MariaDB to SingleStoreDB, emphasizing our commitment to providing a reliable, high-performance alternative and straightforward migration. 6 Reasons to Migrate From MariaDB to SingleStoreDB 1. Scalability and Performance. One of the most compelling reasons to consider SingleStoreDB over MariaDB is its ability to scale horizontally across multiple nodes, providing better performance and resource utilization. SingleStoreDB's distributed architecture enables you to scale horizontally and add more nodes to your cluster as your needs grow, ensuring a seamless and effortless scaling process. MariaDB is primarily designed for single-node deployment, and scaling out requires more complex configurations and increased effort.Customer Story: Emmy-Winning SSIMWAVE Chooses SingleStore for Scalability, Performance and More2. Hybrid Data Processing. SingleStoreDB offers a unique hybrid data processing model, combining the best of both row-based and columnar storage. This approach allows SingleStoreDB to deliver excellent performance for both transactional (OLTP) and analytical (OLAP) workloads. In contrast, MariaDB relies on a traditional row-based storage model, which may not provide the same level of performance for analytical queries.3. Real-time Analytics. As businesses increasingly rely on real-time data analytics, having a database that can handle real-time processing becomes essential. SingleStoreDB is designed to enable real-time analytics on rapidly changing data, thanks to its in-memory row store, which ensures low-latency data processing. MariaDB does not have native support for real-time analytics, which may result in slower query performance and increased latency.Webinar: Anuvu Speed Tests MariaDB, Snowflake & SingleStoreDB4. Advanced Compression. SingleStoreDB leverages advanced data compression techniques to minimize storage and I/O overhead, making your database more efficient and cost effective. With its columnar storage and advanced encoding schemes, SingleStoreDB can compress data up to 80% more effectively than MariaDB. This compression not only saves storage space, but also improves query performance by reducing the amount of data read from disk.5. Built-in Machine Learning and AI Capabilities. SingleStoreDB has built-in support for popular machine learning and AI libraries like TensorFlow, PyTorch and scikit-learn. This integration allows you to perform advanced analytics and machine learning tasks directly within the database, eliminating the need for complex ETL pipelines and external processing. While MariaDB does support some machine learning functionality through plugins, it lacks the same level of native integration and support found in SingleStoreDB.6. Ease of Migration and Support. Migrating from MariaDB to SingleStoreDB is relatively straightforward, as both databases use SQL as their query language. Additionally, SingleStore provides a variety of tools and resources to help you migrate your data and applications, including migration guides, tutorials and dedicated support. Our goal is to ensure a smooth transition for businesses seeking stability amidst uncertainty.Customer Story: Nucleus Replaces MariaDB with SingleStore and Improves Query Speeds Up to 20x, Fueling Market ExpansionConclusionIn these challenging times, it's essential to make decisions that promote stability and performance for your business. Migrating from MariaDB to SingleStoreDB can bring significant benefits in terms of performance, scalability and efficiency. With its hybrid data processing, advanced compression, real-time analytics capabilities and seamless migration process, SingleStoreDB is a powerful and versatile solution for modern database requirements. We are committed to helping businesses maintain continuity and thrive in the face of change, offering a smooth transition to a reliable and high-performance alternative to MariaDB.Our team is here to guide you step-by-step. To learn more and get started migrating from MariaDB to SingleStoreDB, request a demo today.
Read Post
How to Write & Tune Queries in SingleStoreDB
Product

How to Write & Tune Queries in SingleStoreDB

SingleStoreDB is a high-performance, real-time distributed SQL database that processes queries at lightning-fast speeds.It is designed to handle large volumes of data with ease, and also handles complex queries that require real-time analysis. Typically, when a customer transitions to SingleStoreDB, they are able to achieve performance improvements of 10-100x on their queries.However, writing and tuning efficient queries is key to getting the most out of SingleStoreDB— something we’ll cover in this blog post.Writing Queries in SingleStoreDBWriting queries in SingleStoreDB is similar to writing queries in any other SQL-based database system. SingleStoreDB is MySQL wire protocol compatible, meaning you can use the standard ANSI SQL syntax to write queries that retrieve data from the database.Here are some tips on how to write efficient queries in SingleStoreDB:Use Shard Keys. The shard key is a table column (or multiple columns) used to control how the rows of a table are distributed. You should create shard keys on columns that are high cardinality values, frequently used in WHERE clauses or JOIN conditions. Here is a link to our documentation on understanding shard key selection.Use Sort Keys. The sort key is an index that groups rows of columnstore tables into logical segments, where each segment contains data for many rows. You should define a sort key to help with segment elimination for optimal query performance. Check out this link to our documentation that goes into more details on sort key functionality.Use Indexes. Indexes are critical in SingleStoreDB because they allow the database to quickly locate data based on specific criteria. SingleStoreDB supports a wide range of indexes — the full list of which can be found here on our docs page.Limit the Number of Columns Returned. Only retrieve the columns that you need. This will reduce the amount of data that needs to be retrieved from the database, which improves query performance.Use Aggregate Functions. Use aggregate functions such as SUM, COUNT, AVG and MAX to perform calculations on large datasets. This can be much faster than retrieving all the data, and performing the calculation in your application code.Avoid SELECT *. Retrieving all columns using SELECT * can be inefficient, especially if you only need a few columns. Specify the columns you need instead of using SELECT *.Tuning Queries in SingleStoreDBTuning queries in SingleStoreDB involves optimizing the queries to improve their performance. Here are some tips on how to do it:Analyze the Visual Explain. SingleStoreDB provides a visual guide that outlines the query execution plan, showing users how the database engine executes a query. Analyzing this plan can help you identify inefficiencies in the query and optimize it accordingly.Use EXPLAIN Statement. The EXPLAIN statement in SingleStoreDB can be used to obtain the query execution plan for a specific query. This can help you identify potential performance bottlenecks, and optimize your query accordingly.Leverage Query Plancache. In SingleStoreDB, when our users execute a query, it activates code generation, gets optimized and translated into lower level machine language.  After code generation, the compiled query plans are saved for later use in a plancache. Saving the query in the plancache improves query performance by reducing the number of times the query engine needs to run on a commonly executed query.Partition Configuration. SingleStoreDB uses a partitioning scheme called sharding, which splits data across multiple nodes in a cluster. Each shard contains a subset of the data, and each node can contain multiple shards. Depending on your query shapes, adjusting the partition to vCPU ratio can help improve performance of your queries. For example, workloads that require higher ingest performance, it is optimal to increase the number of partitions. For workloads where concurrency is the primary factor, reducing the number of partitions can help with query performance.More details on query tuning can be found in our SingleStore docs.ConclusionSingleStoreDB is a high-performance, distributed SQL database that can process queries at blazing-fast speeds — and writing efficient queries and tuning them is critical to getting the most out of SingleStoreDB. By following the tips outlined in this blog post, you can write queries that are optimized for performance and tune them to improve their performance even further.With these tips, you can harness the full power of SingleStoreDB and get the most out of your data. Get started for free today.
Read Post
High Availability in SingleStoreDB Cloud
Product

High Availability in SingleStoreDB Cloud

SingleStoreDB Cloud enables and manages high availability for you, and SingleStoreDB Premium extends this to multi-AZ high availability. This blog post is intended to help customers understand the high availability features of SingleStoreDB Cloud, and to demonstrate failover when a problem occurs.Availability vs High AvailabilityI started my career as an Avionics Engineer. I had great exposure to aircraft systems that can safely tolerate component malfunctions. Since then, I’ve maintained what one could arguably describe as an unhealthy interest in systems failure. This has manifested itself in various ways: from reading NTSB reports and bingeing episodes of Air Crash Investigation, to reading Post Event Summaries and Post Incident Reviews published by public cloud providers. I also worked for a time with a company who specialised in providing highly available IT systems. But before discussing high availability in depth, we need a definition of “availability”.The Oxford English Dictionary defines availability as the ability of a system “to be used or obtained”. From a database perspective, it means the database system can serve its clients. Availability is typically measured in terms of uptime, normally expressed as a percentage. For example, a 99.99% availability Service Level Agreement (SLA) means a system can be down for just under 53 minutes in an entire year — without breaching the availability SLA.High availability (commonly abbreviated to HA)  is the ability of a system to provide higher uptime than normal. For example, in a dual server active passive cluster, if the active server fails the passive server takes over the active role. Service continues, and as a result, the uptime of the service is higher than if the service had been reliant on the single failed server. The clustering technology provides the high availability functionality.Any infrastructure component that forms part of a database management system can fail. Failure modes vary, but generally fall into these categories:Failure of a single machine. My own experience of these failures includes hardware failures like RAID controller, NIC and RAM failure, to software failures like a memory leak in endpoint protection software causing a server to cease all network communications. High availability technologies at the single machine level include dual power supplies, memory mirroring, dual port HBAs, RAID and NIC teaming. Note that localised failures can have a broad effect, including the NIC failure that brought down an airport’s radar system.A failure of more than one machine. This can include the failure of a shared device or service, like a storage array,  top of rack switch, power or cooling to one or more racks in a data hall or an infrastructure software bug — like the one that affected VMWare virtual machines. High availability can be attained by implementing “N” level redundancy, e.g. 2N or N+1 servers, switches, powering racks with dual electrical feeds, etc.Loss of an entire data centre. While loss of a data centre is less common than other failure modes, it can and does happen (examples include Northgate, OVHcloud and Twitter).Loss of an entire region. While it’s unlikely that all data centres in a broad geographical region would be lost, it is possible due to natural disasters, including large earthquakes, tsunamis or  heat waves. A regional outage can also be caused by civil unrest or war. Appropriate mitigations depend on availability requirements. Constraints may limit some mitigations, e.g. if data must reside in a particular geographical region. SingleStoreDB’s high availability architecture is limited to failure modes within regions (out-of-region disaster recovery is a topic for another day).High Availability in SingleStoreDB CloudOne of the design principles of SingleStoreDB is “no single point of failure,” and SingleStoreDB Cloud is highly available by default. It is important to know that there are two types of nodes in SingleStoreDB: aggregator nodes and leaf nodes.Aggregators route queries to leaf nodes, aggregate results, and send results to clients. Aggregator high availability in SingleStoreDB Cloud is achieved using cloud provider load balancers. They distribute traffic to aggregators, and detect and recover from aggregator failure. This method of attaining high availability is common, is not unique to SingleStoreDB Cloud and is not quite the focus of this blog post.A leaf node is responsible for a subset of a cluster’s data. There is always more than one leaf node in a SingleStoreDB Cloud Workspace. SingleStoreDB automatically distributes data across leaf nodes. High availability for leaf nodes is attained by replicating data across two availability groups. An availability group is a group of leaf nodes. Each availability group has a copy of every data partition, some as masters and some as replicas. If a single node should fail, there is still a copy of the data in the system.Master partitions are evenly distributed across workspace leaf nodes, and replicas are spread evenly across the opposite availability group. This ensures that when a failover occurs, the additional load is evenly balanced across the workspace, rather than having a single node take the entire additional load of the failed node. Should a node fail, SingleStoreDB Cloud keeps the databases online by automatically promoting the relevant replica partitions to master partitions. The following diagrams show partition distribution before and after a failover event. Master partitions are evenly dispersed across nodes in the first diagram. Replicas of master partitions in an availability group are spread evenly across leaf nodes in the other availability group. For example, “db_0 master” on Node 1 has a replica “db_0 replica” on Node 2. Similarly, db_1 has a replica on Node 4.
Read Post
SingleStore Notebook: New Features for Analytics, Machine Learning & Data Exploration
Product

SingleStore Notebook: New Features for Analytics, Machine Learning & Data Exploration

We are excited to announce the launch of our new Notebook feature, which allows you to perform complex analytics, machine learning and data exploration tasks with ease and flexibility.What Are Notebooks?Notebooks are interactive web-based tools that allow users to create and share documents containing live code, visualizations and narrative text. They have become increasingly popular in the data science community as they provide an efficient way to explore, analyze and visualize data, making it easier to communicate insights and results.SingleStore's Notebook feature is based on the popular Jupyter Notebook, which is widely used in data science and machine learning communities. The SingleStore Notebook extends the capabilities of Jupyter Notebook to enable data professionals to easily work with SingleStore's distributed SQL database while providing great extensibility in language and data sources.Key Features of SingleStore NotebookThe new SingleStore Notebook feature offers several key capabilities that make it a powerful tool for data exploration and analysis:Native SingleStore SQL Support. The SingleStore Notebook allows users to query SingleStore's distributed SQL database directly from within the notebook interface — without having to define any connection string, enabling a faster and more efficient way to do data exploration and analysis. Overall, it’s more secure and simpler.For example, in a regular notebook, you would have to use the following code:UserName='admin'Password='MyPassword'DatabaseName='MyDatabase'URL='svc-11138a4....singlestore.com:3306'db_connection_str ="mysql+pymysql://"+UserName+":"+Password+"@"+URL+"/"+DatabaseNamedb_connection = create_engine(db_connection_str)In SingleStoreDB, you only need to select a default database and the following:db_connection = create_engine(connection_url)SQL/Python Interoperability. Using SQL and Python interoperability is important so that you can query a database, using the output in a Python data frame and vice versa.When you query a SingleStoreDB, you can use the following in a cell:%%sql result1 <<USE test;SELECT * from MyTableAnd use result1 output in a Pandas dataframe:df = pd.DataFrame(result1)Collaborative Workflows. Users can save and share Notebooks with other team members, allowing for collaborative data analysis and exploration. You can save your Notebook in your personal folder, or share it with other members in your organization.Interactive Data Visualization: The Notebook feature includes support for popular data visualization libraries, like Matplotlib and Plotly, enabling users to create interactive charts and graphs directly within the notebook.Using SingleStore NotebookFor more details on how to use Notebooks, check our documentation page. You can create your notebook from our portal experience with two entry points as highlighted in the following image:
Read Post
AI-Powered Semantic Search in SingleStoreDB
Engineering

AI-Powered Semantic Search in SingleStoreDB

SingleStoreDB can supercharge your apps with AI. In this blog, we demonstrate how semantic search can be performed on your data in SingleStoreDB — including code examples and a motivating case study from Siemens, a SingleStore customer.What Is Semantic Search?At its core, semantic search relies on natural language processing (NLP) to accurately interpret the context and intent behind a user's search query. Unlike traditional keyword-based search methods, semantic search algorithms take into account the relationship between words and their meanings, enabling them to deliver more accurate and relevant results — even when search terms are vague or ambiguous. Semantic search relies heavily on machine learning algorithms to identify language patterns and understand concept relationships. Embeddings are a key tool in semantic search, creating vector representations of words that capture their semantic meaning. These embeddings essentially create a "meaning space," where words with similar meanings are represented by nearby vectors.What Is SingleStoreDB?SingleStoreDB is a real-time, distributed SQL database designed to handle both transactional (OLTP) and analytical (OLAP) within a unified engine. With support for fast writes and efficient querying, SingleStoreDB excels at managing large-scale transactional workloads and delivering real-time analytics.SingleStoreDB is available as a cloud service (SingleStoreDB Cloud) or for self-hosted installation.SingleStoreDB is also a multi-model database and provides vector database extensions, in addition to support for relational, semistructured, full-text, spatial and time-series data. Its vector capabilities include built-in functions for vector similarity calculations such as cosine similarity and Euclidean distance. These functions can be leveraged in SQL queries to perform similarity calculations efficiently on large volumes of vector data. Moreover, filters on metadata (other descriptive data about objects for which you've created vector embeddings) can be easily intermixed with vector similarity search, by simply using standard SQL WHERE clause filters. An easy way to get started is to sign up for a SingleStoreDB Cloud trial — and get $500 in credits.Is SingleStoreDB the Optimal Foundation for Semantic Search in Your Applications?SingleStoreDB's patented Universal Storage supports both OLTP and OLAP workloads, making it ideal for semantic search use cases. Adding embeddings to your data is simple — just place the vector data in a binary or blob column, using json_array_pack() or unhex() functions.Efficient retrieval of high-dimensional vectors and handling of large-scale vector similarity matching workloads are made possible by SingleStoreDB’s distributed architecture and efficient low-level execution. You can also rely on SingleStoreDB’s built-in parallelization and Intel SIMD-based vector processing to take care of the heavy lifting involved in processing vector data. This enables you to achieve fast and efficient vector similarity matching without the need for parallelizing your application or moving lots of data from your database into your application. We previously benchmarked the performance of our vector matching functions in our blog, “Image Matching in SQL with SingleStoreDB.” We ran the dot_product function as a measure of cosine similarity on 16 million records in just 5 milliseconds.With its support for SQL, SingleStoreDB provides developers with a familiar and powerful interface for building semantic search applications. SQL can be used to create complex queries and perform advanced analytics on the text data stored in SingleStoreDB. In fact, with just one line of SQL, developers can run a semantic search algorithm on their vector embeddings, as demonstrated in the following example.SingleStoreDB's ability to update and query vector data in real-time enables us to power applications that continuously learn and adapt to new inputs, providing users with increasingly precise and tailored responses over time.  By eliminating the need for periodic retraining of machine-learning models or other time-consuming processes, SingleStoreDB allows for seamless and efficient provision of real-time insights.See Semantic Search with SingleStoreDB in Action!The following tutorial will guide you through an example of adding embeddings to each row in your SingleStoreDB database using OpenAI APIs, enabling you to run semantic search queries in mere milliseconds using Python. Follow along to add embeddings to your dataset in your desired Python development environment.Our goal in this example is to extract meaningful insights from a hypothetical company’s employee review dataset by leveraging the power of semantic search. By using OpenAI's Embeddings API and vector matching algorithms on SingleStoreDB, we can conduct sophisticated queries on the reviews left by employees about their company. This approach allows us to delve deeper into the true sentiments of employees, without being constrained by exact keyword matches.Step 1: Install and import dependencies in your environmentInstall the following dependencies in your development environment using pip3.pip3 install mysql.connector openai matplotlib plotly pandas scipyscikit-learn requestsThen start python3 —and at the python3 command prompt, import the following dependencies.import osimport openaiimport jsonfrom openai.embeddings_utils import get_embeddingimport mysql.connectorimport requestsStep 2: Create an OpenAI account and get API connection detailsTo vectorize and embed the employee reviews and query strings, we leverage OpenAI's embeddings API. To use this API you will need an API key, which you can get here. You'll need to add a payment method to actually get vector embeddings using the API, though the charges are minimal for a small example like we present here. Once you have your key, you can add it to your environment variables as OPENAI_API_KEY.os.environ["OPENAI_API_KEY"] = 'youropenAIAPIKey'openai.api_key = os.getenv("OPENAI_API_KEY")Step 3: Sign up for your free SingleStoreDB trial and add your connection details to your Python environmentWe'll go through the example using SingleStoreDB Cloud, but of course you can self-host it and run the example in a similar way. If you're going to use our cloud, sign up for your SingleStoreDB Cloud trial and get $500 in credits.First, create your workspace using the + icon next to your desired workspace group. S-00  is sufficient for this use case.
Read Post
How to Build a Charismatic Twitter Chatbot in a Few Hours
Product

How to Build a Charismatic Twitter Chatbot in a Few Hours

Discover the secret to building an interactive conversation chatbot on Twitter with state-of-the-art natural language processing in this technical tutorial — and create a chatbot that can respond to users with the appropriate context and personality.But why stop there? This tutorial also dives into advanced capabilities using the powerful combination of SingleStoreDB and MindsDB, instead of direct API integration with the GPT-4 model. Take your chatbot game to the next level — and learn how to create a more personalized, engaging user experience.In this technical tutorial, we'll show you how to create a chatbot that can interact with users on Twitter, responding with the appropriate context and personality using state-of-the-art natural language processing.To help you get started, we'll use the example of @Snoop_Stein, a Twitter bot that combines the unique personalities of Snoop Dogg and Albert Einstein. By tweeting @Snoop_Stein, users can engage with a rapping physicist who will respond with witty and intelligent remarks, all thanks to the advanced capabilities of the latest OpenAI GPT-4 model.
Read Post
SingleStoreDB & Google Cloud: The Future of Real-Time Applications
Product

SingleStoreDB & Google Cloud: The Future of Real-Time Applications

SingleStore is partnering with Google Cloud Platform to power the next generation of real-time, cloud native applications.Cloud computing has become increasingly prevalent in recent years, as more businesses and individuals move their data and applications to the cloud. This trend is driven by the many benefits of cloud computing including cost savings, simplicity, scalability and flexibility.According to Gartner, more than half of enterprise IT spend will shift to cloud computing by 2025. The reason is simple — outsourcing computing infrastructure allows organizations to focus their limited resources on differentiating their business within their given market segment, rather than maintaining computing infrastructure (i.e. digital plumbing).SingleStore is partnering with Google Cloud Platform to power the next generation of real-time, cloud native applications.SingleStoreDB CloudSingleStoreDB is a cloud-native, distributed SQL database — ideal for real-time applications that require low latency queries and fast ingestion at scale. SingleStoreDB also provides a  unified, performant database for modern workloads.SingleStore’s patented Universal Storage technology leverages three tiers for data storage — RAM, SSD and object storage. Universal Storage intelligently moves data between the three storage tiers based on data access patterns. This helps SingleStoreDB serve both transactional (OLTP) and analytical (OLAP) workloads from the same database, mitigating bottlenecks and improving performance — without sacrificing durability. While most databases are traditionally either OLAP or OLTP, SingleStoreDB is the world leader in HTAP (hybrid transactional & analytical processing) – enabling organizations to read, write and reason with data in real time, in one place.
Read Post
Looking for a Solution to NoSQL Analytics Limitations? Here’s How to Map NoSQL JSON to SingleStoreDB Tables
Engineering

Looking for a Solution to NoSQL Analytics Limitations? Here’s How to Map NoSQL JSON to SingleStoreDB Tables

NoSQL has raised a big interest in database trends in the past 10 years, commonly referenced for scalable databases and pure OLTP speed for lookup queries. But when it comes to running analytics queries — or more complex OLTP queries —  NoSQL starts to fail.Why? Because NoSQL queries are limited to key value queries, which are very fast but sometimes require you to add an additional layer of computation on the application side to achieve expected results — where with SQL, you can simply query the result you want. Today, with the rise of distributed SQL databases like SingleStore, it’s easier to handle scalability issues you might encounter with legacy NoSQL databases.In this blog, we will go through best practices to move from a NoSQL database to SingleStoreDB — including how to quickly import JSON data into SQL tables. But first, let’s see more on SingleStoreDB. What Is SingleStoreDB?SingleStore is a distributed SQL database that handles both analytical (OLAP) and transactional (OLTP) workloads in the same table type. SingleStoreDB provides fast ingestion and high query speed for complex OLTP and OLAP queries. It provides a robust, scalable solution that is levels above what other legacy single node databases can do. There is also a managed service that can be deployed on AWS, GCP or Microsoft Azure.Moving Past NoSQL LimitationsNoSQL databases are more scalable than legacy SQL databases, handling hundreds of millions of transactions in a high concurrency environment for pure OLTP queries. But today, as data rules  the world, you need the best insights from your database via analytics dashboards, or complex OLTP queries. Unfortunately, these insights can not be obtained properly by a NoSQL database, so users often have to add a new layer to handle analytics with a data warehouse — then also add another OLTP SQL database to handle some more complex SQL queries.All of that will result in data movement, ETL processes and database sprawl which leads to high latency and poor user experiences. What we’ve seen is a progression from a SQL to NoSQL era, leading us to where SingleStore is presently — the NewSQL era.Mapping NoSQL JSON to SingleStore TablesSingleStoreDB provides excellent support for JSON, especially since our latest product improvements in version 8.0  Even better, you can ingest data directly from JSON files stored in cold storage — with high ingest speeds being one of the strongest capabilities in SingleStoreDB.To ingest data, we’ll use SingleStore Pipelines.Let’s move some data from a NoSQL database to a AWS S3 bucket, ingesting it into SingleStoreDB Here is  basic .json data stored on a bucket called nosql migration :{ "id": "1",  "name": "John Smith",  "job": "Director" ,  "Address": { "Street": "Anger street",    "StreetNum": "32 bis",    "City": "London",    "Country": "United Kingdom"  }}And here is the corresponding table in SingleStoreDB :CREATE TABLE employee (id int,name varchar(32),job varchar(32),gender varchar(10),address JSON,Shard key (id));We can also provide a better table definition to have all address information directly in that table, and ingest this information later :CREATE TABLE employee (id int,name varchar(32),job varchar(32),gender varchar(10),address JSON,street varchar(32),streetnum varchar(32),city varchar(32),country varchar(32),Shard key (id));Now, as we define the equivalent table in SingleStoreDB, let’s ingest it via a Pipeline:CREATE PIPELINE pipeline_migration_nosql ASLOAD DATA S3 'nosqlmigration'CONFIG '{"region": "eu-west-1"}'CREDENTIALS '{"aws_access_key_id": "aws_access_key_id",             "aws_secret_access_key": "your_aws_secret_access_key",              "aws_session_token": "your_aws_session_token"}'INTO TABLE employee(id <- id,name <-name,job <- job,gender <- gender,address <- address,street <- address::street,streetnum <- address::streetnum,city <- address::city,country <- address::country)FORMAT JSON;We want to extract the JSON object address and ingest it directly into table fields. That way, we can easily use these fields to run more advanced queries. Now, let’s see what we can do if we have a more nested JSON with an array. It’s pretty common for some NoSQL databases to have a collection of items with one array as field type.Handling a JSON ArrayAn array in JSON is a list of keys and values. There are multiple options to import it properly into SingleStoreDB tables. The best options depend on which type of operation (aggregation, lookup select, etc.) and how often you want to access these items in an array.Let’s use this nested JSON as an example :{ "id": "1",  "name": "John Smith",  "job": "Director" ,  "address": { "street": "Anger street",    "streetnum": "32 bis",    "city": "London",    "country": "United Kingdom"  }, "experience": [ { "role": "Lead Engineer",    "company": "Json",    "yoe": 3  },{ "role": "Senior Engineer",    "company": "Avro",    "yoe": 3  },{ "role": "Junior Engineer",    "company": "Parquet",    "yoe": 4  }  ]}Option 1: Import it as a field JSONThis option is very performant (and even better now with our 8.0 release!) if you want to complete simpler operations, like lookups or aggregations.The idea is to store the array field into a JSON type, directly using the JSON field in your query. The drawback of this method is that you lose the interesting table structure of a SQL database.Here is an example of table definition and  query to access the specific value you are looking for:CREATE TABLE employee (id int,name varchar(32),job varchar(32),gender varchar(10),address JSON,street varchar(32),streetnum varchar(32),city varchar(32),country varchar(32),experience JSON,shard key (id));Query 1 : Find previous experiences of employee 1SELECT emp.name,emp.job, exp.table_col::$role as 'role',exp.table_col::$company as 'company'FROM employee emp , TABLE(JSON_TO_ARRAY(emp.experience)) expWHERE emp.id=1;Query 2 : Aggregate the total years of experience for employee 1SELECT emp.name, SUM(exp.table_col::yoe) as 'Total YoE'FROM employee emp , TABLE(JSON_TO_ARRAY(emp.experience)) expWHERE emp.id=1;Option 2: Create a table to represent a JSON ArrayThe second option is to fully use the relational database SQL to build tables that represent these arrays. Here is the second table that will represent the array (employee experience):CREATE TABLE experience (id_employee int,role varchar(32),company varchar(32),yoe int,Shard key (id_employee));This table structure makes your data more readable and usable for your application or tool. You will need to re-write the pipeline that ingests the JSON file to interpret this array, and insert into the corresponding tables. To make it work, we will write a stored procedure that will insert into two tables, redirecting the ingestion from the pipeline into this procedure.Here is an example of a stored procedure that will insert into these two tables:-- Stored Procedure for ingesting json array in multiple tableDELIMITER //CREATE OR REPLACE PROCEDURE employee_proc(batch QUERY(idvarchar(32),name varchar(32),job varchar(32),gender varchar(32),address JSON,street varchar(32),streetnum varchar(32),cityvarchar(32),country varchar(32),experience JSON))ASDECLARE json_array ARRAY(json); e json;BEGINFOR batch_record IN COLLECT(batch) LOOP   BEGIN INSERT INTO employee(id, name,job,gender,address,street,streetnum,city,country) VALUES(batch_record.id,batch_record.name,batch_record.job,batch_record.gender,batch_record.address,batch_record.address::$street,batch_record.address::$streetnum,batch_record.address::$city,batch_record.address::$country); json_array = JSON_TO_ARRAY(batch_record.experience); FOR i IN 0 .. LENGTH(json_array) - 1 LOOP   e = json_array[i];   INSERT INTO experience(id_employee,role,company,yoe) VALUES(batch_record.id,e::$role,e::$company,e::yoe); END LOOP;   END; END LOOP; END //DELIMITER ;And here is the new pipeline SQL statement:-- Pipeline for ingesting json into a Stored procedureCREATE PIPELINE pipeline_migration_nosql ASLOAD DATA S3 'nosqlmigration'CONFIG '{"region": "eu-west-1"}'CREDENTIALS '{"aws_access_key_id": "your_aws_access_key_id",            "aws_secret_access_key": "your_aws_secret_access_key",             "aws_session_token": "your_aws_session_token"}'INTO PROCEDURE employee_procFORMAT JSON;Query 1: Find experiences of employee 1SELECT emp.name,emp.job, exp.role , exp.companyFROM employee emp JOIN experience exp ON emp.id = exp.id_employeeWHERE emp.id=1;Query 2: Aggregate the total number of years experience for employee 1SELECT emp.name,SUM(exp.yoe) as 'Total YoE'FROM employee emp JOIN experience exp ON emp.id = exp.id_employeeWHERE emp.id=1;Performance Comparison Between Both OptionsBoth options can achieve very fast performance in SingleStore — but they came with some drawbacks. Option one is good if the query is light on the JSON array. Option two makes your data schema more readable, and offers more possibilities in terms of queries shapes and computation. But option two will have an impact on ingest speed and making joins between two big tables can be costly in terms of CPU. Here is a benchmark chart that shows performance for both options for the two queries described above. The experience table has an average of three times the number of rows of the employee table.
Read Post