O’Reilly Radar Podcast: The 2017 Machine Learning Outlook
Data Intensity

O’Reilly Radar Podcast: The 2017 Machine Learning Outlook

O’Reilly Media Editor, Jon Bruner, recently sat down with SingleStore VP of Engineering, Drew Paroski, and SingleStore CMO, Gary Orenstein, to discuss the rapid growth and impact that machine learning will have in 2017. In this podcast, Paroski and Orenstein share examples from companies using real-time technologies to power machine learning applications. They also identify key trends driving the adoption of machine learning and predictive analytics. Listen Here
Read Post
Protecting Against the “Insider Threat” with SingleStore
Engineering

Protecting Against the “Insider Threat” with SingleStore

Today, a number of cyber attacks are carried out by malicious insiders or inadvertent actors. Whether a large government agency or commercial company, protecting your data is critical to successful operations. A data breach can cost significant amounts of lost revenue, a tarnished brand, as well as lost customer loyalty. For government agencies, the consequences can be more severe. SingleStore has a comprehensive security focus, including the ability to protect sensitive data against the “Insider Threat”. Specifically, we this post outlines best practices for security at the database tier. The first step to secure data infrastructure is a database platform such as SingleStore that provides enterprise level security. Today, large government agencies and commercial companies count on SingleStore to secure their most sensitive data. There are three critical pillars to securing your data in a database. Separation of Administrative Duties Data Confidentially, Integrity, and Availability 360° View of all Database Activity In the rest of this post, we will focus on the Separation of Administrative Duties. The primary goal here is to disintermediate the Database Administrator (DBA) from the data. Central to this is to not allow a DBA to grant themselves privileges without approval by a 2nd Administrator. There should also be special application specific DBAs separate from Operations and Maintenance Administrators. The developers and users should not be able to execute DBA tasks. This can all be done by setting up the following recommended roles. At the organization level: 1. Compliance Officer Manages all role permissionsMost activity occurs at the beginning project stagesShared resource across the organization 2. Security Officer Manages groups, users, passwords in SingleStoreMost activity occurs at the beginning project stagesShared resource across the organization At the project level: 1. SingleStore Maintenance and Operations DBA Minimal privileges to operate, maintain and run SingleStoreCan be shared resource across projects 2. DBA per Database Application (Application DBA) Database and schema ownerDoes not have permissions to view the data 3. Developer/User per Database Application Read and write data as permitted by the Application DBADoes not have permission to modify database Once the roles and groups are established, you assign users to these groups. You can then setup row level table access filtered by the user’s identity to restrict content access at the row level. For example, an agency may want to restrict user access to data marked at higher classification levels (e.g. Top Secret) that their clearance level allows. See RBAC User Guide and Row Level Security Guide in SingleStore documentation for details. Lastly and most importantly, your security environment can be permanently locked down once deployed. This is called “Strict Mode” and is a configuration setting that is irreversible once enabled. This ensures against the rogue admin modifying configuration in a production deployed system. See Strict Mode Guide in the SingleStore documentation for details. Too frequently data architects have had to compromise on security for select applications. With SingleStore, you can achieve real-time performance and distributed SQL operations while maintaining the utmost in security controls. Visit www.singlestore.com/free to try SingleStore today!
Read Post
Forrester
SingleStore Recognized In

The Forrester WaveTM

Translytical Data
Platforms Q4 2022

451 Research Webcast: In-Memory Computing Market Predictions 2017
Trending

451 Research Webcast: In-Memory Computing Market Predictions 2017

Adoption of in-memory technology solutions is happening faster than ever. This stems from a three pronged demand – first, a greater number of users, analysts, and businesses need access to data. Second, the number of transactions is increasing globally, so companies need faster ingest and analytics engines. Finally, performance inconsistencies are the nail in the coffin for companies competing in the on-demand economy – these enterprises need the responsiveness in-memory technology provides. In addition to these rising demands for real-time data access and analytics, several other factors contribute to in-memory technology adoption as outlined in the following graphic:
Read Post
The State of Real Time for Property and Casualty Insurers
Data Intensity

The State of Real Time for Property and Casualty Insurers

In some industries, a hesitance remains in recognizing the commodification forces of real-time solutions. These industries often rely on orthodox tenets as barriers to marketplace entry, such as regulatory compliance, traditional value propositions, brand recognition, and market penetration. The term “ripe for disruption” often characterizes these industries and their respective leaders. Arguably, an illustrative industry in the midst of responding to commodification, adapting to real-time technology, and fearing disruption is the Property and Casualty Insurance industry. An examination of this industry’s most commodified products and services serves as a litmus test for our common understanding of the state of real time. Let’s consider the most commodified line of business: personal auto insurance. Here, we find that many traditional insurers capture little to no real-time data about driver behavior and vehicle performance. Instead of real-time systems that capture, analyze, learn, and predict, these insurers rely on expensive national advertising campaigns, extensive commission networks, quarter-hour policy quotes, lengthy claim processes, long call center queues, and monthly billing cycles. Bringing IoT to Property and Casualty Metromile exemplifies a Property and Casualty insurer with a modern, transformative model for personal auto insurance. Using a smartphone app and an Internet of Things (loT) telematic device called Pulse that plugs into a car’s diagnostic port, Metromile owns the customer journey. Data-driven insights and services embody the digital experience for customers beyond common coverages: gas usage optimization, engine light demystification, parked vehicle location, and street sweeping parking ticket avoidance. An obvious technological challenge for system-of-record businesses like Property and Casualty insurers is real-time processing at scale. Even with hybrid – on-premise and cloud – datacenter infrastructures, many enterprise messaging and database technologies struggle with maintaining linear scale at commodity costs when attempting to process, analyze, learn, and predict from streaming data in real time. Why Enterprise Systems Struggle to Adapt to Real-Time The reasons why enterprise systems struggle to adapt to real-time include: Event-based messaging and service-oriented architectures remain overly verbose and complex for internal and external integrations.Batch jobs that extract, transform, and load data require additional computing resources to schedule, coordinate, and monitor the jobs.Disk-based databases read and write only as fast as the best non-volatile solid state disks and IO caches perform. Light at the End of the Tunnel In examining the state of real-time through the lens of the Property and Casualty insurance industry, there is good news! Competitors are taking notice of the technology behind usage-based insurance. In 2016, several US insurers now underwrite auto-insurance policies requiring a telematic device (Nationwide, Progressive, and State Farm). To better segment risk profiles and enhance claim processing, 36% of auto insurance insurers expect to implement usage-based insurance products by 2020. This trend is representative of enterprise businesses looking to benefit by using IoT devices and operating at real-time. For enterprises looking to compete today, real-time technology is available on commodity hardware. With infinite iterator messaging systems like Apache Kafka paired with real-time database like SingleStore, today’s traditional enterprises can eliminate batch jobs, reduce integration complexity, improve network operations, and replace disk-based I/O operations with magnitudes faster, in-memory operations. Such systems produce, consume, and ingest data at millions of instances times per second while simultaneously analyzing, learning, predicting, and responding to real-time data. Most importantly, they do it at linear scale, meaning that the costs to scale as data and services grow remain linear. The only question now is how enterprises like those in the Property and Casualty insurance industry and in many other industries will harness the power of massively parallel, distributed, in-memory, SingleStore database technology to make possible real-time products and solutions for their customers.
Read Post
SQL: The Technology That Never Left Is Back!
Trending

SQL: The Technology That Never Left Is Back!

The Prelude The history of SQL, or Structured Query Language, dates back to 1970, when E.F. Codd, then of IBM Research, published a seminal paper titled, “A Relational Model of Data for Large Shared Data Banks.” Since then, SQL has remained the lingua franca of data processing, helping build the relational database market into a \$36 billion behemoth. The Rise And Fall of NoSQL Starting in 2010, many companies developing datastores tossed SQL out with the bathwater after seeing the challenges in scaling traditional relational databases. A new category of datastores emerged, claiming a new level of scalability and performance. But without SQL they found themselves at a loss for enabling easy analytics. Before long, it was clear that there were many hidden costs of NoSQL. The Comeback That Never Left More recently, many point to a SQL comeback, although the irony is that it never left. In a piece last week on 9 enterprise tech trends for 2017 and beyond, InfoWorld Editor in Chief Eric Knorr notes on trend number 3: The incredible SQL comeback For a few years it seemed like all we did was talk about NoSQL databases like MongoDB or Cassandra. The flexible data modeling and scale-out advantages of these sizzling new solutions were stunning. But guess what? SQL has learned to scale out, too – that is, with products such as ClustrixDB, DeepSQL, SingleStore, and VoltDB, you can simply add commodity nodes rather than bulking up a database server. Plus, such cloud database-as-a-service offerings as Amazon Aurora and Google Cloud SQL make the scale-out problem moot. At the same time, NoSQL databases are bending over backward to offer SQL interoperability. The fact is, if you have a lot of data then you want to be able to analyze it, and the popular analytics tools (not to mention their users) still demand SQL. NoSQL in its crazy varieties still offers tremendous potential, but SQL shows no sign of fading. Everyone predicts some grand unification of SQL and NoSQL. No one knows what practical form that will take. Taking the ‘no’ out of NoSQL In an article, Who took the ‘no’ out of NoSQL?, Matt Asay writes, In the wake of the NoSQL boom, we’re seeing a great database convergence between old and new. Everybody wants to speak SQL because that’s where the primary body of skills reside, given decades of enterprise build-up around SQL queries. The article, interviewing a host of NoSQL specialists, reminds us of the false conventional wisdom that SQL doesn’t scale. Quoting a former MongoDB executive, Assay notes, But the biggest benefit of NoSQL, and the one that RDBMSes have failed to master, is its distributed architecture. The reality is that legacy vendors have had trouble applying scale to their relational databases. However, new companies using modern techniques have shown it is very possible to build scalable SQL systems with distributed architectures. SQL Reigns Supreme in Amazon Web Services There is no better bellwether for technology directions these days than Amazon Web Services. And the statistics shared by AWS tell the story. In 2015, Andy Jassy, CEO of Amazon Web Services, noted that the fastest growing service in AWS was the data warehouse offering Redshift, based on SQL. In 2016, he noted that the fastest growing service in AWS was the database offering Aurora, based on SQL. And one of the newest services, AWS Athena, delivers SQL on S3. This offering is conceptually similar to the wave of ‘SQL as a layer’ solutions developed by Hadoop purveyors so customers could have easy access to unstructured data in HFDS. Lo and behold there were simply not enough MapReduce experts to make sense of the data. AWS has recognized a similar analytics conundrum with S3 growth, which has been so strong it appears that objects stores are becoming the new data lakes. And what do you do when you have lots of data to examine and want to do so easily? You add SQL. SQL Not Fading Away Nick Heudecker, Research Director in the Data and Analytics group at Gartner, put his finger on it recently, Each week brings more SQL into the NoSQL market subsegment. The NoSQL term is less and less useful as a categorization. — Nick Heudecker (@nheudecker) November 8, 2016 Without a doubt the data management industry will continue to debate the wide array of approaches possible with today’s tools. But if we’ve learned one thing over the last 5 years, SQL never left, and it remains as entrenched and important as ever.
Read Post
Everything We’ve Known About Data Movement Has Been Wrong
Engineering

Everything We’ve Known About Data Movement Has Been Wrong

Data movement remains a perennial obstacle in systems design. Many talented architects and engineers spend significant amounts of time working on data movement, often in the form of batch Extract, Transform, and Load (ETL). In general, batch ETL is the process everyone loves to hate, or put another way, I’ve never met an engineer happy with their batch ETL setup. In this post, we’ll look at the shift from batch to real time, the new topologies required to keep up with data flows, and the messaging semantics required to be successful in the enterprise. The Trouble with Data Movement There is an adage in computing that the best operations are the ones you do not have to do. Such is the thinking with data movement. Less is more. Today a large portion of time spent on data movement still revolves around batch processes, with data transferred at a periodic basis between one system and the next. However, Gartner states, Familiar data integration patterns centered on physical data movement (bulk/batch data movement, for example) are no longer a sufficient solution for enabling a digital business. And this comical Twitter message reflects the growing disdain for a batch oriented approach… I hate batch processing so much that I won’t even use the dishwasher. I just wash, dry, and put away real time. — Ed Weissman (@edw519) November 6, 2015 There are a couple of ways to make batch processing go away. One involves moving to robust database systems that can process both transactions and analytics simultaneously, essentially eliminating the ETL process for applications. Another way to reduce the time spent on batch processing is to shift to real-time workflows. While this does not change the amount of data going through the system, moving from batch to real-time helps normalize compute cycles, mitigate traffic surges, and provides timely, fresh data to drive business value. If executed well this initiative can also reduce the time data engineers spend moving data. The Enterprise Streaming Opportunity Most of the discussion regarding streaming today is happening in areas like the Internet of Things, sensors, web logs and mobile applications. But that is really just the tip of the iceberg.
Read Post
How Kellogg Reduced 24-Hour ETL to Minutes and Boosted BI Speed by 20x
Case Studies

How Kellogg Reduced 24-Hour ETL to Minutes and Boosted BI Speed by 20x

Background About Kellogg Kellogg Company is the world’s leading cereal company, second largest producer of cookies, crackers, and savory snacks, and a leading North American frozen foods company. With 2015 sales of \$13.5 billion, Kellogg produces more than 1,600 foods across 21 countries and markets its many brands in 180 countries. Driving Revenue with Customer Logistics Data Kellogg relies on customer logistics data to make informed decisions and improve efficiencies around shopping experiences. Accuracy and speed of such data is directly tied to the profitability of Kellogg’s business. Leveraging In-Memory for Faster Access to Data Making data readily available to business users is top-of-mind for Kellogg, which is why the company sought an in-memory solution to improve data latency and concurrency. Starting with an initiative to speed access to customer logistics data, Kellogg turned to SingleStore to make its 24-hour ETL process faster. In an interview at Strata+Hadoop World, JR Cahill, Principal Architect for Global Analytics at Kellogg said: “We wanted to see how we could transform processes to make ourselves more efficient and start looking at things more intraday rather than weekly to make faster decisions.” Results Reducing Latency from 24-Hours to Minutes JR and team scaled up their SingleStore instance in AWS and within two weeks reduced the ETL process to an average of 43 minutes. On top of that, the team added three years of archiving into SingleStore, a feat not possible with their previous system, while maintaining an average ETL of 43 minutes.
Read Post
SingleStore Manages Smart Meter Data with Leading Gas and Electric Utility Enterprise
Case Studies

SingleStore Manages Smart Meter Data with Leading Gas and Electric Utility Enterprise

Smart gas and electric meters produce huge volumes of data. A small SingleStore cluster of 5 nodes easily handles massive quantities of data like the workloads from leading gas and electric utility enterprises.
Read Post
SingleStore, Tableau, and the Democratization of Data
Trending

SingleStore, Tableau, and the Democratization of Data

“We love fast databases. It makes the experience of interacting with your database that much more enjoyable.” – Tableau Today’s business decisions are about seconds, not minutes. To accommodate this trend, businesses have moved to evidence-backed decision making and widespread data access. Modern business intelligence tools abound, making it easier for the average analyst to create compelling visualizations. In this post, I’ll address how this new mode of thinking about data, the Democratization of Data, comes with two challenges – making data easily available and making it actionable in real time. Making Data Available Companies are migrating to a new model of data distribution – shared access to a centralized database with both historical data and real-time data. This is a far cry from the traditional approach of using many small database instances with stale data, isolated silos, and limited user access. Now, raw data is available to everyone. Employees are empowered to dive into the data, discover new opportunities, or close efficiency gaps in a way that has never been done before. The need for data now coupled with scalability has attracted many developers to in-memory, clustered databases. Making Data Actionable in Real Time Innovations in data visualization have produced powerful, usable tools that afford companies the opportunity to be data-driven. One tool we see embedded across different industries is Tableau. With its mature ecosystem and rich featureset, the business intelligence platform makes it easy for individuals to create compelling, interactive data visualizations. It is a very attractive package for different business levels because it does not require expertise or a degree in visual design or information systems. Any user can create meaningful, actionable dashboards providing views of the business from thirty thousand feet as well as at ground level. But even with a Tableau license in hand, users still face issues – the dashboards are slow or the data is stale. The problem often lies in the database layer. It is imperative that data is up-to-date to be relevant to today’s fast moving business operations. Common issues include:
Read Post
Five Talks for Building Faster Dashboards at Tableau Conference
Trending

Five Talks for Building Faster Dashboards at Tableau Conference

Tableau Conference 2016 kicks-off in Austin, Texas on November 7-11, offering data engineers and business intelligence pros a place to gather and learn how to utilize data to tell a story through analytics and visualizations. SingleStore will be exhibiting the native connectivity and high performance partnership with Tableau using the Tableau SingleStore connector at TC16. Additionally, SingleStore will present a new showcase application, SingleStore Springs: Real-Time Resort Demographic Analysis. This demonstration showcases live customer behavior by demographic across resort properties, visualized with a Tableau dashboard. Attendees will learn how to natively connect Tableau to SingleStore for enhanced dashboard performance and scalability by visiting the SingleStore booth.
Read Post
Getting to Exactly-Once Semantics with Apache Kafka and SingleStore Pipelines (Webcast On-Demand)
Product

Getting to Exactly-Once Semantics with Apache Kafka and SingleStore Pipelines (Webcast On-Demand)

The urgency for IT leaders to bring real-time analytics to their organizations is stronger than ever. For these organizations, the ability to start with fresh data and combine streaming, transactional, and analytical workloads in a single system can revolutionize their operations. When moving from batch to real time, data architects should carefully consider what type of streaming semantics will optimize their workload. The table below highlights the nuances among different types of streaming semantics. Understanding Streaming Semantics
Read Post
Real-Time Roadshow Rolls into Phoenix, Arizona
Trending

Real-Time Roadshow Rolls into Phoenix, Arizona

We’re packing our bags and heading to the Southwest to kick off the first ever SingleStore Real-Time Roadshow! Healthcare, education, aerospace, finance, technology, and other industries play a vital role in Phoenix, home to leading corporations like Honeywell, JP Morgan, AIG, American Express, Avnet, and UnitedHealth Group. Businesses in these industries face the constant challenge of keeping up with the high expectations of users and consumers that demand personalized and immediate services. To meet these challenges and elevate their businesses above the competition, industry leaders and data engineers in the Phoenix area embrace real-time applications as the solution. We’re bringing the Real-Time Roadshow to the capital of Arizona, to directly connect with this vibrant community of businesses and developers looking to pursue and learn more about real-time initiatives. Through a series of in-depth, technical sessions and demonstrations, this event provides an opportunity for data professionals and data leaders to investigate the power of real-time solutions. Here’s what you will learn: Forces driving the need for real-time workloadsHow to process and translate millions of data points into actionable insightsHow to drive new revenue and cut operating costs with real-time dataHow predictive analytics gives companies a competitive advantage to anticipate outcomesTop data architectures for real-time analytics and streaming applicationsUse cases and examples from companies building real-time applications Speaking Sessions Driving the On-Demand Economy with Predictive Analytics SingleStore CTO and co-founder Nikita Shamgunov demonstrates how a real-time trinity of technologies — Apache Kafka, Apache Spark, and SingleStore—enables companies to power their businesses with predictive analytics and real-time applications. Real-Time Analytics with SingleStore and Apache Spark SingleStore Engineer Neil Dahlke dives deep into how Pinterest measures real-time user engagement in this technical demonstration that leverages Spark to enrich streaming data with geolocation.
Read Post
Election 2016: Analyzing Real-Time Twitter Sentiment with SingleStore Pipelines
Data Intensity

Election 2016: Analyzing Real-Time Twitter Sentiment with SingleStore Pipelines

November is nearly upon us, with the spotlight on Election 2016. This election has been amplified by millions of digital touchpoints. In particular, Twitter has risen in popularity as a forum for voicing individual opinions as well as tracking statements directly from the candidates. Pew Research Center states that “In January 2016, 44% of U.S. adults reported having learned about the 2016 presidential election in the past week from social media, outpacing both local and national print newspapers.” The first 2016 Presidential debate “between Donald Trump and Hillary Clinton was the most-tweeted debate ever. All told, there were 17.1 million interactions on Twitter about the event.” By now, most people have probably seen both encouraging and deprecating tweets about two candidates: Hillary Clinton and Donald Trump. Twitter has become a real-time voice for the public watching along with debates and campaign announcements. We wanted to hone in on the sentiments expressed in real time. Using Apache Kafka, SingleStore, Machine Learning and our Pipelines Twitter Demo as a base, we are bringing real-time analytics to Election 2016.
Read Post
SingleStore Opens New Office in Second Tech Hub: Seattle, WA
Company

SingleStore Opens New Office in Second Tech Hub: Seattle, WA

Behind the scenes of the world’s leading companies in finance, retail, media, and energy, sits SingleStore – the operational data warehouse powering real-time data ingest and analytics. At SingleStore, hiring exceptional talent drives innovation in real-time technology and enables us to advance the state of the art in databases. We hire top engineers from prestigious universities such as MIT, Stanford, and Carnegie Mellon University, as well as companies like Facebook, Microsoft, Oracle and Google. These comprehensive academic and career accomplishments distinguish the SingleStore workforce and shape our hiring criteria for new talent. Today, we’re announcing the opening of a brand new SingleStore office in Seattle, Washington – part of a concerted effort to attract the best talent. The SingleStore team occupies the 26th floor of Smith Tower, a historical architectural landmark located at the heart of Pioneer Square – a vibrant neighborhood with a rich history. The office offers a breathtaking 360 degree view of the Puget Sound, Mt. Rainier, the Space Needle, and downtown Seattle.
Read Post
The Case for Advanced Analytics
Data Intensity

The Case for Advanced Analytics

Analytics is trending like never before. And with good reason. Organizations spanning multiple markets – from finance to energy, as well as retail, ecommerce, and media and communications – recognize the ability to quickly apply actionable insights using real-time analytics as a game-changer. Simply put, anticipating and adapting to rapidly evolving market dynamics as they occur is now a competitive imperative. How Advanced Analytics Drives Critical Success According to a recent Gartner report titled How Data Science Teams Leverage Advanced Analytics, “More than 50% of the surveyed users reported that senior executives see advanced analytics as critical to the organization’s success.” The report goes on to add, “Growing pains aside, more organizations are investigating potential applications for advanced analytics than ever before. Gartner’s inquiries on the topic nearly doubled from January 2014 to January 2015, and are projected to grow 26% year over year in 2016.”1 Choosing the Right Advanced Analytics Vendor
Read Post
A Flying Pig and the Zen of Database Proof of Concepts
Trending

A Flying Pig and the Zen of Database Proof of Concepts

A customer asks potential vendors – I need a pig that can fly. Whoever can get me one, wins the deal. Vendor 1 – The Engineer says “There is no such thing as a flying pig. Do not waste our time. We are not interested.” Vendor 2 – The Geneticist says “I am going to create a new species of pig – one with wings.” He goes to work on a flying pig. He never comes back. Vendor 3 – The Practical One says “Flying pig indeed! Yes, we can get you one.” Vendor 3 takes a drone, makes it look like a pig on the outside and flies it. The approach that Vendor 3 takes is a classic example of redefining the problem or finding a suitable workaround to solve an issue. Executing a database proof of concept has similar themes: There are no perfect databases and no perfect workloads either.Real-world scenarios are for the most part models that have been built over many years. You would be hard-pressed to find a good data model and well written queries.Ideally, you tweak the database to suit the workload. The alternative is attractive, but time-consuming and requires customer buy-in.Time is always short. Innovating workarounds to known limitations and focusing on strengths is important.Solutions that are realistic, simple, and effective work well for the majority. Do not let perfection become the enemy of the good. Winning a database proof of concept requires the following steps: Understand the data and the workload. By peeking into the contents, you gain insight into the actual business use case. By knowing the relations and basic thought process that went into building this model you are in a better position to make changes as needed. This is the hardest step and takes the most effort and time. However, the payoff is well worth the hard work  – winning the customer’s confidence. Load the data. This is by far the easiest part. As you load data, you perform requisites such as gathering stats, choosing the right partition strategy, and indexing. Execute the workload. It gets more interesting here. At this point, you know if your database engine can deliver out of the box or needs tweaks. If you followed step 1, you have the in-depth knowledge to solve problems or make alterations. Unfortunately, most of us who have been in the industry long enough, including myself, have biases and preconceived notions. These biases can hinder your ability to find creative solutions to problems. To quote Bruce Lee – “Empty your cup so that it may be filled; become devoid to gain totality.” An open mind makes us more willing to consider alternatives. By locking ourselves up, we limit our capabilities. Our preconceived limitations define us and box us in. Once you have executed the workload and identified how to meet your customer requirements, the next step is to package up the results and present it. Converting a Successful Proof of Concept to a Deal is the Next Challenge I have done enough proofs of concepts to realize that the winner is rarely the best engineered solution. Economics trump everything, which means cost-effective solutions that meet most of the customer requirements tend to win the deal. To summarize, the ability to innovate, adapt, and be flexible more or less wins the deal. On a closing note – Being a Star Trek fan, everytime I run into a pickle with a proof of concept, I think back to the Kobayashi Maru Training Exercise. From wikipedia (edited) “The Kobayashi Maru is a training exercise in the Star Trek universe designed to test the character of Starfleet Academy cadets in a no-win scenario. The test’s name is to describe a no-win scenario, a test of one’s character or a solution that involves redefining the problem.”
Read Post
SingleStore Pipelines: Real-Time Data Ingestion with Exactly-Once Semantics
Engineering

SingleStore Pipelines: Real-Time Data Ingestion with Exactly-Once Semantics

Today we launched SingleStoreDB Self-Managed 5.5 featuring SingleStore Pipelines, a new way to achieve maximum performance for real-time data ingestion at scale. This implementation enables exactly-once semantics when streaming from message brokers such as Apache Kafka. An end-to-end real-time analytics data platform requires real-time analytical queries and real-time ingestion. However, it is rare to find a data platform that satisfies both of these requirements. With the launch of SingleStore Pipelines as a native feature of our database, we now deliver an end-to-end solution from real-time ingest to analytics. Real-Time Analytical Queries and Data Ingestion Let’s define real-time analytical queries and real-time data ingestion separately. A data platform that supports real-time analytical queries quickly returns results for sophisticated analytical queries, which are usually written in SQL with lots of complex JOINs. Execution of real-time analytical queries differentiates SingleStore from competitors. In the past year, Gartner Research recognized SingleStore as the number one operational data warehouse, as well as awarded Visionary placements to the company Operational Database and Data Warehouse Magic Quadrants. A data platform that supports real-time ingestion can instantly store streaming data from sources like web traffic, sensors on machines, or edge devices. SingleStore Pipelines ingests data at scale in three steps. First, performantly pulling from data sources – Extract. Second, mapping and enriching the data – Transform. Finally, loading the data into SingleStore – Load. This all occurs within one database, or pipeline. The transactional nature of Pipelines sets it apart from other solutions. Streaming data is atomically committed in SingleStore, and exactly-once semantics are ensured by storing metadata about each pipeline in the database.
Read Post
SingleStore and Oracle: Better Together
Trending

SingleStore and Oracle: Better Together

Oracle OpenWorld 2016 kicks off on September 18th in San Francisco with ten tracks, including a Data Center track highlighting innovation in databases including SingleStore and Oracle. We built SingleStore to be a flexible ecosystem technology, as exemplified by several features. First, we offer users flexible deployments – whether it’s hybrid cloud, on-premises, VMs, or containers. Second, our connector tools are open source, such as SingleStore Spark Connector and Streamliner, which lets you build real-time pipelines and import from popular datastores like HDFS, S3, and MySQL. And SingleStore is a memory-first engine, designed for concurrent data ingest and analytics. These ingredients make SingleStore a perfect real-time addition to any stack. Several of our customers combine SingleStore with traditional systems, in particular Oracle databases. SingleStore and Oracle can be deployed side by side to enhance scalability, distributed processing, and real-time analytics. Three Ways SingleStore Complements Oracle SingleStore as the Real-Time Analytics Engine Data can be copied from Oracle to SingleStore using a data capture tool, and analytical queries can be performed in real time.
Read Post
MemEx: Predictive Analytics for Global Supply Chain Management
Data Intensity

MemEx: Predictive Analytics for Global Supply Chain Management

The Internet of Things (IoT) produces staggering amounts of data daily. Real-time analysis of this data helps businesses address the demands of consumers in today’s always-on economy. Supply chain management exemplifies IoT impact on manufacturing industries. With a multitude of moving parts like vehicles, shipping containers, and packages functioning as sources of data, companies need more advanced methods for ingesting and analyzing IoT data.
Read Post
The Path to Predictive Analytics and Machine Learning – Free O’REILLY Book
Data Intensity

The Path to Predictive Analytics and Machine Learning – Free O’REILLY Book

Organizations once waited hours, days, or even weeks to get a handle on their data. In an earlier era, that sufficed. But with today’s endless stream of zeros and ones, data must be usable right away. It’s the crux of decision making for enterprises competing in the modern era. Recognizing cross-industry interest in massive data ingest and analytics, we teamed up with O’Reilly Media on a new book: The Path to Predictive Analytics and Machine Learning. In this book, we share the latest step in the real-time analytics journey: predictive analytics, and a playbook for building applications that take advantage of machine learning. Free Download Here: The Path to Predictive Analytics and Machine Learning What’s Inside?
Read Post
Real-Time Analytics with Kafka and SingleStore
Data Intensity

Real-Time Analytics with Kafka and SingleStore

Connected devices, IoT, and on-demand user expectations push enterprises to deliver instant answers at scale. Applications that anticipate customer needs and fulfill expectations for fast, personalized services win the attention of consumers. Perceptive companies have taken note of these trends and are turning to memory-optimized technologies like Apache Kafka and SingleStore to power real-time analytics. High Speed Ingest Building real-time systems begins with capturing data at its source and using a high-throughput messaging system like Kafka. Taking advantage of a distributed architecture, Kafka is built to scale producers and consumers by simply adding servers to a given cluster. This effective use of memory, combined with commit log on disk, provides ideal performance for real-time pipelines and durability in the event of server failure. From there, data can be transformed and persisted to a database like SingleStore. Fast, Performant Data Storage SingleStore persists data from real-time streams coming from Kafka. By combining transactions and analytics in a memory-optimized system, data is rapidly ingested from Kafka, then persisted to SingleStore. Users can then build applications on top of SingleStore also supplies the application with the most recent data available. We teamed up with the folks at Confluent, the creators of Apache Kafka, to share best practices for architecting real-time systems at our latest meetup. The video recording and slides from that session are now available below. Meetup Video Recording: Real-Time Analytics with Confluent and SingleStore Watch now to: See a live demo of our new showcase application for modeling predictive analytics for global supply chain managementLearn how to architect systems for IoT streaming data ingestion and real-time analyticsLearn how to combine Kafka, Spark, and SingleStore for monitoring and optimizing global supply chain processes with real-time analytics Video
Read Post
Why Role-Based Access Control Is Essential to Database Security
Trending

Why Role-Based Access Control Is Essential to Database Security

As repositories of highly sensitive, confidential, and valuable business data, databases are the crown jewels of every organization. Successful businesses not only supply accurate and timely data, they must protect it as well. Security provides a critical competitive edge for any high functioning database. So database providers must prioritize protecting data in order to gain loyal customers who can trust the systems set in place to properly guard valuable information. In our latest enterprise release, SingleStoreDB Self-Managed 5.1, we added Role-Based Access Control (RBAC) as a powerful tool to protect customer data. With this new security feature, SingleStore customers can now easily scale users and roles to tens of thousands without compromising performance. RBAC provides high scalability and enhanced control over user access to data that perfectly suits intensive workloads like those generated by the Internet of Things. SingleStoreDB Self-Managed 5.1 brings enterprise level security with optimized scale of performance to real-time analytics to prove that customers should never have to sacrifice security for speed. Findings in the 2016 Verizon Data Breach Investigations Report underscore the case for RBAC as a robust shield against unauthorized access to secure data. Of the ten incident classification patterns cited in the report, privilege misuse ranks among the most common sources of data breaches along with web app attacks, denial-of-service, crime-ware, and cyber-espionage. “Incident Classification Patterns”
Read Post
SingleStore Geospatial Operations Perform 2x – 24x Faster Than Alternatives
Product

SingleStore Geospatial Operations Perform 2x – 24x Faster Than Alternatives

The mobile revolution, epitomized by the rise of GPS-enabled smartphones, is transforming our lives – the way we travel, connect with like-minded people, track our goods and shipments, manage traffic congestion, get threat alerts, and hunt for Pokemon.Many estimates place the number of smartphone subscribers at 2 billion, a number expected to grow to 6.1 billion by 2020. Each of these devices serves as a GPS-data emitting mobile sensor, providing geospatial data that can be used to track the movement of people, vehicles, and merchandise. Though this data presents rich opportunities for businesses to extract valuable insights and operational efficiencies, the sheer volume requires a modern scale-out analytic solution and technology that can process it without delay for timely actionable intelligence.SingleStore is designed from the ground up as a massively parallel scale-out solution for real-time analytics with built-in support for geospatial queries. While geospatial capabilities are also available through other solutions, only SingleStore offers them in the context of an ACID-compliant in-memory scale-out operational data warehousing solution.While SingleStore offers strong transactional semantics and ElasticSearch as a datastore offers no support for transactions at all, they both offer powerful geospatial analytics and it seems appropriate to benchmark SingleStore against ElasticSearch geolocation capabilities. In this analysis, we compare SingleStore geospatial query performance with ElasticSearch geolocation for tracking vehicles and creating alerts when the vehicles enter certain geofences. The alerts can be used to improve logistics and security, reduce congestion, and monitor vehicle fleets. We simulate an urban scenario with 10M-100M vehicles generating geospatial data points and 2000 geofences. The system is required to ingest updated geospatial data points from vehicles and identify vehicles that show up within specified geofences.SingleStore queries for creating schema for geofences and vehiclesCREATE REFERENCE TABLE IF NOT EXISTS locations (id integer primary key, name varchar(128), polygon geography DEFAULT NULL);CREATE TABLE IF NOT EXISTS records (id integer primary key, location geographypoint, key(location));SingleStore query for updating vehicles geolocationINSERT INTO records (id, location)VALUES (id, location pairs)ON DUPLICATE KEY UPDATE location = VALUES ( location);SingleStore query for checking intersection of vehicles with geofencesSELECT r.id, r.location, l.idFROM records r, locations lWHERE l.id = shape_id AND geography_intersects(r.location, l.polygon);ElasticSearch geolocation mapping (schema) for geofences and vehicles{ "locations": { "properties": { "name" : { "type":"string" }, "polygon":{ "type": "geo_shape", "tree": "quadtree", "precision": "1m" } } }}{ "driver":{ "dynamic": "strict", "properties":{ "location":{ "type": "geo_shape", "tree": "quadtree", "points_only":"true", "precision": "1m" } } }}ElasticSearch query for inserting vehicle geolocation{"location":{"type": "point","coordinates":[latitude,longitude]} }ElasticSearch query for checking intersection of vehicles with geofences{ "query": { "bool":{ "must":{ "match_all":{} }, "filter":{ "geo_shape":{ "location":{ "indexed_shape":{ "id":shape_id, "type":"locations", "index":"locations", "path": "polygon" } } } } } }}We ran SingleStoreDB Self-Managed 5.1 and ElasticSearch 2.3.5 on 1, 5, and 10 node clusters of m4.10xl instances on AWS. Our benchmarking results show that on 10 nodes, SingleStore is 24x faster than ElasticSearch for queries that update vehicle geolocations in terms of rows/second.
Read Post
The Changing Face of the Modern CIO
Trending

The Changing Face of the Modern CIO

In 1981 the role of Chief Information Officer (CIO) first breaks onto the scene. Today, thirty five years since genesis, the responsibilities of the CIO have radically changed. The original CIO served as senior executive in an enterprise and was responsible for the information technology and computer systems that supported enterprise goals. However, as today’s business needs rapidly change, so too does the role of a modern CIO. Today, CIOs must adapt or they will get left behind with legacy systems. The modern CIO is expected to take on multiple responsibilities, including: Management of platforms and systems such as data governance, mobility, cloud infrastructureInvesting in security, improved database speed and access, big data analytics, integrationIdentifying trends, threats, and partners that align with business goalsMaking sure a company’s data is clean, accessible, easy to understand, and secureHiring and developing talent CIOs now face many challenges, since IT plays an even more important role in core business strategy than it has in previous years. Management of systems with methods like data analysis and cloud infrastructure allow CIOs to operate with an agile development mentality and be more fluid when it comes to identifying and implementing new business operations as a result. While much of the responsibility of a CIO has shifted away from managing server farms in a closet to managing a cloud, hardware is just as important today with the emergence of the Internet of Things (IoT). CIOs can now utilize IoT to gather valuable data across an entire logistics operation. For example, sensors can be placed on shipping containers and vehicles to analyze trip data, which can lead to designing more efficient shipping routes, resulting in higher cost savings. Instead of being a back-office executive, CIOs must use their influence over new technologies to identify cost-saving opportunities or create additional revenue streams. CIOs, with their knowledge of modern technological trends, become responsible for maintaining their company’s competitive edge. The responsibilities of modern CIOs working in the forefront of technology and information systems are changing rapidly, and those who do not adapt will quickly fall behind. To learn more about the changing landscape of IT and to network with over 100 IT executives, join us at the HMG 2016 CIO Executive Leadership Summit in Denver, CO on September 1, 2016. The speakers for this event include CIOs pushing the boundaries of what’s possible: Rob Dravenstott from DISH Network, Stephen Gold from CVS Health, and Renee Arrington from Pearson Partners International, Inc. At the Summit, the SingleStore team is available to talk about analyzing real-time data to optimize business processes and create new revenue streams. See you there!
Read Post
The Emergence of Operational Data Warehouses
Data Intensity

The Emergence of Operational Data Warehouses

Last month, we announced that SingleStore received the highest score in the Gartner Critical Capabilities Report for the “Operational Data Warehouse Use Case”. While the findings Gartner shared are gratifying, this deserves a bit of a deeper dive. For starters, let’s examine how Gartner defines an Operational Data Warehouse: “This use case manages structured data that is loaded continuously in support of embedded analytics in applications, real-time data warehousing, and operational data stores. This use case primarily supports reporting and automated queries to support operational needs, and will require high-availability and disaster recovery capabilities to meet operational needs. Managing different types of users or workloads, such as ad hoc querying and mining, will be of less importance as the major driver is to meet operational excellence.” In light of these observations, we expect the adoption rates for Operational Data Warehouses to climb rapidly. Several megatrends point directly to a growing need for Operational Data Warehouses that take full advantage of real-time transaction processing and big data analytics in an in-memory optimized architecture. Take the explosive global sensor market that BCC Research predicts will reach \$154.4 billion by 2020. Image, flow, level, biosensor and chemical sensors will generate a data deluge of information across a wide range of parameters. Organizations that can efficiently analyze these data flows in real time to adapt to constantly changing market conditions will be at a distinct advantage over their competitors who do not possess this capability.
Read Post
What is BPF and why is it taking over Linux Performance Analysis?
Engineering

What is BPF and why is it taking over Linux Performance Analysis?

Performance analysis often gets bottlenecked by lack of visibility. At SingleStore, we architected our database to easily observe its inner workings. Observability allows our engineers to easily identify components that need to be faster. Faster components mean our database’s performance skyrockets. These tools also enable support engineers to react quickly and precisely to customer needs. In the spirit of using the best available tools to which we have access, the performance team is currently evaluating next-generation tools just recently available in Linux.The newest tool for observing the Linux operating system is the “Berkeley Packet Filter” (BPF). BPF allows users to run a small piece of code quickly and safely inside the operating system. Originally used for packet filtering, it has since been enhanced from its eponymous use-case to support dynamic tracing of the Linux operating system. For example, it is possible to write a small BPF program that prints every time a particular file was accessed by a user.The power of the Berkeley Packet Filter, when used with Userland Statically Defined Tracepoints (USDT), expands beyond the operating system to the database. USDT probes are well defined locations in the database where BPF programs run, allowing engineers to ask questions previously unanswerable. For example, engineers can now examine the interactions between the database and the operating system by running BPF programs in each at the same time.Adding a USDT static tracepoint is as easy as a single macro call, which declares the probe and its arguments. This probe fires when each query is executed and records the query string:`DTRACE_PROBE1(memsqld, querystart, query);`To use this USDT probe, we need to attach a BPF program to it. We write our program in C and use the BPF Compiler Collection to compile it to BPF and attach it to our probes. The following BPF script traces queries and records their latencies:BPF_HASH(pid_to_start_hash, u32);BPF_HISTOGRAM(latency);// This function runs each time a query begins. It records the current time stamp// (`start_ts`) and save it in the `pid_to_start_ht` hash tableint querystart(struct pt_regs *ctx){ u64 start_ts = bpf_ktime_get_ns(); u32 pid = bpf_get_current_pid_tgid(); pid_to_start_hash.update(&pid, &start_ts); return 0;}// This function runs at the end of each query. Look up the saved start timestamp// (`start_ts`) for the current thread’s id (pid) using the hash table// (`pid_to_start_hash`) and record the elapsed time (`delta_ms`) in the latency// histogram.int queryend(struct pt_regs *ctx){ u32 pid = bpf_get_current_pid_tgid(); u64 *start_tsp = pid_to_start_hash.lookup(&pid); // Edge case: this query began before we started tracing. if (!start_tsp) return 0; u64 delta_ms = (bpf_ktime_get_ns() - *start_tsp) / 1000 / 1000; // Take the log of the elapsed time to put into the logarithmic histogram. latency.increment(bpf_log2l(delta_ms)); // Make sure to delete values from the hash table when they are no longer needed. pid_to_start_hash.delete(&pid); return 0;}We run `query_latency.py`, a script that wraps the above BPF program using the BCC toolchain, and get a nice histogram of query latencies:$ sudo ./query_latency.py /var/lib/memsql/master-3306/memsqld --histogramTracing queries. ^C to exit.latency (ms): value : count distribution 0 -> 1 : 9 |****************************************| 2 -> 3 : 1 |**** | 4 -> 7 : 1 |**** | 8 -> 15 : 2 |******* | 16 -> 31 : 1 |**** | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 1 |**** |Once engineers have the ability to trace when a thread is executing a SingleStore query, they can ask more interesting questions about how the database interacts with Linux. For example, engineers can investigate and determine how long queries are spending acquiring locks. With BPF, engineers are able to instrument the start and end of the queries as above, and additionally instrument the futex system call itself (used in Linux to acquire and release locks) to trace how long it takes acquire locks while executing our query:futex latencies (ms) for 'select count(distinct sid_1g) where...' value : count distribution 0 -> 1 : 0 | | 2 -> 3 : 2 |**** | 4 -> 7 : 2 |**** | 8 -> 15 : 1 |** | 16 -> 31 : 5 |*********** | 32 -> 63 : 17 |****************************************|What about how a query spends its time? On- and off-CPU flamegraphs are helpful, but they are too coarse for query investigations. We instrumented the kernel scheduler tracepoints to conditionally collect information for threads that queries run on. This tracing tells us how long the thread of a query spends in various states (waiting, running, blocked, I/O, and sleeping).The power of BPF allows us to inspect our database at runtime and ask precise questions. Increased observability provided by BPF improves the rate of performance and optimizes customer interaction with the SingleStore database. Overall, BPF provides the observability necessary to build a transparent and easily accessible modern in-memory database.Access scripts, documentation, and additional reference information on BCC and BPF [here](https://github.com/memsql/memsql-perf-tools).After joining the SingleStore performance team this summer, Kyle Laracey will be returning to Brown University in the fall. At Brown, he studies computer science, is a teaching assistant for CS167: Operating Systems, and is expected to graduate in May 2017.
Read Post
Connect Directly to SingleStore with Tableau 10
Product

Connect Directly to SingleStore with Tableau 10

This post originally appeared in Tableau News by Arthur Gyldenege.
Read Post
New Performance Benchmark for Live Dashboards and Fast Updates
Engineering

New Performance Benchmark for Live Dashboards and Fast Updates

Newest Upsert Benchmark showcases critical use case for internet billing with telcos, ISPs, and CDNs SingleStore achieves 7.9 million upserts per second, 6x faster than Cassandra Benchmark details and scripts now available on GitHub The business need for fast updates and live dashboards Businesses want insights from their data and they want it sooner rather than later. For fast-changing data, companies must rapidly glean insights in order to make the right decisions. Industry applications like IoT telemetry monitoring, mobile network usage, internet service provider (ISP) billing, and content delivery network (CDN) usage tracking depend upon real-time analytics with fast-changing data. Web traffic merits special attention since it continues to grow at an astounding rate. According to Cisco, Global IP traffic will increase nearly threefold over the next 5 years, and will have increased nearly a hundredfold from 2005 to 2020. Overall, IP traffic will grow at a compound annual growth rate (CAGR) of 22 percent from 2015 to 2020. Many businesses face the challenge of monitoring, analyzing, and monetizing large scale web traffic, so we will explore this use case. Use case example In particular, we dive into the example of a content delivery or distribution network (CDN). A CDN is a globally distributed network of web servers deployed in multiple data centers across different geographic regions and is relied upon by content providers such as media companies and e-commerce vendors to deliver content to end users. CDNs have a business need to monitor their system in real-time. In addition to logging customer usage for the purpose of billing, they want to be alerted to sudden increases and decreases in their workloads for load balancing as well as for detecting network events like “denial of service attacks”. The sheer volume of web traffic mandates a massive parallel processing (MPP) system that can scale out to support the load. The concurrent need for real-time analytics points to the direction of hybrid transaction/analytical processing, or HTAP. HTAP systems enable high speed ingest and sophisticated analytics simultaneously without data movement or ETL. Background on the Upsert Benchmark This benchmark demonstrates the raw horsepower of a database system capturing high volume updates. Update, or upsert, is the operative word here. With a conventional `insert` a new row is created for each new database entry. With an upsert, individual rows can be updated in place. This upsert capability allows for a more efficient database table and faster aggregations, and it is particularly useful in areas such as internet billing. For more detail on this workload in use, take a look at this blog post, Turn Up the Volume With High-Speed Counters. SingleStore delivers efficient upsert performance, achieving up to 8 million upserts per second on a 10 node cluster, using the following parameterized query: Upsert query for SingleStore insert into records (customer_code, subcustomer_id, geographic_region, billing_flag, bytes, hits) values on duplicate key update bytes=bytes+VALUES(bytes),hits=hits+VALUES(hits); Comparing Upsert performance Legacy databases and data warehousing solutions are optimized for batch loading of data and subsequently are unable to handle fast data insertions along with ad-hoc analysis of freshly generated data. NoSQL databases like Cassandra can handle fast data insertions but have more challenges with upserts, which are critical for web traffic monitoring across end-customer behavior and tracking web requests. More importantly however, Cassandra does not provide native support for analytics and requires users to bring in additional components like SparkSQL in order to support meaningful querying of data. We created the following query for Cassandra: Upsert query for Cassandra `update perfdb.records set hits = hits + 1 where timestamp_of_data=1470169743185 and customer_code=25208 and subcustomer_id='ESKWUEYXUKRB' and geographic_region=10 and billing_flag=1 and ip_address='116.215.6.236';` The upsert benchmark is based on a simulated workload that logs web traffic across ten different geographic regions. SingleStoreDB Self-Managed 5.1 runs on a 10 node m4.10xlarge cluster on AWS, at \$2.394 per Hour (effective pricing with 1-year reserved instances), and is able to execute up to 8 million upserts per second and simultaneously run live queries on the latest data to provide a real-time window on the changing shape of traffic. Cassandra running on an identical cluster achieves 1.5 million upserts per second. We tested the most recent 3.0.8 version of Apache Cassandra. In the Cassandra query, update means upsert. As noted in the following chart, SingleStore scales linearly as we increase the number of machines with a batch size of 500. Cassandra however, does not appear to support large batch sizes well. According to Cassandra, # Caution should be taken on increasing the size of this threshold as it can lead to node instability. # Fail any batch exceeding this value. 50kb (10x warn threshold) by default. So we set `batch_size_fail_threshold_in_kb: 5000` to support a 10,000 row batch size, but we encountered numerous errors that prevented the benchmark from running on Cassandra with these settings.
Read Post
Massive Data Ingest and Concurrent Analytics with SingleStore
Engineering

Massive Data Ingest and Concurrent Analytics with SingleStore

The amount of data created in the past two years surpasses all of the data previously produced in human history. Even more shocking is that for all of that data produced, only 0.5% is being analyzed and used. In order to capitalize on data that exists today, businesses need the right tools to ingest and analyze data. At SingleStore, our mission is to do exactly that. We help enterprises operate in today’s real-time world by unlocking value from data instantaneously. The first step in achieving this is ingesting large volumes of data at incredible speed. The distributed nature of the SingleStore environment makes it easy to scale up to petabytes of data! Some customers use SingleStore to process 72TB of data a day, or over 6 million transactions per second, while others use it as a replacement for legacy data warehouse environments. SingleStore offers several key features for optimizing data ingest, as well as supporting concurrent analytics: High Throughput SingleStore enables high throughput on concurrent workloads. A distributed query optimizer evenly divides the processing workload to maximize the efficiency of CPU usage. Queries are compiled to machine code and cached to expedite subsequent executions. Rather than cache the results of the query, SingleStore caches a compiled query plan to provide the most efficient execution path. The compiled query plan does not pre-specify values for the parameters, which allows SingleStore to substitute the values upon request, enabling subsequent queries of the same structure to run quickly, even with different parameter values. Moreover, due to the use of Multi-Version Concurrency Control (MVCC) and lock-free data structures, data in SingleStore remains highly accessible, even amidst a high volume of concurrent reads and writes. Query Execution Architecture SingleStore has a two-tiered architecture consisting of aggregators and leaves. Aggregators act as load balancers or network proxies, through which SQL clients interact with the cluster. Aggregators store metadata about the machines in the cluster and the partitioning of the data. In contrast, leaves function as storage and compute nodes.
Read Post