Forrester Finds Millions in Savings and New Opportunities in Digital Transformation with SingleStore
Data Intensity

Forrester Finds Millions in Savings and New Opportunities in Digital Transformation with SingleStore

A new wave of digital transformation is in progress, and this new wave is powered by the exponential growth in the volume and complexity of data. To make data valuable it must be collected, stored, analyzed, and operationalized so as to drive value. Forrester has conducted a Total Economic Impact (TEI) analysis showing the savings and opportunities made possible for organizations moving to SingleStore. In order to put savings and benefits into context, Forrester conducted case studies with four SingleStore customers. These customers face many of the data infrastructure problems that prevent organizations from using their data effectively, including: Data in multiple silosStale dataComplex data architecturesScalability limited, with expensive and fragile efforts to manage scalabilityPoor performance for specific functions and across the board The result? Brittle, overly complex data processing systems which, in many cases, are starting to come crashing down. The Forrester TEI Methodology The four customers that Forrester studies all upgraded to SingleStore to solve specific problems, as described below. The four customers were in online services; professional services’ utilities; and online security services. Each customer has from one to several use cases for SingleStore running during the study period. Forrester then used their trademark TEI methodology, scaling the results to a representative composite organization with 15,000 employees and $3B in revenues. The results were impressive – $15M in cost savings and new benefits across several initiatives and new opportunities generated by improved flexibility, all within a three-year period.
Read Post
Video: Modernizing Data Infrastructure for AI and Machine Learning
Trending

Video: Modernizing Data Infrastructure for AI and Machine Learning

The AI Data Science Summit 2019 featured a keynote by SingleStore’s CEO, Nikita Shamgunov, where he was hosted by SingleStore partner Twingo. Nikita, a co-founder of SingleStore and the technical lead from the beginning, has shepherded SingleStore’s development toward a world where cloud, AI, and machine learning are leading trends in information technology. Now that these trends are becoming predominant, SingleStore is playing an increasing role, as Nikita discussed in his keynote. What follows is an abbreviated version of his presentation, which you can view in full here. – Ed. Today I want to talk about the demands of AI and machine learning data infrastructure. Certainly the promise is very big, right? I couldn’t be more excited about all the innovation that’s coming in retail, in health care, in transport and logistics.
Read Post
Forrester
SingleStore Recognized In

The Forrester WaveTM

Translytical Data
Platforms Q4 2022

“SingleStore’s Columnstore Blows All of the Free and Open Source Solutions Out of the Water” — Actual User
Data Intensity

“SingleStore’s Columnstore Blows All of the Free and Open Source Solutions Out of the Water” — Actual User

A columnstore database takes all the values in a given column – the zip code column in a customer database, for instance – and stores all the zip code values in a single row, with the column number as the first entry. So the start of a columnstore database’s ZIP code record might look like this: 5, 94063, 20474, 38654… The “5” at the beginning means that the ZIP code data is stored in the fifth column in the rowstore database of customer names and addresses that the original data comes from. Columnstore databases make it fast and easy to execute reporting and querying functions. For instance, you can easily count how many customers you have living in each US zip code – or combine your customer data with a zip code marketing database. SingleStore combines rowstore and columnstore data tables in a single, scalable, powerful database that features native SQL support. (See our blog post comparing rowstore and columnstore.) And, in addition to its fast-growing, paid enterprise offering, SingleStore also has a highly capable free option. You can use SingleStore for free, with community support from our busy message board. SingleStore is free up to the point where you reach four nodes, or four separate server instances, with up to 32GB of RAM each – 128GB of RAM total. This large, free capacity is particularly useful for columnstore tables, where 128GB of RAM is likely to be enough to support a terabyte or so of data on disk, with excellent performance. We have several existing customers doing important work using SingleStore for free. And when you need more nodes, or paid support, simply contact SingleStore to move to an enterprise license. Why Use Columnstore? The columnstore is used primarily for analytical applications where the queries mainly involve aggregations over datasets that are too large to fit in memory. In these cases, the columnstore performs much better than the rowstore. A column-oriented store, or “columnstore,” treats each column as a unit and stores segments of data for each column together in the same physical location. This enables two important capabilities. The first is to scan each column individually – in essence, being able to scan only the columns needed for the query, with good cache locality during the scan. These features of columnstore get you excellent performance and low resource utilization – an important factor in the cloud, particularly, where every additional operational step adds to your cloud services bill. The other capability is that columnstores lend themselves well to compression. For example, repeating and similar values can easily be compressed together. SingleStore compresses data up to about 90% in many cases, with very fast compression and decompression as needed. As with the data design of columnstore tables, compression delivers cache locality, excellent performance, low resource utilization, and cost savings. In summary, you should use a columnstore database if you need great analytics performance. It also helps that SingleStore, as a scalable SQL database with built-in support for the MySQL wire protocol, natively supports popular analytic tools like Tableau, Looker, and Zoomdata.
Read Post
Case Study: Kurtosys – Why Would I Store My Data In More Than One Database?
Case Studies

Case Study: Kurtosys – Why Would I Store My Data In More Than One Database?

One of SingleStore’s strengths is speeding up analytics, often replacing NoSQL databases to provide faster performance. Kurtosys, a market-leading, digital experience platform for the financial services industry, uses SingleStore exclusively, gaining far faster performance and easier management across transactions and analytics. Kurtosys is a leader in the digital experience category, with the first truly SaaS platform for the financial services industry. In pursuing its goals, Kurtosys became an early adopter of SingleStore. Today, SingleStore is helping to power Kurtosys’ growth. Stephen Perry, head of data at Kurtosys, summed up the first round of efforts in a blog post several years ago, titled "Why Would I Store My Data In More Than One Database?" (Among his accomplishments, Steve is one of the first SingleStore-certified developers.) In the following blog post, we describe how usage of SingleStore has progressed at Kurtosys. In the first round, Kurtosys had difficulties with its original platform using Couchbase. The company moved to SingleStore, achieving numerous benefits. Further customer requests, and the emergence of new features in SingleStore, opened the door for Kurtosys to create a new platform, which is used by Kurtosys’ customers to revolutionize the way they deliver outstanding digital and document experiences to their sales teams and to their external communities of clients and prospects. In this new platform, SingleStore is the database of record. At Kurtosys, Infrastructure Powers Growth Kurtosys has taken on a challenging task: hosting complex financial data, documents, websites, and content for the financial services industry. Kurtosys’ customers use the Kurtosys platform for their own customer data, as well as for their sales and marketing efforts. The customer list for Kurtosys features many top tier firms, including Bank of America, the Bank of Montreal, Generali Investments, and T. Rowe Price. Kurtosys’ customers require high performance and high levels of security. Customer focus on security is greater in financial services than in most other business segments. A single breach – even a potential breach that is reported, but never actually exploited – can cause severe financial and reputation damage to a company. So customers hold technology suppliers such as Kurtosys to very high standards. Alongside security, performance is another critical element. Financial services companies claim performance advantages to gain new customers, so suppliers have to deliver reliably and at top speed. Since financial services companies also differentiate themselves on customer service, they require suppliers to provide excellent customer service in turn. (Like Kurtosys, SingleStore is well-versed in these challenges. Financial services is one of our leading market segment, with half of the top 10 North America financial services firms being SingleStore customers.) With all of these strict requirements for financial services companies to trust an external provider to host their content – including such crucial content as customer financial data – it is a major step. Yet, Kurtosys has met the challenge and is growing quickly. “Our unique selling proposition is based around the creative use of new and unique technology,” says Steve. “We’ve progressed so far that our original internal platform with SingleStore, which we launched four years ago, is now a legacy product. Our current platform employs a very modern approach to storing data. We are using SingleStore as the primary database for the Kurtosys platform.” Kurtosys Chooses Infrastructure for Growth Kurtosys is adept at innovating its infrastructure to power services for demanding customers. For instance, several years ago, Kurtosys used SQL Server to execute transactions and Couchbase as a high-performance, scalable, read-only cache for analytics. Initially, the combination made sense. Customers of Kurtosys wanted to see the company executing transactions on a database that’s among a handful of well-established transactional databases. SQL Server fit the bill. However, like other traditional relational databases, SQL Server is, at its core, limited by its dependence on a single core update process. This dependency prevents SQL Server, and other traditional relational databases, from being able to scale out across multiple, affordable servers. This means that the single machine running SQL Server is usually fully occupied with transaction processing and would struggle to meet Kurtosys’ requirements, such as the need for ad hoc queries against both structured and semi-structured data. That left Kurtosys needing to copy data to another system (initially Couchbase) and run analytics off that – the usual logic for purchasing a data warehouse or an operational analytics database. Couchbase seemed to be a logical choice. It’s considered a leading NoSQL database, and is often compared to other well-known NoSQL offerings, such as Apache Cassandra, Apache HBase, CouchDB, MongoDB, and Redis. Couchbase tells its target audience that it offers developers the opportunity to “build brilliant customer experiences.” NoSQL databases have the ability to scale out that traditional relational databases lack. However, NoSQL databases face fundamental limitations in delivering on promises such as those made by Couchbase. NoSQL databases favor unstructured or less-structured data. As the name implies, they don’t support SQL. Users of these databases don’t benefit from decades of research and experience in performing complex operations on structured and, increasingly, semi-structured data using SQL. With no SQL support, Couchbase can be difficult to work with, and requires people to learn new skills. Running against unstructured data and semi-structured JSON data, and without the benefit of SQL, Kurtosys found it challenging to come up with an efficient query pattern that worked across different data sets. Kurtosys Moves to SingleStore to Power Fast Analytics As a big data database, Couchbase works well for data scientists running analytics projects. However, for day in and day out analytics use, Kurtosys had difficulty with writing queries, and query performance was subpar. Couchbase was not as well suited for the workloads and high degree of concurrency – that is, large numbers of simultaneous users – required for Kurtosys’ internal user and customer analytics support, including ad hoc SQL queries, business intelligence tools, and app support. At the same time, Kurtosys needed to stay on SQL Server for transactions. Kurtosys had invested a lot in SQL Server-specific stored procedures. Its customers also liked the fact that Kurtosys uses one of the top few best-known relational databases for transactions. So, after much research, Kurtosys selected a fully distributed database which, at the time, ran in-memory: SingleStore. Because SingleStore is also a true relational database, and supports the MySQL wire protocol, Kurtosys was able to use the change data capture (CDC) process built into SQL Server to keep SingleStore’s copy of the data up to date. SingleStore received updates a few seconds after each transaction completed in SQL Server. Queries then ran against SingleStore, allowing both updates and queries to run fast against the respective databases.
Read Post
Webinar: The Benchmark Breakthrough Using SingleStore
Data Intensity

Webinar: The Benchmark Breakthrough Using SingleStore

SingleStore has reached a benchmarking breakthrough: the ability to run three very different database benchmarks, fast, on a single, scalable database. The leading transactions benchmark, TPC-C, and analytics benchmarks, TPC-H and TPC-DS, don’t usually run on the same scale-out database at all. But SingleStore runs transactional and analytical workloads simultaneously, on the same data, and with excellent performance. As we describe in this webinar write-up, our benchmarking breakthrough demonstrates this unusual, and valuable, set of capabilities. You can also read a detailed description of the benchmarks and view the recorded webinar. SingleStore stands out because it is a relational database, with native SQL support – like legacy relational databases – but also fully distributed, horizontally scalable simply by adding additional servers, like NoSQL databases. This kind of capability – called NewSQL, translytical, HTAP, or HOAP – is becoming more and more highly valued for its power and flexibility. It’s especially useful for a new category of workloads called operational analytics, where live, up-to-date data is streamed into a data store to drive real-time decision-making.
Read Post
Managing SingleStore with Kubernetes
Product

Managing SingleStore with Kubernetes

With the arrival of the cloud, organizations face new opportunities – and new challenges. Chief among them is how to take the greatest advantage of public and private cloud resources without being locked into a specific cloud or being barred from access to existing infrastructure. Container solutions such as Docker offer part of the solution, making it much easier to develop, deploy, and manage software. In our webinar, product manager Micah Bahti describes how to take advantage of the next step: Using Kubernetes, and the beta SingleStore Kubernetes Operator, to manage containers across public clouds and existing infrastructure. Until recently, however, Kubernetes didn’t manage stateful services. Recently, that support has been added, and SingleStore has stepped into a leading position. Ahead of other widely used database platforms, SingleStore has developed and made available a beta Kubernetes Operator. The Operator was announced at Red Hat Summit early in May. You can easily get the Operator, for either Open Shift or open source Kubernetes. Note: The SingleStore Kubernetes operator is currently experimental, and in beta. It will reach general availability in the coming months. You can also use the beta Operator with SingleStore on small deployments, including free instances of SingleStore. It scales smoothly to large databases as well; SingleStore scales to databases in the petabytes. Deploying and Installing Kubernetes for SingleStore Deploying and installing Kubernetes for SingleStore is very similar to using Kubernetes with other, stateless software. First, find the needed components. They’re available in the OpenShift container catalog and on Docker Hub.
Read Post
Query Processing Improvements in SingleStoreDB Self-Managed 6.8
Data Intensity

Query Processing Improvements in SingleStoreDB Self-Managed 6.8

In this blog post, I’ll focus on new query processing capabilities in SingleStoreDB Self-Managed 6.8. The marquee query feature is just-in-time (JIT) compilation, which speeds up query runtimes on the first run of a query – now turned on by default. We have also improved performance of certain right and left outer joins and related operations, and Rollup and Cube. In addition, we add convenience features, including sub-select without an alias, and extended Oracle compatibility for date and time handling functions. Finally, new array functions for splitting strings and converting JSON data are added. Other improvements in 6.8 are covered elsewhere. These include: secured HDFS pipelinesimproved pipelines performanceLOAD DATA null column handling extensionsinformation schema and management views enhancements Now, let’s examine how just in time queries can work in a database. Speeding up First Query Runtimes SingleStore compiles queries to machine code, which allows us to get amazing performance, particularly when querying our in-memory rowstore tables. By spending a bit more time compiling than most databases – which interpret all queries, not compiling them – we get high performance during execution. This works great for repetitive query workloads, such as real-time dashboards with a fixed set of queries and transactional applications. But our customers have been asking for better performance the first time a query is run, which is especially applicable for ad hoc queries – when slower performance can be especially noticeable. In SingleStoreDB Self-Managed 6.7, we first documented a JIT feature for SQL queries, enabled by running ‘set interpretermode = interpret_first’. Under this setting, SingleStore starts out interpreting a query, compiles its operators in the background, then dynamically switches from interpretation to execution of compiled code for the query _as the query runs the first time. The interpret_first setting was classified as experimental in 6.7, and was off by default. In 6.8, we’re pleased to say that interpret_first is now fully supported and is on by default. This setting can greatly improve the user’s experience running ad hoc queries, or when using any application that causes a lot of new SQL statements to be run, as when a user explores data through a graphical interface. The interpret_first setting can speed up the first run of a large and complex query – say, a query with more than seven joins – several times by reducing compile overhead, with no loss of performance on longer-running queries for their first run. Rollup and Cube Performance Improvements Cube and Rollup operator performance has been improved in SingleStoreDB Self-Managed 6.8 by pushing more work to the leaf nodes. In prior releases, Cube and Rollup were done on the aggregator, requiring more data to be gathered from the leaves to the aggregator, which can take more time. For example, consider the following query from the Cube and Rollup documentation: SELECT state, product_id, SUM(quantity) FROM sales GROUP BY CUBE(state, product_id) ORDER BY state, product_id; The graphical query plan for this in 6.8, obtained using SingleStore Studio, is the following:
Read Post
HDFS Pipelines Supporting Kerberos and Wire Encryption in SingleStoreDB Self-Managed 6.8
Product

HDFS Pipelines Supporting Kerberos and Wire Encryption in SingleStoreDB Self-Managed 6.8

Many companies, including our customers, have invested heavily in Hadoop infrastructure. In a recent blog post, we explored the topic of hype when it came to enterprises deploying Hadoop across their organizations, and ultimately where Hadoop falls short for certain use cases. Using SingleStore, many of our customers have been able to augment Hadoop using our HDFS pipelines feature, allowing them to quickly ingest from the Hadoop Distributed File System (HDFS) and perform analysis of their data in real time. With SingleStoreDB Self-Managed 6.8, we are happy to announce our support for Kerberos and wire encryption for HDFS pipelines. Kerberos is a widely used method for authenticating users, including users of Hadoop clusters. Similarly, wire encryption protects data as it moves through Hadoop. Combining Kerberos and wire encryption in Hadoop is the standard in enterprises demanding the highest level of security. In SingleStoreDB Self-Managed 6.8, with the release of Kerberos and wire encryption for HDFS, customers now get comprehensive security through full standards-based authentication and over-the-wire data delivery.
Read Post
SingleStore Delivers Top Performance on Industry Benchmarks with Latest Releases
Product

SingleStore Delivers Top Performance on Industry Benchmarks with Latest Releases

SingleStore has spent the last six years on a single mission: developing the fastest, most capable database on the market for a new generation of workloads. Today’s businesses are beginning to win or lose on their ability to use data to create competitive advantage, and technology teams at these companies need data infrastructure that can meet an increasingly broad set of requirements, perform well at scale, and fit easily with existing processes and tools. SingleStoreDB Self-Managed 6.8 is the latest release of the database that meets this challenge. Whether accelerating and simplifying the use of data to improve customer experiences, or accelerating analytics to drive better decision-making and optimize operations, both legacy databases and the current sprawl of specialty tools are failing for many data professionals. Today, we are proud to announce that we have reached an amazing milestone. SingleStore is the first database with true shared-nothing scalability, enabling essentially unlimited scale, to provide outstanding results on performance tests derived from three leading industry benchmarks: TPC-C, TPC-H, and TPC-DS. These results are now possible with our latest software releases. We are announcing the general availability of SingleStoreDB Self-Managed 6.8 and the first beta release of SingleStoreDB Self-Managed 7.0. SingleStoreDB Self-Managed 6.8 offers even faster analytics performance and advanced security for Hadoop environments. The SingleStoreDB Self-Managed 7.0 beta previews new, mission-critical transactional capabilities for system of record applications and even more dramatic query performance improvements.
Read Post
We Spent a Bunch of Money on AWS And All We Got Was a Bunch of Experience and Some Great Benchmark Results
Product

We Spent a Bunch of Money on AWS And All We Got Was a Bunch of Experience and Some Great Benchmark Results

Editor’s note: Running these benchmarks and documenting the results is truly a team effort. In addition to John Sherwood, Eric Hanson and Nick Kline authored this blog post. If you’re anything like us, you probably roll your eyes at “company benchmarks its own software, writes a post about it.” This of course raises a question: if we’re cynical industry veterans, numb from the constant deluge of benchmarketing … why are we writing this? Simple: we wanted to prove to ourselves that we’re the only modern, scalable database that can do a great job on the three well-known database benchmarks, TPC-C, TPC-H, and TPC-DS, which cover both OLTP and data warehousing. Now, we hope to get you to suspend your skepticism long enough for us to prove our capabilities to you. SingleStore’s primary focus is on what we call operational analytics (you might have heard this referred to as HTAP, translytical, or HOAP), running analytical queries across large, constantly changing datasets with consistently high performance. This performance is provided through the use of scale-out, compilation of queries to machine code, vectorized query execution, and use of single instruction, multiple data (SIMD) instructions. Our operational analytics capability blurs OLTP and data warehouse (DW) functionality, and different use cases take advantage of different slices of our spectrum of capabilities. To test some of the new features in our upcoming 7.0 release, we used multiple TPC benchmarks to push SingleStore beyond what our normal everyday stress tests can reach. Our results show that we can do both transaction processing and data warehousing well, and we scale well as workload size increases. No other scale-out database running on industry-standard hardware can do this. Playing Fair TPC benchmarks have a certification process, and we did our best to follow the specifications, but we did not work with TPC to certify the benchmarks, so these are informal results. But we hope it goes without saying that we didn’t cheat. In fact, SingleStore does not have any “benchmark specials” (features designed just to make the benchmark result better, but that nobody would ever use for a real application) built into the code. The TL;DR This is a long post because we wanted to write an in-depth overview of how we ran the benchmarks so you can see how we achieved all these results. But if you want just a quick summary, here’s how we did on the benchmarks. TPC-C: SingleStore scaled performance nearly linearly over a scale factor of 50x.TPC-H: SingleStore’s performance against other databases varied on this benchmark, but was faster than multiple modern scale-out database products that only support data warehousing.TPC-DS: SingleStore’s performance ranged from similar, to as much as two times faster than other databases. Expressed as a geometric mean, as is often done for benchmarking results, our performance was excellent. What we accomplished was solid performance across three distinct benchmarks, establishing SingleStore as a top choice database for operational analytics and cloud-native applications. Now let’s move on to the details of the benchmarks. TPC-C: Scale-out OLTP On the massively transactional side of the spectrum, we have the TPC-C benchmark. To quote the 1997 SIGMOD presentation announcing it, a benchmark is a “distillation of the essential attributes of a workload,” and TPC-C distills the absolute hell out of a sharded transactional workflow. As the scale (defined by the number of warehouse facilities being emulated) increases, the potential parallelism increases as well. From our perspective, this is ideal for discovering any bottlenecks in our own code. Our TPC-C database design used single tables for the large data sets, allowing us to use an existing driver targeting MySQL. Unlike some other official results published by major database vendors, we did not artificially split the data into multiple separate tables to reduce locking and contention. The way we ran the benchmark is much simpler, and shows how our scale-out architecture can make application development easier. While experimenting with smaller datasets, we quickly discovered that driving significant load required us to nearly match the CPU of the aggregators with the leaves in the cluster. This configuration, strong all over (“very Schwarzenegger”, as one of our engineers put it) is quite unusual; it’s rare that customers require such a ratio of aggregators to leaves. In part, our internal benchmarking service colocates drivers on aggregators, which required us to size up boxes to have the extra CPU. Additionally, under the pure OLTP TPC-C workload, aggregators are primarily coordinating transaction state and are effectively concurrency bound. As we would recommend for a real workload of this nature, we had redundancy and synchronous replication enabled. This means that there are two copies of every data partition on two different nodes for high availability (HA), and transaction commits are not acknowledged to the client until both the primary and secondary replicas are updated. This of course requires twice the memory compared to running without HA, and adds to transaction overhead, since row updates must be applied on the replica node before the commit. This cost is to be expected if HA with fast failover is required. If you want HA, you need to pay. After experimenting with smaller datasets, we drove just over 800,000 transactions per minute (tpmC) against 10,000 warehouses, using a cluster with 18 leaves and 18 aggregators, all r3.8xlarge instances. This led us to scale up, moving onto TPC-C at the 50,000 scale. This required what we would term a formidable cluster; storing a single replica of the data (including indexes) required more than 9 TB of RAM. For HA, we stored two copies of the data, which, after some fairly straightforward math, took us to almost 20 TB. We decided to use r5.metal instances, using 2x the leaf instances, for a total of 6x the cores compared to the 10,000 warehouse cluster. Notably, the only change we made in running the benchmark itself was in tuning the partition count and hash index buckets; from the perspective of the open source benchmark driver we were using, originally written for MySQL, everything just worked. The last notable thing about our 50,000 warehouse run was that we ran out of hardware from AWS while trying to reach the limit of the cluster; as we had taken 36 hosts for leaves and 21 for aggregators, we suppose that’s pretty reasonable. With 1,728 physical cores on the leaf nodes for the megacluster (to use the scientific term), it’s hard to argue with calling this “throwing hardware at the problem.” However, as discussed above, we were able to easily meet the same per-core performance as at smaller scales. We feel that our overall results are a strong validation of the performance improvements to our new intra-cluster replication implementation in SingleStoreDB Self-Managed 7.0.
Read Post
Mapping and Reducing the State of Hadoop
Data Intensity

Mapping and Reducing the State of Hadoop

In this blog post, part one of a two-part series, we look at the state of Hadoop from a macro perspective. In the second part of this series, we will look at how Hadoop and SingleStore can work together to solve many of the problems described here. 2008 was a big year for Apache Hadoop. It appeared organizations had finally found the panacea for working with exploding quantities of data with the rise of mobile and desktop web. Yahoo launched the world’s largest Apache Hadoop production application. They also won the “terabyte sort” benchmark, sorting a terabyte of data in 209 seconds. Apache Pig – a language that makes it easier to query Hadoop clusters – and Apache Hive – a SQL-ish language for Hadoop – were actively being developed, by Yahoo and Facebook respectively. Cloudera, now the biggest software and services company for Apache Hadoop, was also founded. Data sizes were exploding with the continued rise of web and mobile traffic, pushing existing data infrastructure to its absolute limits. As a result, the term “big data” was coined around this time too. Then came Hadoop, promising to all organizations to answer any questions you have with your data. The promise: You simply need to collect all your data in one location and run it on free Apache Hadoop software, using cheap scalable commodity hardware. Hadoop also introduced the concept of the Hadoop Distributed File System (HDFS), allowing data to be spanned over many disks and servers. Not only is the data stored, but it’s also replicated 2 – 3 times across servers, ensuring no data loss even when a server goes down. Another benefit to using Hadoop is that there is no limit to the sizes of files stored in HDFS, so you can continuously append data to the files, as in the case of server logs. Facebook claimed to have the largest Hadoop cluster in the world, at 21 petabytes of storage, in 2010. By 2017, more than half of the Fortune 50 companies were using Hadoop. Cloudera and Hortonworks became multi-billion dollar public companies. For an open source project that had only begun in 2006, Hadoop became a household name in the tech industry in the span of under a decade. The only direction is up for Hadoop, right? However, many industry veterans and experts are saying Hadoop perhaps isn’t the panacea for big data problems that it’s been hyped up to be. Just last year in 2018, Cloudera and Hortonworks announced their merger. The CEO of Cloudera announced an optimistic message about the future of Hadoop, but many in the industry disagree. “I can’t find any innovation benefits to customers in this merger,” said John Schroder, CEO and Chairman of the Board at MapR. “It is entirely about cost cutting and rationalization. This means their customers will suffer.” Ashish Thusoo, the CEO Of Qubole, also has a grim outlook on Hadoop in general — “the market is evolving away from the Hadoop vendors – who haven’t been able to fulfill their promise to customers”. While Hadoop promised the world a single data store for all of your data, in cheap and scalable commodity hardware, the reality of operationalizing that data was not so easy. Speaking with a number of data experts at SingleStore, reading articles by industry experts, and reviewing surveys from Gartner, we noticed a number of things that are slowing Hadoop growth and deployment within existing enterprises. The data shows that the rocketship growth of Hadoop had been partly driven by fear of being left behind, especially by technology executives – the ones who overwhelmingly initiate Hadoop adoption, with 68% of adoption initiated within the C-suite, according to Gartner. We will also explore limitations to Hadoop in various use cases especially in this ever-changing enterprise data industry. Let’s dive in. Hype In a Gartner survey released in 2015, the research firm says that an important point to look at with Hadoop adoption is the low number of Hadoop users in an organization, which gives indication that “Hadoop is failing to live up to its promise.” Gartner say that hype and market pressure were among the main reasons for interest in Hadoop. This is not a surprise to many, as it’s hard to avoid hearing Hadoop and big data in the same sentence. Gartner offers the following piece of advice for the C-suite interested in deploying Hadoop: “CEOs, CIOs and CTOs (either singularly or due to pressure from their boards) may feel they are being left behind by their competitors, based on press and hype about Hadoop or big data in general. Being fresh to their roles, the new chief of innovation and data may feel pressured into taking some form of action. Adopting Hadoop, arguably ‘the tallest tree in the big data forest’, provides the opportunity.” The survey warns to not adopt Hadoop because of the fear of being left behind — Hadoop adoption remains still at an early adopter phase, with skills and successes still rare. A concrete piece of advice from Gartner is to start with small projects backed by business stakeholders to see if Hadoop is helpful in addressing core business problems. Using small deployments initially will allow an organization to develop skills and develop a record of success before tackling larger projects. Skills Shortage When using Hadoop for analytics, you lose the familiar benefits of SQL. According to the same survey cited above, it appears that around 70% of organizations have relatively few Hadoop developers and users. The low number of Hadoop users per organization is attributed to Hadoop innately being unsuitable for large simultaneous numbers of users. This also indicates difficulty in hiring Hadoop developers attributed to skill shortage. Which leads to our next point — cost. Cost Two facts about Apache Hadoop: It’s free to use. Forever.You can use cheap commodity hardware. But Hadoop is still very expensive. Why? While Hadoop may have a cheap upfront cost in software use and hosting, everything after that is anything but cheap. As explained before, to make Hadoop work for more than just engineers, there need to be multiple abstraction layers on top. Having additional copies of the data for Hive, Presto, Spark, Impala, etc, means additional cost in hardware, maintenance, and operations. Adding layers on top also means requiring additional operations and engineering work to maintain the infrastructure. While Hadoop may seem cheap in terms of upfront cost, the costs for maintenance, hosting, storage, operations, and analysis make it anything but. Easy to Get Data In, but Very Tough to Get Data Out Getting data into Hadoop is very easy, but it turns out, getting data out and deriving meaningful insight to your data is very tough. A person working on data stored in Hadoop – usually an engineer, not an analyst – is expected to have at least some knowledge of HDFS, MapReduce, and Java. One also needs to learn how to set up the Hadoop infrastructure, which is another major project in itself. Speaking with relevant industry people that have formerly deployed Hadoop or work closely with organizations that use Hadoop, this is the biggest pain point of the technology — how hard it is to run analytics on Hadoop data. Many technologies have been built to tackle the complexities of Hadoop, such as Spark (data processing engine), Pig (data flow language), and Hive (a SQL-like extension on top of Hadoop). These extra layers add more complexity to an already-complex data infrastructure. This usually means more potential points of failure. Hiring Software Engineers is Expensive An assortment of software skills are needed to make Hadoop work. If it’s used with no abstraction layer, such as Hive or Impala, on top, querying Hadoop needs to be done in MapReduce, which is written in Java. Working in Java means hiring software engineers rather than being able to hire analysts which are proficient in SQL. Software engineers with Hadoop skills are expensive, with an average salary in the U.S. at \$84,000 a year (not including bonuses, benefits, etc). In a survey by Gartner, it’s stated that “obtaining the necessary skills and capabilities [is] the largest challenge for Hadoop (57%).” Your engineering team is likely the most expensive, constrained, and tough-to-hire-for resource in your organization. When you adopt Hadoop, you then require engineers for a job that an analyst proficient in SQL could otherwise do. On top of the Hadoop infrastructure and abstraction layers you need to more easily get data out, you now need to account for the engineering resources needed. This is not cheap at all. Businesses Want Answers NOW As businesses are going international, and customers are demanding instant responsiveness around the clock, companies are pushed to become real-time enterprises. Whether this is to derive real-time insights into product usage, live identification of financial fraud, providing customer dashboards that show investment returns in milliseconds, not hours, or understanding ad spend results on an up-to-the-second basis, waiting for queries to Map and Reduce simply no longer serves the immediate business need. It remains true that Hadoop is incredible for crunching through large sets of data, as that is its core strength — in batch processing. There are ways to augment Hadoop’s real-time decision abilities, such as using Kafka streams. But in this case, what’s meant to be real-time processing slows down to micro batching. Spark streaming is another way to speed up Hadoop, but it has its own limitations. Finally, Apache projects like Storm are also micro-batching, so they are nowhere near real time. Another point to consider is that, the above technologies mentioned are another piece of complexity added to an already-complex data infrastructure. Adding multiple layers between Hadoop and SQL-based analytic tools also means slow response, multiplied cost, and additional maintenance required. In short, Hadoop is not optimized for real-time decision making. This means it may not be well-suited to the evolving information demands of businesses in the 21st century. In this, part one of this two-part series on Hadoop, we talked about the rise of Hadoop, why it looked like the solution to organizations’ big data problems, and where it fell short. In the next part of this series, we will explore why combining Hadoop with SingleStore may help businesses that are already invested in Hadoop.
Read Post
Introducing the SingleStore Kubernetes Operator
Product

Introducing the SingleStore Kubernetes Operator

Kubernetes has taken the world by storm, transforming how applications are developed, deployed, and maintained. For a time, managing stateful services with Kubernetes was difficult, but that has changed dramatically with recent innovations in the community. Building on that work, SingleStore is pleased to announce the availability of our SingleStore Kubernetes Operator, and our certification by Red Hat to run on the popular OpenShift container management platform. Kubernetes has quickly become one of the top three most-loved platforms by developers. Now, with the SingleStore Kubernetes Operator, technology professionals have an easy way to deploy and manage an enterprise-grade operational database with just a few commands. Note: The SingleStore Kubernetes operator is currently experimental, and in beta. It will reach general availability in the coming months. The new beta Operator is certified by Red Hat to run SingleStore software on Red Hat OpenShift, or you can run it with any Kubernetes distribution you choose. Running SingleStore on Kubernetes gives data professionals the highest level of deployment flexibility across hybrid, multi-cloud, or on-premises environments. As Julio Tapia, director of the Cloud Platforms Partners Ecosystem for Red Hat, put it in our press release, services in a Kubernetes-native infrastructure “‘just work’ across any cloud where Kubernetes runs.” As a cloud-native database, SingleStore is a natural fit for Kubernetes. SingleStore is a fully distributed database, deploys and scales instantly, and is configured quickly and easily using the native SingleStore API. SingleStore customers have requested the Kubernetes Operator, and several participated in testing prior to this release. The majority of SingleStore customers today deploy SingleStore on one or more public cloud providers. Now, with the Kubernetes Operator, they can deploy on any public or private infrastructure more easily.
Read Post
Case Study: Customer Saves $60K per Month on Move from AWS RDS and Druid.io to SingleStore
Data Intensity

Case Study: Customer Saves $60K per Month on Move from AWS RDS and Druid.io to SingleStore

A SingleStore customer previously ran their business on two databases: the Amazon Web Services Relational Database Service (AWS RDS) for transactions and Druid.io for analytics, with a total bill which reached over \$93,000 a month. They moved both functions to SingleStore, and their savings have been dramatic – about two-thirds of the total cost. The customer’s new monthly cost is about $31,000, for a savings of $62,000 a month, or 66%. In addition, the customer gains increased performance, greater concurrency, and easier database management. Future projects can also claim these benefits, giving the customer lower costs for adding new features to their services and greater strategic flexibility.
Read Post
How to Use SingleStore with Intel’s Optane Persistent Memory
Product

How to Use SingleStore with Intel’s Optane Persistent Memory

Intel’s new Optane DC persistent memory adds a new performance option for SingleStore users. After careful analysis, we’ve identified one area in which SingleStore customers and others can solve a potentially urgent problem using Optane today, and we describe that opportunity in this blog post. We also point out other areas where SingleStore customers and others should keep an eye on Optane-related developments for the future. If you need broader information than what’s offered here, there are many sources for more comprehensive information about Optane and what it can do for you, beginning with Intel itself.
Read Post
SingleStore Offers Streaming Systems Download from O’Reilly
Data Intensity

SingleStore Offers Streaming Systems Download from O’Reilly

More and more, SingleStore is used to help add streaming characteristics to existing systems, and to build new systems that feature streaming data from end to end. Our new ebook excerpt from O’Reilly introduces the basics of streaming systems. You can then read on – in the full ebook and here on the SingleStore blog – to learn about how you can make streaming part of all your projects, existing and new. Streaming has been largely defined by three technologies – one that’s old, one that’s newer, and one that’s out-and-out new. Streaming Systems covers the waterfront thoroughly. Originally, Tyler Akidau, one of the book’s authors, wrote two very popular blog posts: Streaming 101: The World Beyond Batch, and Streaming 102, both on the O’Reilly site. The popularity of the blog posts led to the popular O’Reilly book, Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing. In the excerpt that we offer here, you will see a solid definition of streaming and how it works with different kinds of data. The authors address the role of streaming in the entire data processing lifecycle with admirable thoroughness. They also describe the major concerns you’ll face when working with streaming data. One of these is the difference between the order in which data is received and the order in which processing on it is completed. Reducing such disparities as much as possible is a major topic in streaming systems.
Read Post
The Need for Operational Analytics
Data Intensity

The Need for Operational Analytics

The proliferation of streaming analytics and instant decisions to power dynamic applications, as well as the rise of predictive analytics, machine learning, and operationalized artificial intelligence, have introduced a requirement for a new type of database workload: operational analytics. The two worlds of transactions and analytics, set apart from each other, are a relic of a time before data became an organization’s most valuable asset. Operational analytics is a new set of database requirements and system demands that are integral to achieving competitive advantage for the modern enterprise. This new approach was called for by Gartner as a Top 10 technology for 2019, under the name “continuous analytics.” Delivering operational analytics at scale is the key to real-time dashboards, predictive analytics, machine learning, and enhanced customer experiences which differentiate digital transformation leaders from the followers. However, companies are struggling to build these new solutions because existing legacy database architectures cannot meet the demands placed on them. The existing data infrastructure cannot scale to the load put on it, and it doesn’t natively handle all the new sources of data. The separation of technologies between the transactional and analytic technologies results in hard tradeoffs that leave solutions lacking in operational capability, analytics performance, or both. There have been many attempts in the NoSQL space to bridge the gap, but all have fallen short of meeting the needs of this new workload. Operational analytics enables businesses to leverage data to enhance productivity, expand customer and partner engagement, and support orders of magnitude more simultaneous users. But these requirements demand a new breed of database software that goes beyond the legacy architecture. The industry calls these systems by several names: hybrid transaction and analytics processing (HTAP) from Gartner; hybrid operational/analytics processing (HOAP) from 451 Research; and translytical from Forrester. Consulting firms typically use the term we have chosen here, operational analytics, and CapGemini has even established a full operational analytics consultancy practice around it. The Emergence of Operational Analytics Operational Analytics has emerged alongside the existing workloads of Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP). I outlined the requirements of those workloads in this previous blog entry. To summarize, OLTP requires data lookups, transactionality, availability, reliability, and scalability. Whereas OLAP requires support for running complex queries very fast, large data sets, and batch ingest of large amounts of data. The OLTP and OLAP based systems served us well for a long time. But over the last few years things have changed. Decisions should not have to wait for the data It is no longer acceptable to wait for the next quarter, or week, or even day to get the data needed to make a business decision. Companies are increasingly online all the time; “down for maintenance” and “business hours” are quickly becoming a thing of the past. Companies that have a streaming real-time data flow have a significant edge over their competitors. Existing legacy analytics systems were simply not designed to work like this. Companies must become insight driven This means that, instead of a handful of analysts querying the data, you have hundreds or thousands of employees hammering your analytics systems every day in order to make informed decisions about the business. In addition, there will be automated systems – ML/AI and others – also running queries to get the current state of the world to feed their algorithms. The existing legacy analytics systems were simply not designed for this kind of usage. Companies must act on insights to improve customer experience Companies want to expose their data to their customers and partners. This improves the customer experience and potentially adds net new capabilities. For example, a cable company tracks users as they try to set up their cable modems so they can proactively reach out if they see there is a problem. This requires a system that can analyze and react in real time. Another example is an electronics company that sells smart TVs and wants to expose which shows customers are watching to its advertisers. This dramatically increases the number of users trying to access its analytics systems. In addition, the expectations of availability and reliability are much higher for customers and partners. So you need a system that can deliver an operational service level agreement (SLA). Since your partners don’t work in your company, it means you are exposing the content outside the corporate firewall, so strong security is a must. The existing legacy analytics systems were simply not designed for this kind of usage. Data is coming from many new sources and in many types and formats The amount of data being collected is growing tremendously. Not only is it being collected from operational systems within the company; data is also coming from edge devices. The explosion of IoT devices, such as oil drills, smart meters, household appliances, factory machinery, etc., are the key contributors to the growth. All this data needs to be fed into the analytics system. This leads to an increased complexity in the types of data sources (such as Kafka, Spark, etc…) as well as data types and formats (geospatial, JSON, AVRO, Parquet, raw text, etc.) and throughput requirements for ingest of the data. Again, the existing legacy analytics systems were simply not designed for this kind of usage. The Rise of Operational Analytics These changes have given rise to a new database workload, operational analytics. The short description of operational analytics is an analytical workload that needs an operational SLA. Now let’s unpack what that looks like. Operational Analytics as a Database Workload Operational analytics primarily describes analytical workloads. So the query shapes and complexity are similar to OLAP queries. In addition, the data sets for operational analytics are just as large as OLAP data sets, although often it is the most recent data that is more important. (This is usually a fraction of the total data set.) Data loading is similar to OLAP workloads, in that data comes from an external source and is loaded independent of the applications or dashboards that are running the queries. But this is where the similarities end. Operational analytics has several characteristics that set it apart from pure OLAP workloads. Specifically, the speed of data ingestion, scaling for concurrency, availability and reliability, and speed of query response. Operational analytics workloads require an SLA on how fast the data needs to be available. Sometimes this is measured in seconds or minutes, which means the data infrastructure must allow streaming the data in constantly, while still allowing queries to be run. Sometimes this means there’s a window of time (usually a single-digit number of hours) during all the data must be ingested. As data sets grow, the existing data warehouse (DW) technologies have had trouble loading the data within the time window (and certainly don’t allow streaming). Data engineers often have to do complex tricks to continue meeting data loading SLAs with existing DW technologies. Data also has to be loaded from a larger set of data sources than in the past. It used to be that data was batch-loaded from an operational system during non-business hours. Now data comes in from many different systems. In addition, data can flow from various IoT devices far afield from the company data center. The data gets routed through various types of technologies (in-memory queues like Kafka, processing engines like Spark, etc.). Operational analytics workloads need to easily handle ingesting from these disparate data sources. Operational analytics workloads also need to scale to large numbers of concurrent queries. With the drive towards being data driven and exposing data to customers and partners, the number of concurrent users (also queries) in the system has increased dramatically. In an OLAP workload, five to ten queries at a time was the norm. Operational analytics workloads often must be able to handle high tens, hundreds, or even thousands of concurrent queries. As in an OLTP workload, availability and reliability are also key requirements. Because these systems are now exposed to customers or partners, the SLA required is a lot stricter than for internal employees. Customers expect a 99.9% or better uptime and they expect the system to behave reliably. They are also less tolerant of planned maintenance windows. So the data infrastructure backing these systems needs to have support for high availability, with the ability to handle hardware and other types of failure. Maintenance operations (such as upgrading the system software or rebalancing data) need to become transparent, online operations that are not noticeable to the users of the system. In addition, the system should self-heal when a problem occurs, rather than waiting for an operator to get alerted to an issue and respond. Strong durability is important as well. This is because even though data that is lost could be reloaded, the reloading may cause the system to break the availability SLA. The ability to retrieve the data you are looking for very quickly is the hallmark feature of database systems. Getting access to the right data quickly is a huge competitive advantage. Whether it is internal users trying to get insights into the business, or you are presenting analytics results to a customer, the expectation is that the data they need is available instantly. The speed of the query needs to be maintained regardless of the load on the system. It doesn’t matter if there is a peak number of users online, the data size has expanded, or there are failures in the system. Customers expect you to meet their expectations on every query with no excuses. This requires a solid distributed query processor that can pick the right plan to answer any query and get it right every time. It means the algorithms used must scale smoothly with the system as it grows in every dimension. Supporting Operational Analytics Use Cases with SingleStore SingleStore was built to address these requirements in a single converged system. SingleStore is a distributed relational database that supports ANSI SQL. It has a shared-nothing, scale-out architecture that runs well on industry standard hardware. This allows SingleStore to scale in a linear fashion simply by adding machines to a cluster. SingleStore supports all the analytical SQL language features you would find in a standard OLAP system, such as joins, group by, aggregates, etc. It has its own extensibility mechanism so you can add stored procedures and functions to meet your application requirements. SingleStore also supports the key features of an OLTP system: transactions, high availability, self-healing, online operations, and robust security. It has two storage subsystems: an on-disk column store that gives you the advantage of compression and extremely fast aggregate queries, as well as an in-memory row store that supports fast point queries, aggregates, indices, and more. The two table types can be mixed in one database to get the optimal design for your workload. Finally, SingleStore has a native data ingestion feature, called Pipelines, that allows you to easily and very quickly ingest data from a variety of data sources (such as Kafka, AWS S3, Azure Blob, and HDFS). All these capabilities offered in a single integrated system add up to making it the best data infrastructure for an operational analytics workload, bar none. Describing the workload in general terms is a bit abstract, so let’s dig into some of the specific use cases where operational analytics is the most useful. Portfolio Analytics One of the most common use cases we see in financial services is portfolio analytics. Multiple SingleStore customers have written financial portfolio management and analysis systems that are designed to provide premium services to elite users. These elite users can be private banking customers with high net worth or fund managers who control a large number of assets. They will have large portfolios with hundreds or thousands of positions. They want to be able to analyze their portfolio in real-time, with graphical displays that are refreshed instantaneously as they filter, sort, or change views in the application. The superb performance of SingleStore allows sub-second refresh of the entire screen with real-time data, including multiple tables and charts, even for large portfolios. These systems also need to scale to hundreds or thousands of users concurrently hitting the system, especially when the market is volatile. Lastly, they need to bring in the freshest market data, without compromising the ability to deliver the strict latency SLAs for their query response times. They need to do all of this securely without violating relevant compliance requirements nor the trust of their users. High availability and reliability are key requirements, because the market won’t wait. SingleStore is the ideal data infrastructure for this operational analytics use case as it solves the key requirements of fast data ingest, high scale concurrent user access, and fast query response. Predictive Maintenance Another common use case we see is predictive maintenance. Customers who have services or devices that are running continuously want to know as quickly as possible if there is a problem. This is a common scenario for media companies that do streaming video. They want to know if there is a problem with the quality of the streaming so they can fix it, ideally before the user notices the degradation. This use case also comes up in the energy industry. Energy companies have devices (such as oil drills, wind turbines, etc.) in remote locations. Tracking the health of those devices and making adjustments can extend their lifetime and save millions of dollars in labor and equipment to replace them. The key requirements are the ability to stream the data about the device or service, analyze the data – often using a form of ML that leverages complex queries – and then send an alert if the results show any issues that need to be addressed. The data infrastructure needs to be online 24/7 to ensure there is no delay in identifying these issues. Personalization A third use case is personalization. Personalization is about customizing the experience for a customer. This use case pops in a number of different verticals, such as a user visiting a retail web site, playing a game in an online arcade, or even visiting a brick and mortar store. The ability to see a user’s activity and, more importantly, learn what is attractive to them, gives you the information to meet their needs more effectively and efficiently. One of SingleStore’s customers is a gaming company. They stream information about the user’s activity in the games, process the results against a model in SingleStore, and use the results to offer the user discounts for new games and other in-app purchases. Another example is a popular music delivery service that uses SingleStore to analyze usage of the service to optimize ad spend. The size of data and the number of employees using the system made it challenging to deliver the data in a timely way to the organization and allow them to query the data interactively. SingleStore significantly improved their ability to ingest and process the data and allowed their users to get a dramatic speedup in their query response times. Summary Operational analytics is a new workload that encompasses the operational requirements of an OLTP workload – data lookups, transactionality, availability, reliability, and scalability – as well as the analytical requirements of an OLAP workload – large data sets and fast queries. Coupled with the new requirements of high user concurrency and fast ingestion, the operational analytics workload is tough to support with a legacy database architecture or by cobbling together a series of disparate tools. As businesses continue along their digital transformation journey they are finding more and more of their workloads fit this pattern and are searching for new modern data infrastructure, like SingleStore, that has the performance and scale capabilities to handle them.
Read Post
Webinar: How Kafka and SingleStore Deliver Intelligent Real-Time Applications
Data Intensity

Webinar: How Kafka and SingleStore Deliver Intelligent Real-Time Applications

Using Apache Kafka and SingleStore together makes it much easier to create and deliver intelligent, real-time applications. In a live webinar, which you can view here, SingleStore’s Alec Powell discussed the value that Kafka and SingleStore each bring to the table, shows reference architectures for solving common data management problems, and demonstrates how to implement real-time data pipelines with Kafka and SingleStore. Kafka is an open source messaging queue that works on a publish-subscribe model. It’s distributed (like SingleStore) and durable. Kafka can serve as a source of truth for data across your organization. What Kafka Does for Enterprise IT Today, enterprise IT is held back by a few easy to identify, but seemingly hard to remedy, factors: Slow data loadingLengthy query executionLimited user access
Read Post
Case Study: How Customers are Using SingleStore for Free
Product

Case Study: How Customers are Using SingleStore for Free

On November 6, 2018, we made our then-new production product, SingleStoreDB Self-Managed 6.7, free to use, up to certain limits, described below. To date, we’ve had more than 4,500 sign-ups for this version, with numerous individuals and companies using SingleStore to do some amazing things. To quickly recap, with SingleStore any customer can sign up and start using our full featured product, with all enterprise capabilities, including high availability and security, for free. In talking to customers about their experience and what they’ve built, feedback has been astounding, with folks telling us they can’t believe what we’re giving away for free. We originally stated that what could be done for free with SingleStore, legacy database companies would charge you up to \$100,000 a year. Now, we want to tell you what companies are actually doing with SingleStore. Culture Amp Culture Amp is an employee feedback platform and was looking for a way to improve data-driven decision making with internal reporting tools. The company’s initial solution had low adoption due to poor performance, mostly due to a slow database engine. Scott Arbeitman, analytics and data engineering lead at Culture Amp, started investigating a better database to power its reports. “Trying SingleStore for free was a no-brainer,” according to Scott. The results were outstanding. Tests running the company’s new data model on SingleStore versus the previous database saw an improvement of more than 28x improvement in speed. This speed-up increased reporting usage, made everyone more productive, and helped Culture Amp incorporate more data into the decision-making process. Nikkei Nikkei is a large media corporation based in Japan that shares news about Asia in English. Nikkei needed a better way to get real-time analytics on the readers coming to its website and its widely used mobile app. Having better reader data allows Nikkei to understand what articles are resonating with readers, and what type of ads to show readers. With its previous database, Nikkei was only able to get reader analytics 45 minutes after someone was on the site. Now, with SingleStore, Nikkei is able to get analytics on readers in about 200 milliseconds. That’s an improvement of 13,500 times – four orders of magnitude – in the time until data is available. This allows Nikkei to actually respond to their site visitors’ activities in real time. Because SingleStore is compatible with MySQL, Nikkei is easily able to integrate the data collected into SingleStore with its other databases. The company is getting all of this performance improvement and flexibility for free. How to Use SingleStore for Free If you missed the original announcement, here is a quick recap on what you get when using SingleStore for free: You can create four leaf nodes, with up to eight virtual CPUs per node, and up to 32GB of RAM.You also create aggregator nodes freely; we recommend at least two, for redundancy.You can create as many rowstore tables, stored entirely in-memory, as you like. If your database is entirely made up of rowstore tables, and you have 128GB of RAM in your leaf nodes, that’s the total database size limit.You can also create as many columnstore tables, stored on disk, as you like. A mostly-columnstore database might comfortably reach 1TB in size.Free use of the product includes community support. For dedicated professional support, or more than four leaf nodes, you’ll need a paid license. Response to the free tier has been positive. Most experimental, proof of concept, and in-house-only projects run easily within a four-(leaf)-node configuration, and don’t require professional support. For projects that move to production, both dedicated professional support and the ability to add more nodes – in particular, for paired development and production environments, and for robust high availability implementations – make sense, and are enabled by an easy switch to our enterprise offering. The case study snapshots for Culture Amp and Nikkei are good examples of what our customers have accomplished while using SingleStore for free. It’s always fun sharing the benefits our customers achieve with SingleStore versus other databases, but we get even more excited showing these performance increases when they’re achieved using the free option. We consider all users of SingleStore to be our customers, whether you’re starting with us for the first time for free or you’ve been with SingleStore from the early days. These are just some examples of the cool work happening with free use of SingleStore. To get performance improvements similar to Culture Amp and Nikkei, all you have to do is download SingleStore.
Read Post
Pre-Modern Databases: OLTP, OLAP, and NoSQL
Data Intensity

Pre-Modern Databases: OLTP, OLAP, and NoSQL

In this blog post, the first in a two-part series, I’m going to describe pre-modern databases: traditional relational databases, which support SQL but don’t scale out, and NoSQL databases, which scale out but don’t support SQL. In the next part, I’m going to talk about modern databases – which scale out, and which do support SQL – and how they are well suited for an important new workload: operational analytics. In the Beginning: OLTP Online transaction processing (OLTP) emerged several decades ago as a way to enable database customers to create an interactive experience for users, powered by back-end systems. Prior to the existence of OLTP, a customer would perform an activity. Only at some point, relatively far off in the future, would the back-end system be updated to reflect the activity. If the activity caused a problem, that problem would not be known for several hours, or even a day or more, after the problematic activity or activities. The classic example (and one of the main drivers for the emergence of the OLTP pattern) was the ATM. Prior to the arrival of ATMs, a customer would go to the bank counter to withdraw or deposit money. Back-end systems, either paper or computer, would be updated at the end of the day. This was slow, inefficient, error prone, and did not allow for a real-time understanding of the current state. For instance, a customer might withdraw more money than they had in their account. With the arrival of ATMs, around 1970, a customer could self-serve the cash withdrawal, deposits, or other transactions. The customer moved from nine to five access to 24/7 access. ATMs also allowed a customer to understand in real time what the state of their account was. With these new features, the requirements for the backend systems became a lot more complex. Specifically data lookups, transactionality, availability, reliability, and scalability – the latter being more and more important as customers demanded access to their information and money from any point on the globe. The data access pattern for OLTP is to retrieve a small set of data, usually by doing a lookup on an ID. For example, the account information for a given customer ID. The system also must be able to write back a small amount of information based on the given ID. So the system needs the ability to do fast lookups, fast point inserts, and fast updates or deletes. Transaction support is arguably the most important characteristic that OLTP offers, as reflected in the name itself. A database transaction means a set of actions that are either all completed, or none of them are completed; there is no middle ground. For example, an ATM has to guarantee that it either gave the customer the money and debited their account, or did not give the customer money and did not debit their account. Only giving the customer money, but not debiting their account, harms the bank; only debiting the account, but not giving the customer money, harms the customer. Note that doing neither of the actions – not giving the money, and not debiting the account – is an unhappy customer experience, but still leaves the system in a consistent state. This is why the notion of a database transaction is so powerful. It guarantees the atomicity of a set of actions, where atomicity means that related actions happen, or don’t happen, as a unit. Reliability is another key requirement. ATMs need to be always available, so customers can use one at any time. Uptime for the ATM is critical, even overcoming hardware or software failures, without human intervention. The system needs to be reliable because the interface is with the end customer and banks win on how well they deliver a positive customer experience. If the ATM fails every few times a customer tries it, the customer will get annoyed and switch to another bank. Scalability is also a key requirement. Banks have millions of customers, and they will have tens of thousands of people hitting the back-end system at any given time. But the usage is not uniform. There are peak times when a lot more people hit the system. For example, Friday is a common payday for companies. That means many customers will all be using the system around the same time to check on the balance and withdraw money. They will be seriously inconvenienced – and very unimpressed – if one, or some, or all of the ATMs go down at that point. So banks need to scale to hundreds of thousands of users hitting the system concurrently on Friday afternoons. Hard to predict, one-off events, such as a hurricane or an earthquake, are among other examples that can also cause peaks. The worst case is often the one you didn’t see coming, so you need a very high level of resiliency even without having planned for the specific event that ends up occurring. These requirements for the OLTP workload show up in many other use cases, such as retail transactions, billing, enterprise resource planning (widely known as ERP), customer relationship management (CRM), and just about any application where an end user is reviewing and manipulating data they have access to and where they expect to see the results of those changes immediately. The existing legacy database systems were founded to solve these use cases over the last few decades, and they do a very good job of it, for the most part. The market for OLTP-oriented database software is in the tens of billions of dollars a year. However, with the rise of the Internet, and more and more transactional systems being built for orders of magnitude more people, legacy database systems have fallen behind in scaling to the level needed by modern applications. The lack of scale out also makes it difficult for OLTP databases to handle analytical queries while successfully, reliably, and quickly running transactions. In addition, they lack the key technologies to perform the analytical queries efficiently. This has contributed to the need for separate, analytics-oriented databases, as described in the next section. A key limitation is that OLTP databases have typically run on a single computing node. This means that the transactions that are the core of an OLTP database can only happen at the speed and volume dictated by the single system at the center of operations. In an IT world that is increasingly about scaling out – spreading operations across arbitrary numbers of servers – this has proven to be a very serious flaw indeed. OLAP Emerges to Complement OLTP After the evolution of OLTP, the other major pattern that has emerged is OLAP. OLAP emerged a bit after OLTP, as enterprises realized they needed fast and flexible access to the data stored in their OLTP systems. OLTP system owners could, of course, directly query the OLTP system itself. However, OLTP systems were busy with transactions – any analytics use beyond the occasional query threatened to bog the OLTP systems down, limited to a single node as they were. And the OLAP queries quickly became important enough to have their own performance demands. Analytics use would tax the resources of the OLTP system. Since the availability and reliability of the OLTP system were so important, it wasn’t safe to have just anyone running queries that might use up resources to any extent which would jeopardize the availability and reliability of the OLTP system. In addition, people found that the kinds of analytics they wanted to do worked better with a different schema for the data than was optimal for the OLTP system. So they started copying the data over into another system, often called a data warehouse or a data mart. As part of the copying process, they would change the database schema to be optimal for the analytics queries they needed to do. At first, OLTP databases worked reasonably well for analytics needs (as long as they ran analytics on a different server than the main OLTP workload). The legacy OLTP vendors included features such as grouping and aggregation in the SQL language to enable more complex analytics. However, the requirements of the analytics systems were different enough that a new breed of technology emerged that could satisfy analytics needs better, with features such as column-storage and read-only scale-out. Thus, the modern data warehouse was born. The requirements for a data warehouse were the ability to run complex queries very fast; the ability to scale to handle large data sets (orders of magnitude larger than the original data from the OLTP system); and the ability to ingest large amounts of data in batches, from OLTP systems and other sources. Query Patterns Unlike the OLTP data access patterns that were relatively simple, the query patterns for analytics are a lot more complicated. Trying to answer a question such as, “Show me the sales of product X, grouped by region and sales team, over the last two quarters,” requires a query that uses more complex functions and joins between multiple data sets. These kinds of operations tend to work on aggregates of data records, grouping them across a large amount of data. Even though the result might be a small amount of data, the query has to scan a large amount of data to get to it. Picking the right query plan to optimally fetch the data from disk requires a query optimizer. Query optimization has evolved into a specialty niche within the realm of computer science; there are only a small number of people in the world with deep expertise in it. This specialization is key to the performance of database queries, especially in the face of large data sets. Building a really good query optimizer and query execution system in a distributed database system is hard. It requires a number of sophisticated components including statistics, cardinality estimation, plan space search, the right storage structures, fast query execution operators, intelligent shuffle, both broadcast and point-to-point data transmission, and more. Each of these components can take months or years of skilled developer effort to create, and more months and years to fine-tune. Scaling Datasets for data warehouses can get quite big. This is because you are not just storing a copy of the current transactional data, but taking a snapshot of the state periodically and storing each snapshot going back in time. Businesses often have a requirement to go back months, or even years, to understand how the business was doing previously and to look for trends. So while operational data sets range from a few gigabytes (GBs) to a few terabytes (TBs), a data warehouse ranges from hundreds of GBs to hundreds of TBs. For the raw data in the biggest systems, data sets can reach petabytes (PBs). For example, imagine a bank that is storing the transactions for every customer account. The operational system just has to store the current balance for the account. But the analytics system needs to record every transaction in that account, going back for years. As the systems grew into the multiple TBs, and into the PB range, it was a struggle to get enough computing and storage power into a single box to handle the load required. As a result, a modern data warehouse needs to be able to scale out to store and manage the data. Scaling out a data warehouse is easier than scaling an OLTP system. This is because scaling queries is easier than scaling changes – inserts, updates, and deletes. You don’t need as much sophistication in your distributed transaction manager to maintain consistency. But the query processing needs to be aware of the fact that data is distributed over many machines, and it needs to have access to specific information about how the data is stored. Because building a distributed query processor is not easy, there have been only a few companies who have succeeded at doing this well. Getting the Data In Another big difference is how data is put into a data warehouse. In an OLTP system, data is entered by a user through interaction with the application. With a data warehouse, by contrast, data comes from other systems programmatically. Often, it arrives in batches and at off-peak times. The timing is chosen so that the work of sending data does not interfere with the availability of the OLTP system where the data is coming from. Because the data is moved programmatically by data engineers, you don’t need the database platform to enforce constraints on the data to keep it consistent. Because it comes in batches, you want an API that can load large amounts of data quickly. (Many data warehouses have specialized APIs for this purpose.) Lastly, the data warehouse is not typically available for queries during data loading. Historically, this process worked well for most businesses. For example, in a bank, customers would carry out transactions against the OLTP system, and the results could be batched and periodically pushed into the analytics system. Since statements were only sent out once a month, it didn’t matter if it took a couple of days before the data made it over to the analytics system. So the result is a data warehouse that is queryable by a small number of data analysts. The analysts run a small number of complex queries during the day, and the system is offline for queries while loading data during the night. The availability and reliability requirements are lower than an OLTP system because it is not as big a deal if your analysts are offline. You don’t need transactions of the type supported by the OLTP system, because the data loading is controlled by your internal process. The NoSQL Work Around For more information on this topic, read our previous blog post: Thank You for Your Help, NoSQL, But We Got It from Here. As the world “goes digital,” the amount of information available increases exponentially. In addition, the number of OLTP systems has increased dramatically, as has the number of users consuming them. The growth in data size and in the number of people who want to take advantage of the data has outstripped the capabilities of legacy databases to manage. As scale-out patterns have permeated more and more areas within the application tier, developers have started looking for scale-out alternatives for their data infrastructure. In addition, the separation of OLTP and OLAP has meant that a lot of time, energy, and money go into extracting, transforming, and loading data – widely known as the ETL process – between the OLTP and OLAP sides of the house. ETL is a huge problem. Companies spend billions of dollars on people and technology to keep the data moving. In addition to the cost, the consequence of ETL is that users are guaranteed to be working on stale data, with the newest data up to a day old. With the crazy growth in the amount of data – and in demand for different ways of looking at the data – the OLAP systems fall further and further behind. One of my favorite quotes, from a data engineer at a large tech company facing this problem, is: “We deliver yesterday’s insights, tomorrow!”. NoSQL came along promising an end to all this. NoSQL offered: Scalability. NoSQL systems offered a scale-out model that broke through the limits of the legacy database systems.No schema. NoSQL abandoned schema for unstructured and semi-structured formats, abandoning the rigid data typing and input checking that make database management challenging.Big data support. Massive processing power for large data sets. All of this, though, came at several costs: No schema, no SQL. The lack of schema meant that SQL support was not only lacking from the get-go, but hard to achieve. Moreover, NoSQL application code is so intertwined with the organization of the data that application evolution becomes difficult. In other words, NoSQL systems lack the data independence found in SQL systems.No transactions. It’s very hard to run traditional transactions on unstructured or semi-structured data. So data was left unreconciled, but discoverable by applications, that would then have to sort things out.Slow analytics. Many of the NoSQL systems made it very easy to scale and to get data into the system (i.e., the data lake). While these systems did allow the ability to process larger amounts of data than ever before, they are pretty slow. Queries could take hours or even tens of hours. It was still better than not being able to ask the question at all, but it meant you had to wait a long while for the answer. NoSQL was needed as a complement to OLTP and OLAP systems, to work around the lack of scaling. While it had great promise and solved some key problems, it did not live up to all its expectations. The Emergence of Modern Databases With the emergence of NewSQL systems such as SingleStore, much of the rationale for using NoSQL in production has dissipated. We have seen many of the NoSQL systems try to add back important, missing features – such as transaction support and SQL language support – but the underlying NoSQL databases are simply not architected to handle them well. NoSQL is most useful for niche use cases, such as a data lake for storing large amounts of data, or as a kind of data storage scratchpad for application data in a large web application. The core problems still remain. How do you keep up with all the data flowing in and still make it available instantly to the people who need it? How can you reduce the cost of moving and transforming the data? How can you scale to meet the demands of all the users who want access to the data, while maintaining an interactive query response time? These are the challenges giving rise to a new workload, operational analytics. Read our upcoming blog post to learn about the operational analytics workload, and how NewSQL systems like SingleStore allow you to handle the challenges of these modern workloads.
Read Post
What Makes a Database Cloud-Native?
Product

What Makes a Database Cloud-Native?

SingleStore has been designed and developed as a distributed relational database, bringing the effectiveness of the relational database model into the new world of the cloud, containers, and other software-defined infrastructure – as described in a new report from 451 Research. Today, most of our customers run our software using some combination of the cloud and containers, with many also running it on-premises. Today, we are purveyors of the leading platform-independent NewSQL database. Having recently joined the Cloud Native Computing Federation, we’d like to take this opportunity to answer the question: “What makes a database cloud-native?” Cloud-Native Software Definition There are many definitions of “cloud-native software” available. 451 Research states that cloud-native software is “designed from the ground up to take advantage of cloud computing architectures and automated environments, and to leverage API- driven provisioning, auto-scaling and other operational functions.” The company continues: “Cloud-native architecture and software include applications that have been redesigned to take advantage of cloud computing architectures, but are not limited to cloud applications – we see cloud-native technologies and practices present in on-premises environments in the enterprise.” The point is repeated in one of the major headings in the report: “Cloud-native isn’t only in the cloud.” 451 Research commonly finds cloud-native technologies and practices being used in on-premises environments. What Cloud-Native Means for SingleStore Let’s break down the 451 Research definition of cloud-native and see how it applies to SingleStore. Takes Advantage of Cloud Features The first point from the 451 Research report states that cloud-native software is “designed from the ground up to take advantage of cloud computing architectures and automated environments”. SingleStore has been available on the major public cloud platforms for years, and deployments are balanced across cloud and on-premises environments. More importantly, SingleStore’s unique internal architecture gives it both the scalability that are inherent to the cloud and the ability to support SQL for transactions and analytics. An important step has been qualifying SingleStore for use in containers. SingleStore has been running in containers for a long time, and we use a containerized environment for testing our software.
Read Post
How CEOs Can Stay Relevant in the Age of AI
Trending

How CEOs Can Stay Relevant in the Age of AI

The most important new skills for business leaders are not what you might think. You’ve read the headlines. Data is the new oil; it’s the new currency; data capital can create competitive advantage. We also hear, over and over again, that machine learning (ML) and artificial intelligence (AI) are the future. Few will dispute that these things are true, but the trite language masks a deeper challenge. Data must be collected, analyzed, and acted upon in the right way and at the right time for a business to create value from it. ML and AI are only as powerful as the data that drive them. In this world, big companies – which may throw away as much data in a day as a startup will generate in a year – should have a significant competitive advantage. However, new approaches are needed to move forward effectively in the age of ML and AI. And the new approaches all start with the data itself. To build and maintain a successful business in today’s insight-driven economy, business leaders need to develop a new set of skills. We outline what we believe those skills are below. Skill #1: A Drive to Find Key Data and Make it Useful Business leaders need to be on a mission to collect and (more importantly) expose to their organizations all of the data that might create a competitive advantage. We don’t always know exactly what data or insights might be the ones that will allow us to break away from the pack until after we have analyzed and acted on that data, then measured the results and repeated the cycle, over and over. Business leaders need to encourage collecting as much data as possible in the day-to-day operations of the business, with a particular eye towards where your organization has advantages or challenges. Make sure that the data is not simply collected, but stored in such a way that your teams can easily access, understand, and analyze it. “Big data” was a great start to enabling the future of our businesses, but what we need today instead is “fast data” – data made available to everyone, to drive fast insight. Skill #2: The Ability to Create a Culture of Constant Analysis and Action As the French writer Antoine de Saint-Exupéry stated, “If you want to build a ship, don’t drum up people together to collect wood and don’t assign them tasks and work, but rather teach them to long for the vast and endless sea.” This adage applies to becoming an insight-driven business. Data is not insight, and insights are not outcomes. What we seek in collecting and analyzing data is to identify and carry out the actions that will accelerate and transform our business. The best way to leverage data for creating competitive advantage is to encourage a culture of inquisitiveness, of always asking “the 5 Whys” – a series of “why” questions that take us to the root of what’s important, and why. Compel your teams to constantly look for ways to not just gather and share insights, but to look for ways to turn insights into immediate actions that add value to the business. Innovations such as ecommerce product recommendations, dynamic pricing based on demand, or sensor-based maintenance are all insight-driven innovations that have arisen in the last decade or so and that have generated dramatic competitive advantage. ML and deep learning – the most practical form of AI currently available to business – accelerate this process. You can use them together to test multivariate alternatives, to vary assumptions and audiences around your current performance, to help you maximize the value of the insights that you find and implement today, and then to help you take your insights to another higher level. Skill #3: The Insight to Choose the Right Tools and Technologies The agile movement does not get nearly enough credit for the transformative effect it’s had, and continues to have, on business. But a business can only be agile with the right tools and technologies, and the ability to use them to drive action and change. It’s no surprise that, up to this point, most of the companies and leaders that are making the best use of data to drive their businesses are digital natives – think Google, Facebook, Uber, Airbnb, et al. They have done this by applying the agile mindset of software development to data architecture, data engineering, and data-driven decisioning. While the large digital players may have leapt to the forefront in the last 10 years, the traditional enterprise can use its long operational history, its existing volumes of data, and its ability to generate fresh, useful data, to level the playing field and compete effectively in the modern economy. In order to maximize and utilize these resources, business leaders need to lead the decision making around data infrastructure. The insight-driven enterprise needs the best possible tools and technology to enable fast, flexible, and efficient use of the company’s data. This means shifting the traditional IT mindset from maintaining legacy data infrastructure, overly strict controls, and inflexibility, to one that puts agility first. Analysts, data scientists, and application developers need access to real-time or near-real-time data sources. And they, and the businesspeople who work with them most closely, need to be empowered to act on that data – be it for rapid decision making or to create insight-driven, dynamic experiences for customers and employees. This shift requires a new set of tools, processes, and culture that is so critical to the future of the business that business leaders – all the way up to the CEO – needs to ensure that agility is the primary order of the day. Peter Guagenti is CMO at SingleStore, and is an advisor and a board member for several AI-focused companies. Peter spent more than a decade helping Fortune 500 companies to embrace digital transformation and to use real-time and predictive decisions to improve their businesses.
Read Post
Webinar: Data Trends for Predictive Analytics and ML
Trending

Webinar: Data Trends for Predictive Analytics and ML

In this webinar, Mike Boyarski and Eric Hanson of SingleStore describe the promise of machine learning and AI. They show how businesses need to upgrade their data infrastructure for predictive analytics, machine learning, and AI. They then dive deep into using SingleStore to power operational machine learning and AI. In this blog post, we will first describe how SingleStore helps you master the data challenges associated with machine learning (ML) and artificial intelligence (AI). We’ll then show how to implement ML/AI functions in SingleStore. At any point, feel free to view the (excellent) webinar. Challenges to Machine Learning and AI Predictive analytics is helping to transform how companies do business, and machine learning and AI are a huge part of that. The McKinsey Global Institute analysis shows ML/AI having trillions of dollars of impact in industry sectors ranging from telecommunications to banking to retail. AI investments are focused in automation, analytics, and fraud, among other areas. However, McKinsey goes on to report that only 15% of organizations have the right technology infrastructure, and only 8% of the needed data is available to AI systems across an organization. The vast majority of AI projects have serious challenges in moving from concept to production, and half the time needed to deploy an AI project is spent in preparation and aggregation of large datasets.
Read Post
How We Use Exactly-Once Semantics with Apache Kafka
Product

How We Use Exactly-Once Semantics with Apache Kafka

A version of this blog post first appeared in the developer-oriented website, The New Stack. It describes how SingleStore works with Apache Kafka to guarantee exactly-once semantics within a data stream. Apache Kafka usage is becoming more and more widespread. As the amount of data that companies deal with explodes, and as demands on data continue to grow, Kafka serves a valuable purpose. This includes its use as a standardized messaging bus due to several key attributes. One of the most important attributes of Kafka is its ability to support exactly-once semantics. With exactly-once semantics, you avoid losing data in transit, but you also avoid receiving the same data multiple times. This avoids problems such as a resend of an old database update overwriting a newer update that was processed successfully the first time. However, because Kafka is used for messaging, it can’t keep the exactly-once promise on its own. Other components in the data stream have to cooperate – if a data store, for example, were to make the same update multiple times, it would violate the exactly-once promise of the Kafka stream as a whole. Kafka and SingleStore are a very powerful combination. Our resources on the topic include instructions on quickly creating an IoT Kafka pipeline; how to do real-time analytics with Kafka and SingleStore; a webinar on using Kafka with SingleStore; and an overview of using SingleStore pipelines with Kafka in SingleStore’s documentation. How SingleStore Works with Kafka SingleStore is fast, scalable, relational database software, with SQL support. SingleStore works in containers, virtual machines, and in multiple clouds – anywhere you can run Linux. This is a novel combination of attributes: the scalability formerly available only with NoSQL, along with the power, compatibility, and usability of a relational, SQL database. This makes SingleStore a leading light in the NewSQL movement – along with Amazon Aurora, Google Spanner, and others. The ability to combine scalable performance, ACID guarantees, and SQL access to data is relevant anywhere that people want to store, update, and analyze data, from a venerable on-premise transactional database to ephemeral workloads running in a microservices architecture. NewSQL allows database users to gain both the main benefit of NoSQL – scalability across industry-standard servers – and the many benefits of traditional relational databases, which can be summarized as schema (structure) and SQL support. In our role as NewSQL stalwarts, Apache Kafka is one of our favorite things. One of the main reasons is that Kafka, like SingleStore, supports exactly-once semantics. In fact, Kafka is somewhat famous for this, as shown in my favorite headline from The New Stack: Apache Kafka 1.0 Released Exactly Once. What Is Exactly-Once? To briefly describe exactly-once, it’s one of three alternatives for processing a stream event – or a database update: At-most-once. This is the “fire and forget” of event processing. The initiator puts an event on the wire, or sends an update to a database, and doesn’t check whether it’s received or not. Some lower-value Internet of Things streams work this way, because updates are so voluminous, or may be of a type that won’t be missed much. (Though you’ll want an alert if updates stop completely.)At-least-once. This is checking whether an event landed, but not making sure that it hasn’t landed multiple times. The initiator sends an event, waits for an acknowledgement, and resends if none is received. Sending is repeated until the sender gets an acknowledgement. However, the initiator doesn’t bother to check whether one or more of the non-acknowledged event(s) got processed, along with the final, acknowledged one that terminated the send attempts. (Think of adding the same record to a database multiple times; in some cases, this will cause problems, and in others, it won’t.)Exactly-once. This is checking whether an event landed, and freezing and rolling back the system if it doesn’t. Then, the sender will resend and repeat until the event is accepted and acknowledged. When an event doesn’t make it (doesn’t get acknowledged), all the operators on the stream stop and roll back to a “known good” state. Then, processing is restarted. This cycle is repeated until the errant event is processed successfully.
Read Post
Webinar: Data Innovation in Financial Services
Case Studies

Webinar: Data Innovation in Financial Services

In this webinar, SingleStore Product Marketing Manager Mike Boyarski describes trends in data initiatives for banks and other financial services companies. Data is the lifeblood of modern financial services companies, and SingleStore, as the world’s fastest database, is rapidly growing its footprint in financial services. Financial services companies are leaders in digital transformation initiatives, working diligently to wring business value from the latest initiatives. According to Gartner’s 2019 CIO Agenda survey, Digital transformation is the top priority for banks, followed by growth in revenue; operational excellence; customer experience; cost optimization/reduction; and data and analytics initiatives (which are also likely to show up in the “digital transformation” bucket). As Gartner puts it, “… the digital transformation of banks creates new sources of revenue, supports new enterprise operating models and delivers digital products and services.”
Read Post
Case Study: Improving Risk Management Performance with SingleStore and Kafka
Data Intensity

Case Study: Improving Risk Management Performance with SingleStore and Kafka

Risk management is a critical task throughout the world of finance (and increasingly in other disciplines as well). It is a significant area of investment for IT teams across banks, investors, insurers, and other financial institutions. SingleStore has proven to be very well suited to support risk management and decisioning applications and analytics, as well as related areas such as fraud detection and wealth management. In this case study we’ll show how one major financial services provider improved the performance and ease of development of their risk management decisioning by replacing Oracle with SingleStore and Kafka. We’ll also include some lessons learned from other, similar SingleStore implementations. Starting with an Oracle-based Data Warehouse At many of the financial services institutions we work with, Oracle is used as a database for transaction processing and, separately, as a data warehouse. In this architecture, an extract, transform, and load (ETL) process moves data between the operational database and the analytics data warehouse. Other ETL processes are also typically used to load additional data sources into the data warehouse.
Read Post
Case Study: Wealth Management Dashboards Powered by SingleStore
Data Intensity

Case Study: Wealth Management Dashboards Powered by SingleStore

In this case study, we describe how SingleStore powers wealth management dashboards – one of the most demanding financial services applications. SingleStore’s scalability and support for fast, SQL-based analytics, with high concurrency, mean that it’s well-suited to serve as the database behind these highly interactive tools. Dashboards have become a hugely popular technique for monitoring and interacting with a range of disparate data. Like the dashboard in a car or an airplane, an effective dashboard consolidates data from many inputs into a consolidated, easy to understand display that responds instantly to both external conditions and user actions. SingleStore is widely used to power dashboards of many different kinds, including one of the most demanding: wealth management dashboards for families, individuals, and institutions. Banks and other financial services companies work hard to meet the needs of these highly valuable customers. These users are highly desired as customers, and as such have high expectations and hold those who provide them services to a very high standard. Data is the lifeblood of financial services companies. More than one bank has described themselves as “a technology company that happens to deal with money,” and many now employ more technology professionals than some large software companies. These financial institutions differentiate themselves on the basis of the breadth, depth, and speed of their information and trading support. So wealth management dashboards offer an important opportunity for these companies to provide the highest possible level of service and stand out from the competition.
Read Post
DZone Webinar – SingleStore for Time Series, Real Time, and Beyond
Data Intensity

DZone Webinar – SingleStore for Time Series, Real Time, and Beyond

Eric Hanson, Principal Product Manager at SingleStore, is an accomplished data professional with decades of relevant experience. This is an edited transcript of a webinar on time series data that he recently delivered for developer website DZone. Eric provided an architect’s view on how the legacy database limits of the past can be solved with scalable SQL. He shows how challenging workloads like time series and big data analytics are addressed by SingleStore, without sacrificing the familiarity of ANSI SQL. You can view the webinar on DZone. Time series data is getting more and more interest as companies seek to get more value out of the data they have – and the data they can get in the future. SingleStore is the world’s fastest database – typically 10 times faster, and three times more cost effective, than competing databases. SingleStore is a fully scalable relational database that supports structured and semi-structured data, with schema and ANSI SQL compatibility. SingleStore has features that support time series use cases. For time series data, key strengths of SingleStore include a very high rate of data ingest, with processing on ingest as needed; very fast queries; and high concurrency on queries. Key industries with intensive time series requirements that are using SingleStore today include energy and utilities; financial services; media and telecommunications; and high technology. These are not all the industries that are using SingleStore, but these four in particular, we have a lot of customers in these industries and these industries use time series data. Editor’s Note: Also see our blog posts on time series data, choosing a time series database, and implementing time series functions with SingleStore. We also have an additional recorded webinar (from us here at SingleStore) and an O’Reilly ebook download covering these topics. Introduction to SingleStore SingleStore has a very wide range of attributes that make it a strong candidate for time series workloads. You can see from the chart that SingleStore connects to a very wide range of other data technologies; supports applications, business intelligence (BI) tools, and ad hoc queries; and runs everywhere – on bare metal or in the cloud, in virtual machines or containers, or as a service. No matter where you run it, SingleStore is highly scalable. It has a scale-out, shared-nothing architecture.
Read Post
Webinar: Choosing the Right Database for Time Series Data
Data Intensity

Webinar: Choosing the Right Database for Time Series Data

In this webinar, SingleStore Product Marketing Manager Mike Boyarski describes the growth in popularity of time series data and talks about the best options for a time series database, including a live Q&A. You can view the webinar and download the slides here. Here at SingleStore, we’ve had a lot of interest in our blog posts on time series data and choosing a time series database, as well as our O’Reilly time series ebook download. Additionally, we have a webinar on architecture for time series databases from DZone. This webinar, by contrast, does a particularly good job of explaining what you would want in a time series database, and how that fits with SingleStore. We encourage you to read this blog post, then view the webinar. Time series data is growing in use because it’s getting easier and cheaper to generate time series data (more and cheaper sensors), transmit it (faster online and wireless connections), store it (better databases), act on it (more responsive websites and other online systems), and report on it (better analytics tools).
Read Post
Using SingleStore and Looker for Real-Time Data Analytics
Data Intensity

Using SingleStore and Looker for Real-Time Data Analytics

SingleStore is a fast, scalable SQL database. Looker is a fast, scalable analytics platform. You can use SingleStore and Looker to create a fast, scalable – yes, those words again – analytics solution that works well across a wide range of data ingest, transaction processing, and analytics needs. Both SingleStore and Looker are flexible and powerful tools. With the ability to provide full ANSI SQL support, SingleStore has the ability to work with a wide range of analytics tools. For Looker, its ability to connect to any SQL data source allows it to work well with a vast number of databases. Looker also optimizes its database interface to take advantage of specific database features, as you will see below. When paired together, SingleStore and Looker combine these areas of strength to deliver consistent and concrete results. For instance, one of the most popular applications for real-time analytics is to create a real-time dashboard. There may not be an easier or more effective way to create such dashboards than to first implement SingleStore and Looker together atop your existing architecture. Use Looker to make creating your dashboard easy, and use SingleStore to make it fast. Speeding Up Analytics with SingleStore and Looker You can use the combination of Looker and SingleStore atop an existing data architecture to make data much easier to access and greatly speed up performance. SingleStore is faster than competing solutions; often twice as fast, at half the cost. You can also use SingleStore to take over some or all of the work currently done by an existing SQL or NoSQL database, further improving performance. A solid example of an organization using SingleStore to speed up analytics performance is the online retail company Fanatics. Fanatics sources and sells branded merchandise for some of the world’s leading sports teams, including the NBA and the NFL, along with global brands such as Manchester United. Fanatics uses SingleStore to create a fast and reliable data architecture for all their analytics needs – including apps, business intelligence (BI) tools, and ad hoc SQL queries.
Read Post