Recent Articles

Product
Launching Our Community Edition
We started SingleStore with a vision to enable every company to be a real-time enterprise, and to do that by delivering the leading real-time database for transactions and analytics. Since then, the forces shaping our digital economy have only added wind to our sails. The world is more mobile, interconnected, interactive, and on the cusp of several industry transformations such as the Internet of Things.
Real-time processing is the secret to keeping up, and in-memory solutions are the foundation. Yet existing options have been too expensive or too complex for companies to adopt.
That changes today with the release of SingleStoreDB Self-Managed 4 and our new Community Edition, a free unlimited capacity, unlimited scale offering of SingleStore that includes all transactional and analytical features.
By sharing SingleStore capabilities with the world, for free, we expect many developers and companies will have a chance to explore what is possible with in-memory computing. As the pace of business advances, SingleStore will be there.
Start Using SingleStore Community Edition Now
Unlimited scale and capacity. Free forever.
Download Community Edition →
We hope you enjoy working with our Community Edition. Please feel free to share feedback at our Community Edition Help page.
Eric Frenkiel, CEO and co-founder, SingleStore
Nikita Shamgunov, CTO and co-founder, SingleStore
Community FAQ
Read Post

Trending
Join SingleStore at ad:tech San Francisco
Read Post

Trending1 min Read
Tech Field Day Reception May 13th
Tech Field Day is coming to San Francisco next week with a focus on data, and that means time for a party! On Wednesday, May 13th, at…
Read Post

Data Intensity
Driving Relevance with Real-Time and Historical Data
As technology weaves into our daily lives, our expectations of it continue to increase. Consider mobile devices and location information. Recently 451 Research released data that 47% of consumers would like to receive personalized information based on immediate location.
Read Post

Product
Celebrating SingleStore Availability Two Years In
Today, I couldn’t be more excited to mark the two year anniversary of SingleStore general availability! SingleStore began with a simple idea to build a new database that would give any company the ability to operate in real-time and make their business more data-driven, responsive, and scalable. Since releasing SingleStore, it’s been an amazingly fun journey as the company has grown by leaps and bounds every quarter.
To celebrate our second birthday, I wanted to take a brief moment to reflect on what we’ve been able to accomplish in the two years since releasing SingleStore.
People
SingleStore started in the Y-Combinator winter class of 2011 with just two people – co-founder and CTO Nikita Shamgunov, and myself. Since then, we’ve grown the company to more than 50 people who bring great experience, energy, and passion to the company. We’ve also added database visionaries like Jerry Held and Chris Fry to our executive team to help us see our vision come to fruition.
Customers
We’ve added 40+ enterprise customers over the past 2 years, including top brands like Comcast, Samsung and Shutterstock. It’s been incredibly rewarding to see our customers use SingleStore in ways we never imagined, truly pushing the boundaries of what is possible in Big Data.
Product
Since launching with general availability in 2013, we’ve expanded the SingleStore platform to scale with growing market demand. Major additions to the platform include:
Going beyond memory by including a flash-optimized column store that is closely integrated with the in-memory row store to provide a single database for real-time and historical analyticsWorking with Apache Spark by shipping a bi-directional connector to operationalize Spark models and resultsIncorporating real-time geospatial intelligence to help customers build location-aware applications and analytics
What’s Next?
The most exciting times are still ahead!
Big data has been traditionally thought of as a mechanism for extracting insights from yesterday’s data. We seek to change that way of thinking, empowering businesses to be more responsive by operating with real-time data in the here and now. As demand for real-time and in-memory databases increases, we plan to be there helping customers achieve phenomenal results.
Read Post

Product
Harnessing the Enterprise Capabilities of Spark
As more developers and data scientists try Apache Spark, they ask questions about persistence, transactions and mutable data, and how to deploy statistical models in production. To address some of these questions, our CEO Eric Frenkiel recently wrote an article for Data Informed explaining key use cases integrating SingleStore and Spark together to drive concrete business value.
The article explains how you can combine SingleStore and Spark for applications like stream processing, advanced analytics, and feeding the results of analytics back into operational systems to increase efficiency and revenue. As distributed systems with speedy in-memory processing, SingleStore and Spark naturally complement one another and form the backbone of a flexible, versatile real-time data pipeline.
Read the full article here.
Get The SingleStore Spark Connector Guide
The 79 page guide covers how to design, build, and deploy Spark applications using the SingleStore Spark Connector. Inside, you will find code samples to help you get started and performance recommendations for your production-ready Apache Spark and SingleStore implementations.
Download Here
Read Post

Data Intensity
Real-Time Stream Processing Architecture with Hadoop and SingleStore
While SingleStore and Hadoop are both data stores, they fill different roles in the data processing and analytics stack. The Hadoop Distributed File System (HDFS) enables businesses to store large volumes of immutable data, but by design, it is used almost exclusively for batch processing. Moreover, newer execution frameworks, that are faster and storage agonistic, are challenging MapReduce as businesses’ batch processing interface of choice.
Lambda Architecture
A number of SingleStore customers have implemented systems using the Lambda Architecture (LA). LA is a common design pattern for stream-based workloads where the hot, recent data requires fast updates and analytics, while also maintaining long-term history on cheaper storage. Using SingleStore as the real-time path and HDFS as the historical path has been a winning combination for many companies. SingleStore serves as a real-time analytics serving layer, ingesting and processing millions of streaming data points a second. SingleStore gives analysts immediate access to operational data via SQL. Long-term analytics and longer running, batch-oriented workflows are pushed to Hadoop.
Use Case: Real-Time Analytics at Comcast
As an example, SingleStore customer Comcast focuses on real-time operational analytics. By using SingleStore and Hadoop together, Comcast can proactively diagnose potential issues from real-time intelligence and deliver the best possible video experience. Their Lambda architecture writes one copy of data to a SingleStore instance and another one to Hadoop.
Read Post

Engineering
Boost Conversions with Overlap Ad Targeting
Digital advertising is a numbers game played out over billions of interactions. Advertisers and publishers build predictive models for buying and selling traffic, then apply those models over and over again. Even small changes to a model, changes that alter conversion rates by fractions of a percent, can have a profound impact on revenue over the course of a billion transactions.
Serving targeted ads requires a database of users segmented by interests and demographic information. Granular segmentation allows for more effective targeting. For example, you can choose more relevant ads if you have a list of users who like rock and roll, jazz, and classical music than if you just have a generic list of music fans.
Knowing the overlap between multiple user segments opens up new opportunities for targeting. For example, knowing that a user is both a fan of classical music and lives in the San Francisco Bay Area allows you to display an ad for tickets to the San Francisco Symphony. This ad will not be relevant to the vast majority of your audience, but may convert at a high rate for this particular “composite” segment. Similarly, you can offer LA Philharmonic tickets to classical fans in Southern California, Outside Lands tickets to rock and roll fans in the Bay Area, and so on.
Read Post

Trending
Filling the Gap Between HANA and Hadoop
Takeaways from the Gartner Business Intelligence and Analytics Summit
Last week, SingleStore had the opportunity to participate in the Gartner Business Intelligence and Analytics Summit in Las Vegas. It was a fun chance to talk to hundreds of analytics users about their current challenges and future plans.
As an in-memory database company, we fielded questions on both sides of the analytics spectrum. Some attendees were curious about how we compared with SAP HANA, an in-memory offering at the high-end of the solution spectrum. Others wanted to know how we integrated with Hadoop, the scale-out approach to storing and batch processing large data sets.
And in the span of a few days and many conversations, the gap between these offerings became clear. What also became clear is the market appetite for a solution.
Hardly Accessible Not Affordable
While HANA does offer a set of in-memory analytical capabilities primarily optimized for the emerging SAP S4/HANA Suite, it remains at such upper echelons of the enterprise IT pyramid that it is rarely accessible across an organization. Part of this stems from the length and complexity of HANA implementations and deployments. Its top of the line price and mandated hardware configurations also mean that in-memory capabilities via HANA are simply not affordable for a broader set of needs in a company.
Hanging with Hadoop
On the other side of the spectrum lies Hadoop, a foundational big data engine, but often akin to a large repository of log and event data. Part of Hadoop’s rise has been the Hadoop Distributed File System (HDFS) which allowed for cheap and deep storage on commodity hardware. MapReduce, the processing framework atop HDFS, powered the first wave of big data, but as the world moves towards real-time, batch processing remains helpful but rarely sufficient for a modern enterprise.
In-Memory Speeds and Distributed Scale
Between these ends of the spectrum lies an opportunity to deliver in-memory capabilities with an architecture on distributed, commodity hardware accessible to all.
The computing theme of this century is piles of smaller servers or cloud instances, directed by clever new software, relentlessly overtaking use-cases that were previously the domain of big iron. Hadoop proved that “big data” doesn’t mean “big iron.” The trend now continues with in-memory.
Moving To Converged Transactions and Analytics
At the heart of the in-memory shift is the convergence of both transactions and analytics into a single system, something Gartner refers to as Hybrid transactional/analytical processing (HTAP).
In-memory capabilities make HTAP possible. But data growth means the need to scale. Easily adding servers or cloud instances to a distributed solution lets companies meet capacity increases and store their highest value, most active data in memory.
But an all-memory, all-the-time solution might not be right for everyone. That is where combining all-memory and disk-based stores within a single system fits. A tiered architecture provides infrastructure consolidation and low cost expansion high value, less active data.
Finally, ecosystem integration makes data pipelines simple, whether that includes loading directly from HDFS or Amazon S3, running a high-performance connector to Apache Spark, or just building upon a foundational programming language like SQL.
SQL-based solutions can provide immediate utility across large parts of enterprise organizations. The familiarity and ubiquity of the programming language means access to real-time data via SQL becomes a fast path to real-time dashboards, real-time applications, and an immediate impact.
Related Links:
To learn more about How HTAP Remedies the Four Drawbacks of Traditional Systems here
Want to learn more about in-memory databases and opportunities with HTAP? – Take a look at the recent Gartner report here.
If you’re interested in test driving an in-memory database that offers the full benefits of HTAP, give SingleStore a try for free, or give us a ring at (855) 463-6775.
Read Post

Trending
Real-Time Geospatial Intelligence with Supercar
Today, SingleStore is showcasing a brand new demonstration of real-time geospatial location intelligence at the Gartner Business Intelligence and Analytics Summit in Las Vegas.
The demonstration, titled Supercar, makes use of a dataset containing the details of 170 million real world taxi rides. By sampling this dataset and creating real-time records while simultaneously querying the data, Supercar simulates the ability to monitor and derive insights across hundreds of thousands of objects on the go.
Read Post

Trending
In The End We Seek Structure
In Short:
A range of assumptions led to a boom in NoSQL solutions, but in the end, SQL and relational models find their way back as a critical part of data management.
In the End We Seek Structure. Why SQL and relational models are back as a critical part of data management – Click to Tweet
Background
By the mid 2000s, 10 years into the Netscape-inspired mainstream Internet, webscale workloads were pushing the limits of conventional databases. Traditional solutions could not keep up with a myriad of Internet users simultaneously accessing the same application and database.
At the time, many websites used relational databases like MySQL, SQL Server from Microsoft, or Oracle. Each of these databases relied on a relational model using SQL, the Structured Query Language, which emerged nearly 40 years ago and remains the lingua franca of data management.
Genesis of NoSQL
Scaling solutions is hard, and in particular scaling a relational, SQL database proved particularly challenging, in part leading to the emergence of the NoSQL movement.
FIGURE 1: Interest in NoSQL 2009 – 2015 Source: Google Trends
Read Post

Trending
SingleStore at Gartner Business Analytics and Intelligence Summit
We are thrilled to be in Las Vegas this week for the Gartner Business Analytics and Intelligence Summit. We will be at booth #119 and we have a ton in store for the event, including games and giveaways, happy hour for attendees, and a featured session from SingleStore CEO, Eric Frenkiel.
We will also be showcasing our new geospatial capabilities, and a demo of how Pinterest is using SingleStore and Spark for real-time analytics.
Free Gartner Report: Market Guide for In-Memory DBMS
See the latest developments and use cases for in-memory databases.
Download the Report Here →
From the report…
“The growing number of high performance, response-time critical and low-latency use cases (such as real-time repricing, power grid rerouting, logistics optimization), which are fast becoming vital for better business insight, require faster database querying, concurrency of access and faster transactional and analytical processing. IMDBMSs provide a potential solution to all these challenging use cases, thereby accelerating its adoption.”
Don’t Miss the SingleStore Featured Session
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
SingleStore CEO and Founder, Eric Frenkiel, will discuss how moving from batch-oriented data silos to real-time pipelines means replacing batch processes with online datasets that can be modified and queried concurrently. This session will cover use cases and customer deployments of Hybrid Transaction/Analytic Processing (HTAP) using SingleStore and Spark.
Session Details
Speaker: Eric Frenkiel, SingleStore CEO and FounderData and Time: 12:30pm–12:50pm Monday, 3/30/2015Location: Theater A, Forum Ballroom
Join SingleStore on Monday Night for Happy Hour
We will be hosting a happy hour at Carmine’s in The Form Shops at Caesars on Monday night at 8:00PM. ALTER TABLE TINIs and heavy hors d’oeuvres will be served. Stop by and meet with SingleStore CEO, Eric Frenkiel and CMO, Gary Orenstein. More details here.
Suggested Sessions
We have handpicked a few sessions that you don’t want to miss.
Do We Still Need a Data Warehouse?
Speaker: Donald Feinberg VP Distinguished Analyst
30 March 2015 2:00 PM to 2:45 PM
For more than a decade, the data warehouse has been the architectural foundation of most BI and analytic activity. However, various trends (in-memory, Hadoop, big data and the Internet of Things) have compelled many to ask whether the data warehouse is still needed. This session provides guidance on how to craft a more modern strategy for data warehousing.
Will Hadoop Jump the Spark?
Speaker: Merv Adrian Research VP
31 March 2015 2:00 PM to 2:45 PM
The Hadoop stack continues its dramatic transformation. The emergence of Apache Spark, suitable for many parts of your analytic portfolio, will rewrite the rules, but its readiness and maturity are in question.
The DBMS Dilemma: Choosing the Right DBMS For The Digital Business
Speaker: Donald Feinberg VP Distinguished Analyst
31 March 2015 2:00 PM to 2:45 PM
As your organization moves into the digital business era, the DBMS needs to support not only new information types but also the new transactions and analytics required for the future. The DBMS as we know it is changing. This session will explore the new information types, new transaction types and the technology necessary to support this.
Games and Giveaways
Read Post

Engineering
Turn Up the Volume With High-Speed Counters
Scaling tends to make even simple things, like counting, seem difficult. In the past, businesses used specialized databases for particular tasks, including high-speed, high-throughput event counters. Due to the constraints of legacy systems, some people still assume that relational databases cannot handle high-throughput tasks at scale. However, due to advances like in-memory storage, high-throughput counting no longer requires a specialized, single-purpose database.
Why do we even need counters?
Before we get into the implementation, you might be asking why we need counters at all. Why not just collect event logs and compute counts as needed?
In short, querying a counter is much faster than counting log records, and many applications require instant access to this kind of data. Counting logs requires a large table scan and aggregation to produce a count. If you have an updatable counter, it is a single record lookup. The challenge with high-throughput counters is that building a stateful, fault tolerant distributed system can be challenging. Fortunately, SingleStore solves those hard problems for you, so you can focus on building your application.
In the rest of this article we’ll design a simple robust counter database running on a modest SingleStore cluster, and benchmark how it performs.
Counters are records
Let’s start by creating the following schema:
create database test;
use test;
create table counters_60 (
time_bucket int unsigned not null,
event_type int unsigned not null,
counter int unsigned not null,
primary key (time_bucket, event_type)
);
create table event_types (
event_type int unsigned not null primary key,
event_name varchar(128),
owner varchar(64),
status enum ('active', 'inactive')
);
The column time_bucket is the timestamp on the event rounded to the nearest minute. Making the time_bucket and event_type the primary key allows us to easily index events by time and type.
insert into counters_60 select unix_timestamp() / 60, 1234, 1
on duplicate key update counter = counter + 1;
If a primary key value does not exist, this query will insert a new record into SingleStore. If the primary key value exists, the counter will be incremented. This is informally called an “upsert.” The management of event_types is outside the scope of this article, but it’s trivial (and fast) to join the counter table to a table containing event metadata such as its human-friendly name.
Let’s also insert some data into the event_types table:
insert into event_types values (1234, 'party', 'memsql', 'active');
Querying Counters
Now you have the counts of each event type bucketed by minute. This counter data can easily be aggregated and summarized with simple SQL queries:
-- all-time historical counts of various event types
select e.event_type, e.event_name, sum(c.counter)
from counters_60 c, event_types e
where c.event_type=e.event_type
and e.event_type in (1234, 4567, 7890)
group by 1, 2;
-- total number of events in the last hour
select sum(counter), sum(counter)/60 as 'avg per min' from counters_60
where event_type = 1234
and time_bucket >= unix_timestamp() / 60 - 60;
-- total number of events in time series, bucketed in 10-minute intervals
select floor((unix_timestamp()/60 - time_bucket)/10) as interval, sum(counter)
from counters_60
where event_type = 1234
and time_bucket >= unix_timestamp() / 60 - 60
group by 1;
1.6 Million increments per second
Inserting naively into the counters table, one record at a time, actually gets you pretty far. In our testing this resulted in a throughput of 200,000 increments per second. It’s nice to get impressive performance by default. Then we tried to see how much farther we could go.
In this simulation we processed 1,000 different event types. We created a threaded python script to push as many increments a second as possible. We made three changes to the naive version: multi-insert batches, disabling cluster-wide transactions, and sorting the records in each batch to avoid deadlocking.
insert into counters_60 values
(23768675, 1234, 1),
(23768675, 4567, 1),
(23768675, 7890, 1),
...
on duplicate key update counter = counter + 1;
We used a 6 node AWS cluster with 2 aggregators and 4 leaves to simulate the workload. Each node was m3.2xlarge consisting of 8 cores and 15GB of RAM, with an hourly cost of \$2.61 for the entire cluster. When starting this script on both aggregator nodes, we achieved a throughput of 1.6M upserts a second.
Data Collection
In this simulation we use a Python script to simulate the data ingest. In the real world, we see our customers use technologies like Storm, Kafka and Spark Streaming to collect events in a distributed system for higher throughput. For more information on SingleStore integration with stream processing engines, see this blog post on how Pinterest uses SingleStore and Spark streaming to track real-time event data.
Want to build your own high throughput counter? Download SingleStore today!
Read Post

Trending
SingleStore at Spark Summit East
We are happy to be in New York City this week for Spark Summit East. We will be sharing more about our new geospatial capabilities, as well as the work with Esri to showcase the power of SingleStore geospatial features in conjunction with Apache Spark.
Last week we shared the preliminary release of SingleStore geospatial features introduced at the Esri Developer Summit in Palm Springs. You can read more about the live demonstration showcased at the summit here.
The demonstration uses the “Taxistats” dataset: a compilation of 170 million real-world NYC taxi rides. It includes GPS coordinates of the pickup and dropoff, distance, and travel time. SingleStore is coupled with the new version of Esri’s ArcGIS Server, which has a new feature to translate ArcGIS queries into external database queries. From there we generate heatmaps from the raw data in sub-second time.
This week we launched the official news release of SingleStore geospatial capabilities.
By integrating geospatial functions, SingleStore enables enterprises to achieve greater database efficiency with a single database that is in-memory, linearly scalable and supports the full rage of relational SQL and geospatial functions. With SingleStore, geospatial data no longer remains separate and becomes just another data type with lock-free capabilities and powerful manipulation functions.
Read Post

Engineering
Geospatial Intelligence Coming to SingleStore
This week at the Esri Developers Summit in Palm Springs, our friends at Esri are previewing upcoming features for the next release of SingleStore, using a huge real-world geospatial dataset.
Esri develops geographic information systems (GIS) that function as an integral component in nearly every type of organization. In a recent report by the ARC Advisory Group, the Geographic Information System Global Market Research Study, the authors stated, “Esri is, without a doubt, the dominant player in the GIS market.”
SingleStore showcases Geospatial features at Esri Developers Summit – Click to Tweet
Everything happens somewhere. But, traditionally, spatial data has been locked away in specialized software that either lacked general database features, or didn’t scale out. With SingleStore we are making geospatial data a first-class citizen: just as easy to use, at scale, at great speed and high throughput, as any other kind of data.
The demonstration uses the “Taxistats” dataset: a compilation of 170 million real-world NYC taxi rides. It includes GPS coordinates of the pickup and dropoff, distance, and travel time. SingleStore is coupled with the new version of Esri’s ArcGIS Server, which has a new feature to translate ArcGIS queries into external database queries. From there we generate heatmaps from the raw data in sub-second time.
Heatmaps are a great way to visualize aggregate geospatial data. The X and Y are the longitude and latitude of “cells” or “pixels” on the map, and the color shows the intensity of the values. From there you can explore the dataset across any number of dimensions: zoom in on an area, filter by time, length of ride, and more.
Read Post

Trending
SingleStore at the AMP Lab
Please join us next week as two members of the SingleStore engineering team present at the AMPLab at Berkeley on Wednesday March 11th from 12:00pm to 1:00pm.
AMP SEMINAR
Ankur Goyal and Anders Papitto, SingleStore,
A Distributed In-Memory SQL Database
Wednesday 3/11, Noon, 405 Soda Hall, Berkeley
Talk Abstract
This talk will cover the major architectural design decisions with discussion on specific technical details as well as the motivation behind the big decisions. We will cover lockfree, code generation, durability/replication, distributed query execution, and clustering in SingleStore. We will then discuss some of the new directions for the product, including some ideas on leveraging Spark.
Speakers
Ankur Goyal is the Director of Engineering at SingleStore. At SingleStore he has focused on distributed query execution and clustering, but has touched most of the engine. His areas of interest are distributed systems, compilers, and operating systems. Ankur studied computer science at Carnegie Mellon University and worked on distributed data processing at Microsoft before SingleStore.
Anders Papitto is an engineer at SingleStore, where he has worked on distributed query execution, column store storage and query execution, and various other components. He joined SingleStore shortly before completing his undergraduate studies at UC Berkeley.
About the AMPLab
AMP: ALGORITHMS MACHINES PEOPLE
TURNING UP THE VOLUME ON BIG DATA
Working at the intersection of three massive trends: powerful machine learning, cloud computing, and crowdsourcing, the AMPLab is integrating Algorithms, Machines, and People to make sense of Big Data. We are creating a new generation of analytics tools to answer deep questions over dirty and heterogeneous data by extending and fusing machine learning, warehouse-scale computing and human computation. We validate these ideas on real-world problems including participatory sensing, urban planning, and personalized medicine with our application and industrial partners.
Read Post

Trending
Video: The State of In-Memory and Apache Spark
Strata+Hadoop World was full of activity for SingleStore. Our keynote explained why real-time is the next phase for big data. We showcased a live application with Pinterest where they combine Spark and SingleStore to ingest and analyze real-time data. And we gave away dozens of prizes to Strata+Hadoop attendees who proved their latency crushing skills in our Query Kong game.
During the event, Mike Hendrickson of O’Reilly Media sat down with SingleStore CEO Eric Frenkiel to discuss:
The state of in-memory computing and where it will be in a yearWhat Spark brings to in-memory computingIndustries and use cases that are best suited for Spark
Get The SingleStore Spark Connector Guide
The 79 page guide covers how to design, build, and deploy Spark applications using the SingleStore Spark Connector. Inside, you will find code samples to help you get started and performance recommendations for your production-ready Apache Spark and SingleStore implementations.
Download Here
Watch the video in full here:
Read Post

Trending
Data Stores for the Internet of Things
Like the world wide web, the Internet of Things is personal. It represents a near complete connectedness, including the industrial world, and never ending possibilities of new applications and services.The Internet of Things also represents a need to examine conventional assumptions on databases and data stores to support real-time data pipelines.In an article on Silicon Angle, Designing data stores for the Internet of Things, SingleStore CEO and co-founder Erik Frenkiel shares his insight on the critical requirements to support new interconnected devices, interactive applications, and the analytics to understand their use.Principles of data store design for the Internet of ThingsCapture EverythingSave Data While Serving DataFit the EcosystemOnline All the TimeBe sure to read the entire a article at Designing data stores for the Internet of Things.
Read Post

Trending
Big Data, Big Fun! Visit SingleStore at Strata Booth 1015
The fun begins at Strata + Hadoop World this week in San Jose. Be sure to check out SingleStore at Booth 1015 for the latest product details, demonstrations, games, and prizes. Here is a quick rundown of our activity at the show.
SingleStore Introduces Seamless Spark Connectivity for Enterprise Deployments
Last week, SingleStore announced a high-performance parallel connector for SingleStore and Spark. You can read all the details here and see it live at the show.
SingleStore and Pinterest Showcase Operationalizing Spark at Strata + Hadoop World 2015
We partnered with our friends at Pinterest to share the latest and greatest with Spark and SingleStore. Read all of the details here.
Keynote: Close Encounters with the Third Kind of Database
9:10am-9:15am Thursday, February 19th, Grand Ballroom 220
Join us for this engaging presentation by our CEO and co-founder Eric Frenkiel.
Tutorial Session: Bringing OLAP Fully Online: Analyze Changing Datasets in SingleStore and Spark with Pinterest Demo
10:40am-11:20am Thursday, February 19th, Room LL20D
This session includes appearances from Robert Stepeck, CTO, Novus and Yu Yang, Software Engineering, Pinterest.
Test Your Skills with Query Kong
Win an Estes Proto X Drone after proving your low-latency skills with Query Kong, the breakout game sensation of Strata + Hadoop World!
Visit the SingleStore Booth 1015
We have cool t-shirts for all visitors during the show expo hours:
Wednesday, February 18, 5:00pm – 6:30pm
Thursday, February 19, 10:00am – 4:30pm and 5:30pm – 7:00pm
Friday, February 20, 10:00am- 4:00pm
See you there!
We look forward to sharing great technical insights and fun times at booth 1015!
http://www.singlestore.com/events
Read Post

Case Studies
How Pinterest Measures Real-Time User Engagement with Spark
Setting the Stage for Spark
With Spark on track to replace MapReduce, enterprises are flocking to the open source framework in effort to take advantage of its superior distributed data processing power.
IT leads that manage infrastructure and data pipelines of high-traffic websites are running Spark–in particular, Spark Streaming which is ideal for structuring real-time data on the fly–to reliably capture and process event data, and write it in a format that can immediately be queried by analysts.
As the world’s premiere visual bookmarking tool, Pinterest is one of the innovative organizations taking advantage of Spark. Pinterest found a natural fit in SingleStore’s in-memory database and Spark Streaming, and is using these tools to find patterns in high-value user engagement data.
Pinterest’s Spark Streaming Setup
Here’s how it works:
Pinterest pushes event data, such as pins and repins, to Apache Kafka.Spark Streaming ingests event data from Apache Kafka, then filters by event type and enriches each event with full pin and geo-location data.Using the SingleStore Spark Connector, data is then written to SingleStore with each event type flowing into a separate table. SingleStore handles record deduplication (Kafka’s “at least once” semantics guarantee fault tolerance but not uniqueness).As data is streaming in, Pinterest is able to run queries in SingleStore to generate engagement metrics and report on various event data like pins, repins, comments and logins.
Visualizing the Data
We built a demo with Pinterest to showcase the locations of repins as they happen. When an image is repinned, circles on the globe expand, providing a visual representation of the concentration of repins by location.
Read Post

Product
Operationalizing Spark with SingleStore
Combining the data processing prowess of Spark with a real-time database for transactions and analytics, where both are memory-optimized and distributed, leads to powerful new business use cases. SingleStore Spark Connector links at end of this post.
Data Appetite and Evolution
Our generation of, and appetite for, data continues unabated. This drives a critical need for tools to quickly process and transform data. Apache Spark, the new memory-optimized data processing framework, fills this gap by combining performance, a concise programming interface, and easy Hadoop integration, all leading to its rapid popularity.
However, Spark itself does not store data outside of processing operations. That explains that while a recent survey of over 2000 developers chose Spark to replace MapReduce, 62% still load data to Spark with the Hadoop Distributed File System and there is a forthcoming Tachyon memory-centric distributed file system that can be used as storage for Spark.
But what if we could tie Spark’s intuitive, concise, expressive programming capabilities closer to the databases that power our businesses? That opportunity lies in operationalizing Spark deployments, combining the rich advanced analytics of Spark with transactional systems-of-record.
Introducing the SingleStore Spark Connector
Meeting enterprise needs to deploy and make use of Spark, SingleStore introduced the SingleStore Spark Connector for high-throughput, bi-directional data transfer between a Spark cluster and a SingleStore cluster. Since Spark and SingleStore are both memory-optimized, distributed systems, the SingleStore Spark Connector benefits from cluster-wide parallelization for maximum performance and minimal transfer time. The SingleStore Spark Connector is available as open source on Github.
SingleStore Spark Connector Architecture
There are two main components of the SingleStore Spark Connector that allow Spark to query from and write to SingleStore.
A `SingleStoreRDD` class for loading data from a SingleStore queryA `saveToSingleStore` function for persisting results to a SingleStore table
Read Post

Product
Run Real-Time Applications with Spark and the SingleStore Spark Connector
Apache Spark is one of the most powerful distributed computing frameworks available today. Its combination of fast, in-memory computing with an architecture that’s easy to understand has made it popular for users working with huge amounts of data.
While Spark shines at operating on large datasets, it still requires a solution for data persistence. HDFS is a common choice, but while it integrates well with Spark, its disk-based nature can impact performance in real-time applications (e.g. applications built with the Spark Streaming libraries). Also, Spark does not have a native capability to commit transactions.
Making Spark Even Better
That’s why SingleStore is releasing the SingleStore Spark connector, which gives users the ability to read and write data between SingleStore and Spark. SingleStore is a natural fit for Spark because it can easily handle the high rate of inserts and reads that Spark often requires, while also having enough space for all of the data that Spark can create.
Read Post

Trending
Closing the Batch Gap
Read Post

Trending
Four Ways Your DBMS is Holding You Back – And One Simple Fix
Our data is changing faster than our data centers, making it harder and harder to keep up with the influx of incoming information, let alone make use of it. IT teams still tolerate overnight batch processing. The cost of scaling legacy solutions remains cost prohibitive. And many promised solutions force a complete departure from the past.
If this sounds familiar, you are not alone. Far too many innovative companies struggle to build applications for the future on infrastructure of the past. It’s time for a new approach.
In their report, “Hybrid Transaction/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation,” Gartner identifies four major drawbacks of traditional database management systems, and how a new approach of hybrid transactional and analytical processing can solve these issues.
A Brief Overview of HTAP
Hybrid transactional/analytical processing (HTAP) merges two formerly distinct categories of data management: operational databases that processed transactions, and data warehouses that processed analytics. Combining these functions into a single system inherently eliminates many challenges faced by database administrators today.
How HTAP Remedies the Four Drawbacks of Traditional Systems
ETL
In HTAP, data doesn’t need to move from operational databases to separated data warehouses/data marts to support analytics. Rather, data is processed in a single system of record, effectively eliminating the need to extract, transform, and load (ETL) data. This benefit provides much welcomed relief to data analysts and administrators, as ETL often takes hours (sometimes days) to complete.
Analytic Latency
In HTAP, transactional data of applications is readily available for analytics when created. As a result, HTAP provides an accurate representation of data as it’s being created, allowing businesses to power applications and monitor infrastructure in real-time.
Synchronization
In HTAP, drill-down from analytic aggregates always points to the “fresh” HTAP application data. Contrast that with a traditional architecture, where analytical and transactional data is stored in silos, and building a system to synchronize data stores quickly and accurately is cumbersome. On top of that, it’s likely that the “analytics copy” of data will be stale and provide a false representation of data.
Copies of Data
In HTAP, the need to create multiple copies of the same data is eliminated (or at least reduced). Compared to a traditional architectures, where copies of data must managed and monitored for consistency, HTAP reduces inaccuracies and timing differences associated with the duplication of data. The result is a simplified system architecture that mitigates the complexity of managing data and hardware costs.
Why HTAP and Why Now?
One of the reasons we segmented workloads in the past was to optimize for specific hardware, especially disk drives. In order to meet performance needs, systems designed for transactions were best optimized one way, and systems designed for queries another. Merging systems on top of the same set of disk drives would have been impossible from a performance perspective.
With the advent of low cost, memory-rich servers, in your data center or in the cloud, new in-memory databases can transcend prior restrictions and foster simplified deployments for existing use cases while simultaneously opening doors to new data centric applications.
Want to learn more about in-memory databases and opportunities with HTAP? – Take a look at the recent Gartner report here.
If you’re interested in test driving an in-memory database that offers the full benefits of HTAP, give SingleStore a try for 30 days, or give us a ring.
Read Post

Trending
The Rise of the Cloud Memory Price Drop
Last week, Data Center Knowledge published a piece on Microsoft’s ‘Monster’ Azure instances, with RAM capacities approaching half a terabyte at 448 GB.
Microsoft Azure has launched its most powerful cloud instances to date. The new G-series instances go up to 32 cores, 448 GiB of RAM, and 6,596 GB of local SSD storage.
Microsoft Azure Launches Monster Cloud Instances, Data Center Knowledge, 8 January 2015
The article continues to detail
The highest-memory instance available on Google Compute Engine is 104 GB
and
The Azure announcement comes before the expected roll-out of new high-octane cloud instances by AWS. …the upcoming C4 instances, which will go up to…60 GB of RAM.
However, the R3 instances from Amazon, optimized for memory, reach capacities of up to 244 GB.
This week, Business Insider published a post outlining “The Vicious Price War Going On In Cloud Computing,” that details in finer granularity the precipitous drop of average monthly cost per GB of RAM. The chart comes from RBC Capital’s Mark Mahaney.
Read Post

Company
Welcome Jerry Held as SingleStore Executive Chairman
At SingleStore, we have always respected the challenges the computing industry has tackled in the past. It is no easy feat to build formidable technology companies, and when it happens, we look on with admiration.
It is in that spirit that we are thrilled to welcome Jerry Held as Executive Chairman of SingleStore. Jerry’s experience in technology across computing and database technology is unparalleled. A consummate technology innovator and entrepreneur, he has helped create new database technologies and new companies, beginning with pioneering work to build the INGRES database during his time at U.C. Berkeley, and then Tandem, where he built it from a startup to reaching over $2 billion in annual revenues. He ran Oracle’s server products division during a growth period from $1.5 billion to \$6 billion in annual revenues. He served as “CEO-in-residence” at Kleiner Perkins Caufield & Byers. He served as lead director of Business Objects until its sale to SAP, and was executive chairman of Vertica until its sale to HP. Today he serves as the chairman of Tamr and on the boards of Informatica, NetApp, Kalio and Copia Global.
SingleStore Welcomes Database Visionary Jerry Held as Executive Chairman – Click to Tweet
In short, Jerry is an expert in high-growth technology companies and knows the opportunities they create. He has seen the full picture and enjoys sharing his expertise.
SingleStore began with a simple idea to build a new database that would change the way people think about business: every company should be a real-time company, using SingleStore to make their businesses more data-driven, responsive, and scalable.
To achieve our objectives, we have surrounded ourselves with the best and brightest in the industry, in turn delivering our customers easy access to database innovation. Jerry epitomizes the talent and experience we continue to seek at SingleStore.
As we continue the expansion of our business and world-class product, we look forward to Jerry’s participation in reshaping the real-time database landscape. Welcome, Jerry!
Read Post

Trending
Market Guide for In-Memory DBMS
From the inception of SingleStore, we’ve seen the ability to merge transactions and analytics into a single database system as a core function for any data centric organization. Gartner’s recent report, “Market Guide for In-Memory DBMS” mirrors that belief, and is chock full of key findings and recommendations for businesses looking to take advantage of in-memory computing.
Skip this article and download a complimentary copy of Gartner’s Market Guide for In-Memory DBMS
In the report, Gartner found that “rapid technological advances in in-memory computing (IMC) have led to the emergence of hybrid transactional/analytical processing (HTAP) architectures that allow concurrent analytical and transactional processing on the same IMDBMS or data store.”
HTAP Solves for Real-Time Data Processing
HTAP promises to open a green field of opportunities for businesses that are not possible with legacy database management systems. Gartner highlights that with HTAP, “large volumes of complex business data can be analyzed in real time using intuitive data exploration and analysis without the latency of offloading the data to a data mart or data warehouse. This will allow business users to make more informed operational and tactical decisions.”
HTAP Use Cases
We are in the early days of HTAP, and it is not always clear how it can be applied in the real world. As a rule of thumb, any organization that handles large volumes of data will benefit from HTAP. To provide a bit more context, we’ve compiled the following applications of HTAP in use today.
Application Monitoring
When millions of users reach mobile or web-based applications simultaneously, it’s critical that systems run without any hiccups. HTAP allows teams of system administrators and analysts to monitor the health of applications in real-time to spot anomalies and save on costs incurred from poor performance.
Internet of Things
Applications built for the internet of things (IoT) run on huge amounts of sensor data. HTAP easily processes IoT scale data workloads, as it is designed to handle extreme data ingestion while concurrently making analytics available in real-time.
Real-Time Bidding
Ad Tech companies struggle to implement complex real-time bidding features due of the sheer volume of data processing required. HTAP delivers the processing power that’s necessary to serve display, social, mobile and video advertising at scale.
Market Conditions
Financial organizations must be able to respond to market volatility in an instant. Any delay is money out of their pocket. HTAP makes it possible for financial institutions to respond to fluctuating market conditions as they happen.
In each of these use cases, the ability to react to large data sets in a short amount of time provides incredible value and, with HTAP, is entirely possible.
Finding the Right In-Memory DBMS
Before diving into a proof of concept, we highly suggest reading Gartner’s “Market Guide for In-Memory DBMS.” By giving it a quick read, you’ll come away with a better understanding of the in-memory computing landscape, new business opportunities, applicable use cases for your organization, and an action plan for getting started.
For a limited time, we’re offering a complimentary download of the report. Download it now to learn:
Why In-memory computing is growing in popularity and adoptionHow IMDBMSs are categorized and the three major use cases they supportNew business opportunities emerging from hybrid transactional and analytical processing (HTAP)How to jump ahead of the competition with recommendations for effective use of IMDBMS
Get a better understanding of the in-memory computing landscape. Download the Gartner Market Guide here. – Click to Tweet
Required Disclaimer: Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Read Post

Data Intensity
A Sensor In It, A Database Behind It
On opening day, Molly Wood at The New York Times recounted the official CES theme emerging as “Put a sensor in it.”
The unofficial theme seemed to be: Put a sensor in it.
Later that day Samsung declared its entire product portfolio will be connected within 5 years.
“In five years,” every single one of Samsung’s products will be a connected “Internet of Things” device, Samsung chief executive Boo-Keun Yoon said today during the opening keynote at the 2015 Consumer Electronics Show in Las Vegas.
–Venturebeat
The Consumer Electronics Show reigns as the tech industry’s annual device bonanza, and a large part of this year’s euphoria relates to connected devices that form the Internet of Things.
The theme and message remain clear. Sensors, and interconnected devices will stampede ahead.
Less frequently discussed is what happens behind those devices, and in particular, the expectations users have about their daily device interactions and demand for data.
Fulfilling Data for The Internet of Things
Both companies and users have a lot at stake.
Device and application providers aim to serve users with a rich engaging set of functionality. They also seek to instrument service delivery to monitor and react to different situations.
Users invest time and money in connected devices and applications to fill needs and desires. But along with that comes a set of expectations.
Examples include:
Fitness device users want their information up to the last stepVideo watchers expect streams to be delivered efficientlyDrivers expect hassle free connected services
Ultimately, all of the edge data drives demand and application requirements at the other end of the network, in the data center. Specifically, that infrastructure must include a database that can:
Support massive data ingest across millions of devices and connections
Database systems must keep up with the incoming flood of data to ensure no loss, and that every user or device has a complete picture of its history.
Serve as the system of record while simultaneously providing real-time analytics
In a real-time world, there is no luxury or pain of transferring data between systems, commonly referred to as Extract, Transform and Load (ETL). Systems of record for the Internet of Things need to mix transactions and analytics seamlessly and simultaneously.
Respond to and integrate well with familiar ecosystems
With sensor data touching everything from business intelligence to ad networks, connecting to multiple systems must be painless and simple.
Allow for online scaling and online operations
The world stops for no one, and successful services will be judged by their ability to grow and provide enterprise level service quality.
It will be fun to see the possibilities of devices, drones, automated equipment and the emerging services they will power as part of the Internet of Things. As that happens, mountains of data will need to be captured and applied quickly to provide the richest user experience.
We built SingleStore to give organizations behind the Internet of Things a head start. Every day, we work with development and operations teams to make sure they can ingest large amounts of sensor data with ease, make sound decisions in real-time, and manage complex, real-world data models with the familiarity of SQL. If that sounds like something that might help, give us a ring, or download SingleStore and try it for 30 days.
Read Post

Engineering
Load Files from Amazon S3 and HDFS with the SingleStore Loader
One of the most common tasks with any database is loading large amounts of data into it from an external data store. Both SingleStore and MySQL provide the LOAD DATA command for this task; this command is very powerful, but by itself, it has a number of restrictions:
It can only read from the local filesystem, so loading data from a remote store like Amazon S3 requires first downloading the files you need.
Since it can only read from a single file at a time, loading from multiple files requires multiple LOAD DATA commands. If you want to perform this work in parallel, you have to write your own scripts.
If you are loading multiple files, it’s up to you to make sure that you’ve deduplicated the files and their contents.
Why We Built the SingleStore Loader
At SingleStore, we’ve acutely felt all of these limitations. That’s why we developed SingleStore Loader, which solves all of the above problems and more. SingleStore Loader lets you load files from Amazon S3, the Hadoop Distributed File System (HDFS), and the local filesystem. You can specify all of the files you want to load with one command, and SingleStore Loader will take care of deduplicating files, parallelizing the workload, retrying files if they fail to load, and more.
Use a load command to load a set of files
Read Post