Faster Data Warehouse Track at Gartner Data and Analytics Summit
Trending

Faster Data Warehouse Track at Gartner Data and Analytics Summit

Gartner Data and Analytics Summit kicks off next week in Grapevine, TX where SingleStore CTO Nikita Shamgunov will showcase the power of real-time for Chief Data Officers who are faced with transitioning their data warehouse applications to the cloud.
Read Post
Guest Post: Real-Time Big Data Ingestion with Meterial
Case Studies

Guest Post: Real-Time Big Data Ingestion with Meterial

This post originally appeared on the Myntra Engineering Blog. Learn how Myntra gained real-time insights on rapidly growing data using their new processing and reporting framework. Background I got an opportunity to work extensively with big data and analytics in Myntra. Data Driven Intelligence is one of the core values at Myntra, so crunching and processing data and reporting meaningful insights for the company is of utmost importance. Every day, millions of users visit Myntra on our app or website, generating billions of clickstream events. This makes it very important for the data platform team to scale to such a huge number of incoming events, ingest them in real time with minimal or no loss, and process the unstructured/semi-structured data to generate insights. We use a varied set of technologies and in-house products to achieve the above, including Go, Kafka, Secor, Spark, Scala, Java, S3, Presto and Redshift.
Read Post
Forrester
SingleStore Recognized In

The Forrester WaveTM

Translytical Data
Platforms Q4 2022

SingleStore at Spark Summit East 2017
Trending

SingleStore at Spark Summit East 2017

Last week we announced the release of the SingleStore Spark 2 Connector with support for both Apache Spark 2.0 and 2.1. At Spark Summit Boston East 2017 next week we will showcase our new connector that operationalizes powerful advanced analytics. February 7-9 John B. Hynes Convention Center 900 Boylston Street, Boston, MA 02115 https://spark-summit.org/east-2017/ SingleStore CTO and Co-founder, Nikita Shamgunov and product manager, Steven Camiña will also deliver the following talks at the conference. Spark Summit East 2017 SingleStore Speaker Sessions
Read Post
SingleStore Meetups 2016: Year in Review
Product

SingleStore Meetups 2016: Year in Review

We love hosting meetups at SingleStore headquarters. Meetups give us a hands-on way to share what we have been working on, connect with the community, and get in-person feedback from people like you. Last year we hosted meetups spanning topics from building real-time data pipelines to powering geospatial dashboards. We wanted to share a few highlights from our most memorable meetups in 2016. SingleStore Meetups 2016: Year in Review January 21, 2016 – Building Real-Time Digital Insight at Macys.com Featuring Chandan Joarder, Principal Engineer of Macys.comShared lessons learned building a real-time insight applicationMacy’s innovative two-pronged approach to data exploration and visualization incorporates packaged tools like Tableau and custom BI dashboards
Read Post
SingleStore and Oracle: Better Together
Trending

SingleStore and Oracle: Better Together

Oracle OpenWorld 2016 kicks off on September 18th in San Francisco with ten tracks, including a Data Center track highlighting innovation in databases including SingleStore and Oracle. We built SingleStore to be a flexible ecosystem technology, as exemplified by several features. First, we offer users flexible deployments – whether it’s hybrid cloud, on-premises, VMs, or containers. Second, our connector tools are open source, such as SingleStore Spark Connector and Streamliner, which lets you build real-time pipelines and import from popular datastores like HDFS, S3, and MySQL. And SingleStore is a memory-first engine, designed for concurrent data ingest and analytics. These ingredients make SingleStore a perfect real-time addition to any stack. Several of our customers combine SingleStore with traditional systems, in particular Oracle databases. SingleStore and Oracle can be deployed side by side to enhance scalability, distributed processing, and real-time analytics. Three Ways SingleStore Complements Oracle SingleStore as the Real-Time Analytics Engine Data can be copied from Oracle to SingleStore using a data capture tool, and analytical queries can be performed in real time.
Read Post
Seven Talks You Can’t Miss at Gartner Catalyst 2016
Trending

Seven Talks You Can’t Miss at Gartner Catalyst 2016

The 2016 Gartner Catalyst Conference kicks off on August 15-18 in San Diego and will feature over 150 sessions for technical professionals. The conference offers eight in-depth tracks on topics including data centers, data and analytics, security, software, mobility, cloud, digital workplaces, and the Internet of Things. Book an in-person SingleStore demo at Gartner Catalyst ⇒ singlestore.com This year, Gartner has chosen the following theme to guide the experience at Catalyst: Architecting the On-Demand Digital Business. Each track at the show reinforces the importance of embracing modern architecture in order to sense, adapt, and scale businesses for long-lasting impact. There will be valuable opportunities to hear directly from leading analysts, who spent months researching and analyzing industry trends. Here are six sessions and a SingleStore speaking session that we recommend to stay on top of real-time data trends: From Data to Insight to Action: Building a Modern End-to-End Data Architecture Monday, 15 August 2016, 9:30 AM – 10:15 AM Carlie J. Idoine, Research Director, Gartner @CarlieIdoine For years, IT organizations have been dealing with a steady rise in the volume, velocity and variety of data. But now, unprecedented new data sources, such as IoT, are pushing infrastructures to the limit. This session defines a bold strategy and highly-scalable data management architecture built upon technologies such as cloud computing, predictive analytics and machine learning that scale, respond automatically, and unlock enormous business value.
Read Post
Should You Use a Rowstore or a Columnstore?
Engineering

Should You Use a Rowstore or a Columnstore?

The terms rowstore and columnstore have become household names for database users. The general consensus is that rowstores are superior for online transaction processing (OLTP) workloads and columnstores are superior for online analytical processing (OLAP) workloads. This is close but not quite right — we’ll dig into why in this article and provide a more fundamental way to reason about when to use each type of storage. Background One of the nice things about SQL-based databases is the separation of logical and physical concepts. You can express logical decisions as schemas (for example, the use of 3NF) and SQL code (for example, to implement business logic), and for the most part avoid thinking about physical implementation details or runtime considerations. Then, based on what your workload is trying to do, you can make a series of physical decisions which optimize performance and cost for your workload. These physical decisions include where to put indexes, what kind of indexes to use, how to distribute data, how to tune the database, and even which database product to use (if you use ANSI SQL). Importantly, making physical decisions does not require changing SQL code. Until recently, indexes were almost always backed by B-Trees and occasionally hash tables. This started to change when Sybase IQ and then more popularly C-Store/Vertica hit the market and provided incredible cost-savings and performance for data-warehouse workloads with columnstore indexes. Columnstores have hit the mainstream and are the primary storage mechanism in modern data-warehouse technology (e.g. Redshift, Vertica, HANA) and are present in mainstream databases (Oracle, SQL Server, DB2, SingleStore). Nowadays, one of the key physical decisions for a database workload is whether to use a rowstore or columnstore index. Performance Considerations Let us frame the discussion about when to use a rowstore or columnstore by boiling down the fundamental difference in performance. It’s actually quite simple: Rowstores are better at random reads and random writes.Columnstores are better at sequential reads and sequential writes. Feeling déjà vu? This is a fairly familiar concept in computer science, and it’s pretty similar to the standard tradeoff between RAM and disk. This reasoning also obviates several myths around rowstores and columnstores: Is a RAM-based rowstore faster than a disk-based columnstore? Not necessarily — if the workload has sequential reads (e.g. an analytical workload with lots of scans) a columnstore can be significantly faster.Are writes slow in a columnstore? Not necessarily — if the writes are mostly ordered and you don’t need to run updates, then a columnstore can be as fast or even faster to write into than a rowstore, even for relatively small batches.Are columnstores bad at concurrent writes? It depends on the type of disk. Both rotational and solid-state disks are good at sequential writes, but solid-state disks tend to be significantly faster to write into concurrently; therefore, columnstores running on SSDs can be very fast at concurrent writes. Rowstores for Analytics, Columnstores for Transactions Let’s look at a few use cases which violate the common belief that rowstores are superior for transactions and columnstores are superior for analytics. These are based on workloads that we’ve seen at SingleStore, but these observations are not specific to SingleStore. In analytical workloads, a common design choice is whether to append (aka log-oriented insertion) or update/upsert. For operational analytics, the upsert pattern is especially common because by collapsing overlapping rows together with an update, you partially-aggregate the result set as you write each row, making reads significantly faster. These workloads tend to require single-row or small-batch random writes, so a rowstore is a significantly better choice as columnstores can’t handle this pattern of writes at any reasonable volume. As an aside, the read pattern is often still scan-oriented, so if the data were magically in a columnstore, reads could be a lot faster. It still makes sense to use a rowstore, however, because of the overwhelming difference in write performance. Another example of rowstores in analytical workloads is as dimension tables in a traditional star schema analytics workload. Dimension tables often end up on the inside of a join and are seeked into while scanning an outer fact table. We’ve seen a number of customer workloads where we beat columnstore-only database systems simply because SingleStore can back dimension tables with a rowstore and benefit from very fast seeks (SingleStore even lets you use a lock-free hashtable as a type of rowstore index, so you have a pre-built hash join). In this case, rowstores are the superior choice because dimension tables need to be randomly, not sequentially, read. Finally, columnstores can be used for transactional workloads as well, in particular workloads that are computationally analytic but have operational constraints. A common use case in ad-tech is to leverage a dataset of users and groups (which users belong to) to compute overlap on-demand, i.e. the number of users who are in both Group A and Group B. Doing so requires scanning every row for all users in both Group A and Group B, which can be millions of rows. This computation is significantly faster in a columnstore than a rowstore because the cost of executing the query is dominated by the sequential scan of user ids. Furthermore, sorting by group id not only makes it easy to find the matching user ids but also to scan them with high locality (since all user ids in a group end up stored together). With SingleStore, we were able to get this query to consistently return within 100 ms over 500 billion rows stored in the columnstore. When your workload is operational and fundamentally boils down to a sequential scan, then it can run significantly better in a columnstore. As a side benefit, columnstores offer exceptional data compression so this workload has a relatively small hardware footprint of less than ten nodes on Amazon and fit in SSD. Because these workloads bleed the traditional definitions of OLTP and OLAP, they are often referred to as Hybrid Transactional/Analytical Processing (HTAP) workloads. Conclusions and Caveats The main takeaway is pretty straightforward. Rowstores excel at random reads/writes and columnstores excel at sequential reads/writes. You can look at almost any workload and determine which bucket it falls into. Of course, there are still a few caveats: If you have both random and sequential operations, it’s usually better to go with a rowstore. Rowstores tend to be better at sequential operations than columnstores are at random operations. However, if storage space is a primary design constraint, you may want to still consider using a columnstore.Some databases, namely SingleStore and Oracle, have hybrid tables with both rowstore and columnstore functionality. In SingleStore, there is a small rowstore sitting alongside every columnstore table that is used for small-batch writes (e.g. singleton inserts and updates). Oracle redundantly stores rowstore data in an in-memory columnstore to transparently speed up reads which would be faster against column-oriented data (see A Case for Fractured Mirrors). Of course, this comes at the expense of write performance since you have to pay the cost of inserting into both stores every time, rather than whichever is cheapest.Sorting in columnstores is a big deal, especially for workloads with tight requirements on performance, because it enables consistent performance with low latency. Generally, commercial columnstores (Redshift, Vertica, SingleStore) support sorting but open source and SQL-on-Hadoop databases do not (Kudu is a notable exception). Thanks to Gary Orenstein, Andy Pavlo, and Nikita Shamgunov for reviewing drafts of this post. This is a repost of an article by Ankur Goyal, VP of Engineering, published on Medium ⇒
Read Post
The Real-Time Track at Gartner BI
Trending

The Real-Time Track at Gartner BI

SingleStore is headed to Gartner Business Intelligence and Analytics Summit next week. We’ll be focusing on real-time analytics with our own VP of Engineering kicking off a live demo on day one of the show. For the following days, we’ll be tracking the hot topics in real-time analytics, the new Magic Quadrant for Data Warehouses, IoT, Spark, Relational Databases, In-Memory Computing, and Machine Learning
Read Post
SingleStore Meetups - Year in Review
Trending

SingleStore Meetups - Year in Review

It has been six months since we began hosting meetups regularly at SingleStore. Our office is located in the heart of SoMa, two blocks from the Caltrain station. At the new San Francisco epicenter of tech startups, we want to meet our neighbors and see what other cool technologies are out there! What better way than over cold brews, local pizza and deep tech talks. In honor of the first official meetup of 2016, we decided to take a look back at the meetups of 2015, and share highlights from each one. Hope to see you at 534th St on January 21st for an intimate conversation with Chandan Joarder on Building Real-Time Digital Insight at Macys.com! RSVP for our next meetup: Building Real-Time Digital Insight at Macys.com Without further ado, we present Meetups: A Year in Review.
Read Post
The Benefits of an In-Memory Database
Data Intensity

The Benefits of an In-Memory Database

Our CTO and co-founder Nikita Shamgunov recently sat down with Software Engineering Daily. In the interview, Nikita focused on the ideal use cases for an in-memory database, compared to a disk-based store, and clarified how SingleStore compares to MySQL. In this post, we will dig deeper into how we define the ‘in-memory database’ and summarize its benefits. What is SingleStore? SingleStore is a high-performance in-memory database that combines the horizontal scalability of distributed systems with the familiarity of SQL. How do you define an ‘in-memory database’? An in-memory database, also known as a main memory database, can simply be defined as a database management system that depends on main memory (RAM) for computer data storage. This is in contrast to traditional database systems, which employ disk-based storage engines. The term ‘in-memory’ is popular now, but it does not tell the whole story of SingleStore. Our preferred description is ‘memory first’, which means RAM is used as a first class storage layer – you can read and write directly to and from memory without touching the disk. This is opposed to “memory only,” which does not incorporate disk as a complimentary storage mechanism. What type of data lends itself well to an in-memory database? If you need ultra fast access to your data, store it in an in-memory database. Many real-time applications in the modern world need the power of in-memory database structures. There are several critical features that set in-memory databases apart. First, all data is stored in main memory. Therefore, you will not have to wait for disk I/O in order to update or query data. Plus, data is loaded into main memory with the help of specialized indexing data structures. Second, data is always available in memory, but is also persisted to disk with logs and database snapshots. Finally, the ability to read and write data so quickly in an in-memory database enables mixed transaction/analytical and read/write workloads. As you can see, in-memory databases provide important advantages. They have the potential to save you a significant amount of time and money in the long run. For more information, listen to Nikita’s complete podcast: http://traffic.libsyn.com/sedaily/memsql_nikita_2.mp3 Test the magic of in-memory for yourself, by downloading a 30-day Enterprise Edition trial or free forever Community Edition of SingleStore at singlestore.com/free.
Read Post
What’s Hot and What’s Not in Working With Data
Trending

What’s Hot and What’s Not in Working With Data

Data is often considered to be the foundation of many global businesses. Data fuels daily operations, from customer transactions to analytics, from operations to communications. So we decided to answer the question: what’s hot and what’s not in working with data today? HOT: Letting your database be a database Databases were constructed to store data. However, sometimes applications are used to store data itself, a result of legacy database limitations. Storing data in an application makes it hard to update that application or to extract value from that data easily. By using a database for its intended function, developers can easily make changes to an  application, without affecting the data. This can save time and money in the long run. NOT: Adding SQL to your NoSQL database, but not calling it SQL SQL, or Structured Query Language, is the lingua franca for working with data and therefore a convenient tool for managing or analyzing data in a relational database. As a result, SQL is experiencing a renaissance. Many NoSQL databases now realize the value of SQL and SQL-like features such as JOINs. They are making hasty attempts to integrate SQL into their offerings, without acknowledging the gaps. HOT: Giving a dash of structure to your data Rather than spending your days wrangling unstructured data, providing some structure to your data upfront improves your ability to put that data to use down the road. Time is of the essence when it comes to most applications, and a little structure goes a long way for enabling real-time applications. Real-time stream processing frameworks like Apache Spark make it possible to add structure to data on the fly, so it is ready to be queried as soon as it lands in the database. HOT: Putting your data in RAM If data is made easily accessible, data locality will increase. Hoping your dataset fits in RAM is not a strategy – a strategic decision to ensure  data is in RAM improves the efficiency of applications that sit on the top of a database. NOT: Calling your database representative for scaling support Instead of calling your traditional database representative for scaling support, just add nodes with more flexible databases to achieve scale out. Adding nodes increases the speed of data processing. For example, with SingleStore you can add additional nodes while the cluster remains online. HOT/NOT: Knowing what is/knowing what was People are interested in staying up to date with the latest data processing techniques. Knowing what works for the present reality is more important that sticking with trends of the past. Real-time analytics will pave the way forward for business – it will reveal the path forward and ensure data does not remain trapped in dark corners. If you work with with databases or data, understanding the hot topics at present will save you from having to do battle with your data as you build applications, scale and innovate for your companies and yourself.
Read Post
Essential Resources for Apache Spark
Case Studies

Essential Resources for Apache Spark

There’s no doubt about it. Apache Spark is well on its way to becoming a ubiquitous technology. Over the past year, we’ve created resources to help our users understand the real-world use cases for Spark as well as showcase how our technologies compliment one another. Now, we’ve organized and consolidated those materials into this very post. Videos Pinterest Measures Real-Time User Engagement with Spark\ Demo of real-time data pipeline processing and analyzing re-pins across the United States.
Read Post
Join SingleStore at ad:tech San Francisco
Trending

Join SingleStore at ad:tech San Francisco

Read Post
Real-Time Stream Processing Architecture with Hadoop and SingleStore
Data Intensity

Real-Time Stream Processing Architecture with Hadoop and SingleStore

While SingleStore and Hadoop are both data stores, they fill different roles in the data processing and analytics stack. The Hadoop Distributed File System (HDFS) enables businesses to store large volumes of immutable data, but by design, it is used almost exclusively for batch processing. Moreover, newer execution frameworks, that are faster and storage agonistic, are challenging MapReduce as businesses’ batch processing interface of choice. Lambda Architecture A number of SingleStore customers have implemented systems using the Lambda Architecture (LA). LA is a common design pattern for stream-based workloads where the hot, recent data requires fast updates and analytics, while also maintaining long-term history on cheaper storage. Using SingleStore as the real-time path and HDFS as the historical path has been a winning combination for many companies. SingleStore serves as a real-time analytics serving layer, ingesting and processing millions of streaming data points a second. SingleStore gives analysts immediate access to operational data via SQL. Long-term analytics and longer running, batch-oriented workflows are pushed to Hadoop. Use Case: Real-Time Analytics at Comcast As an example, SingleStore customer Comcast focuses on real-time operational analytics. By using SingleStore and Hadoop together, Comcast can proactively diagnose potential issues from real-time intelligence and deliver the best possible video experience. Their Lambda architecture writes one copy of data to a SingleStore instance and another one to Hadoop.
Read Post