Author

Mason Hooten
Digital Marketing Associate

Data Intensity
Video: Scoring Machine Learning Models at Scale
At Strata+Hadoop World, SingleStore Software Engineer, John Bowler shared two ways of making production data pipelines in SingleStore:
**1) Using Spark for general purpose computation
Through a transform defined in SingleStore pipeline for general purpose computation**
In the video below, John runs a live demonstration of SingleStore and Apache Spark for entity resolution and fraud detection across a dataset composed of a hundred thousand employees and fifty million customers. John uses SingleStore and writes a Spark job along with an open source entity resolution library called Duke to sort through and score combinations of customer and employee data.
SingleStore makes this possible by reducing network overhead through the SingleStore Spark Connector along with native geospatial capabilities. John finds the top 10 million flagged customer and employee pairs across 5 trillion possible combinations in only three minutes. Finally, John uses SingleStore Pipelines and TensorFlow to write a machine learning Python script that accurately identifies thousands of handwritten numbers after training the model in seconds.
Read Post

Case Studies
Video: Real-Time Analytics at UBER Scale
We’ve created an updated version of this blog post with much more detail. – Editor
At Strata+Hadoop World, James Burkhart, technical lead on real-time data infrastructure at Uber, shared how Uber supports millions of analytical queries daily across real-time data with Apollo, Uber’s internal analytics querying language.
James covers architectural decisions and lessons learned from building an exactly-once ingest pipeline that captures raw events across in-memory row storage and on-disk columnar storage. He also details how Uber uses a custom metalanguage and query layer by leveraging partial OLAP result set caching and query canonicalization. Putting all the pieces together provides thousands of Uber employees with subsecond p95 latency analytical queries spanning hundreds of millions of recent events.
Video and Slides:
Read Post

Data Intensity
Video: Building the Ideal Stack for Real-Time Analytics
Building a real-time application starts with connecting the pieces of your data pipeline.
To make fast and informed decisions, organizations need to rapidly ingest application data, transform it into a digestible format, store it, and make it easily accessible. All at sub-second speed.
A typical real-time data pipeline is architected as follows:
Application data is ingested through a distributed messaging system to capture and publish feeds.A transformation tier is called to distill information, enrich data, and deliver the right formats.Data is stored in an operational (real-time) data warehouse for persistence, easy application development, and analytics.From there, data can be queried with SQL to power real-time dashboards.
As new applications generate increased data complexity and volume, it is important to build an infrastructure for fast data analysis that enables benefits like real-time dashboards, predictive analytics, and machine learning.
At this year’s Spark Summit East, SingleStore Product Manager, Steven Camina shared how to build an ideal technology stack to enable real-time analytics.
Video: Building the Ideal Stack for Real-Time Analytics
Read Post