This blog is a recap of some of the highlights of an information-packed webinar that feels more like an entertaining talk show. Fast Analytics in Streaming Media, stars Mark Hashimoto, software engineering manager at WhatsApp and SingleStore’s Domenic Ravita.
Taming the Wild West: Fast Analytics in Streaming Media
If you watched a lot of Netflix and other streaming content last year during lockdown, you’re not alone. A recent study by Convivas reported that streaming volumes in Q4 2020 were up 44% over Q4 2012! Explosive growth in viewership has fueled a similar frenzy on the back end, sending content service providers scrambling to satisfy their customers’ every viewing whim.
“Let's face it, when we're looking at TV, and we're trying to find a show to watch, we really want to find something within ten to 30 seconds,” says Mark Hashimoto, software engineering manager at WhatsApp and a guest on SingleStore’s latest industry webinar, “Fast Analytics in Streaming Media.” “Multiply that by billions of people across the globe, searching for entertainment,” he said, and you can begin to see why the unmitigated data analytics challenge really is “the Wild West.” This blog is a recap of some of the highlights of the information-packed webinar, which, when you watch it, feels like an entertaining talk show—without the guilt!
Defining data intensity
Tech TV presenter Lisa Martin kept the program moving at a brisk pace, volleying conversation gambits between Mark, also a former engineer at Comcast, and SingleStore field CTO Domenic Ravita as the group rolled through the fun Wild West graphic (above) that set the structure for the webinar. In setting up the analytics challenge that streaming media providers face, Domenic gave a quick definition of data intensity.
“In physics, power applied over a given surface area is a measure of intensity. If we take that analogy and look at data intensity, it's that concept of the processing power you need to exert in a tight timeframe on lots of different data surface areas. Data intensity adds a multiplier factor; what is the size of the data, and how fast is changing? What's the level of concurrency needed? What's the complexity of the queries? And what are the consistent response times expected? All those things, together, are what we call data intensity.”
Mark then took the concept to the next level, adding in the consumer context.
“The consumer is faced with the challenge of finding great content that's hopefully personalized to them, and will resonate with them. The first part of the discovery portion is really data intensive. It’s based on what that individual has watched or what their interests are. From the streaming media company’s perspective, the challenge is how to provide those pure magical moments in the user journey to say, “Here's a great show that we think will really captivate, and that you should watch.”
He went on to explain the latest angle on personalization, as vaccinated people start to gather together again. It's right now starting on a very personal level, like you, Lisa or Dominic. “Let’s say all three of us want to watch a show together,” Mark theorized with Lisa and Domenic. “What are some of the shows that all three of us would like that we haven't seen before? That's really where the industry is going; the data, the infrastructure and the user experience all have to come together to really resonate with the customer.”
Legacy data infrastructure is inadequate
Not surprisingly, the streaming media industry’s infrastructure isn’t well suited for data intensive analytics. “One thing you should know about traditional infrastructure,” Mark said, “is that it’s really good for more batch processing, not for on-demand. The infrastructure has to be very malleable to handle spikes of traffic, if you have a hit show that goes viral.
“The East Coast comes online at about 7:30 PM. The wave then rolls across the country, and around the world,” he explained. “Traditional infrastructure doesn't handle bursts of load very well. So, in today's infrastructure, you really need elastic computing. That's why you really want to have a modern cloud infrastructure, to provide the magical experience you want customers to have.”
Domenic quickly picked up the thread from a database perspective.
“We are, by my measure, roughly 20 years into the cloud data era. Back in the late 90s and early 2000s, we used existing single node databases like Oracle and MySQL to build the first-generation Yahoos and eBays of the world. The problem with that is, to get the scale needed for reads and writes, it's like retrofitting a bicycle with the engine of a car. It just doesn't fit. Patching is difficult, and you get a lot of data duplication and inefficient use of hardware.”
He went on to explain that second-era cloud data systems ushered in the speed and scale that earlier systems couldn't deliver, and in a more efficient way. However, “you had to give up something—the power of relational semantics,” Domenic said. “Basically, you gave up on letting the database do a lot of the work for you. With these second-era cloud data systems, you've got to write that logic in your code; you've got to know how to join together the data between customers, product, price and so forth. As an industry we gained speed and scale, but we gave up on SQL.”
Domenic then brought the narrative up to the present.
“The third era of systems that have been around the last few years—and SingleStore is among this class of new, modern data infrastructure systems—offers the speed and scale of the preceding NoSQL era, but with relational SQL. The core foundation of modern systems is a scale-out relational data tier. When you think about use cases that Mark provided—such as matching machine learning outputs, recommendations, getting analytics on that, and serving customers shows—this approach both lowers latency and improves the customer experience.”
But wait, there’s more!
Clearly, there’s a lot to tell in the story of fast analytics in streaming media; this blog post captures just the first ten minutes of the webinar! Watch the remaining 35 minutes to learn about: