How can big data and machine learning be used for good?
Updated Dec. 12, 2019, including adding a transcript. This blog post discusses the use of SingleStore as a database for machine learning and artificial intelligence. See our recent Thorn case study for the latest on SingleStore’s work with Thorn, including Thorn’s use of SingleStore Managed Service.
In our keynote at Strata+Hadoop World, SingleStore CEO Eric Frenkiel shared how we are working with Thorn to provide a new approach to machine learning and real-time image recognition to combat child exploitation.
About Thorn
Thorn partners across the technology companies and government organizations to combat predatory behavior, rescue victims, and protect vulnerable children.
Thorn has to sift through a massive amount of images daily. Images are processed using facial recognition, then classified and de-duplicated, and ultimately matched against millions of open web images. If an image match can be found faster, victims of trafficking and sexual exploitation can be helped faster.
With SingleStore, Thorn is able to accelerate their image recognition workflow. New vectors representing a face can be inserted and queried in real-time. This allows analyst to find image matches faster and improve law enforcement response times.
For more detail on our work with Thorn, see our recent Thorn case study. You can also watch our recorded keynote and download the slides below. In addition, we have a Q&A blog post from the engineers behind the application:




Lightly edited transcript of the video above, which describes the use of SingleStore as a database for AI and machine learning, at Thorn in particular. Eric Frenkiel speaking…
Well, good morning. I’m very excited to be here to be talking about machines and the magic of fast learning. Let me go ahead and kick off the slide show in a moment, and we can talk about what we’re going to be focusing on today: using a large corpus of data, and really referencing applications as actors and data scientists as operators.
This is now possible because data scientists are actually operationalizing data models and creating a synergistic feedback loop with machine learning that lets applications move faster, data scientists get more precise data, and make a lot of great things in a business. What we’ve seen is that there is a greater precision possible with this type of positive feedback loop. In addition, businesses can now tune what they’re doing in real time. And lastly, to get ahead of the game, it’s all about discovery, and the ability to use these same machine learning models and look ahead as to what will happen next.
But I’m here this morning to talk about a very important issue for us: what can be done to help nonprofits working in big data. And in particular, we’re working with Thorn, which is a nonprofit dedicated to eradicating the exploitation of children on the internet today. I’m regretful to share some of these numbers, because they’re downright shocking and very, very serious. In the same way that we’ve seen big data growth, there’s been more than 5,000% increase in images of sexually exploited children online since 2007. And the fact is, nearly half of all victims meet their perpetrators online first.
Thorn has a very, very important mission. It’s about creating big data capability that lets Thorn move faster and more effectively to find these children and return them to their families. Thorn has to sift through a vast amount of data daily. More than 100,000 ads are posted on the public web alone, each day, in the United States.
When you look at how this has to happen, they’re using machine learning, with facial recognition, to provide an ability to understand what is going on in any given image that is posted onto the public web.
This happens by taking a raster to vector conversion, creating a point map of a given face. Using more than 5,000 data points for a given face, they can deduplicate images, then they can classify a given image and correlate and match it. All to help law enforcement find that child and return them to their family.
This is literally a needle in a haystack type of problem. When you convert this, you are literally sifting through multiple millions of images to identify this victim. And they have to do this every single time to make sure that they don’t miss any given child.
SingleStore worked with Thorn to add a new capability that lets them actually process this at a thousand-fold improvement time. By adding a vector dot product operation into the database, and going directly to Intel’s AVX2 SIMD instruction set, you can now fully saturate a processor’s pipeline and effectively do more floating point operations in a given cycle to make this operation faster. And in terms of what this means, this is the ability to take what would be a positive match from 20 minutes down to 200 milliseconds. (A 100-fold improvement – Ed.)
So when we talk about real-time data, it’s important to remember that it can have very, very important real world impact. And as a matter of fact, we have members of the Thorn team in the audience today, and if you would join me in thanking them for what they’ve done. Alone, in 2016, they found more than 2,000 children. So, if we can give these guys a round of applause.
It’s incredibly important to be able to help big data challenges, both in commerce and in nonprofits. And, for me, this is one of the most important things I’ve been able to say. How amazing it is to help Thorn achieve this mission.
But to take a step back and to learn how we can apply this in other areas. There are so many different datasets that are purely image-based, for which you can leverage machine learning to gain insights. Whether it’s mapping, social imagery, or even forms that have handwriting. The ability to actually apply machine learning at scale accelerates a business’s ability to operate.
We’ve seen a number of our end users engaged with TensorFlow to provide a lot of this machine learning, and it is a phenomenal framework. You can use Hadoop as a phenomenal data store, to store lots and lots of this blob data, this image data.
But the real question becomes, how can we actually accelerate that? And it actually turns into a Lambda architecture type of use case, where you use a message queue like Kafka to fork the data. Data can flow to the permanent data lake, Hadoop, and also to a real-time, in-memory processing system with TensorFlow, in a data store like SingleStore, which has high speed vector algebra built-in. This is then able to be rendered out to that given model or application.
Later today we actually have a few customers talking about how they’re using real time analytics at scale. Both Uber for business intelligence, and how Macy’s is actually using SingleStore for real-time dashboarding with click-stream analysis. You can also join us later today to pick up a book, which goes into detail about this type of learning, called The Path to Predictive Analytics and Machine Learning, published by O’Reilly.
You too can take advantage of the ease of use, ease of management, and reliability of SingleStore Managed Service. Use SingleStore for free or contact SingleStore today.