Thorn’s child sex trafficking investigations tool, Spotlight, gathers information from escort sites to provide law enforcement with a tool to help find trafficked children, fast. (A Special Agent for the Wisconsin Human Trafficking Task Force describes Spotlight this way: “It is the greatest tool we have in the fight against human trafficking.“) And using SingleStore is one of the ways they do it. SingleStore is a powerful solution that meets Thorn’s requirements, including SQL support; fast query response time; support for machine learning and AI; a large and scalable number of simultaneous users; and horizontal scale-out. Also, SingleStore runs just about anywhere, notably including on-premises installations and all the major public clouds.
Still, Thorn had a business problem. As a tech non-profit, they are highly skilled at identifying and making tradeoffs that will allow their small team to deliver the biggest impact. In a constantly shifting digital environment, they know they need to focus on keeping Spotlight agile, to help find victims faster. So they need to keep the operation and maintenance of Spotlight as simple and easy to manage as possible.
In support of this strategy, Thorn is moving to SingleStoreDB Cloud, the fully managed, on-demand, and elastic cloud database from SingleStore. Where SingleStoreDB Self-Managed 7.0 meets Thorn’s database needs, SingleStoreDB Cloud meets Thorn’s operational needs – removing work from Thorn’s development and operations personnel, and leaving it in the hands of SingleStore.
Peter Parente, data engineer at Thorn, puts it well: “We want to focus our time on building the application for our mission, rather than managing every detail of exactly how the data is going to be stored.” Now, Thorn can focus on growing Spotlight to meet the needs of its users and fulfill its mission: to build technology to defend children from sexual abuse.
What Thorn Delivers
As Thorn describes it, new technologies can be used by abusers to facilitate abuse – and, thankfully, the same new technologies can be leveraged to stop this abuse. Thorn leverages data to find trafficked children faster, building technology to create a world where every child can be safe, curious, and happy.
There are more than 150,000 escort ads posted daily across the US, totaling in the millions of ads a year – and, somewhere in that mountain of data, children are being sold for sex. Thorn’s research shows that 63% of child sex trafficking survivors were advertised online at some point. Harnessing that data, Spotlight is offered for free to users who are involved in actively investigating child sex trafficking cases.
When Thorn started several years ago, they only focused on a few problematic sites and online sources. Now, the number of sites with child sex trafficking content is increasing, and the user base for Spotlight has grown. Thorn is a strong example of the need that so many organizations have for nearly limitless scalability and concurrent access.
“As time passes, we have greater data complexity. More data to store and more users that need to analyze that data,” says Parente. “There are more sites, and some of the sites have added features that increase the data flowing in from them as well.”
But, even as the demands increase, so does Thorn’s effectiveness. Thorn has huge impact. Spotlight has been very successful, helping to identify over 10,000 trafficked children. On average, eight children a day are identified with it. And Thorn is proud of having sped up law enforcement investigation time, by as much as 63% – that is, they’ve cut time for investigations by nearly two-thirds. (Thorn also educates people on these topics; more than 3.5 million teens have learned to identify and prevent sextortion – extortion focused on nude images of the victim – through Thorn projects.)
To work effectively, such a system needs to meet a number of technical requirements:
In addition, Thorn identified two business-oriented requirements, to allow them to fulfill their specific mission most effectively:
How SingleStoreDB Cloud Helps Thorn Succeed
Thorn built a tool that meets all their requirements:
SingleStoreDB Cloud sits at the heart of the system. “It’s currently our primary data store,” according to Parente.
Using Machine Learning and AI to Facilitate Identifications
Thorn uses SingleStore’s Euclidean distance function for computing image similarity, resulting in very high throughput rates for image comparisons. The process is described in detail in this blog post from SingleStore co-CEO Nikita Shamgunov: SingleStore as a Data Backbone for Machine Learning and AI.
The slide below shows the use of this function. Thorn has previously worked with SingleStore on advances in machine learning for image recognition.
Using Amazon SQS as a Data Pipeline
Thorn uses Amazon S3 and SQS as the input source for their data pipeline. Many other SingleStore customers have used Kafka in similar situations. (We recently published a case study featuring the Kafka-plus-SingleStore architecture from a major technology services company.) But Thorn finds Amazon SQS easier to maintain and manage.
According to Parente, “Our data is not currently delivered to S3 in a streaming fashion. It’s more a set of micro batches. We don’t currently have a need for the streaming support that you typically see associated with Kafka.”
“We rely on SQS to provide us with the notifications we need, as data is delivered into our S3 buckets,” continues Parente. “When we receive a notification, our data pipeline runs a set of machine learning models and natural language processing annotators before storing the results in SingleStore for use by our application.”
SingleStoreDB Cloud Helps Thorn Achieve Statelessness
Why has Thorn chosen SingleStoreDB Cloud, rather than SingleStoreDB Self-Managed, which they could install and run on AWS themselves? The main reason is to focus their technical resources on other areas. Every hour saved in database administration is an hour freed up for work that will speed up an investigator’s process, providing timely insights and aggregating information across time and space to find child victims faster.
The features of SingleStoreDB Cloud lend themselves to Thorn’s needs. Thorn has designed their system in such a way as to offload software maintenance and management to the greatest degree possible, using Kubernetes as their management tool for most of the system, and SingleStoreDB Cloud – which is built on Kubernetes, and managed using it – as their core database.
Kubernetes was originally developed for stateless services, and Thorn built their data pipeline (above) to be as stateless as possible. Parente says, “The pipeline workers are all stateless. If we fail processing some input data, the pipeline simply retries the input from S3 at some point in the future. Our processing is idempotent.”
More recently, Kubernetes has added features for managing stateful software. To make these features work, stateful software such as SingleStore (or any database) requires a Kubernetes Operator, which serves as an interface between the database and Kubernetes. SingleStore has created a Kubernetes Operator and uses it for managing SingleStoreDB Cloud. SingleStore customers are also using this Operator in their own development efforts.
Thorn could have used the SingleStore Operator to integrate SingleStoreDB Self-Managed into their Kubernetes management framework. Instead, they chose SingleStoreDB Cloud. “So in some way,” Parente continues, “the indirect answer to the question, ‘Are we depending on the stateful features of Kubernetes?,’ is ‘Yes – but indirectly, through SingleStoreDB Cloud.’” Thorn maintains their stateless management framework by leaving the management of stateful software – their SingleStore database – to SingleStore, the company, through SingleStoreDB Cloud.
“One of the reasons we’re using SingleStoreDB Cloud is to offload having to manage that stateful data store,” continued Parente. “If we weren’t using SingleStoreDB Cloud, and instead hosting our own database, we would be responsible for scaling it on Kubernetes, making sure data is retained after nodes restart, repartitioning data to take advantage of new nodes, and so on.”
Thorn defers to other industry-leading experts for its other data store. “For S3, Amazon is managing the complexity,” says Parente. “The files are written, and then we assume that S3 works as advertised.”
The same questions arise for both technologies: “Are we sure it’s backed up? Is it going to scale? We want to offload that onto other vendors, including AWS and SingleStore. That’s time better spent for our mission-oriented work. We focus more on how we build out our system, or surface the processed information to our users in the best available fashion.”
This approach allows Thorn to work more closely with their users, improve the system to meet user needs, and get data out to them in the way they need it, in the formats and with the timeliness they need to prioritize the identification of child sex trafficking victims.
As Julie Cordua, CEO of Thorn, has said: “SingleStore is delivering a real impact for our organization by making real-time decisions and predictive analytics easier. And, because it easily scales to support our machine learning and AI needs, SingleStore helps us continually build better tools to find victims of trafficking and sexual abuse, faster. It is a true case of technology being applied in a way that will make a real difference in people’s lives.”