Over the past year, the pandemic has drastically altered the educational landscape. 55 million K-12 students in the United States suddenly found themselves in online or hybrid learning environments. Educators needed new ways of managing their classrooms, keeping students engaged, and tracking their performance.
Some school districts took an ad hoc approach to the software needed to support digital learning. Educators had to learn and log in to multiple applications, adding complexity and inefficiency to an already harrowing time. These solutions also often fell under general-purpose categories and were not always well-suited for the education market.
GoGuardian Empowers Educators in Digital Learning Environments
GoGuardian is a unified education technology platform specializing in moderating student web activities by using machine learning to facilitate a better learning environment. The software suite combines dynamic data inputs to calibrate student engagement and help educators draw conclusions about their class sessions. They can access all of their filtering, classroom management, student safety, and device inventory needs from a single platform.
The individual products included in the GoGuardian suite include:
Giving School Districts and Educators Everything They Need for Online Learning Environments
GoGuardian covers a lot of ground when it comes to managing digital education, and with that comes complex requirements to make everything function as intended. The platform processes events from more than five million students daily.
Handling this event volume is challenging, and the GoGuardian team had a number of priorities they kept front and center when developing the platform.
To meet these priorities, GoGuardian needed to overcome the following technical challenges.
Cyclical Data Generation
GoGuardian’s data generation was cyclical. Since most users are students, schools had little reason to keep the products enabled when classes were not in session. The data generation rate outside of school hours is drastically lower than when school is in session. The difference between the peak and trough traffic is quite large.
High Data Throughput
High data throughput was needed to keep up with the real-time traffic of more than five million students. Every event translated into multiple writes across different tables, as they’re a collection of web clicks or navigation.
Handling data duplication: A piece of data at time T0 may be updated and reappear at T10 + t aggregated. This data is not identical, but they share the same key with expanded or updated information. The updated event has old and new data. GoGuardian could save on row count and storage for many tables by storing only the most up-to-date version of each event. This approach would also reduce overall compute time.
Mutable Event Data
Event data is mutable and may need to update rather than insert. Databases that are not mutation friendly have problems with this. While workarounds are possible by redesigning data generation to support immutable inserts only, that requires retaining the entire payload of generated data. This approach increases row count, which leads to expensive reads.
Read Query Pattern
Reads are dynamic over many dimensions. The data is grouped, aggregated, and filtered by time, classrooms, student, school, URL, and other factors. 94 percent of queries in GoGuardian require ranking or aggregation over a dimension, while only 6 percent are at the row level.
Pre-calculating requests was not a feasible approach, as the team would need to reduce the degree of dimensions and how dynamic the queries could be. Ultimately, this would have required removing features from the products.
Exploring Database Options Capable of Supporting GoGuardian’s Software Suite
The GoGuardian team tested many types of databases to try to find a solution that met their development priorities and addressed the many technical challenges they faced.
GoGuardian started on a single SQL database, but once they reached a certain scale, they couldn’t rely on a single instance. They moved to an SQL sharding solution to split writes across multiple databases based on a key. Each database held a subset of GoGuardian’s data.
These sets of relational databases handled high-throughput browser events, with large quantities of rows per table on each SQL shard. Queries without indexes have unreasonable latency on a table of this high-scale. Each query needed to be carefully crafted, particularly when joining with other tables. Without this step, the databases would lock up and impact users.
The relational, row-based SQL databases handled writes relatively well, but reads were problematic. While adding additional SQL databases and resharding would help, it wouldn’t keep up with GoGuardian’s growth and introduced maintainability issues.
JK Kim, a GoGuardian Senior Infrastructure Engineer, said, “We are often too focused on latency, performance, and cost, but rarely ever talk about maintainability.”
The GoGuardian team found that shards were not very maintainable, due to the resharding process and the need for a shard router. The process is not as simple as adding a database cluster. With resharding, they had to:
“The dull, mundane, and time-consuming nature of that particular work was something we were not thrilled about having to do,” explained Kim.
The shard router itself presented another problem. Operating SQL shards was dependent on the shard router service knowing which shard is responsible for which key. GoGuardian used this stateful mapping because not all keys were equal in the platform’s traffic load, and the degree of variance was high.
The team allocated shards based on the expected traffic. Database uptime and performance was dependent on the shard router service, which was not ideal. While the SQL shard database offered fast writes and a fast, simple index-based fetch, it fell short on aggregate query performance and was resource-intensive to manage.
GoGuardian’s next database was Druid. This columnar database was adopted to complement the sharded SQL databases, as it offered aggregations across vast amounts of data and dimensions at high speeds and was distributed by nature for easy scaling.
Because of the way GoGuardian generates data, it requires updates of individual rows.
However, Druid doesn’t offer mutation at the row level. The only way to have data mutation was to run an index replacement job, which replaces an entire data block in a batch process.
To work around this limitation, GoGuardian’s team needed to create non-trivial logic to eliminate the duplicated data during the time threshold when the newer, more correct data could show up. Once these data are finalized, it would trigger a batch job to replace the duplicate data with the finalized, condensed data.
By implementing Druid, GoGuardian removed the requirement for a resource-intensive shard router, but added another outside component to their workload: an index replacement job. It also had many moving parts, including Hadoop, SQL databases, Zookeeper, and various node types that contributed to the complexity. Other roadblocks with this database technology included a lack of joins and, at the time, no basic support for SQL.
The team had to drastically change the way they thought about data and designed tables and queries to make Druid work for the platform. They considered redesigning how the data was generated so updates would not be required, and changing their Druid schema and services to adapt to such data.
Ultimately, Druid didn’t work for GoGuardian’s upsert use case. The time, cost, and engineering effort to make this happen would have been significant. To achieve this without causing downtime for GoGuardian’s users, they would have to replicate data ingestion and storage for an extended period. The team also wasn’t convinced that insert-only data generation was a better choice than their original approach.
Phoenix is an Apache project that adds a layer on top of HBase to allow SQL queries and joins. The GoGuardian team was excited about the potential of this database technology, but the testing process didn’t go as planned. The database bugged out and no amount of restarts or configuration changes could bring it back into a functional state.
GoGuardian needed a production database that is so resilient and versatile that no operation should be able to bring the database into an unintentional, unrecoverable, and inoperable state.
BigQuery, Presto, and Amazon Athena
GoGuardian explored these three products due to their excellent parallel wide queries, although they had non-ideal write throughput. They also had query engines that were decoupled from storage.
Of these databases, BigQuery offered the most optimal write throughput when writing to native BigQuery storage rather than using a schema on top of files. However, the team would have still needed to redesign their data generation to deal with the limitations in write throughput.
Spanner is Google’s proprietary, geographically distributed database offered through Google Cloud Platform. This relational database offered strong consistency across multiple regions. The GoGuardian team found Spanner fast and exciting to work with, and it was one of the most impressive candidates during the testing phase. However, the cost projection for the platform’s usage was higher than that of the existing legacy systems.
How SingleStoreDB Solved GoGuardian’s Challenges Without Compromise
After testing so many database solutions, GoGuardian’s team was still struggling to find the right fit for the platform’s use case.
Kim explained, “When we sat down and looked at our incoming data and query patterns, we were at an all-too-familiar place for an engineering team: we needed the best of both worlds with fast writes of row-based databases and the fast aggregate reads of columnar databases. Normally this is where we engineers begin to use our “trade-off” and “expectation management” skills.”
Despite trying many types of databases, partition strategies, and schemas, GoGuardian’s team wasn’t able to come up with a confident solution that didn’t result in transition difficulties or compromises in business requirements. That changed when they discovered SingleStoreDB.
“We first learned about SingleStore in the crowded vendor hall of AWS re:Invent 2017. Another engineer and I started sharing our problems with one of their representatives and ended up talking about data throughput, consistency, high availability, transaction isolation, and databases in general. It was one of the most exciting and enlightening conversations I’ve had, and it changed how we served data at GoGuardian,” said Kim.
SingleStoreDB Self-Managed is a cloud-native unified database designed for speed, scale, and agility. Multiple aggregator nodes serve as the brains of the operation and multiple leaf nodes serve as data storage. The database is coordinated by a single master aggregator. Through the simplicity of its design, SingleStore is able to achieve complex operations at low latency. Several benefits drew GoGuardian to this database technology.
Row and Column Data Stores in the Same Database
GoGuardian’s team was excited about the possibilities offered by a unified database architecture.
“SingleStore supports both types of storage, defined at table creation time. Perhaps most importantly, it allows unions and joins across row-based and columnar tables. I cannot stress enough how important this feature is to us, as it fundamentally changed how we served data by giving us the best of both worlds: the fast writes of a row-store and the fast aggregate reads of a column-store.”
“I don’t know of many database solutions that can support both row and columnar storage types. These features allowed us a degree of flexibility we never had previously.”
GoGuardian’s engineers put many preventative measures in place to prepare for inevitable machine failure and plan for incident mitigation. SingleStoreDB helps the team maintain high availability by replicating every write into both a master and secondary partition. If the original master fails over, the secondary partition becomes the master.
“It’s fast. There are some databases that, by design, cannot get better than 500ms latency, regardless of how small or simple the data being queried is. With SingleStore, we are able to see some queries under 30ms when using proper indexes and partition keys.”
“Ever since our conversation on the crowded floor of AWS re:Invent, all the way through the time when we dropped the multiple legacy databases that were replaced by SingleStore, we have always enjoyed their assistance and friendliness. It has definitely been a pleasant working experience.”
“For example, when we told them that we needed the ability to backup our data to S3, one of their engineers sent us an unreleased version with that feature to start testing with. We were able to establish a line of communication with their engineering team to round out the feature and even iron out a bug on their end. This line of communication increased our confidence in adopting SingleStore.”
Supporting GoGuardian’s Upsert Use Case
SingleStoreDB allowed GoGuardian to write to row-based tables and pay a read latency penalty for the short period of time while the data is volatile. It then unions that table with a columnar table that holds the final data past the volatile period. The unions and joins across row and columnar tables are seamless in this unified database. Support for SQL made it easy for the team to begin testing this database technology.
GoGuardian’s stream processor continuously writes data in real-time into a row table. Periodically, a batch process dumps stable data into a columnar table with the same schema. When the platform reads the data, queries are run against a view that is the union of the row and columnar tables.
After configuring the appropriate partition keys to minimize data skew, GoGuardian’s team found that SingleStoreDB offered stellar speed and efficiency for their use case.
“Previously, we couldn’t do joins at all in our columnar storage, and heavily frowned upon it in our row storage due to inefficiency. However, because SingleStore allows us to do both, we can now have the properly organized and normalized data we data engineers dream about.”
What GoGuardian Achieved With SingleStore's Help
GoGuardian’s education technology platform now serves over 10,000 schools, 500,000 educators, and 18,000,000 students. The performance delivered by SingleStoreDB during proof-of-concept testing and production deployment includes:
“At the end of the day, all databases have their own place in this world. The question always comes down to what it takes to make it work for your company’s specific use cases. At GoGuardian, we found SingleStore to be the path of least resistance. Now that everything is up and running smoothly, I’m both proud of what we have accomplished and thankful for the product and the support we have received from the SingleStore team. I have more confidence than ever before in our infrastructure and its ability to handle strenuous loads during peak hours.”
What’s Next for GoGuardian?
GoGuardian continues to help educators adapt to online and hybrid learning environments through a comprehensive suite of powerful tools and continually updated features.