Modeling the City of the Future with Kafka and Spark


Emily Friedman

Marketing Communications for SingleStore.

Modeling the City of the Future with Kafka and Spark

Today at Spark Summit in San Francisco, we are showcasing MemCity, a simulation that measures and maps the energy consumption across 1.4 million households in a futuristic, metropolitan city, approximately the size of Chicago.

MemCity tracks, processes, and analyzes data from various energy devices that can be found in homes, measured by the minute in real-time. We define real-time as up to the last click, meaning all of the data being processed, up until the moment you hit enter on your query, is included. Each device is measured according to timestamp, device identification, and watts consumed. Appliances measured include laundry and water devices, kitchen appliances, entertainment, lighting, HVAC, and small devices.  The MemCity dashboard showcases the total Megawatts per hour by device, hour of the day, and zip code.

MemCity is built with the “real-time trinity” of Apache Kafka, Apache Spark, and SingleStore. This is the same architecture that companies, like Pinterest, have implemented to unlock new business opportunities with real-time data processing and analytics. Kafka, Spark, and SingleStore pair well as all three technologies are distributed and memory-optimized. Together, they solve the problem of how to ingest, process, and serve real-time data across an organization.

Kafka, a message queue, condenses the data from individual houses and devices into consumable streams. Next, Spark reads from Kafka, then transforms and enriches the data with geolocation and device type information. Finally, SingleStore provides persistence and real-time SQL queries, ingesting data from Spark while simultaneously powering live power consumption dashboards.

Now for some of the technical specifications.

  • Runs on four Amazon EC2 nodes, which cost roughly $2.35 an hour, equating to under $21,000 per year.
  • Pipeline processes 186.67k transactions/sec – timestamp, device ID, watts consumed, geolocation info
  • Runs simultaneous real-time SQL queries

MemCity highlights an application of data that will guide our future – smart cities are no longer nice-to-have; they are becoming a necessity.  By capturing energy readings by device in real time over geographical distances and over periods of time, we can enable urban planners, energy companies, and other like organizations to identify trends in consumption and figure out the right solutions for smarter energy systems and devices. We can gather deep and meaningful insights from sensors, and realize the possibility of doing so with low-cost, commodity hardware. Putting data to good use makes long-term planning viable.

For more information on MemCity, check out the official news release:

Come see us at Booth #K6 at Spark Summit West showcasing MemCity today through Wednesday. More information about our presence at the Summit: