Carbon, Cloud, and the Modern Data Estate

DR

Domenic Ravita

VP, Product Marketing & Developer Relations

Carbon, Cloud, and the Modern Data Estate

On this National Cut your Energy Costs Day, it’s a good time to think about our carbon footprints at home and at work as data professionals.

Since the first home was electrified in Manhattan in the 1880s, our home electricity usage has grown dramatically. According to Energy.gov, residential homes now account for 22% of all electricity consumption in the U.S. Roughly 63% of this electricity is still generated by nonrenewable fossil fuel sources in the U.S. according to the Energy Information Administration., but this varies a lot based on where you live in the country. In Georgia where I live, nonrenewable fossil fuel sources account for about 71% of electric generation.  No matter where you live, saving energy brings immediate benefits to you and helps reduce our carbon footprint. As today is National Cut your Energy Costs Day, it’s a good time to think about how changing some habits will save money on your monthly electricity bill, but the larger collective impact of cutting your energy use helps the environment by reducing the carbon footprint.

Three energy-saving tips that can make a difference.  First,  install a programmable thermostat. These can learn household behaviors and set temperatures at the optimal levels for comfort and may save as much as 15 percent of electricity consumption.  Second, finish replacing those energy-hungry incandescents with LED bulbs. Finally, unplug the multiple devices, laptops, televisions, and even the coffee pot.  The bricks and wall warts for those electronics and appliances are energy vampires which draw power even when the device is off and can account for as much as 20% of your monthly bill.

But as important as our personal energy habits are, perhaps we should be more environmentally conscious about the impact of our choices as IT and data professionals. Our home and work lives have blurred in the restricted lifestyle this pandemic has caused and our new home-bound behaviors are driving the largest, fastest adoption of digital services the world has ever seen. By day, we’ve turned to video conferencing from the kitchen counter for work. By night, we’re watching The Queen’s Gambit on Netflix and the Mandalorian on Disney+. (I highly recommend both.) But you may be wondering how the use of these digital services are impacting electricity usage.

electricity-and-the-cloudElectricity and the Cloud

In the early months of the pandemic after air travel dropped precipitously, the carbon footprint of video streaming services received a lot of attention. With the energy consumption of information and communication technologies (ICT) increasing 9% every year, The Shift Project’s Lean ICT Report found that the carbon footprint from ICT sectors increased from 2.5% to 3.7% of global emissions, compared to air travel’s 2.4% share. Of the 3.7%, 19% come from data centers and 16% from network operations. Of course, video streaming services are just one type of digital service among many more SaaS applications delivered by both public cloud data centers and enterprise-owned data centers. and they require massive amounts of electricity to operate.  The National Resources Defense Council (NRDC) estimated that data center electricity consumption would increase to roughly 140 billion kilowatt-hours annually in 2020, the equivalent annual output of 50 power plants, costing American businesses $13 billion in electricity bills and emitting nearly 100 million metric tons of carbon pollution per year. This is roughly 2-3% of all electricity usage in the U.S. per year. Although it’s invisible to us, our collective use of digital services is making a big impact on electricity usage and the environment.

There is some good news on data centers. Efficiency improvements have reduced the growth rate in their electricity consumption over the last 20 years. A study commissioned by the Department of Energy and conducted by the Lawrence Berkeley National Laboratory used decades worth of data the observe the trend in electricity usage of data centers and found that from 2000 to 2020 the rate of increase in electricity usage was estimated to stabilize close to 4% from 2010 to 2020 compared to a 90% increase from 2000 to 2005 and a 24% from 2005 to 2010. Part of the efficiency gain is attributed to the reduced growth in the number of servers operating in public cloud data centers. Servers in the public cloud are operated at a higher server utilization rate than enterprise-managed data centers. Amazon Web Services commissioned a study from 451 Research showing that their infrastructure-as-a-service was more than 3.6 times as energy efficient as the median of surveyed U.S. enterprise-owned data centers. They attribute that efficiency advantage to a much higher server utilization and more efficient hardware. Google and Microsoft Azure data centers are achieving similar efficiency gains over corporate-owned data centers.

managing-the-data-estateManaging the Data Estate

But just as our personal energy habits in our homes have a large effect on energy use, so do our IT decisions. How we manage the data powering these SaaS applications in the context of carbon may be the next big challenge because where data is stored, where it’s copied, how it’s processed, and where it’s transmitted all add up and have an impact. You or your Cloud Operations team sees the effect of that for your company’s SaaS product in your cloud utility bill every month. Some of the line items may pop out, like a large cluster of m5.12xlarge instances in a test environment that’s been left running for 30 days with no activity. In this case, the cloud-saving habit is no different than your home energy-saving habit: Turn off the lights when you leave the room!

The carbon impact of other cloud decisions we make may be less obvious. Modern customer and business experiences delivered by SaaS applications depend on a diverse data backend. Microsoft refers to this as the “modern data estate” with data stored in different data locations across different types of data stores from operational databases to data warehouses to data lakes. Into this data estate flows a deluge of data from an increasing number of different sources. Within the data estate we ingest, manage, and analyze this data using various types of storage appropriate to the processing and need for freshness. In the data estate, you need to retain a long history of data to be able to access past and present data and predict the future.

I think the analogy of the estate is a useful one for thinking about the carbon impact of our data management decisions. Within the estate we have assets and liabilities, in terms of data assets and workloads. The data liabilities include the cost of copying and moving data. It has been conventional wisdom of late to pick a datastore-per-workload. There are complex decision trees available on how to pick from among almost 20 different specialty datastores such as time-series, key-value, document, wide-column, unstructured text, graph or relational data. There’s also the choice about the type of processing needed in terms of transactions or analytics.

Consider the real-time data assets and workloads needed for your SaaS application or business. Think about how many different types of datastores are involved in creating, storing and processing those data assets and workloads. Also consider the machine learning models which operate on that data. The tally may be 3, 4 or more. Because it’s as easy and convenient to spin up new datastores as it is to flip on a light, your data estate may be large and sprawling which requires an estate staff with specialized skill sets to manage each of those assets.  At SingleStore, we’ve encountered  scenarios where as many as 14 different types of datastores were involved in producing real-time assets and serving real-time analytics.

Serving these diverse workloads on diverse data structures for real-time use cases is inherently inefficient. Big data becomes even bigger when it’s copied and moved rather than operated on in place and as it arrives from streaming sources. In terms of the “data estate”, we can reduce the liability and cost of creating, processing, and maintaining real-time assets by consolidating these workloads. There’s no need to give up on the convenience of instant availability in the cloud or the data access styles and structures you’ve grown accustomed to when designing your application. Many have already moved off single-node and special purpose databases to achieve greater efficiency by combining real-time operational and analytical workloads on diversely-structured data, from binary and raw text to key-value, document and relational. Such as sharing hardware at the cloud infrastructure level is resulting in higher server utilization and greater energy efficiency for data centers, building applications with a unified database that supports diverse workloads on diversely-structured data reduces your data estate’s liabilities. I argue that it also has the effect of increasing the value of the real-time data assets as well since designing SaaS applications with SingleStore reduces latency and stores data more efficiently through a combination of innovations than other datastores.

takeawayTakeaway

So, unplug those energy vampires in your home and across your data estate. Take a modern approach to cut your energy consumption. Consider the advantages you gain by combining real-time workloads into fewer datastores to not only simplify and accelerate your data, but also to conserve electricity and reduce the carbon footprint. By renewing and modernizing your data estate through reducing special purpose datastores, you’re directly following the environmental ethos of reduce, reuse, and recycle. I’ve said before that you must simplify to accelerate. Consider that by doing so, you may also “simplify to save”.


Share