This article defines data-intensive applications in more detail, including key requirements, use cases and how you can evaluate your application's data intensity.
Table of Contents

What Are Data-Intensive Applications?

Data, not logic, is at the forefront of application development. In the software industry, it was previously thought that an application’s functionality was contained in its logic and dispersed throughout the code, but it’s since become clear that application logic is largely determined by the state of business data. The state of an order, for example, determines what can and cannot happen in the business process associated with that order; only after an order has been dispatched can it be shipped. These state transitions in business data can be chained together to build a long-running transaction connected with that business object.
As a result of this shift in perspective, data has evolved from being something static that aids logic to something that defines an application’s business logic. Such logic is based on both historical and current data. Today’s data-intensive applications are designed to manage terabytes of data from millions of customers, and thorough analysis of data related to user behavior and business performance can be used to determine future business strategies.
This article will define data-intensive applications in more detail so you can see how they benefit users.

Why Do You Need Data-Intensive Applications?

Organizations can use data-intensive applications in multiple ways. Before the cloud computing era, previous orders in an order management system could be used to anticipate the projected order volume for the coming months. The much larger volumes of data available in a modern cloud-based system can increase an organization’s analytical capabilities exponentially.
For example, ridesharing apps like Uber rely on real-time data to find available cabs near a user’s location and calculate estimated fares. Other examples of data-intensive applications include social media platforms like Facebook and Twitter; payment service providers like PayU and PayPal; mobile banking applications; video streaming services like Netflix and Hulu; and eCommerce applications like Flipkart and eBay.

What Are the Key Requirements for Data-Intensive Applications?

There are many factors that you should consider during the architecting and design of a data-intensive application. The following are the most important.

High Concurrency in Data Access

Since data-intensive applications have a large number of consumers generating huge amounts of data, you need to ensure high concurrency for data access.
For example, when a user opens the Uber app to request and book a ride, the app finds available cabs near the user’s location and displays the computed fares for the ride.
In 2021, Uber had approximately 3.5 million drivers and completed approximately 18 million trips per day. As of 2020, Uber is present in more than 10,000 cities. If you equally distribute those numbers across the cities, there are approximately 350 drivers competing for 1,800 bookings per day. This makes concurrent access to the same cabs high.

Fast-Changing Data Streams

Ultra-fast-changing data streams need to be managed effectively in data-intensive applications. For example, say a user is making a reservation on the IRCTC (Indian Railways) website. The user searches for a train, checks seat availability and books the seat. This is a heavily used site, though. Indian Railways carries over 24 million passengers a day, who prebook and travel in about 13,452 trains.
At such high volumes, seat availability data changes almost instantly. Typically, within seconds of opening the online bookings for a train, all seats are booked. Applications with a data stream of this speed must be able to keep up with it.

Super Low Latency

Low latency is important to enable instant data access and updates. Say a customer uses their debit card at an ATM to check their balance, withdraw cash or deposit money. Their account needs to be updated immediately, and the customer should get an instant notification.
HDFC Bank, one of India’s largest banks, owns over 18,000 ATMs. Even if more than half of those ATMs are in use simultaneously, the concurrency of access to the system is around 9,000. As of 2022, HDFC Bank has over 68 million customers. Its online banking application and ATMs need to access the same account information, which increases the concurrent users of the system. While the concurrent users are high, data concurrency is low, because each customer is accessing their own account.

Large Data Sets with Quick Ingestion of Data

Data-intensive applications like Uber and IRCTC don’t need to consider the rate of ingestion of data, but this is an important consideration for streaming apps. For example, video streaming service Netflix has 222 million subscribers in more than 190 countries. Users generally watch videos in high definition, and the data needs to be streamed continuously without gaps so that there is no buffering.
According to Android Authority, Netflix’s data consumption is at 6.5 GB to 11.5 GB per hour for 4K resolution.

Fast Analytics

You also need to architect for fast real-time analytics on large data sets. Netflix, for example, supports more than 2,000 devices for streaming. Each of these devices has varied support for video quality, audio quality, resolutions, and supported formats. Netflix needs to be able to transcode and encode the original video streams appropriately for all of these devices.
Meanwhile, to offer cabs to a user, Uber needs to apply parameters that minimize wait time, reduce extra driving, and improve the overall ETA. This requires Uber to apply real-time analytics on a large driver database with current location, driver behavior, and other parameters.

Scaling and Analytics in Data-Intensive Applications

Vertical scaling works well for a client/server application with a few users, but that single server can only manage a certain number of concurrent users. Data-intensive applications use horizontal scaling; they’re written as stateless applications, allowing for load balancing across different servers
Some functionalities in an application aren’t used as much as others. For example, actions such as updating a train’s status or adding a timetable to the IRCTC site are performed less frequently than actions like booking a train seat or checking a train’s status. If all of these functions were scaled evenly because they were hosted on the same cluster, the performance of less-used functions would be influenced by the performance of more-used ones. This issue is avoided with the microservices architecture, in which functions are broken down into smaller chunks that can be deployed and scaled independently.
Microservices, clustering, and all other scaling strategies are focused on the application’s processing side. To get around vertical scaling on the data side, microservices architecture alternatives involve either connecting each service to the same database or storing data for each microservice in a single database. In order to prevent a database from becoming a bottleneck, it needs to scale intrinsically, similarly to the microservice structure, with regional and functional deployments of the microservices.
A database that can partition data in a single table into numerous co-located nodes can be accessed easily from a server in a specific location without having to manually deploy a new database for each region. Regionally distinct deployments are ineffective, though, because they discourage data consolidation. Typical systems attempt to deploy separate regional databases with background activities that combine data into a central database. This causes a significant delay in data availability for data-intensive applications, and database administration becomes more difficult. The best solution allows the database to scale horizontally by distributing data between various servers.
Analytics is another major use case for such applications. Analyzing the entire data set is preferable to doing it piecemeal. Real-time analytics requires a localized data set, while offline analytics needs the entire data set. However, real-time analytics in one region cannot benefit from the learnings of another. Uber, for example, must do real-time analysis of cab statistics in order to evenly distribute assignments to cab drivers. Otherwise, it will be perceived as biased in favor of a few cab drivers. In this use case, analytics that are run just on the regional database will suffice. A distributed database that works well in both regional and consolidated use cases is needed for these situations.

Conclusion

Data-intensive applications are those in which the amount of data that needs to be managed grows exponentially with the number of users. Such applications can handle increasingly complex tasks to serve users and offer deeper analysis for organizations in multiple industries.
The needs of data-intensive applications, such as super low latency of data access, ultra-fast changing data streams and high concurrency, can best be met by distributed databases. For example, SingleStoreDB offers a real-time, distributed SQL database for multi-cloud, on-premise and hybrid systems. It provides fast ingestion, nearly unlimited scalability and sub-second latency. And with a unified data engine for transactional and analytical workloads, SingleStoreDB powers fast, real-time analytics for data-intensive applications.
Wondering how data-intensive your applications are? Find out with our Data-Intensity Calculator