AWS interview question
Define scaling and its types :
Scaling alters the size of a system. In the scaling process, we either compress or expand the system to meet the expected needs. The scaling operation can be achieved by adding resources to meet the smaller expectation in the current system, by adding a new system to the existing one, or both.
Horizontal scaling is best achieved by distributed NoSQL databases (Cassandra, MongoDB) and modern distributed SQL systems (Google Spanner, CockroachDB) that use sharding to distribute data across multiple servers. Vertical scaling is better suited for traditional Relational Database Management Systems (RDBMS), such as MySQL, PostgreSQL, and Oracle, by adding resources like CPU/RAM to a single server
Vertical Scaling: When new resources are added to the existing system to meet the expectation, it is known as vertical scaling.
Consider a rack of servers and resources that comprises the existing system. (as shown in the figure). Now when the existing system fails to meet the expected needs, and the expected needs can be met by just adding resources, this is considered vertical scaling. Vertical scaling is based on the idea of adding more power(CPU, RAM) to existing systems, basically adding more resources.
Vertical scaling is not only easy but also cheaper than Horizontal Scaling. It also requires less time to be fixed.
Horizontal Scaling: When new server racks are added to the existing system to meet the higher expectation, it is known as horizontal scaling.
Consider a rack of servers and resources that comprises the existing system. (as shown in the figure). Now when the existing system fails to meet the expected needs, and the expected needs cannot be met by just adding resources, we need to add completely new servers. This is considered horizontal scaling. Horizontal scaling is based on the idea of adding more machines to our pool of resources. Horizontal scaling is difficult and also costlier than Vertical Scaling. It also requires more time to be fixed.
Traditional RDBMS databases like Oracle are generally considered poor candidates for easy horizontal scaling (adding more nodes) compared to NoSQL systems because they are designed for strong consistency (ACID) on a single, shared storage. While Oracle can be scaled horizontally through complex, expensive, or specific tools like RAC or Sharding, it does not achieve the seamless, linear scaling characteristic of NoSQL, primarily due to the following reasons:
1. The Bottleneck of ACID Consistency
RDBMS systems guarantee ACID (Atomicity, Consistency, Isolation, Durability) properties, which require all nodes in a cluster to agree on the final state of data before a transaction is finalized.
High Latency: As you add more nodes, the coordination needed to ensure data remains consistent across those nodes increases network traffic, resulting in reduced performance.
"Shared Everything" Architecture: Technologies like Oracle RAC (Real Application Clusters) still rely on nodes sharing the same underlying data storage. This means that while compute power increases, the shared storage eventually becomes the bottleneck, limiting true horizontal scalability.
2. Inefficiency of Distributed JOINS
A key feature of relational databases is joining data across multiple tables.
Network Overhead: In a sharded (partitioned) database environment, tables are spread across different nodes. A query requiring a JOIN between tables located on different servers must fetch data over the network, which is significantly slower than local disk reads.
Complexity: Performing complex joins across hundreds of shards is difficult to optimize, making it impractical for massive-scale, high-transaction systems.
3. High Operational Complexity (Sharding)
To truly scale Oracle horizontally, you must use sharding, which involves splitting data across independent database instances.
Manual Management: Historically, sharding was not natively supported, requiring complex, manual application-level logic to route queries to the correct database instance.
Re-sharding Challenges: As data grows unevenly, re-balancing data (re-sharding) across new nodes is highly complex, often requiring downtime.
4. Designed for Vertical Scaling
Traditional RDBMS architecture was built in an era where vertical scaling (adding more CPU, RAM, or faster disk to a single machine) was more common than horizontal, distributed systems.
B+-Trees Optimization: Oracle relies on B+-Trees for indexing, which are designed for fast access on a single machine's local disk. Distributing B+-Trees across nodes requires costly network calls.
Summary of Alternatives
While Oracle offers solutions like Oracle Sharding to enable horizontal scaling, it is often more expensive and complex to implement than NoSQL alternatives (like Cassandra, MongoDB, or ScyllaDB) that are built from the ground up for massive, distributed horizontal scale, often at the cost of immediate consistency
Why is database sharding important?
Sharding is example of Horizontal scaling . As an application grows, the number of application users and the amount of data it stores increase over time. The database becomes a bottleneck if the data volume becomes too large and too many users attempt to use the application to read or save information simultaneously. The application slows down and affects customer experience. Database sharding is one of the methods to solve this problem because it enables parallel processing of smaller datasets across shards.
Comments
Post a Comment