AWS interview question
Define scaling and its types :
Scaling alters the size of a system. In the scaling process, we either compress or expand the system to meet the expected needs. The scaling operation can be achieved by adding resources to meet the smaller expectation in the current system, by adding a new system to the existing one, or both.
Horizontal scaling is best achieved by distributed NoSQL databases (Cassandra, MongoDB) and modern distributed SQL systems (Google Spanner, CockroachDB) that use sharding to distribute data across multiple servers. Vertical scaling is better suited for traditional Relational Database Management Systems (RDBMS), such as MySQL, PostgreSQL, and Oracle, by adding resources like CPU/RAM to a single server
Vertical Scaling: When new resources are added to the existing system to meet the expectation, it is known as vertical scaling.
Consider a rack of servers and resources that comprises the existing system. (as shown in the figure). Now when the existing system fails to meet the expected needs, and the expected needs can be met by just adding resources, this is considered vertical scaling. Vertical scaling is based on the idea of adding more power(CPU, RAM) to existing systems, basically adding more resources.
Vertical scaling is not only easy but also cheaper than Horizontal Scaling. It also requires less time to be fixed.
Horizontal Scaling: When new server racks are added to the existing system to meet the higher expectation, it is known as horizontal scaling.
Consider a rack of servers and resources that comprises the existing system. (as shown in the figure). Now when the existing system fails to meet the expected needs, and the expected needs cannot be met by just adding resources, we need to add completely new servers. This is considered horizontal scaling. Horizontal scaling is based on the idea of adding more machines to our pool of resources. Horizontal scaling is difficult and also costlier than Vertical Scaling. It also requires more time to be fixed.
Traditional RDBMS databases like Oracle are generally considered poor candidates for easy horizontal scaling (adding more nodes) compared to NoSQL systems because they are designed for strong consistency (ACID) on a single, shared storage. While Oracle can be scaled horizontally through complex, expensive, or specific tools like RAC or Sharding, it does not achieve the seamless, linear scaling characteristic of NoSQL, primarily due to the following reasons:
1. The Bottleneck of ACID Consistency
RDBMS systems guarantee ACID (Atomicity, Consistency, Isolation, Durability) properties, which require all nodes in a cluster to agree on the final state of data before a transaction is finalized.
High Latency: As you add more nodes, the coordination needed to ensure data remains consistent across those nodes increases network traffic, resulting in reduced performance.
"Shared Everything" Architecture: Technologies like Oracle RAC (Real Application Clusters) still rely on nodes sharing the same underlying data storage. This means that while compute power increases, the shared storage eventually becomes the bottleneck, limiting true horizontal scalability.
2. Inefficiency of Distributed JOINS
A key feature of relational databases is joining data across multiple tables.
Network Overhead: In a sharded (partitioned) database environment, tables are spread across different nodes. A query requiring a JOIN between tables located on different servers must fetch data over the network, which is significantly slower than local disk reads.
Complexity: Performing complex joins across hundreds of shards is difficult to optimize, making it impractical for massive-scale, high-transaction systems.
3. High Operational Complexity (Sharding)
To truly scale Oracle horizontally, you must use sharding, which involves splitting data across independent database instances.
Manual Management: Historically, sharding was not natively supported, requiring complex, manual application-level logic to route queries to the correct database instance.
Re-sharding Challenges: As data grows unevenly, re-balancing data (re-sharding) across new nodes is highly complex, often requiring downtime.
4. Designed for Vertical Scaling
Traditional RDBMS architecture was built in an era where vertical scaling (adding more CPU, RAM, or faster disk to a single machine) was more common than horizontal, distributed systems.
B+-Trees Optimization: Oracle relies on B+-Trees for indexing, which are designed for fast access on a single machine's local disk. Distributing B+-Trees across nodes requires costly network calls.
Summary of Alternatives
While Oracle offers solutions like Oracle Sharding to enable horizontal scaling, it is often more expensive and complex to implement than NoSQL alternatives (like Cassandra, MongoDB, or ScyllaDB) that are built from the ground up for massive, distributed horizontal scale, often at the cost of immediate consistency
Why is database sharding important?
Sharding is example of Horizontal scaling . As an application grows, the number of application users and the amount of data it stores increase over time. The database becomes a bottleneck if the data volume becomes too large and too many users attempt to use the application to read or save information simultaneously. The application slows down and affects customer experience. Database sharding is one of the methods to solve this problem because it enables parallel processing of smaller datasets across shards.
Database Migration Service
1. Database Migration Service (DMS): AWS’s DMS allows you to migrate databases to AWS quickly and securely without needing to manually copy over gigabytes of data or manually set up replication.
2. Replication: The key to zero downtime migrations. AWS DMS uses replication to copy data from the source database to the target database while keeping both in sync.
3. AWS Schema Conversion Tool (SCT): This tool helps you convert your source database schema to the format required by your target database.
The AWS Schema Conversion Tool (AWS SCT) is a free, standalone application used to automate heterogeneous database migrations. It converts your existing database schema—including tables, views, stored procedures, and functions—from a source engine to a format compatible with AWS target databases.
4. Minimal Downtime Cutover: Once you’ve replicated your data and everything is in sync, you can switch to the new database with minimal downtime by redirecting traffic.
Which Applications are good candidate for nosql database
NoSQL databases are ideal for applications that require high scalability, flexible data models, and rapid development, particularly when dealing with large volumes of unstructured or semi-structured data. They are best suited for projects where data structures evolve frequently or where horizontal scaling across commodity servers is necessary
What’s the best way to migrate an on-premises Oracle database to AWS?
A: Use AWS Database Migration Service (DMS) for minimal downtime. For large databases, combine DMS with AWS Snowball for physical data transfer. Pre-migration steps:
- Test compatibility with AWS using the AWS Schema Conversion Tool (SCT).
- Choose between EC2 (full control) or RDS (managed service).
- Optimize storage (e.g., GP3/IO2 volumes for performance).
Should I use Amazon RDS for Oracle or deploy Oracle on EC2?
A: It depends on your needs:
- RDS: Automated backups, patching, scaling, and Multi-AZ for HA. Limited to Oracle Standard Edition and minor version control.
- EC2: Full control (e.g., Oracle Enterprise Edition, custom scripts, and OS-level tweaks). Requires manual HA setup (e.g., Data Guard).
you can run Oracle Database Enterprise Edition (EE) on Amazon RDS. However, unlike Standard Edition 2 (SE2), Enterprise Edition is only available under the Bring Your Own License (BYOL) model
What are the top high-availability (HA) options for Oracle on AWS?
- RDS Multi-AZ: Automatic failover, sync replication.
- EC2 with Data Guard: Deploy standby instances in another AZ/region.
- Oracle Real Application Clusters (RAC): Complex but offers active-active clusters (requires EC2 and EBS/FSx for shared storage).
How does licensing work for Oracle on AWS?
- BYOL (Bring Your Own License): Use existing Oracle licenses. Ideal for long-term deployments.
- License Included: Pay hourly via AWS (simpler but costlier long-term).
⚠️ Audit compliance: Track usage with AWS License Manager.
How can I optimize Oracle performance on AWS?
- Instance Type: Use memory-optimized (R5) or burstable (T3) instances.
- Storage: IO2 Block Express for high IOPS (up to 256k IOPS/volume).
- AWR Reports: Analyze waits and tune SQL.
- CloudWatch: Monitor CPU, memory, and disk metrics.
How do I secure Oracle databases on AWS?
- Encryption: Use Oracle TDE + AWS KMS for data-at-rest. SSL/TLS for data-in-transit.
- IAM: Restrict access via roles/policies.
- Security Groups: Limit traffic to trusted IPs/ports.
- Audit: Enable Oracle Audit Vault + AWS CloudTrail.
Can I use Oracle Data Guard with AWS?
A: Yes! Deploy a standby instance in another AZ or region. For EC2:
- Use ASM or EBS replication.
- For RDS: Enable Multi-AZ or create a read replica.
What tools monitor Oracle on AWS?
- CloudWatch: Track OS/database metrics.
- Oracle Enterprise Manager (OEM): For deep SQL/DB diagnostics.
- AWS RDS Performance Insights: Analyze RDS workload.
How do I reduce costs for Oracle on AWS?
- Reserved Instances: Save up to 60% on EC2/RDS.
- Stop/Start Non-Prod Instances: Use EC2 scheduling.
- Storage Tiering: Archive logs to S3 Glacier.
- RDS vs. EC2: Compare TCO using the AWS Pricing Calculator.
What do you know about the Amazon Database?
Answer: Amazon database is one of the Amazon Web Services that offers managed database along with managed service and NoSQL. It is also a fully managed petabyte-scale data warehouse service and in-memory caching as a service. There are four AWS database services, the user can choose to use one or multiple that meet the requirements. Amazon database services are – DynamoDB, RDS, RedShift, and ElastiCache.
Explain Amazon Relational Database.
Answer: Amazon relational database is a service that helps users with a number of services such as operation, lining up, and scaling an on-line database within the cloud. It automates the admin tasks such as info setup, hardware provisioning, backups, and mending. Amazon relational database provides users with resizable and cost-effective capability. By automating the tasks, it saves time and thus let user concentrate on the applications and provide them high availableness, quick performance, compatibility, and security. There are a number of AWS RDS engines, such as:
MySQL
Oracle
PostgreSQL
SQL Server
MariaDB
Amazon Aurora
What is Amazon ElastiCache?
Answer: Amazon ElastiCache is an in-memory key-value store which is capable of supporting two key-value engines – Redis and Memcached. It is a fully managed and zero administrations which are hardened by Amazon. With the help of Amazon ElastiCache, you can either build a new high-performance application or improve the existing application. You can find the various application of ElastiCache in the field of Gaming, Healthcare, etc.
What will happen to the dB snapshots and backups if any user deletes dB instance?
Answer: When a dB instance is deleted, the user receives an option of making a final dB snapshot. If you will do that it will restore your information from that snapshot. AWS RDS keeps all these dB snapshots together that are created by the user along with the all other manually created dB snapshots when the dB instance is deleted. At the same time, automated backups are deleted while manually created dB snapshots are preserved.
How to connect to ec2 from your machine
Under ssh --> auth -- select private key downloaded
under connection - data -- select autologin user as ec2-user
References
https://www.linkedin.com/pulse/top-10-oracle-dba-questions-aws-expert-answers-best-oradba-ulfgf/
https://blog.devops.dev/database-migration-on-aws-without-downtime-how-i-saved-the-day-and-the-data-e3df4363e28d
https://www.whizlabs.com/blog/aws-database-interview-questions/
Comments
Post a Comment