In this Document

APPLIES TO:

Oracle Database Cloud Schema Service - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Database - Enterprise Edition - Version 11.2.0.3 to 19.3.0.0.0 [Release 11.2 to 19]
Information in this document applies to any platform.

PURPOSE

This note discusses the concepts of ASM disk group failure coverage, how to reserve free space to cover disk or cell failure in an Exadata environment, and how to determine the resulting capacity of a disk group (DG).

SCOPE

This applies primarily to Exadata which uses ASM for storage management. This information is intended for architects, database, and systems administrators who will plan and manage storage in the Exadata environment. It is believed that the concepts apply to non-Exadata environments, but the assertions in this note have not been tested in non-Exadata environments.

DETAILS

Overview

Exadata uses ASM for storage management and disk space allocation; the implementation of ASM is the same as non-Exadata environments, but there are specific things to consider in an Exadata environment:

Cells form the basis of ASM failure groups
The probability of disk failure is much higher than an entire cell
A single disk failure is repaired rather quickly with on-hand replacements, and re-mirroring the data is typically accomplished in a short time.
A cell failure is very rare, but when it happens, it may take several hours to repair and require the actions of a Field Service Engineer (FSE) who will need to bring spare parts such as a motherboard. It is usually preferable in these cases to NOT rebalance the data from the lost cell but rather leave those disks offline and wait for the cell to be repaired and subsequently re-synchronized by an ASM RESYNC. If using the best practice recommendation of high redundancy for all failure groups, there will still be redundancy in the disk group after losing a single cell.
In all current production releases as of the update date of this MOS note, there are 8 disk partners and, when possible, 4 failure group (cell) partners. As an example, with a 5 cell configuration, a given disk will be partnered with two other disks on each of four cells. Partnering can differ across disk groups depending on the disk group attribute, "content.type" (typically we use content type DATA or RECOVERY) . For example, the partners for the DATA content.type are different than the ones for the RECOVERY content.type, meaning a full partner failure for one disk group will never forcibly dismount the other disk group.

This note discusses the following:

Understanding why high redundancy disk groups is a best practice
Reviewing the concepts of disk failure coverage and how much space you need to reserve
Calculating how much usable file capacity will be present in the disk group after reserving for failure coverage

ASM redundancy types

ASM disk groups in Exadata are defined as either normal or high (External redundancy is NOT supported with Exadata storage). Normal redundancy provides two copies of file extents (database files are stored as one or more "file extents" in ASM); high redundancy provides for three copies of file extents. Each disk is partnered with a set of other disks in other failure groups to ensure that file extent copies are stored in separate failure groups and the disk group can tolerate the loss of one disk or one cell in normal redundancy, or two disks or two cells in high redundancy disk groups.

In all current production releases as of the update date of this MOS note, we have 8 disk partners and, when possible, 4 failure group (cell partners). As an example, with a 5 cell configuration a given disk will be partnered with two other disks on each of four cells. ASM's partnering method when the CONTENT_TYPE attribute is used, is done so that a loss of all partner disks will not cause the loss of another disk group. For example, we partner differently for DATA than we do for RECO meaning a full partner failure will not cause the loss of both disk groups.

Oracle recommends high redundancy disk groups to maintain redundancy during storage cell updates (for example during monthly security updates); this means the disk group will tolerate a single disk failure on a partner cell while the cell is offline for the software update. With database consolidation and with the growing sizes of most databases on Exadata, the impact of any downtime becomes significant, thus high redundancy provides the correct level of protection for business and mission critical applications. Development and test databases can also benefit from high redundancy due to consolidation density or the sheer number of databases that can reside on the same Exadata cluster. In these environments, the cost of downtime is very high due to the impact on many developers, so the use of high redundancy disk groups is imperative for these types of clusters as well.

Normal redundancy may be suitable for development/test databases that can tolerate downtime for: 1) storage maintenance since it’s safer to do an offline storage update when configured with normal redundancy or, 2) can tolerate downtime to restore and recover cluster and test/development databases in the case of losing a disk group.

Failure Coverage

Failure coverage refers to the amount of space in a disk group that will be used to remirror data in the event of ~~some~~ a storage failure - more space kept free means greater failure coverage. In this note we will use the following terms for the various kinds of failure coverage:

Disk failure coverage (DFC) will refer to having enough free space to allow data to be re-mirrored (and rebalanced) after a single disk failure in a normal redundancy disk group, or single or dual disk failure in a high redundancy disk group.

Cell failure coverage (CFC) will refer to having enough free space to allow data to be re-mirrored after the loss of one entire cell (i.e., an ASM failure group). Double cell failures are extremely rare and will not be considered in this note.

Cell failures are very rare and rebalancing is typically not desired; it is better to simply repair the cell and bring it back online. Therefore, CFC will not be discussed in the remainder of this note. If for some reason you must drop a single cell, each disk group should have FREE_MB greater than a cell's worth of the total disk group space plus an additional 5% of that space. For example, if a disk group has a total size of 2048 GB with 4 cells, if one cell is to be dropped, then FREE_MB should be at least: (2048 GB / 4) * 1.05 = 538 GB

Oracle recommends the use of high redundancy disk groups using high capacity or flash disks with space reserved for disk failure coverage. This gives excellent availability benefits and performance along with good capacity. Greater availability can be achieved by having a standby database in addition to using high redundancy disk groups. To learn more about availability, please see the Maximum Availability Architecture resources on OTN.

Calculating Reserve Space and Disk Group Capacity

Reserving space in the disk group means that you monitor the disk group to ensure that FREE_MB never goes below the minimum amount needed for disk failure coverage. ASM disk groups do not set aside space that is considered reserved space (even the value of REQUIRED_MIRROR_FREE_MB is not set aside - it's merely calculated and used to derive USABLE_FILE_MB). This section discusses how to go about calculating the reserve space and capacity of the disk group.

Calculating Reserve Space for Failure Coverage

For DFC, to enable rebalancing after the loss of a single disk, Oracle recommends having free space in the disk group equal to or greater than the percentage of the total disk group capacity as follows:

Table 1: Oracle Exadata X9 and Earlier High Capacity or Extreme Flash, or Exadata X10 High Capacity
Grid Infrastructure Version	Number of Storage Cells	Required % Free of Disk Group Capacity to Successfully Rebalance after a single disk failure
Grid Infrastructure Version	Number of Storage Cells	1/8th Rack or Base System	1/4 rack or larger
12.1.0	Any	20%	15%
12.2, 18.1+	less than 5	20%	15%
12.2, 18.1+	5 or more	20%	9%

Applies to any disk groups and any redundancy type (HIGH or NORMAL)

Table 2: Oracle Exadata X10 Extreme Flash
Number of Failure Groups (8 disks / FG)	Redundancy	Required % Free of Disk Group Capacity to Successfully Rebalance after a single disk failure
less than 5	NORMAL	15%
less than 5	HIGH	29%
5 or more	NORMAL	9%
5 or more	HIGH	11%

X10 EF cells have four physical flash disks with two ASM disks per physical flash disk. Therefore, a flash card failure will cause two ASM disks to be dropped.
The percentages in the table below account for this configuration when patch 34281503 is installed (GI 19.x or higher is recommended)

Prior to the fix in bug 32166950, REQUIRED_MIRROR_FREE_MB in V$ASM_DISKGROUP didn’t reflect the values shown in table 1 above. However, the value of REQUIRED_MIRROR_FREE_MB was changed with bug 32166950 to directly show the values recommended in table 1. After the advent of Smart Rebalance (see the section below, "Exadata Smart Rebalance for High Redundancy Disk Groups" for more details) and with the bug fix in 35177768, REQUIRED_MIRROR_FREE_MB is set to zero when a disk group is high redundancy in version 19.x and higher.

We recommend applying the collection of improvements available in BLR 35394920 to obtain the latest recommended values and other fixes related to ASM space management.

Please note that Oracle has supplied an Exachk check for many years to ensure systems have sufficient free space to rebalance disk groups after a disk failure. The check is called “Verify there is enough diskgroup freespace for a rebalance operation”.

To validate if your disk groups have sufficient space, run the specific Exachk check after upgrading to AHF release 23.4 or higher :

#exachk -check B516E6BD64DC3012E0431EC0E50A83E8,655C2F41EBD4D8D2E053D398EB0A46B7 -excludeprofile switch

An example of the output for the check is:

PASS => There is enough disk group free space for a rebalance operation

DATA FROM EXADB01 - VERIFY THERE IS ENOUGH DISK GROUP FREE SPACE FOR A REBALANCE OPERATION 

SUCCESS: Disk group DATAC1 has 68.6% free space which is enough to complete a rebalance operation if a disk fails
	 Disk group DATAC1 HIGH Redundancy, Required Minimum Percent Free = 11%, Required Minimum Free MB = 41246064
	 Disk group DATAC1 has Total MB = 374964224, Free MB =  257196660 , Usable MB = 71983531
	 Number of failgroups = 5

SUCCESS: Disk group RECOC1 has 94.4% free space which is enough to complete a rebalance operation if a disk fails
	 Disk group RECOC1 HIGH Redundancy, Required Minimum Percent Free = 11%, Required Minimum Free MB = 10312248
	 Disk group RECOC1 has Total MB = 93747712, Free MB = 88452120, Usable MB = 26046623
	 Number of failgroups = 5

Model       : Oracle Corporation ORACLE SERVER X10-2L_EXTREME_FLASH
Eighth Rack : FALSE

Oracle recommends running the above Exachk report frequently to stay up-to-date with disk group capacity.

If your disk group does not have sufficient free space, Oracle recommends:

Resizing the disk group to ensure sufficient free space is available (sometimes free space may be over-allocated in one disk group and can be resized down to give some free space to another disk group - see note 2176737.1)
If #1 is not possible, add one or more storage cells to the infrastructure and retry #1

Calculating Disk Group Capacity

Usable DG capacity is calculated by taking into account the reserve space as well as mirroring for each kind of failure coverage:

Disk Usable File MB = (FREE_MB - Disk Required Mirror Free MB) / 2 (or divide by 3 for high redundancy)

Where,

FREE_MB is the raw free space in the disk group in MB.

Disk Required Mirror Free MB is the amount of space that should be reserved for disk failure coverage (as explained above in Calculating Reserve Space for Failure Coverage). Compute this as:

Disk Required Mirror Free MB = Required % Free space (from table 1 or 2 above) X size of the disk group (from V$ASM_DISKGROUP.TOTAL_MB)

Please note that you must monitor V$ASM_DISKGROUP.FREE_MB to ensure that it never goes below the required amount of free space (Disk Required Mirror Free MB) .

If you need to change the space allocation of your disk groups, please see the Oracle Exadata documentation for instructions on adding or resizing your disk groups to increase space in one disk group while shrinking another disk group.

Exadata Smart Rebalance for High Redundancy Disk Groups

The Exadata Smart Rebalance feature adjusts how Exadata handles disk failure when there is insufficient space to rebalance the affected disk groups. This feature effectively increases the usable space in disk groups by eliminating the need to reserve free space to accomodate a rebalance after a disk failure. The Exadata Smart Rebalance feature works in conjunction with high redundancy disk groups only, and is available starting with Exadata Server Software version 19.1 and Oracle Grid Infrastructure version 19.3.

Without the Smart Rebalance feature, Exadata responds to disk failure by forcibly dropping the disk and performing a rebalance operation. If a disk group contains insufficient free space, the rebalance terminates with an ORA-15041 error, and the disk group is left in a potentially unbalanced state and without full redundancy.

With the Smart Rebalance feature, Exadata software first determines whether there is sufficient free space to complete the rebalance operation. If there is enough space, Exadata automatically drops the disk and proceeds with the rebalance. If there is insufficient free space, then Exadata instructs ASM to OFFLINE the disk, which avoids a failed rebalance operation. Then, after physical disk replacement, Exadata directs ASM to perform a REPLACE DISK operation, which efficiently reconstructs just the lost contents of the failed disk. Finally, ASM automatically brings the disk online after completion of the disk replacement operation. Smart Rebalance only applies to disk failures, not to cell failures. If a cell fails and failgroup_repair_timer expires, then a rebalance will be attempted regardless of whether Smart Rebalance can be used for disk failures.

Note that affected disk groups must run with reduced redundancy for the whole time that the disk is offline. During this time, failure of additional partner disks would further compromise redundancy, and in the worst-case scenario, data loss is inevitable if all partner disks fail. Consequently, as best practice, Oracle recommends maintaining sufficient free space to conduct a rebalance operation to ensure that triple redundancy will be restored after a disk failure and give you greater resiliency in the event of a disk failure or an online storage cell planned maintenance operation (which temporarily reduces redundancy to two copies while a cell is offline). Smart Rebalance will still be used regardless if the following recommendations are heeded or not.

If you choose to rely on the Smart Rebalance feature to maximize storage capacity, then you should meet the following recommendations to mitigate the risk of failure to additional partner disks and possibly a lost disk group:

1. The disks must be less than five (5) years old and must be 8TB or greater in size (see MOS Document 2075007.1)

2. Oracle ASR must be fully configured for both ILOM and OS usage

3. Your operations staff is responsive to disk failures and maintain up-to-date operational procedures

4. You replace failed disks within 48 hours. To determine how quickly you historically replace failed disks, you can analyze the cell alert history by using the following command:

dcli -g cell_group -l root "cellcli -e list alerthistory" | grep -i 'hard disk' | egrep -i 'fail|normal’

Note that the above command shows events retained in each cell alert history, which may only cover a brief period, depending on the frequency of alert generation and the size of the alert history. In this case, you may need to consult your Oracle Service Request history or other maintenance logs for the required information. You are also encouraged to maintain knowledge across your personnel regarding the location and usage of on-site disk “spares” kits.

Requirements 1 and 2 are considered the basic recommendations for using Smart Rebalance. Requirements 3 and 4 must be assessed and maintained by you in addition to the basic recommendations. If you cannot meet these recommendations, Oracle recommends that you do not rely on Smart Rebalance.

In summary, here are the pros and cons of Exadata Smart Rebalance:

Smart Rebalance Tradeoffs
	Pros	Cons
Using Smart Rebalance	Usable space is maximized	Higher risk of disk group loss if additional partner disks fail before replacing them
Reserving Space for DFC*	Availability is maximized because failed disks are automatically rebuilt/rebalanced	Usable space is less (per Table 1 or 2 above)

* MAA-recommendation

Considerations when Adding Space to an Existing Disk Group

Storage cells can be added when additional storage is needed. Only a small amount of free space is needed in each disk group to ensure new storage can be added; this is computed as:

Minimum free space needed in each disk group when adding storage = (64MB X rebalance power X total number of disks currently in a disk group)

"rebalance power" is the rebalance power used when adding the storage

"number of disks" is the total number of disks in a disk group (e.g., for a quarter rack there are 3 cells with 12 HC disks for a total of 36 disks)

For example, if we are adding storage to an existing rack with 3 HC storage servers (12 disks per storage server) for a total number of 36 disks for each disk group, and we plan to add the new disks using rebalance power 4, the free space needed would be:

Minimum free space needed in each disk group for adding storage = (64 MB X 4 X 36 ) = 9216 MB

If your system does not have the fix for bug 33317279, then you should compute an additional amount of free space per disk group using this formula:

Additional free space needed in each disk group when adding storage = (Number of disks being added * disk size in TB / 1.73TB + 2) * Number of mirrors (either 2 or 3) * 4MB

This additional free space requirement is added to the "Minimum free space needed in each disk group when adding storage" computation above for a total amount of free space that is needed for each disk group.

Oracle recommends performing a rebalance on all disk groups and ensure they complete successfully prior to attempting to add additional storage after applying the fix for 33317279.

After the fix is applied and a rebalance operation completes without error on all disk groups, then you will not need to add this additional free space for future storage additions.

For example, if we are adding a single storage server to an existing rack that has 3 storage servers, and the ASM disk size for one of the high redundancy disk groups is 16 TB, then the additional space for that disk group would be:

Additional free space needed in each disk group when adding storage =( 12 X 16 TB/1.73 TB + 2 ) X 3 mirrors X 4 MB = (12 X 9.25 + 2) X 12 = ( 111 + 2) X 12 = 1356MB

This calculation would need to be repeated for each disk group and added to the results of the "Minimum free space needed in each disk group when adding storage above".

For the examples above, the total minimum free space needed for the disk group in our calculations would be: 9216 + 1356 MB = 10572 MB

If you have just applied the fix for bug 33317279 and but have not yet successfully rebalanced the disk groups, then you should add this additional space requirement.

Do not attempt to add storage unless your disk groups have the minimum amount of space needed as indicated above.

Once additional storage is available, one should attempt to ensure that all disk groups are sized to allocate at least enough free space to accommodate the DFC equirements mentioned above.

Oracle Cloud Considerations

When storage is added in the Oracle cloud, the storage is added in two phases:

Add grid disks on the new cells to existing disk groups. The size of the griddisks being added will be the same size as the existing cells. After the disks are added, the total size of the disk groups will be greater than they were before the cells were added since more grid disks would be present. Therefore, to keep the disk group size the same, the grid disks should be shrunk down such that the total size of the disk group will be the same as it was originally. This will provide free space to allocate to existing disk groups in various clusters.
Shrink the disk group to keep the size of the disk groups the same as when the storage was added while ensuring space for DFC.
1. If there is insufficient free space to successfully resize the disk groups down to their original size (and ensure DFC), then the cloud automation will not attempt to shrink the disk groups. This means that in some cases after the storage is added, the size of the disk groups are larger than their size at the time the storage was added, and the additional space that was added will not be available as free space to allocate to other clusters and/or disk groups.

Additional Considerations

Its important to have a monitoring strategy using EM or custom scripts (possibly utilizing the attached script) to ensure that you keep FREE_MB above Disk Required Mirror Free MB at all times.
On versions prior to 12.1.0.1, it is a good idea to extend the time of the ASM disk group attribute disk_repair_time to a value that reflects the expected time to repair a cell (e.g., 24 hours). Since Oracle 18c, the value of disk_repair_time has increased from 3.6 hours to 12 hours. This will prevent a rebalance from starting and instead will use the much faster resync capability when the cell is back online. If a rebalance does start, it is expected that you will not have space to complete the rebalance and at some point you may receive an ORA-15041 error. When the cell is returned to service, the disks will be added back in and the ORA-15041 will be cleared when the rebalance completes (as long as the disk group has sufficient free space of course). As of version 12.1.0.1, there is a new disk group attribute, failgroup_repair_time, that governs the time a cell is allowed to be offline before its disks are dropped. This value defaults to 24 hours thus obviating the need to alter the disk_repair_time attribute as in older versions.
Starting with GI version 19.16, you will see messages in the alert.log when there is insufficient space to successfully rebalance a disk group. These messages will look like this:
WARNING: No free space to complete rebalance after disk failure: diskgroup XXXXX
If there is suffficient space available again, you will see a message like this:

INFO: Free space available to complete rebalance: diskgroup XXXXX
These messages can be monitored in the alert.log so administrators can take corrective actions.

List of Recommended Bug Fixes

List of Recommended Bug Fixes
Bug 35640761 - ADR ALERT MINING ENHANCEMENT FOR ASM SMART REBAL RELATED LOG MESSAGE CHANGES
Bug 35285795 - EXADATA: ISSUE INSUFFICIENT SPACE FOR RESTORING REDUNDANCY WITH SMART REBALANCE
Bug 33317279 - LNX-19C-GI: ASM DISKGROUP REBALANCE FAILED WITH "ORA-00600: INTERNAL ERROR CODE, ARGUMENTS: [KFDVAXTNTMIGRATE_BN]" FOLLOWING AN ADD DISK OPERATION
Bug 32166950 - CORRECT VALUE OF REQUIRED_MIRROR_FREE_MB AND USABLE_FILE_MB FOR EXADATA SYSTEMS
Bug 35177768 - EXADATA: V$ASM_DISKGROUP.REQUIRED_MIRROR_FREE_MB DOES NOT FACTOR IN SMART REBALANCE
BUG 35394920 - BACKPORT OF 35285795 ON DATABASE RU 19.19.0.0.0 (BLR #10302485)

REFERENCES

NOTE:1467056.1 - Resizing Grid Disks in Exadata: Examples
NOTE:1070954.1 - Oracle Exadata Database Machine Exachk
NOTE:1465230.1 - Resizing Grid Disks in Exadata: Example of Recreating RECO Grid Disks in a Rolling Manner
NOTE:1464809.1 - Script to Calculate New Grid Disk and Disk Group Sizes in Exadata