Best Practices and Recommendations for RAC databases with SGA size over 100GB (Doc ID 1619155.1)

In this Document

APPLIES TO:

Oracle Database Cloud Exadata Service - Version N/A and later
Oracle Database Cloud Service - Version N/A and later
Oracle Database - Enterprise Edition - Version 11.2.0.3 and later
Oracle Database Backup Service - Version N/A and later
Oracle Database Cloud Schema Service - Version N/A and later
Information in this document applies to any platform.

PURPOSE

The goal of this note is to provide best practices and recommendations to users of Oracle Real Application Clusters (RAC) databases using very large SGA (e.g. 100GB) per instance (note that RAC assumes homogeneously sized SGAs across the cluster). This document is compiled and maintained based on Oracle's experience with its global RAC customer base.

This is not meant to replace or supplant the Oracle Documentation set, but rather, it is meant as a supplement to the same. It is imperative that the Oracle Documentation be read, understood, and referenced to provide answers to any questions that may not be clearly addressed by this note.

All recommendations should be carefully reviewed by your own operations group and should only be implemented if the potential gain as measured against the associated risk warrants implementation. Risk assessments can only be made with a detailed knowledge of the system, application, and business environment.

As every customer environment is unique, the success of any Oracle Database implementation, including implementations of Oracle RAC, is predicated on a successful test environment. Oracle Support has identified 100 GB as a baseline for large SGA's that would benefit from the recommendations provided in this note. However, this is just a baseline, and it is possible for similar(but smaller) SGA's to benefit from these recommendations. It is thus imperative that any recommendations from this note are thoroughly tested and validated using a testing environment that is a replica of the target production environment before being implemented in the production environment to ensure that there is no negative impact associated with the recommendations that are made

SCOPE

This article applies to all new and existing RAC implementations.

This is for RAC databases only as most of the parameters listed in here are for RAC Database only.

DETAILS

Note that the recommendations presented in this note are a result of the experience from working on databases with SGA of 1 TB and 2.6 TB.
However, the databases with SGA of 100GB and 300GB also benefited from the recommendations

Also, some recommendation is removed for 18.1 and above, so check if the recommendation is applicable to your database.

Note: ORAchk 18.2 and above can be used to validate the proper settings for Large SGA Databases (those documented in this MOS Document. Though the check is available within ORAchk 18.2, it is always recommended to use the latest version of ORAchk which is available via <Document <span>1268927.2> to ensure you are receiving the most up-to-date information.

Download latest AHF. Refer ti Autonomous Health Framework (AHF) - Including TFA and ORAchk/EXAchk Document 2550798.1

init.ora parameters:

a.      Set _lm_sync_timeout to 1200    (this recommendation is valid only for databases that are12.2 and lower)
           Setting this will prevent some timeouts during reconfiguration and DRM. It's a static parameter and rolling restart is supported.

b.      Set shared_pool_size to 15% or larger of the total SGA size.
        For example, if SGA size is 1 TB, the shared pool size should be at least 150 GB. It's a dynamic parameter.

c.      Set _gc_policy_minimum to 15000
        There is no need to set _gc_policy_minimum if DRM is disabled by setting _gc_policy_time = 0. _gc_policy_minimum is a dynamic parameter, _gc_policy_time is a static parameter and rolling restart is not supported. To disable DRM, instead of _gc_policy_time, _lm_drm_disable should be used as it's dynamic.

Note: 15000 is the new default in 23c, 19c DBRU JUL '23, and 19c ADB. Customer won't have to tune this parameter any more in those releases or later.
   This is due to internal bug 34729755.

d.      Set _lm_tickets to 5000    (this recommendation is valid only for databases that are12.2 and lower)
        Default is 1000.   Allocating more tickets (used for sending messages) avoids issues where we ran out of tickets during the reconfiguration. It's a static parameter and rolling restart is supported. When increasing the parameter, rolling restart is fine but a cold restart can be necessary when decreasing.

e.      Set gcs_server_processes to the twice the default number of lms processes that are allocated.    (this recommendation is valid only for databases that are12.2 and lower)
        The fix is also included in the 12.2.0.1 JUL 2018 database RU, so this does apply to the database that is running on 12.2.0.1 JUL 2018 or higher.
        The default number of lms processes depends on the number of CPUs/cores that the server has,
        so please refer to the gcs_server_processes init.ora parameter section in the Oracle Database Reference Guide
        for the default number of lms processes for your server. Please make sure that the total number of lms processes
        of all databases on the server is less than the total number of CPUs/cores on the server. Please refer to the Document 558185.1
It's a static parameter and rolling restart is supported.

f.      Set TARGET_PDBS to the number of PDBs that are planned to be running in the CDB. Do not add seed and root in this count. (This recommendations is valid for 12.2 databases and higher)
       The default value of TARGET_PDBS, especially for databases with a large sga_target setting, is known to cause performance and instance eviction issues.
       For detailed description of issues related to target_pdbs, refer to the Document 2644243.1

Setting up hugepages is a general recommendation for all Linux users, but using hugepages is particularly more important for database that has large SGA.
In other words, setting up hugepages when SGA is large is a critical recommendation.

For other platforms, consider using large pages if possible.

Following patches are recommended:

11.2.0.3.5 DB PSU or above is highly recommended to address known issues with large SGA sizes.

For 11.2.0.4 customers: applying the following bug fixes can helpBUG 12747740 - RAC PERF: NODE JOIN RECONFIGURATION (PCMREPLAY) DOES NOT SCALE WITH MORE LMS'S
BUG 14193240 - LMS SIGNALED ORA-600[KGHLKREM1] DURING BEEHIVE LOAD
BUG 16392068 - MSGQ: LMS0 HITS ORA-600 [KJBMPOCR:DSB]
BUG 17232014 - INITIAL ALLOCATION FOR KJBR&KJBL ARE TOO LOW W/ LARGE CACHES DUE TO UB4 OVERFLOW
BUG 17257445 - RAC PERF: DRM OPTIMIZATION (BUG 14558880) SHOULD ALSO WORK FOR RECONFIGURATION
BUG 17314971 - RAC PERF: RM/PT LATCH REDUCTION FOR RCFG (17257445) SHOULD BE ENABLED FOR SYNC7

For SGA that is larger than 4 TB and for Linux platform,
BUG 18780342 - LINUX SUPPORT FOR > 4TB SGA

REFERENCES

NOTE:558185.1 - LMS and Real Time Priority in Oracle RAC 10g and 11g
NOTE:1392248.1 - Auto-Adjustment of LMS Process Priority in Oracle RAC with 11.2.0.3 and later
NOTE:2550798.1 - Autonomous Health Framework (AHF) - Including TFA and ORAchk/EXAchk

NOTE:2644243.1 - Performance Issues when using PDBs with Oracle RAC 19c and 18c

Search This Blog

Abdul Hafeez Kalsekar -- Oracle database tuning case studies