Friday, July 19, 2024

Oracle Database 23ai – Real-Time SQL Plan Management

One of the most frequent questions I get asked is “why did the Oracle Optimizer change the execution plan on a SQL statement?”  Generally, this arises when a statement goes from running well to suddenly performing poorly.  In short, something has changed in a way that the Optimizer thinks the data should be accessed.  Which then leads to the question, why doesn’t it know that the new performance is worse?

Well not to solve all problems but Oracle database 23ai introduces a new feature called Real-Time SQL Plan Management.  This feature builds on-top of existing features Automatic SQL Tuning Sets (Auto STS) and the current SQL Plan Management (SPM).  To summarize, the Oracle database uses these features in together in the following manor:

  • Auto STS captures execution plans of “most” SQL statements and stores then in a system managed SQL tuning set called “SYS_AUTO_STS”
  • At SQL execution time, the Optimizer decides if a new plan is warranted for a SQL statement based on previous execution information and current estimated execution plan.  The Optimizer will then execute the new adjusted plan and compare it to previous runs.
  • If the new plan is better a SQL Plan Baseline will be automatically created for that new plan.  If during future executions of the SQL the performance regresses again the Optimizer will re-analyze and possible disable the baseline or create another baseline as needed.

All that work you may have been doing to identify a poor running SQL, see how to run it better, and then putting in a “fix” such as a baseline is now 100% automated.

Often, we shy away from new features for fear that they will cause more harm than benefit.  Personally, I have adopted many new features from Oracle over the past 30+ years.  My goal here is to shed some light on how this new feature works to help reduce the fear, uncertainty, and doubt that might arise. 

Let’s look at a quick example, note this is running in Oracle Cloud Infrastructure (OCI) with Oracle Database Enterprise Edition 23.4.1.24.06 as we patiently await the on-premises release.  I’m skipping over some of the mechanics to focus more on the results of real-time SPM that you might see and want to understand what occurred.

First for this to work we need to enable the Real-Time SPM (RT SPM) task:

exec dbms_spm.configure('AUTO_SPM_EVOLVE_TASK', 'AUTO')

Note in 19c the EVOLVE task was either on or off and did not do real time evaluation.  AUTO is a new setting for 23ai.

You can check your current settings with the following query:

SELECT PARAMETER_NAME, PARAMETER_VALUE     FROM DBA_SQL_MANAGEMENT_CONFIG;

PARAMETER_NAME                           PARAMETER_VALUE
---------------------------------------- ------------------------------
AUTO_SPM_EVOLVE_TASK                     AUTO

Once the Optimizer detects that a better execution plan exists, it may decide to try it.  If it does you will see a note in the explain plan:

Note -----    - This is SQL Plan Management Test Plan

You will see this when the Optimizer runs the SQL with the possibly better execution plan.  Note, you may find it hard to capture this moment in a real system.  

If the new execution plan performs better, a SQL Plan Baseline (or baseline) will be created for the new execution plan as well as the previous execution plan:

SQL_HANDLE           PLAN_NAME                      ORIGIN             ENA ACC   COST  -------------------- ------------------------------ ------------------ --- --- ------ SQL_2044b318726d1c30 SQL_PLAN_20j5m31t6u71h169efe64 FOREGROUND-CAPTURE YES YES 151184 SQL_2044b318726d1c30 SQL_PLAN_20j5m31t6u71h8059011b FOREGROUND-CAPTURE YES NO  271739

Note the source of the baseline “FOREGROUND-CAPTURE” this means that the Optimizer made this baseline during a SQL execution after analysis.  We can see a little bit more information from the new NOTES column of the DBA_SQL_PLAN_BASELINES view:

SELECT p.plan_name, p.foreground_last_verified,
       pfspm.status result, pfspm.ver verify_type,
       pfspm.test_phv, pftp.elps, pfspm.ref_phv, pfrp.elps
  FROM dba_sql_plan_baselines p,
    XMLTABLE(
         '/notes'
         passing xmltype(p.notes)
         columns
             plan_id         NUMBER    path 'plan_id',
             flags           NUMBER    path 'flags',
             fg_spm          XMLType   path 'fg_spm') pf,
     XMLTABLE(
         '/fg_spm'
         passing pf.fg_spm
         columns
             test_phv        NUMBER         path 'test_phv',
             ref_phv         NUMBER         path 'ref_phv',
             ver             VARCHAR2(8)    path 'ver',
             status          VARCHAR2(8)    path 'status',
             ref_perf        XMLType        path 'ref_perf',
             test_perf       XMLType        path 'test_perf' ) pfspm,
    XMLTABLE(
        '/ref_perf'
        passing pfspm.ref_perf
        columns
             bg              NUMBER         path 'bg',
             cpu             NUMBER         path 'cpu',
             elps            NUMBER         path 'elps' ) pfrp,
    XMLTABLE(
        '/test_perf'
        passing pfspm.test_perf
        columns
             bg              NUMBER         path 'bg',
             cpu             NUMBER         path 'cpu',
             elps            NUMBER         path 'elps' ) pftp
  WHERE notes IS NOT NULL
    AND p.sql_handle = 'SQL_2044b318726d1c30'
  ORDER BY p.foreground_last_verified;

PLAN_NAME                      FOREGROUND_LAST_VERIFIED     RESULT  VERIFY_T  TEST_PHV ELPS    REF_PHV ELPS ------------------------------ ---------------------------- ------- -------- --------- ---- ---------- ---- SQL_PLAN_20j5m31t6u71h169efe64 19-JUL-24 01.27.50.000000 PM better  normal   379518564 2775 2153316635 3674

This shows us that the elapsed time was better for the tested execution plan compared to the reference (existing) plan.  There is more data in the NOTES column, but this is a helpful way to see what the Optimizer did to verify the performance.  In this case the elapsed time was lower for the newly tested plan.

You may wonder, what happens if things change again?   What if this new plan is not the best plan in the future?  Well, the Real Time SPM will look for that also, when the SQL would degrade due to Optimizer execution plan generation estimation, it will not get used.  Again, in the notes section of an explain plan you may see this:

Note

-----

   - Failed to use SQL plan baseline for this statement

To view the execution plan for a given baseline, use the DBMS_XPLAN package:

SELECT *
    FROM table(DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE(
             sql_handle=>'SQL_2044b318726d1c30 ',
             plan_name=>'SQL_PLAN_20j5m31t6u71h169efe64',
             format=>'typical'));

-------------------------------------------------------------------------------- SQL handle: SQL_2044b318726d1c30 SQL text: select /* SPM_TEST_QUERY_Q1 */ sum(t2.amount) from   sales_area1 t1,                 sales_area2 t2 where  t1.sale_code = t2.sale_code and           t1.sale_type  = 1 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Plan name: SQL_PLAN_20j5m31t6u71h169efe64         Plan id: 2727377809 Enabled: YES     Fixed: NO      Accepted: YES     Origin: FOREGROUND-CAPTURE Plan rows: From Auto SQL Tuning Set -------------------------------------------------------------------------------- Plan hash value: 2727377809 ----------------------------------------------------------------------------------------------------- | Id  | Operation                             | Name        | Rows  | Bytes | Cost (%CPU)| Time     | ----------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT                      |             |       |       |   151K(100)|          | |   1 |  SORT AGGREGATE                       |             |     1 |    33 |            |          | |*  2 |   HASH JOIN                           |             |    62M|  1966M|   151K  (1)| 00:00:06 | |   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| SALES_AREA1 | 25000 |   170K| 15238   (1)| 00:00:01 | |*  4 |     INDEX RANGE SCAN                  | SALES_TYP1I | 25000 |       |   235   (0)| 00:00:01 | |   5 |    TABLE ACCESS FULL                  | SALES_AREA2 |  2500K|    61M|   135K  (1)| 00:00:06 | -----------------------------------------------------------------------------------------------------

Note that there is additional information about the baseline provided by the explain plan and that the plan itself comes from the auto SQL Tuning Set which is where the baseline was created from.

I’m really excited for this new feature and how it can help proactively with SQL performance issues.  Oracle will now detect if the SQL performance has degraded from a previously capture execution plan and then test a possible fix and implement a baseline if the fix is acceptable.  Seems like the answer to “why did the Optimizer change the execution plan?” is, just wait a moment and it will fix itself.

Be sure to check out the Oracle blog by Nigel Bayliss which includes a link to example code he created that was shown in this blog.

Thursday, June 6, 2024

Oracle RAC install and Linux 8 - SSH connectivity issue INS-06006

 Many systems are getting upgraded to Linux version 8 due to the impending end of life for Linux 7.  This includes both RedHat and Oracle Enterprise Linux.  In general most are choosing to do a new build out and migrate their databases from the previous system to the new system.  This is the most supportable model, though not the only.

One issue you may run into is installing either Oracle RAC Grid Infrastructure or Oracle Database in a clustered environment.  During the runInstaller or gridSetup.sh steps you may receive an error about "Passwordless SSH connectivity not setup" even though you have set it up and manually verified it.  This is due to a change in the SSH 8 to deal with a security vulnerability (CVE-2019-6111).  You will receive a message like this:

[FATAL] [INS-06006] Passwordless SSH connectivity not set up between the following node(s): [srvr02].

   CAUSE: Either passwordless SSH connectivity is not setup between specified node(s) or they are not reachable. Refer to the logs for more details.

   ACTION: Refer to the logs for more details or contact Oracle Support Services.

When you manually verify SSH connectivity everything works:

$ for i in srvr01 srvr02; do ssh $i hostname -s ; done
srvr01
srvr02

The fix is to make sure you are applying the RU / patch to the software during the install process.  Basically forcing the usage of --applyRU for runInstaller or gridSetup.sh.  This is also the recommended method for all patching / installs going forward (expect more of this in 23ai version).  For 19c you need to use 19.6 or above RU.

For GI install your command line would look something like:

/u01/app/19/grid_1/gridSetup.sh --applyRU /u01/app/grid/patch/36233126

There is a little caveat for this when it comes to database software install.  Normally you would run the install from the first node of the cluster and the installer would replicate the software to each node, using SSH.  The issue is that the RU for the database does not directly include the fix for SSH (Bug 30159782 - Remote Copy Fails if using openssh-7.2. 7.4, etc. (Doc ID 30159782.8)).  You have to include the OCW (Oracle ClusterWare) patch into the runInstaller steps.  The OCW patch is included in the GI RU not the database RU.

Image 1
The easiest way to deal with this is the use the Grid Infrastructure (GI) RU patch for both the GI home and the database home.  Though the syntax will look slightly different for the database home.  We will use the recent April 2024 RU as an example.  For the 19.23 GI RU we see the following patch contents on the download screen (see image 1).  In the red box's we can see that both the database patch and the OCW patch are included.

Once you have the the patch downloaded and unzipped we can verify that from the directory structure

36233126/
├── 36233263  <- Database RU 19.23
├── 36233343
├── 36240578  <- OCW patch 19.23
├── 36383196
├── 36460248
├── automation
├── bundle.xml
├── README.html
└── README.txt

So the steps for setting up a database home on RAC with Linux 8 would look like this:
  1. Stage the patch files E.G. /u01/app/oracle/patch (along with any additional patches)

    unzip -qo p36233126_190000_Linux-x86-64.zip -d /u01/app/oracle/patch


  2. Unzip the install media (19.3) into the home E.G. /u01/app/oracle/product/19/dbhome_2

    unzip -qo LINUX.X64_193000_db_home.zip -d /u01/app/oracle/product/19/dbhome_2


  3. Update the OPatch version in the new home (be sure to download the latest OPatch from patch 6880880 in MOS)

    rm -rf /u01/app/oracle/product/19/dbhome_2/OPatch

    unzip -qo p6880880_190000_Linux-x86-64.zip -d /u01/app/oracle/product/19/dbhome_2


  4. Install the software and apply the patches in one command (be sure to updated the highlighted items based on your configuration).  Here we are installing the RU, OCW, OJVM, and Datapump bundle patches.

    export CV_ASSUME_DISTID=OEL7.8

    /u01/app/oracle/product/19/dbhome_2/runInstaller \
    -silent -ignoreprereqfailure \
    -applyRU /u01/app/oracle/patch/36233126/36233263 \
    -applyOneOffs
    /u01/app/oracle/patch/36233126/36240578,/u01/app/oracle/patch/36420641,/u01/app/oracle/patch/36199232 \
    oracle.install.db.InstallEdition=EE \
    oracle.install.option=INSTALL_DB_SWONLY \
    INVENTORY_LOCATION=/u01/app/OraInventory \
    ORACLE_BASE=/u01/app/oracle \
    UNIX_GROUP_NAME=oinstall \
    oracle.install.db.CLUSTER_NODES=srvr01,srvr02 \
    oracle.install.db.OSDBA_GROUP=dba \
    oracle.install.db.OSOPER_GROUP=oper \
    oracle.install.db.OSBACKUPDBA_GROUP=backupdba \
    oracle.install.db.OSDGDBA_GROUP=dgdba \
    oracle.install.db.OSKMDBA_GROUP=kmdba \
    oracle.install.db.OSRACDBA_GROUP=racdba 

    Note: back slashes (\) are used to continue the command on to individual lines.  Also the patch list is comma separated with no spaces, and be sure to export the CV_ASSUME_DISTID which is required for Linux 8 when installing 19c.

  5. After runInstaller finishes, run the root.sh script on every node you installed on.
If you make a mistake or have issues during the runInstaller, just remove the dbhome directory and start over.

Now you should have a new 19.23 home installed on all nodes of the cluster with the OCW patch, JVM, and DataPump patches all ready to go!

Thursday, May 16, 2024

Serious issue - server crash - Oracle database 19c and Oracle Enterprise Linux 8 Update 8 and above

 

I'm a tad behind on this but think it's a critical issue. This could be a repeat of something you may have already heard about or read about.  I just want to make sure the information is getting out.

There is a serious issue with Oracle database 19c running on Oracle Enterprise Linux 8 Update 8 and above.  The issue starts with database 19.21 and above (October 2023 RU) when running on OEL 8 with the UEK kernel UEK7U1 and above (5.15.0-10* kernel released in April 2023).  If you are running a system with HUGEPAGES memory setup and have the ASM filter driver or ASM Clustered Filesystem (ACFS) kernel modules installed this can lead to a server crash (Linux kernel panic).

We were able to re-create the issue constantly when running a RMAN restore.  Criticality to me is increased as this would affect any large database systems (those using HUGEPAGES and ASM together).

See the following MOS document:
ACFS Kernel Panic (RIP: ofs_teardown_diop_osd) After Updating OS Kernel to UEK7U1 or Later (Doc ID 2998947.1)

Or Mike Dietrich's blog:

Can you use the UEK7 Linux kernel, or may you get some trouble?

The fix for this problem? You have two basic solutions:

  • Stay on UEK7 (5.15.0-3.xxx) until you can upgrade your GI / ASM home and kernel modules to 19.23 (April 2023 RU)
  • Apply the April 2023 RU to your GI / ASM home and make sure the ACFS / ASFD kernel modules are update (this is done automatically)
There is a one-off patch, but it seems to be for a specific configuration / combination.  If you are on the specific kernel version 5.15.0-201.135.6 and running 19.22 ASM / GI (Jan 2024 RU) then there is a patch that can be applied:
     Patch 35983839 - UEK7U2 SUPPORT FOR ACFS, AFD (V5.15.0-201.135.6)

Since Linux kernels update frequently, and corporate security teams want frequent updates, I would lean towards taking up the 19.23 (April 2023 RU) since that is the fully supported minimum supported version for UEK7U2 based on the support matrix:


The issue seems to be first acknowledged in November of 2023, I'm a little behind as I always recommend enterprise customers use a N-1 patching methodology.  So here we are in Q2 of 2024 hitting a issue that is part of Q1 2024 as we uptake a October 2023 Linux kernel with a Jan 2024 database patch.  The good news is, the fix is readily available at this point and our client can move forward instead of backwards.

Hopefully this gives you enough information to be preemptive on the fix and not have systems crashing.

One final reference, a good blog on the OEL kernel release names and dates:


Gary

Wednesday, April 10, 2024

Putting the You In User Groups: UTOUG Training Days 2024

Putting the You In User Groups: UTOUG Training Days 2024

Gary Gordhamer, Managing Principal Consultant, Viscosity NA

We both recently attended and spoke at the Utah Oracle User Group (UTOUG) Training Day event that took place in Salt Lake City Utah on March 21st and 22nd.  This was an outstanding event and the volunteers and conference committee of UTOUG did an outstanding job bringing together dozens of excellent presenters, many of them (like us) Oracle ACEs. 

Finding time to learn, meet our fellow IT specialists, or even have a few moments to plan what the future holds is almost impossible.  The benefits of attending a well-planned in-person event is one of the best ways to meet all these needs. 

In case you doubt that, here’s the list of most in-demand skills for 2024, according to LinkedIn. The top three “soft” skills are Communication, Customer Service, and Leadership.

Still not convinced? In 2023, LinkedIn listed the most in-demand “hard skills” -   things that are not common across most jobs – and the top three on the list were Software Development, SQL, and Finance. 

We both agree that UTOUG Training Days hit every one of these items - on both lists. Whether you might have been utterly new to IT, or a seasoned professional looking to learn new skills while strengthening others, this event compressed all of that into a short two-day event.  If you’re feeling a bit of “Fear of Missing Out” (FOMO), well, that’s our point. And if you’re a manager and you feel like your team missed out, well, that’s good too. – that’s your call to action.

A couple of unique items that UTOUG included in their event: bringing students from nearby regional colleges into the conference, funding scholarships for students, and finishing with a ski day on Saturday for those interested to keep the community atmosphere.  These are critical components to building the next generation of IT professionals and helping employers build their teams.   Again, managers: If you are struggling to find the right talent, maybe you need to invest one or two days into looking where the talent actually is as opposed to casting wide nets into the ocean of the internet.

Day One: Leaders Are Made … and They Are Magnetic

Our first day kicked off with an amazing keynote by Elena Marquetti-Ali about magnetic leadership.  A little tidbit from the session: 85% of managers have no leadership training! So that means everyone who attended Training Days now has more training than 85% of the leaders out there.  This speaks directly to two of the most in-demand skills.  Her “magnetic” 30 minute discussion was covered in a single slide.  Impactful and direct, for those of us armchair speakers, this was a sight to behold beyond the great content.  

Following the keynote was  25 deep technical sessions over five time slots that covered everything from SQL, Cloud, Machine Learning, and even team building.  Breakfast and lunch were included, and we had 15 minutes between each session to spend time meeting our fellow attendees and find out about their takeaways or key challenges.  Hmmm … problem-solving, communication, teamwork, and research … it seems we just hit a few more of those key skills the folks at LinkedIn already identified as crucial to success.

You may ask yourself, how good could these sessions be?  Well they were presented by 31 great people from three different continents.  How great?  Well at least 20 of them are recognized by Oracle corporation as Oracle ACEs for their knowledge and outreach in technical field.  Meanwhile all the topics and speakers were selected through a volunteers review process.  That means users help drive the content, and the content is created by the some of the best.  Have a topic that needs deep technical dive and more outreach or training?  Let the user group know!

Day Two: Humans 25, AI 1

Day Two started with breakfast and another gathering time for all the attendees to learn from each other, followed by another 25 deep technical sessions covering performance tuning, mentoring, operations, and even a great discussion about how even in the face of ever-more-powerful Artificial Intelligence, humans are still irreplaceable. 

In our combined opinion, this was a wonderful conference that hit all the requirements of learning, community, skill building and putting you the IT professional first. Thanks to all the leaders in the UTOUG, especially their conference committee members, for building an amazingly well-rounded event!  

And for those of you who really hoped to attend, but missed it for any number of reasons: Be sure to remember to watch out for this UTOUG event in 2025, don’t forget to look for other regional events nearby, and be sure to tell your local user group what you want to see.

Jim - Gary