Friday, September 4, 2020

Oracle 19c Database Upgrade - Performance testing options

You may be able to related to this, upgrading Oracle databases can be a real drag.  

Databases tend to be at the heart of critical businesses process, and you may have spent a lot of time getting things to be in a perfect working order.  Upgrading your Oracle database from 11g to 19c might be a very frightening due to the risk of things changing and loss of all the work already done.  This is normally where testing comes into play.  What are your options for testing performance changes when moving from Oracle 11g to 19c? 

There are basically three options:

  • Oracle Real Application Testing or RAT
  • Simulated application workload either manually or with tools like LoadRunner, NeoLoad, or a number of web specific tools
  • Look at individual SQL either manually or through tools like SQL Tuning Sets in Oracle Tuning Pack

First let’s put a few rules in place when it comes to testing.  For a test to be valid you need to use an environment that is as close to the target production system you will be moving to.  That means if you are moving to new servers or changing hardware / storage during the upgrade; then make sure you're running tests on that new hardware or storage.  Oracle 19c has many new features that take advantage of new hardware.  Testing on an 8-year-old piece of hardware that was running 11g with 19c is not the same as testing 19c on a brand-new server.  As the optimizer in Oracle is very aware of underlying infrastructure it will not act the same, so you are not testing the upgrade really.

 

Let’s give a quick description of each of these options that might take some of the stress out of upgrading your Oracle database.

Oracle Real Application Testing or RAT

Oracle makes a very specific database tool for application workload testing.  Real Application Testing or RAT.  Acronym aside, the tool is pretty interesting.  RAT captures SQL statements from your production database, and then you can clone the database to another environment and re-run all the SQL.  This is one of the most complete ways to test just the database component of your application.  A couple of downsides to this tool, first it only tests the database component.  If you are just upgrading the database software, change the database hardware, or only adjusting database settings; then this tool is great.  If you are making application changes or updating the database schema at the same time, then this tool will not work, or at minimum will not be able to test those changes.  Also, RAT is an additional cost license.

 

RAT will run through all the application SQL’s in order but not necessarily in the same amount of time.  E.G. if you capture 8 hours of production workload it will probably take longer than 8 hours to run the replay of the workload on a similarly sized test system even if 19c is faster at individual SQL statements.  Also, RAT cannot simulate everything, such as DataGuard and GoldenGate.  These replication tools do cause some overhead, and RAT will not simulate that.  Though most of that overhead can be very small.

 

High-level steps for a RAT replay:

  1. Capture the workload on your production system, be sure to include AWR data
  2. Clone the production database to test area based on the start time of your capture
  3. Upgrade the cloned test system to 19c and apply and database configuration changes
  4. Transfer the capture files from production to test, and process the capture files
  5. Run the database replay on test
  6. Load your AWR data from production and run comparison reports between production and test

The great thing about RAT is you get a very detailed report of database effort and work.  You can see exactly what the 19c and possibly new hardware did differently at the database level. 

 

The downside to RAT is you cannot change the application schema other than adding indexes or updating statistics.  Overall this is a great way to test your database upgrade, and you can roll back your test database and re-run the RAT testing multiple times.  This allows for iterative changes to database configuration

Simulating Application Workload

Another option is to do application testing.  In this scenario you will have a full test system that include database and application tiers and a full setup of application software.  The basic is idea is to setup a test environment that can be logged into and then create transactions or usage on the application.

 

There can be a lot of complexity with setting up another full environment with all the application components.  It can also be difficult to replicate enough hardware to achieve production level performance or load.  A second challenge can be the actual testing.  It might be easy to have a user login to the test environment and run a few application transactions.  This might give you a feel for some individual performance, but really can’t compare to having many multiple users.  A system with a single user accessing it will react very differently than a system with 100’s or even 1,000’s of active users.  In order to simulate those levels of workload you will need to use tolls like LoadRunner or others that will simulate end users running transactions through the application GUI or API.  Creating the tests that simulate many active users can also be extremely complex and fall short of real workload patterns.

 

The high-level steps for application testing very dependent on your testing process and environment.  You may have a test system with full technology stack and redundant application servers, load balancers, web servers, and databases.  Or you may have a very simple setup just to perform basic smoke testing.

 

The plus side of this is that you can test both application changes and database upgrade at the same time.  Though you may have a hard time during testing teasing out what might be due to an application change versus a database upgrade change.  So that might extend your tuning effort work, though it can decrease the amount of testing work.

Manually running SQL

A third option is to just manually run some SQL statements in your test database.  This can be done by capturing the SQL from the production system using SQL history ASH or AWR or with a SQL Tuning Sets (STS) if you are licensed.  SQL can also be captured using SQL or session tracing.  These statements can then be re-run in a test environment to see how they perform after the database has been modified or upgraded. 

 

As you might already be guessing, this is very complex.  Finding the statements, you want to re-run, possibly updating them, and then making sure the environment is set correct for the SQL to work in test environment can be complex.  If you have no other avenue for testing, this can allow you to possible pick out specifically long running or high impact SQL’s.  This is really a very minimal test, but it may show how the software changes affected the SQL performance profile. 

 

The high-level steps for SQL tuning vary a little based on if you are using a tool like SQL Tuning Sets or manually capturing statements.  Either way it looks something like this:

  • Identify and capture the specific SQL in production (be sure to capture explain plans and run time statistics)
  • Have a test environment that is similar to production in data volume and configuration (or clone from production)
  • Upgrade the test system to 19c
  • Transfer the SQL statements to test (text files. STS, etc…)
  • Run the statements in test and compare explain plans and run time statistics

This can be a little bit time consuming without a tool like STS that will do most of this for you with a few steps or even through tools like Oracle Enterprise Manager (OEM).

 

The plus side of running individual SQL statements is the ability to look at them in-depth.  You will can get very detailed performance information by running traces or SQLT explain plans.  The negative side is that you won’t be able to get a multi-user view of the system.

Summary

All of this comes down to simulating the workload of the application in an upgraded database environment.  The goal of this is to reduce the anxiety or risk of doing your upgrade.  To be clear none of this will fully eliminate that risk, though all of this will make you more prepared for that go live date.  No matter which path you choose this will give more hands-on experience with the 19c upgrade process and seeing how your application perform differently or the same on a 19c database.

Thursday, July 30, 2020

Keeping your Oracle Database Healthy

I recently gave a virtual presentation about database patching and managing risk.  During the session one of the key takeaways is keeping your database healthy.  Although patching is part of keeping things healthy, there is a lot more.

One key is being proactive about health checks.  You should always do some basic database health checks during every patch or upgrade. In fact, you may also want to proactively perform some database health checks weekly or monthly, since many things can happen outside of DB patching.

I've personally been in many patching exercises where we planed a specific amount of time. Having tested the patch in a non production, feeling confident things will go well.  Only to find out that something else in the database is not healthy.  Something only in production.  Some code is invalid, or some system issue is going on.  The patching window is now two times longer as we are fixing the problem we didn't know and still trying to patch.

Oracle provides many great tools to help, ORAchk, EXAchk, and Autonomous Health Framework (AFH).

See note: Autonomous Health Framework (AHF) - Including TFA and ORAchk/EXAChk (Doc ID 2550798.1)

Running these tools proactively and reviewing the results on a regular basses is critical to a healthy system.  Always do some basic database health checks on your database
  • Every patch or upgrade
  • Proactively perform weekly or monthly
  • Before and after a patch (compare)
Notice this is not just during patching, many things can happen outside of DB patching to corrupt your system!

Knowing there is an issue will help you plan your work and make sure your outages have the time you need and everyone is clear on the change that is needed.  The less surprises there are in the work you do, the more the business and management will trust what you are doing in the future.

There are other tools that Oracle provides outside of AHF.  Here is a list of some more Oracle provided scripts and technical notes on things you can check in your database to show it is healthy.  This does not guarantee that the application that uses the database is healthy, this is more about the technical aspects fo the database itself.
  • FAQ: Database Upgrade and Migration (Doc ID 1352987.1)
  • Autonomous Health Framework (AHF) - Including TFA and ORAchk/EXAChk (Doc ID 2550798.1)
  • Script to Collect DB Upgrade / Migrate Diagnostic Information (dbupgdiag.sql) (Doc ID 556610.1)
  • hcheck.sql Script to Check for Known Problems in Oracle8i, Oracle9i, Oracle10g, Oracle 11g and Oracle 12c (Doc ID 136697.1)
  • Assistant: Download Reference for Oracle Database/GI Update, Revision, PSU, SPU(CPU), Bundle Patches, Patchsets and Base Releases (DOC ID 2118136.2)
  • Best Practices for running catalog, catproc and utlrpscript (Doc ID 863312.1)
  • Debug and Validate Invalid Objects (Doc ID 300056.1)
  • Where Can I Find the Parallel Version of Utlrp.sql? (Doc ID 230136.1)
  • Master Note: Troubleshooting Oracle Data Dictionary (Doc ID 1506140.1)
  • Overview of Refreshing Materialized Views (Doc ID 549874.1)
  • How to Monitor the Progress of a Materialized View Refresh (MVIEW) (Doc ID 258021.1)
  • Script to Check the Status or State of the JVM within the Database (Doc ID 456949.1)
Hopefully these notes give you a point in the right direction on how to help keep your database healthy during patching activities, and even when you are not patching.


Wednesday, June 17, 2020

Oracle ACFS / ASM Filter Driver and Oracle Enterprise Linux

In the world of Oracle databases, DBA’s and system administrators don’t always see eye to eye.  Mostly I have found that this is really just about coming at challenges from different angles.  Some of the major changes in Oracle since 10g version has been the introduction of Oracle Automatic Storage Management or ASM.  This has led to many features such as ASMLIB, ASM Clustered Filesystem (ACFS), and now the ASM Filter Driver (AFD or ASMFD).  These features require OS specific compatible libraries, and kernel modules.  Information on compatibility can be confusing.

Background References

Let’s start with some background.  For this blog, I’m going to focus on Oracle Enterprise Linux (OEL).  The information here could be assembled for any supported OS.  This information is from a few sources, the main ones being:
Before making any major patching or upgrade decisions please reference these notes and Oracle support in general. 

Oracle does a pretty good job of providing information on Linux and Oracle database / ASM compatibility.  Though, the notes are not always clear as to what the goals is.  For example, I have heard many people say that ASM has stopped working during a Linux upgrade.  Or that ASM drivers are not compatible with Linux changes, or that you can’t patch Linux without breaking ASM / AFD / ACFS.  Well, these are antidotal statements, and the truth is a little more complex.

Linux Kernel Version

Oracle Enterprise Linux (OEL) recommends using the Unbreakable Enterprise Kernel or UEK.  The main reason behind this is that Oracle controls the destiny of the UEK.  As noted above ACFS and AFD are kernel dependent.  So having the ability to set course for both the Linux Kernel and the ASM related kernel modules is important.  For the scope of this discussion, let’s look at OEL releases 5 through 8 and Oracle Database releases 11gR2 through 19c.  First here is a high level view of OEL and UEK releases.
Support DatesOEL Version
Jun 2007 - Jun 2017OEL5      
Feb 2011 - Mar 2021OEL6      
Jul 2014 - Jul 2024OEL7      
Jul 2019 - Jul 2029OEL8      
UEK YUM ChannelUEKR1UEKR2UEKR3UEKR4UEKR5UEKR6
Major UEK Version2.6.322.6.393.8.134.1.124.14.355.4.17
Released Mar, 2012Oct, 2013Jan, 2016Jun, 2018Mar, 2020

So this is a nice swath of about 10 years of versions.  Please note that the major version number of the kernels does not change if you don’t change UEK channels.  For example, if you move from OEL6 Update 5 to Update 8, but stick with UEKR3 your kernel major number will remain 3.8.13.  The minor numbers beyond the .13 will change, but not the major number.  This is also true for OEL7 and UEKR5 where the major release number is 4.14.35.  Within in this major version, Oracle release new security fixes and patch updates monthly or quarterly for each kernel version that is supported within the support dates.  So for example, OEL6 UEKR2 which was first release in Mar of 2012, just released kernel UEK 2.6.39-400.321.1 in April of 2020.  That is a long history of fixing the same set of code.

ACFS / AFD Linux Kernel Compatibility

Now let's move over to the database support for ACFS / AFD on OEL.  Here is another summary now listing Oracle database versions.
UEK YUM ChannelUEKR1UEKR2UEKR3UEKR4UEKR5UEKR6
Major UEK Version2.6.322.6.393.8.134.1.124.14.355.4.17
DB VersionRelease Date Mar, 2012Oct, 2013Jan, 2016Jun, 2018Mar, 2020
11.2.0.3OEL5 - OEL62.6.32-1002.6.39-100    
11.2.0.4OEL5 - OEL62.6.32-1002.6.39-1003.8.13-1184.1.12-112.16.4  
12.1.0.1OEL5 - OEL62.6.32-1002.6.39-100    
12.1.0.2OEL6 - OEL7 2.6.39-1003.8.13-118   
12.2.0.1OEL6 - OEL7 2.6.39-1003.8.13-1184.1.12-112.16.44.14.35-1902 
18cOEL6 - OEL7 2.6.39-1003.8.13-1184.1.12-112.16.44.14.35-1902TBD
19cOEL7   4.1.12-112.16.44.14.35-1902TBD

Here again I’ve covered about 10 years, and you can see clearly what high-level database versions are compatible with what OEL and UEK versions.  You might also see Oracle’s commitment to long term support for 11.2.0.4.  Similar to the expected support for 19c going until 2024.  Also note that UEKR6 is not yet supported.

Now the information above still high level.  You may assume that every 19c version of Oracle support all UEK4 and UEK5 kernels out of the box.  That is not true.  Database releases and kernel releases are not specifically synchronized.  So perhaps you have tried to test if the ACFS drivers can be installed:
$ acfsdriverstate supported -v
ACFS-9459: ADVM/ACFS is not supported on this OS version: '4.14.35-1902.10.8.el7uek.x86_64'
ACFS-9201: Not Supported

ACFS-9553: Operating System: linux
ACFS-9554: Machine Architecture: x86_64-linux-thread-multi
ACFS-9555: Operating system name and information: Linux srvr03 4.14.35-1902.10.8.el7uek.x86_64 #2 SMP Thu Feb 6 11:02:28 PST 2020 x86_64 x86_64 x86_64 GNU/Linux
ACFS-9556: Release package: oraclelinux-release-7.7-1.0.5.el7.x86_64
ACFS-9557: Version: ADVM/ACFS is not supported on 4.14.35-1902.10.8.el7uek.x86_64
ACFS-9558: Variable _ORA_USM_NOT_SUPPORTED is defined: no


ASM Required Patches

If your combination of grid software and Linux kernel are not supported, you will see the above message “ACFS-9201: Not Supported”.  How can you fix this?  Oracle provides patches at specific versions of GI to support UEK versions.
DB VersionOS VersionUEK ChannelUEK MinimumBase bug or Base version
11.2.0.3.6OEL5 - OEL6UEKR12.6.32-10015986571
11.2.0.3.7OEL5 - OEL6UEKR12.6.32-10012983005
11.2.0.4OEL5UEKR12.6.32-100Base
11.2.0.4OEL5 - OEL6UEKR22.6.39-100Base
11.2.0.4.4OEL6UEKR33.8.13-11816318126
11.2.0.4.6OEL7UEKR33.8.13-3518321597
11.2.0.4.180717 OEL6 - OEL7UEKR44.1.12-112.16.422810422, 28171094, 27463879
12.1.0.1OEL5 - OEL6UEKR12.6.32-100Base
12.1.0.1OEL5 - OEL6UEKR22.6.39-100Base
12.1.0.2.170718OEL5 - OEL6UEKR22.6.39-100Base
12.1.0.2.170718OEL6UEKR33.8.13-13Base
12.1.0.2.170718OEL7UEKR33.8.13-3518321597
12.1.0.2.181016OEL6 - OEL7UEKR44.1.12-112.16.422810422, 27942938, 28171094, 27942938
12.1.0.2.190716OEL7UEKR54.14.35-190227494830
12.2.0.1OEL6UEKR22.6.39-100Base
12.2.0.1OEL6UEKR33.8.13-13Base
12.2.0.1.180717OEL6UEKR44.1.12-112.16.427463879, 28171094
12.2.0.1OEL7UEKR33.8.13-35Base
12.2.0.1.180717OEL7UEKR44.1.12-112.16.427463879, 28171094, 27463879
12.2.0.181016OEL7UEKR54.14.35-190228069955
18.3.0.0OEL6UEKR22.6.39-100Base
18.3.0.0OEL6UEKR33.8.13-13Base
18.4.0.0.181016OEL6UEKR44.1.12-112.16.428069955
18.3.0.0OEL7UEKR33.8.13-35Base
18.4.0.0.181016OEL7UEKR44.1.12-112.16.427463879, 28171094, 27463879
18.4.0.0.181016OEL7UEKR54.14.35-190227494830
19.3.0.0OEL7UEKR44.1.12-112.16.4Base
19.4.190716OEL7UEKR54.14.35-190227494830

The example above on my test system is a fresh install of GI 19.3.0.0 on a OEL 7.8 system using UEKR5.  Based on the list above we can tell that for UEKR5 support I need either 19.4.190716 or I need patch 27494830.  Oracle provides many patches where the title contains “ACFS RU” that you can search for in the patches section of MOS.  I decided to use this patch “MERGE ON ACFS RU 19.6.0.0.0 OF 28531803 30685278 (Patch 31055785)” which includes the following merge of patches:
Patch  31055785     : applied on Fri May 22 14:15:34 CDT 2020
Unique Patch ID:  23448796
Patch description:  "ACFS Interim patch for 31055785"
   Created on 20 Mar 2020, 03:53:18 hrs PST8PDT
   Bugs fixed:
     28531803, 30685278, 27494830, 27917085, 28064731, 28293236, 28321248
     28375150, 28553487, 28611527, 28687713, 28701011, 28740425, 28818513
     28825772, 28844788, 28855761, 28860451, 28900212, 28951588, 28960047
     28995524, 29001307, 29030106, 29031452, 29039918, 29115917, 29116692
     29127489, 29127491, 29167352, 29173957, 29198743, 29229120, 29234059
     29250565, 29264772, 29302070, 29313039, 29318169, 29338628, 29339084
     29350729, 29363565, 29379932, 29391373, 29411007, 29417321, 29428155
     29437701, 29482354, 29484738, 29520544, 29524859, 29527221, 29551699
     29560075, 29586338, 29590813, 29604082, 29643818, 29704429, 29705711
     29721120, 29760083, 29779338, 29791186, 29848987, 29851205, 29862693
     29872187, 29893148, 29929003, 29929061, 29937236, 29941227, 29963428
     30003321, 30032562, 30046061, 30051637, 30057972, 30076951, 30093449
     30140896, 30179531, 30239575, 30251503, 30264950, 30269395, 30275174
     30324590, 30363621, 30655657
     
Which you can see includes the required patch 27494830 (first row, 3rd column). Depending on your version of Oracle, and quarterly bundle patch there are many options of patches available. The quarterly bundle patches for GI generally contain an updated ACFS patch as well.

 

Details of the ASM Kernel Modules 

In 19c, under the GRID_HOME is a directory called usm. On my test system under here I can find this:
$ ls $GRID_HOME/usm/install/Oracle/EL7UEK/x86_64
4.1.12/  4.1.12-112.16.4/  4.14.35-1902/
This is the location of the libraries used to compile the kernel modules for ACFS / AFD.  As expected, there is a directory for each major kernel version that pretty much matches our list above.  The command “amcmd afd_configure -e” is used to install the AFD kernel modules.  The AFD module use the Kernel Application Binary Interface and are installed under the /lib/modules location on your Linux system.  As they are considered weak modules, they are placed in one location and then symbolically linked into each kernel specific directory when the kernel is updated.
[root@srvr03 sbin]# ls -l /lib/modules/4.14.35-1902*/{extra,weak-updates}/oracle
/lib/modules/4.14.35-1902.0.9.el7uek.x86_64/extra/oracle:
total 9824
-rw-r--r-- 1 root root 10055682 Jun  5 13:16 oracleafd.ko

/lib/modules/4.14.35-1902.10.8.el7uek.x86_64/weak-updates/oracle:
total 0
lrwxrwxrwx 1 root root 69 Jun  5 13:17 oracleafd.ko -> /lib/modules/4.14.35-1902.0.9.el7uek.x86_64/extra/oracle/oracleafd.ko

/lib/modules/4.14.35-1902.11.3.1.el7uek.x86_64/weak-updates/oracle:
total 0
lrwxrwxrwx 1 root root 69 Jun  5 13:44 oracleafd.ko -> /lib/modules/4.14.35-1902.0.9.el7uek.x86_64/extra/oracle/oracleafd.ko

Here we can see that the primary module was installed under 4.14.35-1902.0.9.  Then when the kernel 4.14.35-1902.11.3.1 was installed, a symbolic link was created to the original module.

The AFD module is normally loaded by a Linux RC script called afd (/etc/init.d/afd).  You can verify it will be started at boot with the following command as the root user:
[root@srvr03 init.d]# chkconfig --list afd

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

      If you want to list systemd services use 'systemctl list-unit-files'.
      To see services enabled on particular target use
      'systemctl list-dependencies [target]'.

afd            	0:off	1:off	2:off	3:on	4:off	5:on	6:off

Here you can see that the AFD module will attempt to be loaded at run levels 3 and 5.  You can also check what version of the AFD module you have loaded with the following Linux command:

[oracle@srvr03 ~]$ modinfo oracleafd
filename:       /lib/modules/4.14.35-1902.11.3.1.el7uek.x86_64/weak-updates/oracle/oracleafd.ko
license:        Oracle Corporation
description:    ASM Filter Driver
author:         Oracle Corporation
srcversion:     533BB7E5866E52F63B9ACCB
depends:        
retpoline:      Y
name:           oracleafd
vermagic:       4.14.35-1902.0.9.el7uek.x86_64 SMP mod_unload modversions 
signat:         PKCS#7
signer:         
sig_key:        
sig_hashalgo:   md4
parm:           oracleafd_use_logical_block_size:Non-zero value implies using logical sectorsize (int)
parm:           oracleafd_use_maxio_softlimits:Non-zero value implies using conservative maximum block io limits (int)

Using the lsmod command to see if it is loaded.  If this command returns no lines, then it is not loaded. 

[oracle@srvr03 ~]$ lsmod |grep oracleafd
oracleafd             229376  0 


Conclusion

Ok, so hopefully this has given you a clear picture as to what you need to plan for when using ASM Filter Driver (AFD) or ACFS on your OEL system with UEK.  Once you have a base version of kernel and GI you should have little issues unless you jump UEK channels or do a GI major version change.  As many people are eyeing up 19c upgrades, hopefully the above information is helpful to prepare your Linux OS for the change.  For myself what started as a somewhat simple post about kernel drivers turned into a pretty long delve into the workings of the ASM filter driver.  Please leave your comments and questions.