Friday, July 19, 2024

Oracle Database 23ai – Real-Time SQL Plan Management

One of the most frequent questions I get asked is “why did the Oracle Optimizer change the execution plan on a SQL statement?”  Generally, this arises when a statement goes from running well to suddenly performing poorly.  In short, something has changed in a way that the Optimizer thinks the data should be accessed.  Which then leads to the question, why doesn’t it know that the new performance is worse?

Well not to solve all problems but Oracle database 23ai introduces a new feature called Real-Time SQL Plan Management.  This feature builds on-top of existing features Automatic SQL Tuning Sets (Auto STS) and the current SQL Plan Management (SPM).  To summarize, the Oracle database uses these features in together in the following manor:

  • Auto STS captures execution plans of “most” SQL statements and stores then in a system managed SQL tuning set called “SYS_AUTO_STS”
  • At SQL execution time, the Optimizer decides if a new plan is warranted for a SQL statement based on previous execution information and current estimated execution plan.  The Optimizer will then execute the new adjusted plan and compare it to previous runs.
  • If the new plan is better a SQL Plan Baseline will be automatically created for that new plan.  If during future executions of the SQL the performance regresses again the Optimizer will re-analyze and possible disable the baseline or create another baseline as needed.

All that work you may have been doing to identify a poor running SQL, see how to run it better, and then putting in a “fix” such as a baseline is now 100% automated.

Often, we shy away from new features for fear that they will cause more harm than benefit.  Personally, I have adopted many new features from Oracle over the past 30+ years.  My goal here is to shed some light on how this new feature works to help reduce the fear, uncertainty, and doubt that might arise. 

Let’s look at a quick example, note this is running in Oracle Cloud Infrastructure (OCI) with Oracle Database Enterprise Edition 23.4.1.24.06 as we patiently await the on-premises release.  I’m skipping over some of the mechanics to focus more on the results of real-time SPM that you might see and want to understand what occurred.

First for this to work we need to enable the Real-Time SPM (RT SPM) task:

exec dbms_spm.configure('AUTO_SPM_EVOLVE_TASK', 'AUTO')

Note in 19c the EVOLVE task was either on or off and did not do real time evaluation.  AUTO is a new setting for 23ai.

You can check your current settings with the following query:

SELECT PARAMETER_NAME, PARAMETER_VALUE     FROM DBA_SQL_MANAGEMENT_CONFIG;

PARAMETER_NAME                           PARAMETER_VALUE
---------------------------------------- ------------------------------
AUTO_SPM_EVOLVE_TASK                     AUTO

Once the Optimizer detects that a better execution plan exists, it may decide to try it.  If it does you will see a note in the explain plan:

Note -----    - This is SQL Plan Management Test Plan

You will see this when the Optimizer runs the SQL with the possibly better execution plan.  Note, you may find it hard to capture this moment in a real system.  

If the new execution plan performs better, a SQL Plan Baseline (or baseline) will be created for the new execution plan as well as the previous execution plan:

SQL_HANDLE           PLAN_NAME                      ORIGIN             ENA ACC   COST  -------------------- ------------------------------ ------------------ --- --- ------ SQL_2044b318726d1c30 SQL_PLAN_20j5m31t6u71h169efe64 FOREGROUND-CAPTURE YES YES 151184 SQL_2044b318726d1c30 SQL_PLAN_20j5m31t6u71h8059011b FOREGROUND-CAPTURE YES NO  271739

Note the source of the baseline “FOREGROUND-CAPTURE” this means that the Optimizer made this baseline during a SQL execution after analysis.  We can see a little bit more information from the new NOTES column of the DBA_SQL_PLAN_BASELINES view:

SELECT p.plan_name, p.foreground_last_verified,
       pfspm.status result, pfspm.ver verify_type,
       pfspm.test_phv, pftp.elps, pfspm.ref_phv, pfrp.elps
  FROM dba_sql_plan_baselines p,
    XMLTABLE(
         '/notes'
         passing xmltype(p.notes)
         columns
             plan_id         NUMBER    path 'plan_id',
             flags           NUMBER    path 'flags',
             fg_spm          XMLType   path 'fg_spm') pf,
     XMLTABLE(
         '/fg_spm'
         passing pf.fg_spm
         columns
             test_phv        NUMBER         path 'test_phv',
             ref_phv         NUMBER         path 'ref_phv',
             ver             VARCHAR2(8)    path 'ver',
             status          VARCHAR2(8)    path 'status',
             ref_perf        XMLType        path 'ref_perf',
             test_perf       XMLType        path 'test_perf' ) pfspm,
    XMLTABLE(
        '/ref_perf'
        passing pfspm.ref_perf
        columns
             bg              NUMBER         path 'bg',
             cpu             NUMBER         path 'cpu',
             elps            NUMBER         path 'elps' ) pfrp,
    XMLTABLE(
        '/test_perf'
        passing pfspm.test_perf
        columns
             bg              NUMBER         path 'bg',
             cpu             NUMBER         path 'cpu',
             elps            NUMBER         path 'elps' ) pftp
  WHERE notes IS NOT NULL
    AND p.sql_handle = 'SQL_2044b318726d1c30'
  ORDER BY p.foreground_last_verified;

PLAN_NAME                      FOREGROUND_LAST_VERIFIED     RESULT  VERIFY_T  TEST_PHV ELPS    REF_PHV ELPS ------------------------------ ---------------------------- ------- -------- --------- ---- ---------- ---- SQL_PLAN_20j5m31t6u71h169efe64 19-JUL-24 01.27.50.000000 PM better  normal   379518564 2775 2153316635 3674

This shows us that the elapsed time was better for the tested execution plan compared to the reference (existing) plan.  There is more data in the NOTES column, but this is a helpful way to see what the Optimizer did to verify the performance.  In this case the elapsed time was lower for the newly tested plan.

You may wonder, what happens if things change again?   What if this new plan is not the best plan in the future?  Well, the Real Time SPM will look for that also, when the SQL would degrade due to Optimizer execution plan generation estimation, it will not get used.  Again, in the notes section of an explain plan you may see this:

Note

-----

   - Failed to use SQL plan baseline for this statement

To view the execution plan for a given baseline, use the DBMS_XPLAN package:

SELECT *
    FROM table(DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE(
             sql_handle=>'SQL_2044b318726d1c30 ',
             plan_name=>'SQL_PLAN_20j5m31t6u71h169efe64',
             format=>'typical'));

-------------------------------------------------------------------------------- SQL handle: SQL_2044b318726d1c30 SQL text: select /* SPM_TEST_QUERY_Q1 */ sum(t2.amount) from   sales_area1 t1,                 sales_area2 t2 where  t1.sale_code = t2.sale_code and           t1.sale_type  = 1 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Plan name: SQL_PLAN_20j5m31t6u71h169efe64         Plan id: 2727377809 Enabled: YES     Fixed: NO      Accepted: YES     Origin: FOREGROUND-CAPTURE Plan rows: From Auto SQL Tuning Set -------------------------------------------------------------------------------- Plan hash value: 2727377809 ----------------------------------------------------------------------------------------------------- | Id  | Operation                             | Name        | Rows  | Bytes | Cost (%CPU)| Time     | ----------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT                      |             |       |       |   151K(100)|          | |   1 |  SORT AGGREGATE                       |             |     1 |    33 |            |          | |*  2 |   HASH JOIN                           |             |    62M|  1966M|   151K  (1)| 00:00:06 | |   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| SALES_AREA1 | 25000 |   170K| 15238   (1)| 00:00:01 | |*  4 |     INDEX RANGE SCAN                  | SALES_TYP1I | 25000 |       |   235   (0)| 00:00:01 | |   5 |    TABLE ACCESS FULL                  | SALES_AREA2 |  2500K|    61M|   135K  (1)| 00:00:06 | -----------------------------------------------------------------------------------------------------

Note that there is additional information about the baseline provided by the explain plan and that the plan itself comes from the auto SQL Tuning Set which is where the baseline was created from.

I’m really excited for this new feature and how it can help proactively with SQL performance issues.  Oracle will now detect if the SQL performance has degraded from a previously capture execution plan and then test a possible fix and implement a baseline if the fix is acceptable.  Seems like the answer to “why did the Optimizer change the execution plan?” is, just wait a moment and it will fix itself.

Be sure to check out the Oracle blog by Nigel Bayliss which includes a link to example code he created that was shown in this blog.

Thursday, June 6, 2024

Oracle RAC install and Linux 8 - SSH connectivity issue INS-06006

 Many systems are getting upgraded to Linux version 8 due to the impending end of life for Linux 7.  This includes both RedHat and Oracle Enterprise Linux.  In general most are choosing to do a new build out and migrate their databases from the previous system to the new system.  This is the most supportable model, though not the only.

One issue you may run into is installing either Oracle RAC Grid Infrastructure or Oracle Database in a clustered environment.  During the runInstaller or gridSetup.sh steps you may receive an error about "Passwordless SSH connectivity not setup" even though you have set it up and manually verified it.  This is due to a change in the SSH 8 to deal with a security vulnerability (CVE-2019-6111).  You will receive a message like this:

[FATAL] [INS-06006] Passwordless SSH connectivity not set up between the following node(s): [srvr02].

   CAUSE: Either passwordless SSH connectivity is not setup between specified node(s) or they are not reachable. Refer to the logs for more details.

   ACTION: Refer to the logs for more details or contact Oracle Support Services.

When you manually verify SSH connectivity everything works:

$ for i in srvr01 srvr02; do ssh $i hostname -s ; done
srvr01
srvr02

The fix is to make sure you are applying the RU / patch to the software during the install process.  Basically forcing the usage of --applyRU for runInstaller or gridSetup.sh.  This is also the recommended method for all patching / installs going forward (expect more of this in 23ai version).  For 19c you need to use 19.6 or above RU.

For GI install your command line would look something like:

/u01/app/19/grid_1/gridSetup.sh --applyRU /u01/app/grid/patch/36233126

There is a little caveat for this when it comes to database software install.  Normally you would run the install from the first node of the cluster and the installer would replicate the software to each node, using SSH.  The issue is that the RU for the database does not directly include the fix for SSH (Bug 30159782 - Remote Copy Fails if using openssh-7.2. 7.4, etc. (Doc ID 30159782.8)).  You have to include the OCW (Oracle ClusterWare) patch into the runInstaller steps.  The OCW patch is included in the GI RU not the database RU.

Image 1
The easiest way to deal with this is the use the Grid Infrastructure (GI) RU patch for both the GI home and the database home.  Though the syntax will look slightly different for the database home.  We will use the recent April 2024 RU as an example.  For the 19.23 GI RU we see the following patch contents on the download screen (see image 1).  In the red box's we can see that both the database patch and the OCW patch are included.

Once you have the the patch downloaded and unzipped we can verify that from the directory structure

36233126/
├── 36233263  <- Database RU 19.23
├── 36233343
├── 36240578  <- OCW patch 19.23
├── 36383196
├── 36460248
├── automation
├── bundle.xml
├── README.html
└── README.txt

So the steps for setting up a database home on RAC with Linux 8 would look like this:
  1. Stage the patch files E.G. /u01/app/oracle/patch (along with any additional patches)

    unzip -qo p36233126_190000_Linux-x86-64.zip -d /u01/app/oracle/patch


  2. Unzip the install media (19.3) into the home E.G. /u01/app/oracle/product/19/dbhome_2

    unzip -qo LINUX.X64_193000_db_home.zip -d /u01/app/oracle/product/19/dbhome_2


  3. Update the OPatch version in the new home (be sure to download the latest OPatch from patch 6880880 in MOS)

    rm -rf /u01/app/oracle/product/19/dbhome_2/OPatch

    unzip -qo p6880880_190000_Linux-x86-64.zip -d /u01/app/oracle/product/19/dbhome_2


  4. Install the software and apply the patches in one command (be sure to updated the highlighted items based on your configuration).  Here we are installing the RU, OCW, OJVM, and Datapump bundle patches.

    export CV_ASSUME_DISTID=OEL7.8

    /u01/app/oracle/product/19/dbhome_2/runInstaller \
    -silent -ignoreprereqfailure \
    -applyRU /u01/app/oracle/patch/36233126/36233263 \
    -applyOneOffs
    /u01/app/oracle/patch/36233126/36240578,/u01/app/oracle/patch/36420641,/u01/app/oracle/patch/36199232 \
    oracle.install.db.InstallEdition=EE \
    oracle.install.option=INSTALL_DB_SWONLY \
    INVENTORY_LOCATION=/u01/app/OraInventory \
    ORACLE_BASE=/u01/app/oracle \
    UNIX_GROUP_NAME=oinstall \
    oracle.install.db.CLUSTER_NODES=srvr01,srvr02 \
    oracle.install.db.OSDBA_GROUP=dba \
    oracle.install.db.OSOPER_GROUP=oper \
    oracle.install.db.OSBACKUPDBA_GROUP=backupdba \
    oracle.install.db.OSDGDBA_GROUP=dgdba \
    oracle.install.db.OSKMDBA_GROUP=kmdba \
    oracle.install.db.OSRACDBA_GROUP=racdba 

    Note: back slashes (\) are used to continue the command on to individual lines.  Also the patch list is comma separated with no spaces, and be sure to export the CV_ASSUME_DISTID which is required for Linux 8 when installing 19c.

  5. After runInstaller finishes, run the root.sh script on every node you installed on.
If you make a mistake or have issues during the runInstaller, just remove the dbhome directory and start over.

Now you should have a new 19.23 home installed on all nodes of the cluster with the OCW patch, JVM, and DataPump patches all ready to go!

Thursday, May 16, 2024

Serious issue - server crash - Oracle database 19c and Oracle Enterprise Linux 8 Update 8 and above

 

I'm a tad behind on this but think it's a critical issue. This could be a repeat of something you may have already heard about or read about.  I just want to make sure the information is getting out.

There is a serious issue with Oracle database 19c running on Oracle Enterprise Linux 8 Update 8 and above.  The issue starts with database 19.21 and above (October 2023 RU) when running on OEL 8 with the UEK kernel UEK7U1 and above (5.15.0-10* kernel released in April 2023).  If you are running a system with HUGEPAGES memory setup and have the ASM filter driver or ASM Clustered Filesystem (ACFS) kernel modules installed this can lead to a server crash (Linux kernel panic).

We were able to re-create the issue constantly when running a RMAN restore.  Criticality to me is increased as this would affect any large database systems (those using HUGEPAGES and ASM together).

See the following MOS document:
ACFS Kernel Panic (RIP: ofs_teardown_diop_osd) After Updating OS Kernel to UEK7U1 or Later (Doc ID 2998947.1)

Or Mike Dietrich's blog:

Can you use the UEK7 Linux kernel, or may you get some trouble?

The fix for this problem? You have two basic solutions:

  • Stay on UEK7 (5.15.0-3.xxx) until you can upgrade your GI / ASM home and kernel modules to 19.23 (April 2023 RU)
  • Apply the April 2023 RU to your GI / ASM home and make sure the ACFS / ASFD kernel modules are update (this is done automatically)
There is a one-off patch, but it seems to be for a specific configuration / combination.  If you are on the specific kernel version 5.15.0-201.135.6 and running 19.22 ASM / GI (Jan 2024 RU) then there is a patch that can be applied:
     Patch 35983839 - UEK7U2 SUPPORT FOR ACFS, AFD (V5.15.0-201.135.6)

Since Linux kernels update frequently, and corporate security teams want frequent updates, I would lean towards taking up the 19.23 (April 2023 RU) since that is the fully supported minimum supported version for UEK7U2 based on the support matrix:


The issue seems to be first acknowledged in November of 2023, I'm a little behind as I always recommend enterprise customers use a N-1 patching methodology.  So here we are in Q2 of 2024 hitting a issue that is part of Q1 2024 as we uptake a October 2023 Linux kernel with a Jan 2024 database patch.  The good news is, the fix is readily available at this point and our client can move forward instead of backwards.

Hopefully this gives you enough information to be preemptive on the fix and not have systems crashing.

One final reference, a good blog on the OEL kernel release names and dates:


Gary

Wednesday, April 10, 2024

Putting the You In User Groups: UTOUG Training Days 2024

Putting the You In User Groups: UTOUG Training Days 2024

Gary Gordhamer, Managing Principal Consultant, Viscosity NA

We both recently attended and spoke at the Utah Oracle User Group (UTOUG) Training Day event that took place in Salt Lake City Utah on March 21st and 22nd.  This was an outstanding event and the volunteers and conference committee of UTOUG did an outstanding job bringing together dozens of excellent presenters, many of them (like us) Oracle ACEs. 

Finding time to learn, meet our fellow IT specialists, or even have a few moments to plan what the future holds is almost impossible.  The benefits of attending a well-planned in-person event is one of the best ways to meet all these needs. 

In case you doubt that, here’s the list of most in-demand skills for 2024, according to LinkedIn. The top three “soft” skills are Communication, Customer Service, and Leadership.

Still not convinced? In 2023, LinkedIn listed the most in-demand “hard skills” -   things that are not common across most jobs – and the top three on the list were Software Development, SQL, and Finance. 

We both agree that UTOUG Training Days hit every one of these items - on both lists. Whether you might have been utterly new to IT, or a seasoned professional looking to learn new skills while strengthening others, this event compressed all of that into a short two-day event.  If you’re feeling a bit of “Fear of Missing Out” (FOMO), well, that’s our point. And if you’re a manager and you feel like your team missed out, well, that’s good too. – that’s your call to action.

A couple of unique items that UTOUG included in their event: bringing students from nearby regional colleges into the conference, funding scholarships for students, and finishing with a ski day on Saturday for those interested to keep the community atmosphere.  These are critical components to building the next generation of IT professionals and helping employers build their teams.   Again, managers: If you are struggling to find the right talent, maybe you need to invest one or two days into looking where the talent actually is as opposed to casting wide nets into the ocean of the internet.

Day One: Leaders Are Made … and They Are Magnetic

Our first day kicked off with an amazing keynote by Elena Marquetti-Ali about magnetic leadership.  A little tidbit from the session: 85% of managers have no leadership training! So that means everyone who attended Training Days now has more training than 85% of the leaders out there.  This speaks directly to two of the most in-demand skills.  Her “magnetic” 30 minute discussion was covered in a single slide.  Impactful and direct, for those of us armchair speakers, this was a sight to behold beyond the great content.  

Following the keynote was  25 deep technical sessions over five time slots that covered everything from SQL, Cloud, Machine Learning, and even team building.  Breakfast and lunch were included, and we had 15 minutes between each session to spend time meeting our fellow attendees and find out about their takeaways or key challenges.  Hmmm … problem-solving, communication, teamwork, and research … it seems we just hit a few more of those key skills the folks at LinkedIn already identified as crucial to success.

You may ask yourself, how good could these sessions be?  Well they were presented by 31 great people from three different continents.  How great?  Well at least 20 of them are recognized by Oracle corporation as Oracle ACEs for their knowledge and outreach in technical field.  Meanwhile all the topics and speakers were selected through a volunteers review process.  That means users help drive the content, and the content is created by the some of the best.  Have a topic that needs deep technical dive and more outreach or training?  Let the user group know!

Day Two: Humans 25, AI 1

Day Two started with breakfast and another gathering time for all the attendees to learn from each other, followed by another 25 deep technical sessions covering performance tuning, mentoring, operations, and even a great discussion about how even in the face of ever-more-powerful Artificial Intelligence, humans are still irreplaceable. 

In our combined opinion, this was a wonderful conference that hit all the requirements of learning, community, skill building and putting you the IT professional first. Thanks to all the leaders in the UTOUG, especially their conference committee members, for building an amazingly well-rounded event!  

And for those of you who really hoped to attend, but missed it for any number of reasons: Be sure to remember to watch out for this UTOUG event in 2025, don’t forget to look for other regional events nearby, and be sure to tell your local user group what you want to see.

Jim - Gary

Tuesday, January 24, 2023

Running Oracle database containers on Apple M1 using PODMAN

INTRODUCTION


In this blog we will talk about running Oracle database in a PODMAN container on Apple silicon (M1 / M2) under Mac OS X.  As you might know, Oracle does not provide native Oracle database binaries for Apple silicon, which is a different architecture than the Intel / AMD x86.  If you previously ran a Linux container under Docker on a X86 Intel based Mac, you will find a very different situation on a newer Apple M1 or M2 processor.

Docker for Mac OS X is still available and it should be able to emulate x86 automatically.  I personally had some issues getting Oracle’s pre-built database containers to work under docker, and instead decided to look into using PODMAN.

I should mention, that I still find PODMAN somewhat buggy. It’s not clear to me if this is due to the QEMU emulation of x86 on Apple silicon, or Podman, or my specific setup or some combination.  So be warned, I do see a number of times where I start the Oracle database container and it fails.  Then I shutdown my PODMAN machine, and restart it and then the container works again.  Generally once the container is running it has been solid.

Also, since x86 is emulated on the Apple silicon, it runs much slower than what I was used to on my previous Intel based MacBook using Docker.  In my antidotal tests it runs about 70% slower.  Note this is not a knock on Apple silicon, this is just the nature of the full processor emulation that is required.

REQUIREMENTS

So PODMAN has a slightly different architecture than Docker and is an open source product.  This means that installing and using it will be based on setting up and using a number of open source items.  I won’t cover all of them in these directions.  You can find many links on how to install these items.  You will need 

  • HomeBrew (https://brew.sh) or some method to install and run open source software
  • QEMU (https://www.qemu.org) Install this through HomeBrew, this is the Open Source emulation software that will emulate the x86_64 (AMD64) processor.  Note you will see just about everything referenced in the Open Source world as AMD64, as that is who built the original 64-bit instruction set that Intel uses today.
  • Podman (https://podman.io) the container platform, you can install this using HomeBrew or you can directly download the packages and install them
  • Podman desktop (optional) (https://podman-desktop.io) a useful GUI for working with Podman, though you will still need to do some of the work on command line.

AMD64 MACHINE

The next step is to create a machine (VM) for PODMAN based on amd64 processor.  I borrowed these steps from a great IBM article.  I’ll paraphrase / reuse some of their commands but the source article is here:

https://developer.ibm.com/tutorials/running-x86-64-containers-mac-silicon-m1/

Note, when you install PODMAN, you will probably already have a ARM based M1 machine (VM) created.  This default machine cannot run the Oracle provided containers for database.

The machine is a VM that will host your containers in PODMAN.  This is the main difference from Docker which runs a OS daemon, while PODMAN uses a VM that is technically outside of the OS it is running on.

For the purpose of running Oracle database, we will have PODMAN run a Linux amd64 VM under QEMU emulation, and then our containers will run inside of that VM (machine).  Open a terminal prompt (without Rossetta) and execute the following commands:

  1. Verify PODMAN is version 4.1.1 or newer:
    podman --version
  2. Make sure you have stopped the current machine in PODMAN:
    podman machine stop
  3. Verify your QEMU version 7.0 or higher:
    qemu-system-x86_64 --version 
  4. Create a new PODMAN machine using the latest FEDORA CORE image:
    Podman machine init —image-path https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/37.20221225.3.0/x86_64/fedora-coreos-37.20221225.3.0-qemu.x86_64.qcow2.xz intel
    Note: you can update the link by navigating to:
    https://getfedora.org/en/coreos/download?tab=metal_virtualized&stream=stable&arch=x86_64
    and grabbing the URL for DOWNLOAD from “Virtualized -> QEMU”
    Note: the name “intel” will be the name of your PODMAN machine, you can change this to be whatever is useful for you.

Before starting this machine, we need to reconfigure it for the proper settings.  This is critical, and I have made some suggestions to help with performance.  

Edit the configuration file in your favorite text editor: 

~/.config/containers/podman/machine/qemu/intel.json

Note: if you changed the machine name, then replace intel.json with the name your_name.json

Note: this is a JSON file, so watch your syntax closely.  A quick JSON primer: https://guide.couchdb.org/draft/json.html
we will be updating JSON arrays and objects.

First in the “CmdLine” section, change the command that will be run:

"/opt/homebrew/bin/qemu-system-aarch64",

Change to:

"/opt/homebrew/bin/qemu-system-x86_64",

Also in the “CmdLine” section you will have to remove the following options that are for AARM:

"-accel",
"hvf",
"-accel",
"tgc",
"-cpu",
"host",
"-M",
"virt,highmem=on",
"-drive",
"file=/Users//.local/share/containers/podman/machine/qemu/intel_ovmf_vars.fd,if=pflash,format=raw"

Note: the -cpu may say something besides host, if it does remove that as well.

Add the following command line options in the spot you just removed:

  "-cpu",
  "qemu64",
  "-accel",
  "tcg,thread=multi",
  "-m",
  "2048",
  "-smp",
  "4",

This is basically saying these are the first command line options to the qemu-system-x86_64 command.

Note: I have set the memory (-m) to 2GB and the CPU core (-smp) count to 4.  You can adjust these settings for your needs.

Next change the line for the VM firmware:

"file=/opt/homebrew/share/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on",

to

"file=/opt/homebrew/share/qemu/edk2-x86_64-code.fd,if=pflash,format=raw,readonly=on",

This is setting the correct firmware for the machine (VM).

Finally at the bottom of the configuration file is a object section that is used to display the machine information in the GUI.  Update the CPUs and Memory settings appropriately:

“CPUs”: 4,

“Memory”: 2048,

Your machine file should now be setup correctly.  Next we need to launch the machine and make sure PODMAN is healthy.  From the terminal run:

podman machine start intel

Note: this will take a few minutes to start.

Verify the machine is running and healthy:

podman machine list

NAME        VM TYPE     CREATED      LAST UP            CPUS        MEMORY      DISK SIZE

intel*      qemu        4 weeks ago  Currently running  4           2.147GB     107.4GB


If you are only going to use this machine for your containers, you can set it to be the default by running from the terminal:

podman system connection default intel

Note: you need to restart the terminal for this to take affect.

Also you can change back to the original AARM container with:

podman system connection default podman-machine-default

Setting up your Oracle Database Container

I had some additional issues here.  As I noted the emulated amd64 machine is not very fast, after I increased the SMP count to 4 I ran into another issue with the JAVA Oracle setup utilities that are executed during container creation.  If you try to load your Oracle Database container and it is hanging, you should adjust your SMP count down to 1 (one) and restart your machine.  Once the container is loaded and usable, you can shutdown the machine, reset the SMP back to 4 (four) or whatever you had.  Then starting the container after this is fine.

I could verify the hang by connecting to the OS of the container:

podman exec -it DB213 sh

Seeing the java process running oracle.assistants.dbca.driver.DBConfigurator and the log file showed now progress happening (under /opt/oracle/cfgtoollogs/dbca/ORCLCDB)

Steps to get an Oracle Database Container

  1. Login to the website (container-registry.oracle.com) and accept the license.  Sign in with the button in the upper left corner.  Click on the repository (Database) then accept the license.
  2. From the terminal, login to the same repository with PODMAN (use same login you accepted the license with):
    podman login container-registry.oracle.com
  3. Pull the container of your choice into PODMAN (use the links on the website to select the specific version you want):
    podman pull container-registry.oracle.com/database/enterprise:21.3.0.0
  4. List the installed images, you may want to note the image ID if you want a specific version:
    podman image list
  5. Now build a container from the image (using ID, you can also use REPOSITORY)
    podman run -d -name DB213 container-registry.oracle.com/database/enterprise
    Note: the DB213 is the my container name, use whatever name works for you.

Note: please see my preface on SMP CPU count if you are having issues with your container not creating a Oracle database properly.

To get a shell on the container:

podman exec -it DB213 sh

Additional tip

I was having some issues with timezone for my demo database.  The containers will be UTC by default.  If you want to change this you can set the parameter tz in the file:

$HOME/.config/containers/containers.conf


[containers]

# Set time zone in container. Takes IANA time zones as well as "local",

# which sets the time zone in the container to match the host machine.

#

tz = "local"

Friday, February 5, 2021

Finding Your SQL Trace File in Oracle Database

 So this question was asked to me a few months back in one of my presentations.  How do I find a trace file from SQL trace on my Oracle Database?

Sorry for the delay, but here are my suggestions.  First you will need to know where your trace file will be located.  By default it will be in the trace directory as part of the diagnostic destination. Make sure you know where your diagnostic destination is located.

SQL> show parameter diag
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
diagnostic_dest                      string      /u01/app/oracle

There are basically three four main ways to find your trace file.

Option 0 - Just ask

Well I learned something new, so I wanted to update this article.  There is a view you can directly query to see your current trace file name.  

SELECT value 
      FROM v$diag_info 
      WHERE name = 'Default Trace File';

This will show the correct trace file name when you update your TRACEFILE_IDENTIFER.  So I would still recommend setting that as outlined in Option 1.

Option 1 - Manually set the trace file name

For this option, we set a session level parameter to drive the name of the trace file.

ALTER SESSION SET tracefile_identifier = 'SQLID_fua0hb5hfst77';
Note: if you receive an error turning on tracing, make sure the user has been granted the alter session privilege.

Next we enable tracing:
EXEC DBMS_SESSION.SET_SQL_TRACE(sql_trace => true);
Run the SQL statement you want to trace:
SELECT sum(t1.c), sum(t2.c)
  FROM t1, t2
  WHERE t1.a = t2.a
   AND t1.d = 10;
Then disable the tracing:
EXEC DBMS_SESSION.SET_SQL_TRACE(sql_trace => false);
Here we can not see the trace file which contains the identifier we gave above:
cd /u01/app/oracle/diag/rdbms/t1db/t1db/trace/

[oracle@srvr03 trace]$ ls -ltrh *fua0*
-rw-r----- 1 oracle oinstall 4.3K Feb  5 08:29 t1db_ora_4978_SQLID_fua0hb5hfst77.trc
-rw-r----- 1 oracle oinstall 2.0K Feb  5 08:29 t1db_ora_4978_SQLID_fua0hb5hfst77.trm
If you are trying to do extended SQL or CBO tracing you can also use different options for doing the SQL trace.  If the trace is going to be large you should also check the size of the maximum dump file.  This can also be adjusted at the session level.
ALTER SESSION SET EVENTS '10053 trace name context forever, level 1'; -- CBO tracing
ALTER SESSION SET EVENTS '10046 trace name context forever, level 12'; -- Additional SQL trace information

SQL> show parameter max_dump_file_size;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
max_dump_file_size                   string      unlimited

Option 2 - get your session diagnostic information


You can query the name of the current diagnostic trace file for the session.
set linesize 100
column name format a20
column value format a70
select name, value from v$diag_info where name = 'Default Trace File';

NAME VALUE -------------------- ---------------------------------------------------------------------- Default Trace File /u01/app/oracle/diag/rdbms/t1db/t1db/trace/t1db_ora_31523.trc
Now repeat the process of turning on trace, running your SQL, and turning trace back off.

Option 3 - Searching for the trace file


You can search for the trace file at the OS level.  This will probably give you a list of files to verify which one contains the trace file in question.  If you get the SID from the session you can include that in the file search.

SQL> select userenv('SID') from dual;
USERENV('SID') -------------- 19
Then search the trace directory (based on the diagnostic destination setting of your database):
find /u01/app/oracle/diag/rdbms/t1db/t1db/trace -name t1db_ora\*.trc -mtime 0 -exec grep -l "*** SESSION ID:(19." {} \;
/u01/app/oracle/diag/rdbms/t1db/t1db/trace/t1db_ora_26655.trc /u01/app/oracle/diag/rdbms/t1db/t1db/trace/t1db_ora_4978_SQLID_fua0hb5hfst77_.trc /u01/app/oracle/diag/rdbms/t1db/t1db/trace/t1db_ora_31523.trc /u01/app/oracle/diag/rdbms/t1db/t1db/trace/t1db_ora_14396.trc
Note the mtime option for find is looking for files that were updated in the past 24 hours.  If you want to look for older files adjust the number after mtime number.  E.G. +1 will look for files modified 48 hours ago.

There are other ways, but these should be some of the easiest ways.

Gary

Friday, September 4, 2020

Oracle 19c Database Upgrade - Performance testing options

You may be able to related to this, upgrading Oracle databases can be a real drag.  

Databases tend to be at the heart of critical businesses process, and you may have spent a lot of time getting things to be in a perfect working order.  Upgrading your Oracle database from 11g to 19c might be a very frightening due to the risk of things changing and loss of all the work already done.  This is normally where testing comes into play.  What are your options for testing performance changes when moving from Oracle 11g to 19c? 

There are basically three options:

  • Oracle Real Application Testing or RAT
  • Simulated application workload either manually or with tools like LoadRunner, NeoLoad, or a number of web specific tools
  • Look at individual SQL either manually or through tools like SQL Tuning Sets in Oracle Tuning Pack

First let’s put a few rules in place when it comes to testing.  For a test to be valid you need to use an environment that is as close to the target production system you will be moving to.  That means if you are moving to new servers or changing hardware / storage during the upgrade; then make sure you're running tests on that new hardware or storage.  Oracle 19c has many new features that take advantage of new hardware.  Testing on an 8-year-old piece of hardware that was running 11g with 19c is not the same as testing 19c on a brand-new server.  As the optimizer in Oracle is very aware of underlying infrastructure it will not act the same, so you are not testing the upgrade really.

 

Let’s give a quick description of each of these options that might take some of the stress out of upgrading your Oracle database.

Oracle Real Application Testing or RAT

Oracle makes a very specific database tool for application workload testing.  Real Application Testing or RAT.  Acronym aside, the tool is pretty interesting.  RAT captures SQL statements from your production database, and then you can clone the database to another environment and re-run all the SQL.  This is one of the most complete ways to test just the database component of your application.  A couple of downsides to this tool, first it only tests the database component.  If you are just upgrading the database software, change the database hardware, or only adjusting database settings; then this tool is great.  If you are making application changes or updating the database schema at the same time, then this tool will not work, or at minimum will not be able to test those changes.  Also, RAT is an additional cost license.

 

RAT will run through all the application SQL’s in order but not necessarily in the same amount of time.  E.G. if you capture 8 hours of production workload it will probably take longer than 8 hours to run the replay of the workload on a similarly sized test system even if 19c is faster at individual SQL statements.  Also, RAT cannot simulate everything, such as DataGuard and GoldenGate.  These replication tools do cause some overhead, and RAT will not simulate that.  Though most of that overhead can be very small.

 

High-level steps for a RAT replay:

  1. Capture the workload on your production system, be sure to include AWR data
  2. Clone the production database to test area based on the start time of your capture
  3. Upgrade the cloned test system to 19c and apply and database configuration changes
  4. Transfer the capture files from production to test, and process the capture files
  5. Run the database replay on test
  6. Load your AWR data from production and run comparison reports between production and test

The great thing about RAT is you get a very detailed report of database effort and work.  You can see exactly what the 19c and possibly new hardware did differently at the database level. 

 

The downside to RAT is you cannot change the application schema other than adding indexes or updating statistics.  Overall this is a great way to test your database upgrade, and you can roll back your test database and re-run the RAT testing multiple times.  This allows for iterative changes to database configuration

Simulating Application Workload

Another option is to do application testing.  In this scenario you will have a full test system that include database and application tiers and a full setup of application software.  The basic is idea is to setup a test environment that can be logged into and then create transactions or usage on the application.

 

There can be a lot of complexity with setting up another full environment with all the application components.  It can also be difficult to replicate enough hardware to achieve production level performance or load.  A second challenge can be the actual testing.  It might be easy to have a user login to the test environment and run a few application transactions.  This might give you a feel for some individual performance, but really can’t compare to having many multiple users.  A system with a single user accessing it will react very differently than a system with 100’s or even 1,000’s of active users.  In order to simulate those levels of workload you will need to use tolls like LoadRunner or others that will simulate end users running transactions through the application GUI or API.  Creating the tests that simulate many active users can also be extremely complex and fall short of real workload patterns.

 

The high-level steps for application testing very dependent on your testing process and environment.  You may have a test system with full technology stack and redundant application servers, load balancers, web servers, and databases.  Or you may have a very simple setup just to perform basic smoke testing.

 

The plus side of this is that you can test both application changes and database upgrade at the same time.  Though you may have a hard time during testing teasing out what might be due to an application change versus a database upgrade change.  So that might extend your tuning effort work, though it can decrease the amount of testing work.

Manually running SQL

A third option is to just manually run some SQL statements in your test database.  This can be done by capturing the SQL from the production system using SQL history ASH or AWR or with a SQL Tuning Sets (STS) if you are licensed.  SQL can also be captured using SQL or session tracing.  These statements can then be re-run in a test environment to see how they perform after the database has been modified or upgraded. 

 

As you might already be guessing, this is very complex.  Finding the statements, you want to re-run, possibly updating them, and then making sure the environment is set correct for the SQL to work in test environment can be complex.  If you have no other avenue for testing, this can allow you to possible pick out specifically long running or high impact SQL’s.  This is really a very minimal test, but it may show how the software changes affected the SQL performance profile. 

 

The high-level steps for SQL tuning vary a little based on if you are using a tool like SQL Tuning Sets or manually capturing statements.  Either way it looks something like this:

  • Identify and capture the specific SQL in production (be sure to capture explain plans and run time statistics)
  • Have a test environment that is similar to production in data volume and configuration (or clone from production)
  • Upgrade the test system to 19c
  • Transfer the SQL statements to test (text files. STS, etc…)
  • Run the statements in test and compare explain plans and run time statistics

This can be a little bit time consuming without a tool like STS that will do most of this for you with a few steps or even through tools like Oracle Enterprise Manager (OEM).

 

The plus side of running individual SQL statements is the ability to look at them in-depth.  You will can get very detailed performance information by running traces or SQLT explain plans.  The negative side is that you won’t be able to get a multi-user view of the system.

Summary

All of this comes down to simulating the workload of the application in an upgraded database environment.  The goal of this is to reduce the anxiety or risk of doing your upgrade.  To be clear none of this will fully eliminate that risk, though all of this will make you more prepared for that go live date.  No matter which path you choose this will give more hands-on experience with the 19c upgrade process and seeing how your application perform differently or the same on a 19c database.