Thursday, May 16, 2024

Serious issue - server crash - Oracle database 19c and Oracle Enterprise Linux 8 Update 8 and above

 

I'm a tad behind on this but think it's a critical issue. This could be a repeat of something you may have already heard about or read about.  I just want to make sure the information is getting out.

There is a serious issue with Oracle database 19c running on Oracle Enterprise Linux 8 Update 8 and above.  The issue starts with database 19.21 and above (October 2023 RU) when running on OEL 8 with the UEK kernel UEK7U1 and above (5.15.0-10* kernel released in April 2023).  If you are running a system with HUGEPAGES memory setup and have the ASM filter driver or ASM Clustered Filesystem (ACFS) kernel modules installed this can lead to a server crash (Linux kernel panic).

We were able to re-create the issue constantly when running a RMAN restore.  Criticality to me is increased as this would affect any large database systems (those using HUGEPAGES and ASM together).

See the following MOS document:
ACFS Kernel Panic (RIP: ofs_teardown_diop_osd) After Updating OS Kernel to UEK7U1 or Later (Doc ID 2998947.1)

Or Mike Dietrich's blog:

Can you use the UEK7 Linux kernel, or may you get some trouble?

The fix for this problem? You have two basic solutions:

  • Stay on UEK7 (5.15.0-3.xxx) until you can upgrade your GI / ASM home and kernel modules to 19.23 (April 2023 RU)
  • Apply the April 2023 RU to your GI / ASM home and make sure the ACFS / ASFD kernel modules are update (this is done automatically)
There is a one-off patch, but it seems to be for a specific configuration / combination.  If you are on the specific kernel version 5.15.0-201.135.6 and running 19.22 ASM / GI (Jan 2024 RU) then there is a patch that can be applied:
     Patch 35983839 - UEK7U2 SUPPORT FOR ACFS, AFD (V5.15.0-201.135.6)

Since Linux kernels update frequently, and corporate security teams want frequent updates, I would lean towards taking up the 19.23 (April 2023 RU) since that is the fully supported minimum supported version for UEK7U2 based on the support matrix:


The issue seems to be first acknowledged in November of 2023, I'm a little behind as I always recommend enterprise customers use a N-1 patching methodology.  So here we are in Q2 of 2024 hitting a issue that is part of Q1 2024 as we uptake a October 2023 Linux kernel with a Jan 2024 database patch.  The good news is, the fix is readily available at this point and our client can move forward instead of backwards.

Hopefully this gives you enough information to be preemptive on the fix and not have systems crashing.

One final reference, a good blog on the OEL kernel release names and dates:


Gary

No comments:

Post a Comment