Monday, June 11, 2018

Oracle RAC + ASM + NFS home lab

Many people look for ways to emulate enterprise hardware in their home labs in order to get good hands on experience with Oracle products.  One of the harder ones to get working is Oracle RAC due to the need of multiple networks and shared storage.

With the plethora of virtualization software out now, this has gotten much easier.  For my job I needed a two node RAC setup to do some testing of DBCA (Database Creation Assistant) scripts.  I've done 100's of RAC installs, so other than the time, I didn't foresee any issues.  Well, as usual, what can go wrong will go wrong.

Basic RAC Setup

So lets jump to the basics.  For RAC I needed a public network, private network, and shared storage.  For the network pieces I also needed additional virtual IP's for each node as well as a set of three SCAN IP address.

I setup a new virtual private network for my servers.  I added all the new IP's to my DNS server.  Things were going well.  I then did the trick to create shared ASM disks on a shared NFS mount:

dd if=/dev/zero of=/u03/oradata/asm_dsk1 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk2 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk3 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk4 bs=1k count=10000000
dd if=/dev/zero of=/u03/oradata/asm_dsk5 bs=1k count=10000000

I then changed the ownership:
chown grid:asmdba /u03/oradata/asm-dsk?
chmod 660 /u03/oradata/asm-dsk?


That second command, to change the permissions is what I missed.  This was somewhat critical.  It also speaks to a problem I've seen many times over.  How to deal with Oracle access issue when you have role separation setup.  The is particularly common when running Oracle E-Business suite due to the number of assumptions made by the Oracle procedures for cloning an EBS instance.

Troubleshooting Role separation issues.


As mentioned I've done this many times, and the only way to effectively troubleshoot these issues is to methodically go step by step through the setup of Oracle and Grid and ASM.  There are a few notes for help on this in MOS, but not a lot:

  • UNIX: Diagnostic C program for ORA-1031 from CONNECT INTERNAL / AS SYSDBA (Doc ID 67984.1)
  • ORA­15183 Unable to Create Database on Server using 11.2 ASM and Grid Infrastructure (Doc ID 1054033.1)
  • Database Creation on 11.2 Grid Infrastructure with Role Separation ( ORA-15025, KFSG-00312, ORA-15081 ) (Doc ID 1084186.1)
  • How To Recompile config.c / Relink Executables Of A Grid Infrastructure Home (Cluster) (Doc ID 1637766.1)
  • Connect as SYSDBA on 11.2 Cloned Home Gives "ORA-1031: Insufficient Privileges" Error (Doc ID 1061788.1)

These all sound misleading, but lets go through the issue I faced.

The Issue

I had setup all my hardware and ASM raw disks.  After completing the GRID INFRASTRUCTURE installation, things looked good.  ASM was up and running, and the ASM diskgroup (DATA) looked fine.

I then went on to install the database home, and try to create my RAC database with DBCA.  That is when the trouble started.  During the initial create database statement I would get:
ORA-00200: control file could not be created
ORA-00202: control file: '+DATA'
ORA-15045: ASM file name '+DATA' is not in reference form
ORA-17502: ksfdcre:5 Failed to create file +DATA
ORA-27091: unable to queue I/O
ORA-27041: unable to open file


I jumped into the database alert log and DBCA logs, and garnered the following additional error:

Linux-x86_64 Error: 13: Permission denied
Additional information: 3
ORA-1501 signalled during: CREATE DATABASE "mydb2"

Ok, so time to go back and check everything. For role separation to work there is a OS group that is used to allow for the oracle and grid users to have permissions on the ASM disks.  In my case that group was supposed to be asmadmin.  Somewhere during the install I must have picked the wrong item in a pulldown box, or didn't pay attention to a fixup script that ran.  Either way I ended up with a difference.  So here is a list of steps to check that everything is right for role separation.  You need to check every item.

Role Separation Checklist

You should shutdown CRS before making any changes / updating any of these items.  As the root user run on all nodes: 
$GRID_HOME/bin/crsctl stop crs
  1. Check the user id and group membership for the grid user:
    id grid
    (note the id number and group names, numbers the user is in)
  2. Do the same check for the oracle user:
    id oracle
  3. Make a list of the groups you are using for each role in oracle (ASM access, ASM super user, DB DBA, DB OPER, etc..)
  4. As the grid user, check the configuration for the Grid infrastructure ($GRID_HOME/rdbms/lib/conf.c)
    #define SS_DBA_GRP "asmdba"
    #define SS_OPER_GRP "asmoper"
    #define SS_ASM_GRP "asmadmin"

    If changes are needed, update the config.c and re-link the oracle binary
    cd $GRID_HOME/rdbms/lib
    make -f ins_rdbms.mk ioracle
  5. As the oracle user do the same check in the oracle database home:
    ($ORACLE_HOME/rdbms/lib/conf.c)
    #define SS_DBA_GRP "dba"
    #define SS_OPER_GRP "oper"
    #define SS_ASM_GRP "asmadmin"

    (note that the SS_ASM_GRP has to match the grid home setting)
    If you need to make changes, again re-link the oracle binary
    cd $ORACLE_HOME/rdbms/lib
    make -f ins_rdbms.mk ioracle
  6. Check that the oracle binary in the grid home has the right permissions and ownership. The user, group ownership are critical.  Also the sitcky bit has to be set for execution:
    ls -l $GRID_HOME/bin/oracle
    -rwsr-s--x. 1 grid asmadmin 291225032 Jun  6 10:15 /u01/app/12.1.0/grid/bin/oracle

    If this is not correct then as the root user run:
    $GRID_HOME/bin/setasmgidwrap $GRID_HOME/bin/oracle
  7. Do the same check for the database home oracle binary (again pay close attention to the user and group ownership as well as the sticky bit on the execution flag):
    ls -l $ORACLE_HOME/bin/oracle
    -rwsr-s--x. 1 oracle asmadmin 323613264 Jun  7 16:08 /u01/app/oracle/product/12.1.0/dbhome_1/bin/oracle

    If this is not correct, then as the root user run:
    $GRID_HOME/bin/setasmgidwrap $ORACLE_HOME/bin/oracle
  8. Check the ownership / permissions on the ASM source disks (in my case this was the raw files I had created on the shared NFS mount):
    ls -l /u03/oradata/asm*
    -rw-rw----. 1 grid asmadmin 10240000000 Jun 11  2018 /u03/oradata/asm_dsk1

    (Again note the user and group ownership, as well as file permissions.  I was missing the write option on the group)
  9. Check NFS mount settings.  Since I'm on a home lab using NFS, this is also important.  If your in a SAN or Exadata situation, you shouldn't have to check this. 
    I use the following settings, which are based on Oracle's recommended settings with the addition of "_netdev" which tells the startup scripts to wait until the network is started before trying to mount this specific mountpoint:
    rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,vers=3,timeo=600,actimeo=0,_netdev
You can now reboot or restart CRS (as the root user run $GRID_HOME/bin/crsctl start crs).

That should take care of permission issues when working with role separation.  It's important to go through every step, and verify everything.  I spent a few hours repeating these steps over a few times, having missed one thing or another.  So even with years of experience it's easy to miss a step.  What should have taken me about 30 minutes to resolve, ended up taking about 3 hours due to this.

Not the end of the world, but really should have been caught sooner.
Gary

No comments:

Post a Comment