Tuesday, October 17, 2017

Oracle Exadata OneCommand - build virtual server April 2017

Many of you are probably very aware of Exadata at this point.  It's many years into its lifecycle (version 7 just announced), and it's very prevalent in the Oracle database landscape.  I was recently asked to rebuild a few physical nodes of Exadata into virtual nodes.  This is covered under MOS Note 2099488 "Migration of a Bare metal RAC cluster to an OVM RAC cluster on Exadata".

I wouldn't call this the best or worst written MOS note.  It contains four options for doing the rebuild.  No matter which option you’re going to use, be sure to read all four.  There are steps outlined in more detail in some options than other, and many of that background information is important.

Also, the Exadata build or deployment process is all based on OneCommand (or the Oracle Exadata Deployment Assistant / OEDA).  Make sure you go through all of the readme files and documentation for this tool as well.

Overview

Ok, so I'm not going to go through every step here, as there is a lot of background.  But in general, let’s get a quick outline of what we will be doing:

  1. Building a new configuration file from OEDA that will represent the new build out of the Exadata.
  2. Request any network / DNS changes that are needed to account for your system change (E.G. if you are adding more virtual servers, or clusters).  Once those changes are completed, run through the checkip script and verify the output is what you expect.
  3. Staging all the needed software, patches, and OneCommand tools for the build.   This list does come from the output of OEDA.  Note, if you are rebuilding servers, be sure to keep copies of all these files off of the local storage on your Exadata.  Such as a NFS mount or other shared storage that you can easily get to through the rebuild process.
  4. Downloading the server build USB or PXE image files (these are listed in the additional readme for the QFSDP of the version you are installing).  Then staging these files on your PXE boot / NFS server or creating a USB thumb drive to boot from.
  5. Cleaning up the storage cells if needed.  This depends on what you are doing to your system configuration and if are keeping or destroying your current databases and data.
  6. Rebuild the database nodes using the images setup in step 4.  This will setup the DOM0 / Oracle VM host on the Exadata.  Be sure to use the serial console through the service processor (ILOM), not the GUI / Java based console as it will not work.
  7. Run the post build steps of switching to the VM boot image, and reclaiming free space by removing the physical Exadata OS image.
  8. Setup SSH equivalency between DB nodes and Storage nodes.
  9. Stage the OneCommand utility on the first node, along with the needed patches and software install media.  Be sure to unzip the KLONE gold images from the proper patch zip file.  This is outlined in the OneCommand readme file.
  10. Execute the needed OneCommand steps to build the virtual servers, create the OS users, and setup the CELL connectivity.

From there you can continue on to cluster and database software install and a number of other post Exadata build steps.  There are 17 OneCommand steps in all, and what you will run will depend on your needs and what you are changing.

So, why should I write all this up?  Well during my latest attempt to do this work I ran into a few issues.  I wanted to expand on those here.  This is not all the issues, but it is a specifically tricky one that I did not get any help from Oracle support on.

Issues

During step 10 above, I ran into at least three issues.  The OneCommand output was of not help.  While executing step 2 "Create Virtual Machine", I received the following message:
"Error running oracle.onecommand.deploy.machines.VmUtils method createVMs"
There was slightly more information that that, but really nothing of value.

Digging through the log output I found at reference to "Unable to locate file"
db-klone-Linux-x86-64-12102170418.zip
grid-klone-Linux-x86-64-12102170418.zip

So, these two zip files are in the patches that OEDA / OneCommand ask to download in the configuration file.  Buried in the OneCommand readme is the details to unzip those two patches prior to running OneCommand.  In my case it was patches 25898234 and 25898235.  So just unzipping these two patches and I was able to move forward.  Or so I thought.

On the next run the log now changed, still saying "Unable to locate file", but the names changed:
db-klone-Linux-x86-64-12102170814.zip
grid-klone-Linux-x86-64-12102170814.zip

See the issue?  The date stamps have the month and day digits transposed.  I couldn't find this anywhere in any of the OneCommand configuration files that were human readable (XML or Text).
So, I cheated, creating a symbolic link from one file to the other:
cd WorkDir
ln -s ./db-klone-Linux-x86-64-12102170418.zip ./db-klone-Linux-x86-64-12102170814.zip
ln -s ./grid-klone-Linux-x86-64-12102170418.zip ./grid-klone-Linux-x86-64-12102170814.zip

Now feeling confident that run three should just work.  Unfortunately, it did not.
Same error message that is of no value:
"Error running oracle.onecommand.deploy.machines.VmUtils method createVMs"
Back to the log file I go.

In the log file, there is a section where the Java routine gets "Exception: null".  Just prior to this exception the application is trying to get a list of system first boot images.  The last line was referencing "System.first.boot.12.2.1.1.1.170419.img.bz2".  Hum, that is the image file used to build the virtual machine with.

Digging into this some, for my build the version of that image we were using is April 2017.  The information above looks right.  I double checked the patch for that image, patch number 25742355.  This information is in the additional readme for the April 2017 QFSDP for Exadata, and is also in the list from the OEDA Installation Template HTML output.

I also verified that the patch was in my WorkDir location and that the zip file was in good shape.  No issues there.

Next, I dug into the OneCommand configuration files.  In the properties directory there is a es.propreties file that contains all these patch file names and versions.  There is a section that covers the VM first boot images.  Going through the list I find this line:
12.2.1.1.1,System.first.boot.12.2.1.1.1.170323.img.bz2,12.2.1.1.1, \
  p25742355_122111_Linux-x86-64.zip,12.2.1.1.1.170323:\

Ah, well now I see this issue.  Again, a data miss-match between two things inside OneCommand.  Clearly Oracle updated the patch, but didn't update OneCommand in all the right places.  I moved that line out of the list and commented it out.  Then I added the following line:
12.2.1.1.1,System.first.boot.12.2.1.1.1.170419.img.bz2,12.2.1.1.1, \
  p25742355_122111_Linux-x86-64.zip,12.2.1.1.1.170419:\

Be sure to watch where in the list you make edits, and watch the colons and backslashes’ to not corrupt the data array.

Conclusion 

Now the build of the VM's continued normally and I was able to proceed.
What a messy situation where Oracle is just not keeping everything in sync with each other.  It's clear that OneCommand knew what it was looking for in one way (from the log file), but the reality in the configuration files was slightly different.  Seems like a little bit of a house of cards, with too many moving parts. 

I'll leave this story at this point.  Hopefully this helps someone out there that may be running into the same issues with Exadata OneCommand.

Friday, July 21, 2017

Python - cx_Oracle - Mac OS X Sierra - Oracle client - DPI-1047 libclntsh.dylib cannot be loaded

Python - cx_Oracle - Mac OS X Sierra - Oracle drivers

Modern development life seems so easy. Just grab a few libraries or API's, run a few quick installs and everything works. Well in this case, everything didn't just work. Most of my frustration for this issue came from old documentation or just plain lack of documentation.

I've been working in enterprises for 25 years and have been in the middle of a ton of "it doesn't work" conversations. So, digging in and knowing why is pretty much my nature, and I really don't like ambiguity. Terms like "the software just sucks", "something magic happens", or "it just isn't right" doesn't sit well with me.

So here is a quick dive into a recent attempt to get connected from Python to an Oracle database on my Mac Book pro running OS X Sierra.

I just want the solution ->

To start with, this is for those of you trying to do some Python development while accessing a Oracle database and using OS X as your development platform. This may be different than your destination platform (Linux, etc...).

Setup

Ok so let's get started, what do you need? Should be three simple things:
  1. Python - a "good" version is pre-installed on MAC, so that is what I started with
  2. Oracle driver - Oracle Instant Client for MAC on Oracle Technet (I'm using 12c)
  3. cx_Oracle - Python extension for using Oracle Database (I downloaded the source and built it local.  You will need XCode installed to do this)
Ok, so again a few simple steps, install the Oracle client as outlined on Oracle TechNet.  Note I used the $HOME/instantclient_12_1 folder.  If you follow all the instructions you will also have a $HOME/lib directory with most of the same files installed.  This second directory is for non-Oracle software to find the driver, or at least that is the theory (based on a lot of other forum postings and a few blogs).

Then I built the cx_Oracle extension:
python setup.py build
sudo python setup.py install
Note: second line has to be run with sudo to allow the install to put the library (egg file) into a system folder.

Finally a simple test and we should be all set.  Right?
python -c "import cx_Oracle; print cx_Oracle.version"
Traceback (most recent call last):
  File "", line 1, in 
  File "build/bdist.macosx-10.12-intel/egg/cx_Oracle.py", line 7, in 
  File "build/bdist.macosx-10.12-intel/egg/cx_Oracle.py", line 6, in __bootstrap__
cx_Oracle.DatabaseError: DPI-1047: Oracle Client library cannot be loaded: dlopen(libclntsh.dylib, 1): image not found. See https://oracle.github.io/odpi/doc/installation.html for help
Poof, or maybe I should say "Boom".  Well that didn't work.

Wild goose chase

Ok, so jump into google and start searching for answers.  This is where things go south pretty fast.  There are a number of references to this issue, but nobody is really pointing in the right direction, or at minimum all the answers are very dated.

The normal answer is "you need to set environment variable" to have the correct libraries to be found.  From a legacy perspective, these would-be LD_LIBRARY_PATH and DYLD_LIBRARY_PATH.  Except these no longer work on modern OS like Sierra.  Which again is not well documented.  You can spend a lot of time digging but short answer is Python is not seeing these set even if you set them.

This then leads down another rabbit hole.  Mac OS X has System Integrity Protection (SIP), which is intended to help make sure applications do not do inappropriate things.  I'm not going to get into a lot of detail, but in short applications have to be configured at creation (link) time as to what is allowed to be called or pulled in (libraries).  I couldn't find any official Apple documents on this, but in general this does back up the above comment that you can't just set an environment variable and your program will load a somewhat random binary library.

Ok, so then you get pulled into another set of solutions.  Basically, they all say that if you put your libraries under /usr/local/lib then SIP will allow them to be loaded.  There is another set of postings that will suggest $HOME/lib is a safe zone also for SIP.  Again, I couldn't find any Apple document that stated this, nor did anyone give a lot of details other than "it worked for me".  Well it didn't work for me.

Time for the next rabbit hole, which is basically a number of postings that say, "built in Python on OS X sucks".  Well isn't that interesting.  Ok, well that might be a statement of opinion, but it doesn't provide any details.  Again, just not in my nature.

You will find postings that suggest you disable SIP, I personally don't think this is a good idea.  With a lot of years of IT experience, its best to work with security, not around it.

Finding the answer

Ok, so here is the solution I found.  I'm sure this is not the only solution, but it did work well for me.

Make sure you have your Oracle instant client installed in a good location.  I really don't think the exact location matters, use what works for you as long as you're consistent.  I stuck with the Oracle directions and used $HOME/instantclient_12_1 for this case.

I then went back to the cx_Oracle source and did the build again with one minor change:
python setup.py build
install_name_tool -add_rpath $HOME/instantclient_12_1 ./build/lib.macosx-10.12-intel-2.7/cx_Oracle.so
sudo python setup.py install
Ok so what did I just do?  Well I updated the cx_Oracle.so header to include a new path to locate libraries during run time. You can read more about RPATH here.  This could also be done at link time, but that is inside the setup.py process, and I didn't want to dig into that.

You should now have a working cx_Oracle driver, and you can use the Apple provided Python.

Further digging / background

With these changes, when the install is run, the Python egg is created.  This second time it includes my so slightly modified library file.  I can verify this in two ways, both using the otool command.

First we can check the .so file that is created during the build process (run this while in the cx_Oracle source directory).
otool -l ./build/lib.macosx-10.12-intel-2.7/cx_Oracle.so |grep -A 4 -B 1 RPATH
It should return something like this:
Load command 12
          cmd LC_RPATH
      cmdsize 48
         path /Users/ggordham/instantclient_12_1 (offset 12)
The Load command number might be different than 12 for your install or in future or past versions.  That just means it's the 12th location in the header.

The second way to check is after you run python with the cx_Oracle.  When you do that the egg file is opened and the .so file is copied to a temporary directory in your home directory.
So, let's do a quick test on cx_Oracle first (Note, be sure to change out of the source directory for cx_Oracle before trying this):
python -c "import cx_Oracle; print cx_Oracle.version"
6.0rc1
Now we a temporary copy of the .so file here:
$HOME/.python-eggs/cx_Oracle-6.0rc1-py2.7-macosx-10.12-intel.egg-tmp/cx_Oracle.so
So, we can do the same test on the "run time" version of the library
otool -l $HOME/.python-eggs/cx_Oracle-6.0rc1-py2.7-macosx-10.12-intel.egg-tmp/cx_Oracle.so | grep -A 4 -B 1 RPATH

That's it, hope this helps.  I've already let the cx_Oracle developers know of this information.  Not sure if it will end up changing anything.  At a minimum, the documents should show what works and what doesn't.

Gary

Wednesday, March 15, 2017

Oracle Database - inside looking out #C17LV

It's March of 2017 and the years continue to click by at an increasing rate.
Oracle just made available the second major release of it's 12c database product.  Yes it was available in the Oracle cloud last fall, but with 420,000 customers I'm sure there are still a few of us waiting for the "run it on your own hardware" release.

This is all very timely, learning new Oracle technology has been a life long journey for myself.  My first Oracle database was version 6 which required me sitting with a consultant for a few weeks to learn the details of how to manage it and how to find what you needed in the dozen or so printed manuals.

Well it's 26 years latter, and things sure do change, but they also stay the same.  With google, My Oracle support, and countless blogs; we still search for truth.  Most of the time this means talking to a trusted, and hopefully knowledgable expert.  That discussion might be virtual, but boy is a lot lost.  I believe the statistic thrown around is something like 93% of communication is non-verbal.

For the second year in a row I'm the conference chair for IOUG at COLLABORATE 17.  This is the number one technical conference for people in the Oracle technology business, and has been for the past 17 years (having changed names just a little).  The IOUG again brings a lot of DBA centric sessions and unique opportunities:

  • Sunday pre-conference workshops on Oracle 12c new features, Cloud DBA, and SQL tuning.
  • Cloud workshop during the week with Oracle - learn how to get your Oracle system in the cloud
  • OakTable world - a mini conference in a conference put on by OakTable Oracle scientists 
  • Hands on labs throughout the week - Oracle 12c database upgrade, Apache Hadoop, Oracle Database 12c in-memory, Oracle Database 12c multitenant
  • In person networking opportunities - meet fellow professionals working on the same technical challenges
  • 220+ technical sessions, quick tips, and hands on labs about Oracle Database, Development, Engineered Systems, OEM, and more.
This year I'll be presenting four times.
  • Session 333 - Oracle and NLS - a detailed look at how data is stored in Oracle databases, and why you should always be using international character sets correctly.
    Add to your schedule: Thursday, Apr 06, 2017 (09:45 AM - 10:45 AM) Palm D 
  • Session 352- DBA 201: Database Listener registration - a quick tip detailing how the Oracle Listener becomes aware of databases, especially important for multiple networked systems like Exadata.
    Add to your schedule: Monday, Apr 03, 2017 (12:00 PM - 12:30 PM) Palm C
  • Session 196 - Cloud DBA transformation, futuristic TomorrowLand, or desolate wasteland - I'm co-presenting with another great speaker, Jim Czuprynski where we will be musing on the future of DBA's in the cloud based on 50+ years of combined experience.  Be sure to check his blog out.
    Add to your schedule: Tuesday, Apr 04, 2017 (04:15 PM - 05:15 PM) Banyan C
  • Session 10183 - Essential Skills for the EBS DBA: past and present, the future is Cloudy - Again I'm co-presenting, this time with Jeffrey Weiss about what skills will make you a better E-Business Suite Applications DBA, along with the OAUG group.  This type of cross over could only happen at COLLABORATE.
    Add to your schedule: Tuesday, Apr 4, 2017 11:00:00 AM PDT Breakers I

Note: all times are PDT local to Las Vegas.
For those that can't make it out to Las Vegas to be in person, IOUG offers a virtual conference pass. This allows remote attendees to participate in over 30 of the technical sessions. 

Jim and I will be having some special buttons for those that find us at the conference.

The future is in your hands, make sure you have the tools, and network of people to help you get there.  That's my view, from the inside of the database looking out.

If you see me in Vegas, say HI! #C17LV
Keep in touch through twitter @ggordham, blogging, or LinkedIn

Gary