Monday, March 23, 2015

Exalogic Virtual Control Stack Backup and Oracle Enterprise Manager (OEM)

So I think our group is a early adopter of the Oracle Exalogic Elastic Cloud, specifically the virtual edition.  Now I don't believe we are the only ones running this product, not by a long shot.  But I'm not sure there is a large number of big corporations using it yet for critical application workloads.  That being said, it has been working well for us.

Today I'd like to talk about one specific item for the Exalogic.  In a virtual configuration there is a set of virtual servers (or vServers as Oracle calls them), that runs the management interface for the virtualization.  This is called the Control Stack by Oracle.  The Control stack is made up of three vServers that host an install of Oracle Virtual Machine Manager (OVMM), and Oracle Enterprise Manager Operations Center (EMOC).  These systems are critical to operations on your system, they are used to monitor all your hardware and vServers; create and destroy vServers; and most importantly start or stop any vServers.

As you might guess from this description, you need to backup your control stack, as it contains all the information about your virtual infrastructure.  If you were to loose your control stack, you would effectively loose all your vServers.  The data that the vServers contain is somewhere else, but all of the information about your vServers (the names, IP addresses, what drives are attached to what servers, etc...) is in the control stack.  Without this, you would have a hard time finding anything.

Oracle provides a utility to do backups in Exalogic virtual called EXABR.  This is pretty full feature, if young tool.  One of it's options is to backup the control stack.  Now for the backup to work, the control stack has to be shutdown.  This does not impact your running system, all the running vServers will stay running, this only impacts your ability to do management for the duration of the backup.  In our experience this backup is also very fast, maybe a minute or two at most.  There are three main commands to the backup:
exabr stop control-stack -r /backupdir 
exabr backup control-stack -r /backupdir
exabr start control-stack -r /backupdir
Pretty straight forward.  Now as the title of this blog shows, we have a second set of software to talk about; Oracle Enterprise Manager Cloud Control (OEMCC).  We (like a lot of customers) use OEMCC to monitor and manage our Oracle landscape, and Exalogic is no different.  So in a normal Exalogic Virtual deployment when you use OEMCC, you will install a OEM Agent on the first server of the Control Stack (generally called the admin server), which hosts the OVMM and EMOC web servers.  This agent monitors all of the Exalogic components.

Now comes the rub, as they say.  When you start your control stack backup, the exabr stop contorl-stack command will shutdown this server, and therefore the agent for OEMCC.  This will then generate alerts for your entire Exalogic cloud.  As you might guess in a large cloud you will get 100's of alerts.  So the answer of course is to create a blackout in OEMCC.  Here is our method of doing this.  We added two lines to the above code:
ssh oracle@scand01adm01 /u01/app/EMbase/core/12.1.0.3.0/bin/emctl start blackout \"ExaBR stop control stack for backup\" -nodelevel 
exabr stop control-stack -r /backupdir
exabr backup control-stack -r /backupdir
exabr start control-stack -r /backupdir
ssh oracle@scand01adm01 /u01/app/EMbase/core/12.1.0.3.0/bin/emctl stop blackout \"ExaBR stop control stack for backup\" -nodelevel
So basically this is all scripted up so that whenever we call a backup we get the blackout and the backup.  We had to setup SSH keys for the user that runs the OEMCC agent (probably oracle user in most installations) to allow the SSH to work from the compute node where exabr runs.

This has saved us a lot of headaches from the daily operations of Exalogic, and by doing the nodelevel blackout on OEMCC it blackouts all the vServers and related infrastructure for that agent.  Luckily the backups generally only runs for one or two minutes at most so this does not put us at a very high risk right now.


No comments:

Post a Comment