Agents in Oracle Clusterware

Oracle Clusterware manages applications when they are registered as resources with Oracle Clusterware. Oracle Clusterware has access to application-specific primitives that have the ability to start, stop, and monitor a specific resource. Oracle Clusterware runs all resource-specific commands through an entity called an agent.

An agent is a process that contains the agent framework and user code to manage resources. The agent framework is a library that enables you to plug in your application-specific code to manage customized applications. You program all of the actual application management functions, such as starting, stopping and checking the health of an application, into the agent. These functions are referred to as entry points.

The agent framework is responsible for invoking these entry point functions on behalf of Oracle Clusterware. Agent developers can use these entry points to plug in the required functionality for a specific resource regarding how to start, stop, and monitor a resource. Agents are capable of managing multiple resources.

Agent developers can set the following entry points as callbacks to their code:

  • ABORT: If any of the other entry points hang, the agent framework calls the ABORT entry point to stop the ongoing action. If the agent developer does not supply a stop function, then the agent framework exits the agent program.

  • ACTION: The ACTION entry point is Invoked when a custom action is invoked using the clscrs_request_action API of the crsctl request action command.

  • CHECK: The CHECK (monitor) entry point acts to monitor the health of a resource. The agent framework periodically calls this entry point. If it notices any state change during this action, then the agent framework notifies Oracle Clusterware about the change in the state of the specific resource.

  • CLEAN: The CLEAN entry point acts whenever there is a need to clean up a resource. It is a non-graceful operation that is invoked when users must forcefully terminate a resource. This command cleans up the resource-specific environment so that the resource can be restarted.

  • DELETE: The DELETE entry point is invoked on every node where a resource can run when the resource is unregistered.

  • MODIFY: The MODIFY entry point is invoked on every node where a resource can run when the resource profile is modified.

  • START: The START entry point acts to bring a resource online. The agent framework calls this entry point whenever it receives the start command from Oracle Clusterware.

  • STOP: The STOP entry points acts to gracefully bring down a resource. The agent framework calls this entry point whenever it receives the stop command from Oracle Clusterware.

START, STOP, CHECK, and CLEAN are mandatory entry points and the agent developer must provide these entry points when building an agent. Agent developers have several options to implement these entry points, including using C, C++, or scripts. It is also possible to develop agents that use both C or C++ and script-type entry points. When initializing the agent framework, if any of the mandatory entry points are not provided, then the agent framework invokes a script pointed to by the ACTION_SCRIPT resource attribute.

See Also:

"ACTION_SCRIPT" for information about this resource attribute

At any given time, the agent framework invokes only one entry point per application. If that entry point hangs, then the agent framework calls the ABORT entry point to end the current operation. The agent framework periodically invokes the CHECK entry point to determine the state of the resource. This entry point must return one of the following states as the resource state:

  • CLSAGFW_ONLINE: The CHECK entry point returns ONLINE if the resource was brought up successfully and is currently in a functioning state. The agent framework continues to monitor the resource when it is in this state. This state has a numeric value of 0 for the scriptagent.

  • CLSAGFW_UNPLANNED_OFFLINE and CLSAGFW_PLANNED_OFFLINE: The OFFLINE state indicates that the resource is not currently running. These two states have numeric values of 1 and 2, respectively, for the scriptagent.

    Two distinct categories exist to describe an resource's offline state: planned and unplanned.

    When the state of the resource transitions to OFFLINE through Oracle Clusterware, then it is assumed that the intent for this resource is to be offline (TARGET=OFFLINE), regardless of which value is returned from the CHECK entry point. However, when an agent detects that the state of a resource has changed independent of Oracle Clusterware (such as somebody stopping the resource through a non-Oracle interface), then the intent must be carried over from the agent to the Cluster Ready Services daemon (CRSD). The intent then becomes the determining factor for the following:

    • Whether to keep or to change the value of the resource's TARGET resource attribute. PLANNED_OFFLINE indicates that the TARGET resource attribute must be changed to OFFLINE only if the resource was running before. If the resource was not running (STATE=OFFLINE, TARGET=OFFLINE) and a request comes in to start it, then the value of the TARGET resource attribute changes to ONLINE. The start request then goes to the agent and the agent reports back to Oracle Clusterware a PLANNED_OFFLINE resource state, and the value of the TARGET resource attribute remains ONLINE. UNPLANNED_OFFLINE does not change the TARGET attribute.

    • Whether to leave the resource's state as UNPLANNED_OFFLINE or attempt to recover the resource by restarting it locally or failing it over to a another server in the cluster. The PLANNED_OFFLINE state makes CRSD leave the resource as is, whereas the UNPLANNED_OFFLINE state prompts resource recovery.

  • CLSAGFW_UNKNOWN: The CHECK entry point returns UNKNOWN if the current state of the resource cannot be determined. In response to this state, Oracle Clusterware does not attempt to failover or to restart the resource. The agent framework continues to monitor the resource if the previous state of the resource was either ONLINE or PARTIAL. This state has a numeric value of 3 for the scriptagent.

  • CLSAGFW_PARTIAL: The CHECK entry point returns PARTIAL when it knows that a resource is partially ONLINE and some of its services are available. Oracle Clusterware considers this state as partially ONLINE and does not attempt to failover or to restart the resource. The agent framework continues to monitor the resource in this state. This state has a numeric value of 4 for the scriptagent.

  • CLSAGFW_FAILED: The CHECK entry point returns FAILED whenever it detects that a resource is not in a functioning state and some of its components have failed and some clean up is required to restart the resource. In response to this state, Oracle Clusterware calls the CLEAN action to clean up the resource. After the CLEAN action finishes, the state of the resource is expected to be OFFLINE. Next, depending on the policy of the resource, Oracle Clusterware may attempt to failover or restart the resource. Under no circumstances does the agent framework monitor failed resources. This state has a numeric value of 5 for the scriptagent.

The agent framework implicitly monitors resources in the states listed in Table 9-1 at regular intervals, as specified by the CHECK_INTERVAL or OFFLINE_CHECK_INTERVAL resource attributes.

See Also:

"CHECK_INTERVAL" and "OFFLINE_CHECK_INTERVAL" for more information about these resource attributes

Table 9-1 Agent Framework Monitoring Characteristics

State Condition Frequency

ONLINE

Always

CHECK_INTERVAL

PARTIAL

Always

CHECK_INTERVAL

OFFLINE

Only if the value of the OFFLINE_CHECK_INTERVAL resource attribute is greater than 0.

OFFLINE_CHECK_INTERVAL

UNKNOWN

Only monitored if the resource was previously being monitored because of any one of the previously mentioned conditions.

If the state becomes UNKNOWN after being ONLINE, then the value of CHECK_INTERVAL is used. Otherwise, there is no monitoring.

Whenever an agent starts, the state of all the resources it monitors is set to UNKNOWN. After receiving an initial probe request from Oracle Clusterware, the agent framework executes the CHECK entry point for all of the resources to determine their current states.

Once the CHECK action successfully completes for a resource, the state of the resource transitions to one of the previously mentioned states. The agent framework then starts resources based on commands issued from Oracle Clusterware. After the completion of every action, the agent framework invokes the CHECK action to determine the current resource state. If the resource is in one of the monitored states listed in Table 9-1, then the agent framework periodically executes the CHECK entry point to check for changes in resource state.

By default, the agent framework does not monitor resources that are offline. However, if the value of the OFFLINE_CHECK_INTERVAL attribute is greater than 0, then the agent framework monitors offline resources.