Owners Manual

FAQs about Monitoring Mediation Servers | Troubleshooting
OMNM 6.5.2 User Guide 725
FAQs about Monitoring Mediation Servers
After making a UDP-based JGroups discovery request and receiving a response from an application
server in the cluster, each mediation server makes an RMI (TCP) call to an application server every
30 seconds. This RMI call results in a “call on cluster” on the application server cluster, using
JGroups (UDP by default), to call the agentHeartbeat method of the OWMedServerTrackerMBean
on each application server in the cluster. The primary application server updates the timestamp for
the medserver in question, and the others ignore the call. Every five seconds, the primary
application server checks to see if it has not received a call from a mediation server in the last 52
seconds. If it has not, it attempts to verify down status by pinging the suspected mediation server.
Then it issues an RMI call on that mediation server. It considers the meditation server down if the
ping or the final RMI call fails. This avoids false meditation server down notifications when a
network cable is pulled from an application server.
Does the application server wait 15 seconds after receiving the mediation server's response? Or
does it monitor mediation server every 15 seconds regardless of the mediation server's
response?
The receipt of the mediation server's RMI call is on a different thread than the monitoring
code. The monitoring code should run every 5 seconds, regardless of the frequency of
mediation server calls. However, after investigating the scheduling mechanism used (the JBoss
scheduler -
http://community.jboss.org/wiki/scheduler
), it is possible that other tasks using
this scheduler could impact the schedule because of a change in the JDK timer
implementation after JDK 1.4.
What kind of functionality (JMS?) does application server use to send and receive
OpenManage Network Manager messages?
The application server does not actively monitor the mediation servers unless it fails to get a
call from one for 52 seconds. If it does try to verify a downed mediation server, it uses an RMI
call.
The RMI calls use TCP sockets. It may use multiple ports: 1103/1123 (UDP - JGroups
Discovery), 4445/4446 (TCP - RMI Object), 1098/1099 (TCP - JNDI), or 3100/3200 (TCP -
HAJNDI), 8093 (UIL2).
What kind of problem or bug would it make application server to falsely detect a mediation
server down? For example, would failing to allocate memory cause application server to think
a mediation server is down (dead)?
An out of memory error on an application server could result in a false detection of a downed
medserver.
If such memory depletion occurs as described in the previous answer, would the record
appears in the log? If it doesn't appear in the log, would it possibly appear if the log-level is
changed?
An out of memory error usually appears in the log without modifying logging configuration,
since it is logged at ERROR level.
The log shows that a mediation server was detached from the cluster configuration, but what
kind of logic is used to decide the detachment from the cluster? For instance, would it
detach application servers if they detect the mediation server down?
JBoss (JGroups) has a somewhat complex mechanism for detecting a slow server in a cluster,
which can result in a server being “shunned.” This logic remains, even though we have never
observed the shunning of a server resulting in a workable cluster. This is the only mechanism
which automates removing servers from the cluster. The configuration for this service is