Pythian Blog: Technical Track

UDM is watching you, UDMs

With time monitoring of several thousand targets of different versions and on different operating systems accumulates additional checks and user-defined metrics for specific requirements. With the presence of a dozen super administrator accounts, the Oracle Enterprise Manager environment demands specific monitoring so that checks that were customized and configured for certain targets are not lost. How could this monitoring function be better organized? One of the approaches I find useful is to create user-defined metrics to monitor other user-defined metrics. Knowing the list of UDMs that should be configured for targets, I created UDMs that gather information about metrics and send alerts if there are any discrepancies. [code]select count(*) from ( with sql_metrics as ( select 'OEMP' target_name, 'prod_targets_no_metrics' metric_name, '' note from dual union all select 'OEMP', 'target_removed', '' from dual union all select 'OEMP', 'test_targets_no_metrics', '' from dual union all select 'OEMP', 'UDM_count_autofiles', '' from dual union all select 'OEMP', 'UDM_asm_dg', '' from dual union all select 'PROD.WORLD', 'UDM_apply_status', '' from dual union all select 'TEST.WORLD', 'UDM_apply_status', '' from dual union all select 'STDBY.WORLD', 'UDM_standby_lag', '' from dual union all select 'PROD.WORLD', 'changes_lag', '' from dual union all select 'REP.WORLD', 'changes_lag', '' from dual union all select 'TEST.WORLD', 'changes_lag', '' from dual union all select 'TNP', 'standby_lag', '' from dual union all select 'E.WORLD', 'UDM_alertlog', '' from dual ), sql_current as (select c.target_type, c.target_name, c.metric_label, c.column_label, max(c.collection_timestamp) last_date, count(*) cnt from mgmt$metric_current c where c.metric_label like 'User-Defined%Metric%' group by c.target_type, c.target_name, c.metric_label, c.column_label ) select target_name, metric_name, decode(nvl(cnt, 0), 0, 'Error: Metric does not exist', 'Error: Metric exists but coll time older than 3 days') msg, note from ( select m.target_name, m.metric_name, c.cnt, c.last_date, m.note from sql_metrics m, sql_current c where m.target_name = c.target_name(+) and m.metric_name = c.column_label(+) ) where last_date < sysdate - 3.4 or last_date is null )[/code] But what about the UDM check itself? What if it is removed as well? For that purpose, I created a script which scheduled on the OMS host and runs emcli collect_metric periodically. If collection is not completed successfully then the DBA on call is alerted. [code]/home/oracle/working/ag/report_udm_not_there.sh OEMP UDM_existence >/dev/null 2>&1 report_udm_not_there.sh: #!/bin/bash CMD_PATH=/home/oracle/working/ag export JAVA_HOME=/e00/oracle/middleware/oms11g/jdk export PATH=$JAVA_HOME/bin:$PATH UDM=$2 DB=$1 LOG=$CMD_PATH/`basename $0 .sh`_${DB}_${UDM}.log $CMD_PATH/emcli collect_metric -target_type=oracle_database -target_name=$DB -collection=$UDM >$LOG 2>$LOG if [ `cat $LOG | grep "was collected at repository successfully" | wc -l` -ne 1 ] then mailx -s "issues with $UDM at $DB" admin@site.com < $LOG fi[/code] Another way to monitor disassociation of UDMs and targets is to use the FLASHBACK feature which allows you to compare current and past data (let's say 15 minutes ago) of mgmt$metric_current. However, it can be too general. Happy OEM monitoring!

No Comments Yet

Let us know what you think

Subscribe by email