Pythian Blog: Technical Track

Mining the AWR to Identify Performance Trends

Sometimes it's useful to check how performance of a SQL statement changes over time. The diagnostic pack features provide some really useful information to answer these questions. The data is there, but it not always easy to retrieve it, especially if you want to see how the performance changes over time. I've been using three really simple scripts to retrieve this information from the AWR. These scripts help me answer the following questions:
  • How does the performance of a particular SQL change over time?
  • How do wait times of a particular wait event change over time?
  • How does a particular statistic change over time?
Please note, the scripts provided here require diagnostic pack licenses and it's your task to make sure you have them before running the scripts.

SQL performance

I use script awr_sqlid_perf_trend.sql to check how performance of the SQL changes over time. The script summarizes the data from DBA_HIST_SQLSTAT and reports the average statistics for a single execution of the query during the reporting interval. It requires 3 input parameters:
  1. SQL ID
  2. Days to report. It will summarize all AWR snapshots starting with "trunc(sysdate)-{days to report}+1", so if you pass "1", it will summarize all snapshots from today, if "2" - than it's yesterday and today are included.
  3. Interval in hours. "24" will provide one row for each day, "6" will give 4 rows a day.
Nothing shows it better than an example. Below you see how I'm checking execution statistics for sql_id fd7rrqkn1k2xb by summarizing the AWR information captured in last 2 weeks and reporting the average values for 2-day intervals. Then I'm taking a little closer look at the last 4 days for the same SQL by summarizing data over 6hour intervals. Note, the time column shows the beginning of the interval. [code light="true" language="text" title="SQL> @awr_sqlid_perf_trend.sql fd7rrqkn1k2xb 14 48"] TIME EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC DISK_READS_1EXEC DIRECT_WRITES_1EXEC ------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- ------------------- 16.10.2013 00:00:00 351 195.571 74.995 .097 .000 .000 .000 134417.570 21319182.291 293731.556 304434.305 18.10.2013 00:00:00 364 91.225 47.474 1.687 .000 .000 .002 141140.228 20364053.544 270107.745 273343.709 20.10.2013 00:00:00 542 20.686 9.378 .004 .000 .000 .000 146436.875 4597922.220 3.168 .000 22.10.2013 00:00:00 531 25.060 12.086 .161 .000 .000 .000 146476.605 6026729.224 23999.684 23998.859 24.10.2013 00:00:00 542 51.611 40.675 1.880 .000 .000 .000 146814.220 21620264.039 287994.862 287994.701 26.10.2013 00:00:00 534 39.949 26.688 1.050 .000 .000 .000 147099.275 14081016.607 159704.463 159704.418 28.10.2013 00:00:00 245 37.837 29.384 1.150 .000 .000 .000 147135.216 15505533.959 179244.437 179244.367 7 rows selected. [/code]   [code light="true" language="text" title="SQL> @awr_sqlid_perf_trend.sql fd7rrqkn1k2xb 4 6"] TIME EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC DISK_READS_1EXEC DIRECT_WRITES_1EXEC ------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- ------------------- 26.10.2013 00:00:00 72 19.209 9.439 .000 .000 .000 .000 147076.000 4623816.597 .111 .000 26.10.2013 06:00:00 72 15.391 9.401 .000 .000 .000 .000 147086.403 4624153.819 .000 .000 26.10.2013 12:00:00 72 14.022 9.351 .000 .000 .000 .000 147099.000 4624579.639 .000 .000 26.10.2013 18:00:00 55 48.174 35.723 1.575 .000 .000 .000 147099.000 19192781.582 243584.055 243584.055 27.10.2013 00:00:00 72 76.723 43.350 2.116 .000 .000 .000 147099.000 23258326.875 314445.111 314445.111 27.10.2013 06:00:00 72 64.921 43.914 2.084 .000 .000 .000 147107.542 23258506.028 315673.000 315673.000 27.10.2013 12:00:00 72 52.567 43.383 2.041 .000 .000 .000 147116.000 23258739.403 315673.000 315673.000 27.10.2013 18:00:00 47 25.522 18.095 .523 .000 .000 .000 147117.532 9382873.851 80597.702 80597.362 28.10.2013 00:00:00 65 17.645 9.384 .000 .000 .000 .000 147120.000 4625354.262 .000 .000 28.10.2013 06:00:00 19 17.571 9.451 .000 .000 .000 .000 147122.421 4625411.263 .000 .000 28.10.2013 12:00:00 6 14.083 9.645 .000 .000 .000 .000 147208.167 4629315.167 .000 .000 28.10.2013 18:00:00 48 42.173 35.208 1.509 .000 .000 .000 147236.375 18606643.833 229433.750 229433.750 29.10.2013 00:00:00 72 53.015 43.517 2.022 .000 .000 .000 147245.125 23265547.847 314507.319 314507.083 29.10.2013 06:00:00 30 52.181 43.638 1.932 .000 .000 .000 147250.300 23265839.767 303949.000 303949.000 29.10.2013 12:00:00 5 59.576 43.836 1.177 .000 .000 .000 144049.800 23267109.200 227814.000 227814.000 15 rows selected. [/code] I've checked this SQL because the users reported inconsistent performance. It can also be observed in the outputs above. Take a look! The number of rows processed during each execution of the SQL doesn't change - it's always around 147K, but look at the disk reads and the direct writes! These values can be around zero, but then they suddenly jump up to 300K, and when they do, the buffer gets increase too and the CPU time goes up from 9 seconds to 43. Based on the information above it looks like there could be two different execution plans involved and bind variable peeking could be causing one or the other plan to become the active plan. Additionally you can use the same script to check how execution statistics for the same SQL change over time. Does the elapsed time increase? Do the number of processed rows or number of buffer gets per execution change?

Wait event performance

Script awr_wait_trend.sql can be used to show the changes in wait counts and wait durations for a particular event over time. Similarly to the previous script it also requires 3 parameters, only instead of SQL ID you pass the name of the wait event. This time the data comes from DBA_HIST_SYSTEM_EVENT. I typically use this script in two situations:
  • To check if a particular wait event performs worse when an overall performance problem is reported (usually I'm looking at IO events)
  • Illustrate how the implemented change improved the situation.
The example below shows how the performance of log file parallel write event changed over 3 weeks. On october 19th we moved the redo logs to dedicated high performance LUNs. Before that the 2 members of each redo log group were located on a saturated LUN together with all the data files. [code light="true" language="text" title="SQL> @awr_wait_trend.sql "log file parallel write" 21 24"] TIME EVENT_NAME TOTAL_WAITS TOTAL_TIME_S AVG_TIME_MS ------------------- ---------------------------- --------------- -------------- -------------- 09.10.2013 00:00:00 log file parallel write 4006177 31667.591 7.905 10.10.2013 00:00:00 log file parallel write 3625342 28296.640 7.805 11.10.2013 00:00:00 log file parallel write 3483249 31032.324 8.909 12.10.2013 00:00:00 log file parallel write 3293462 33351.490 10.127 13.10.2013 00:00:00 log file parallel write 2871091 36413.925 12.683 14.10.2013 00:00:00 log file parallel write 3763916 30262.718 8.040 15.10.2013 00:00:00 log file parallel write 3018760 28262.172 9.362 16.10.2013 00:00:00 log file parallel write 3303205 31062.276 9.404 17.10.2013 00:00:00 log file parallel write 3012105 31831.491 10.568 18.10.2013 00:00:00 log file parallel write 2692697 26981.966 10.020 19.10.2013 00:00:00 log file parallel write 1038399 512.950 .494 20.10.2013 00:00:00 log file parallel write 959443 427.554 .446 21.10.2013 00:00:00 log file parallel write 1520444 606.580 .399 22.10.2013 00:00:00 log file parallel write 1618490 655.873 .405 23.10.2013 00:00:00 log file parallel write 1889845 751.216 .398 24.10.2013 00:00:00 log file parallel write 1957384 760.656 .389 25.10.2013 00:00:00 log file parallel write 2204260 853.691 .387 26.10.2013 00:00:00 log file parallel write 2205783 856.731 .388 27.10.2013 00:00:00 log file parallel write 2033199 785.785 .386 28.10.2013 00:00:00 log file parallel write 2439092 923.368 .379 29.10.2013 00:00:00 log file parallel write 2233614 840.628 .376 21 rows selected. [/code] Visualizing the data from output like that is easy too! Creating a graph has never been easier

System Statistics

The last script from this set is awr_stat_trend.sql. It does the same thing with the system statistics collected in DBA_HIST_SYSSTAT as previous scripts did to the performance of SQLs and wait events. The parameters are similar again - the name of the system statistic, days to report and the interval. I usually use the query to check how the redo size or the number of physical reads change over time, but there's huge number of statistics available (638 different statistics in 11.2.0.3) and that's why I'm sure you'll find your own reasons to use this script. [code light="true" language="text" title="SQL> @awr_stat_trend.sql "redo size" 3 4"] TIME STAT_NAME VALUE ------------------- ------------------------- ----------------- 27.10.2013 00:00:00 redo size 1739466208 27.10.2013 04:00:00 redo size 2809857936 27.10.2013 08:00:00 redo size 648511376 27.10.2013 12:00:00 redo size 533287888 27.10.2013 16:00:00 redo size 704832684 27.10.2013 20:00:00 redo size 819854908 28.10.2013 00:00:00 redo size 2226799060 28.10.2013 04:00:00 redo size 3875182764 28.10.2013 08:00:00 redo size 1968024072 28.10.2013 12:00:00 redo size 1125339352 28.10.2013 16:00:00 redo size 1067175300 28.10.2013 20:00:00 redo size 936404908 29.10.2013 00:00:00 redo size 1758952428 29.10.2013 04:00:00 redo size 3949193948 29.10.2013 08:00:00 redo size 1715444632 29.10.2013 12:00:00 redo size 1008385144 29.10.2013 16:00:00 redo size 544946804 17 rows selected. [/code] redo size
AWR is a gold mine, but you need the right tools for digging. I hope the scripts will be useful for you too! P.S. You might have noticed the scripts are published on GitHub, let me know if you find any issues using them and perhaps one day you'll find new versions for the script.

Update (4-Nov-2013)

I've added the instance numbers to the outputs in all three scripts. This is how it looks now: [code light="true" language="text" title="SQL> @awr_sqlid_perf_trend.sql 0fxqsqwhkn03b 14 48"] INST TIME EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC DISK_READS_1EXEC DIRECT_WRITES_1EXEC ----- ------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- ------------------- 1 28.10.2013 00:00:00 840 .611 .014 .595 .007 .000 .000 1.000 1085.583 128.724 .000 30.10.2013 00:00:00 1466 .491 .011 .479 .005 .000 .000 1.000 976.001 88.744 .000 01.11.2013 00:00:00 542 .798 .023 .760 .025 .000 .000 1.000 896.978 114.196 .000 03.11.2013 00:00:00 544 .750 .021 .719 .017 .000 .000 1.000 1098.213 134.941 .000 2 28.10.2013 00:00:00 1638 .498 .017 .474 .013 .000 .000 1.001 953.514 96.287 .000 30.10.2013 00:00:00 1014 .745 .022 .712 .019 .000 .000 1.000 1034.249 131.057 .000 01.11.2013 00:00:00 1904 .633 .011 .624 .002 .000 .000 1.000 1045.668 104.568 .000 03.11.2013 00:00:00 810 .602 .017 .581 .010 .000 .000 1.000 929.778 108.998 .000 8 rows selected. [/code]

No Comments Yet

Let us know what you think

Subscribe by email