Pythian Blog: Technical Track

REGEXP_REPLACE: Auto-Translate SQL Literals to Bind Vars

Most DBAs are probably aware of the cursor_sharing feature in the Oracle optimizer that replaces all literals in the where clause of a SQL statement by bind variables. This allows Oracle to calculate a Signature from the resulting SQL string that can be used for various tasks, like sharing "like" statement cursors and identifying corresponding sql_profiles for statements. I've come to appreciate the bind variables even more when it comes to get some aggregate run-time statistics for statements. We can group by the Signature for this purpose, and that covers all statements, not just those that were submitted with bind vars. This Signature is very handy - it is also stored in various internal tables. (See Maris Elsins' blog post here for more details about the SQL signature.) Despite the convenience that the Signature provides, there are several situations where we would like to have the ability to convert an SQL string that contains literals into a statement that only contains bind variables. (Read the whole post, you will see what I mean.) I searched for an Oracle-supplied way to do this and could not find one. So, I decided to see if I could get the SQL strings transformed in a way that resembles the result of cursor_sharing. Using the regexp_replace function, I could get far enough to give me the result I wanted, that is, all string literals (strings within two single quotes) and numeric literals replaced by a generic bind variable name (:STR and :NUM). That was enough for my needs. I did not venture to try to get a different variable name for each literal. It would not have provided me any additional benefit, and I have no clue if it's feasible at all using a single "non-procedural" expression. Here is an example of a statement before and after translation:
Before: select 'username:'||substr(USERNAME,+1, 3), -1234.56 NEG_NUM from dba_users where username like 'SYS%'
 After : select ':STR'||substr(USERNAME,+:NUM, :NUM), -:NUM NEG_NUM from dba_users where username like ':STR'
Now I can get aggregated stats for any "similar" queries. I just need to supply the appropriate SQLl string source to the expression. :) The basic regular expressions that I use are the following : 1- To remove duplicate white space and all carriage returns and linefeeds (this flattens the query into one line):
 regexp_replace(sql_text, '[[:space:]]+',' ' )
2- To transform all numeric literals into ":NUM". The premise is that all numeric literals must be preceded by either one of "(=+, -" Otherwise, it is considered to be an identifier or a string literal:
 regexp_replace('([\(\=\<\>\+, -])[0-9]+\.?[0-9]*','\1:NUM')
3- To transform all string literals into "':STR'". For example, the string "select 'abc' from dual" would be translated into "select ':STR' from dual" :
 regexp_replace('''(.)*?''',''':STR''' )
The final Oracle function call looks like this (three embedded regexp_replace):
 regexp_replace(
  regexp_replace(
  regexp_replace(sql_text, '[[:space:]]+',' ' ),
  '([\(\=\<\>\+, -])[0-9]+\.?[0-9]*','\1:NUM'),
  '''(.)*?''',''':STR''' )
Here's an example using faked queries and stats:
- create a holding table for faked queries and stats
 create table billette_sqltext
 as
 select sql_id,
  sorts,
  executions,
  disk_reads,
  buffer_gets,
  USER_IO_WAIT_TIME,
  ROWS_PROCESSED,
  sql_fulltext
 from v$sqlarea
 where rownum < 1
 ;
 
 - Generate 500 faked queries and stats
 begin
  for x in 1..5 loop
  insert into billette_sqltext
  select sql_id, sorts, executions, disk_reads, buffer_gets, USER_IO_WAIT_TIME, ROWS_PROCESSED, statement
  from
  (select sql_id, sorts, executions, disk_reads, buffer_gets, USER_IO_WAIT_TIME, ROWS_PROCESSED, rownum rnum
  from v$sqlarea where sql_id not in (select sql_id from billette_sqltext) and rownum<101) sa,
  (select 'select '''||table_name||''' from '||owner||'.'||table_name||' where a = '||rownum statement, rownum rnum
  from dba_tables where owner = 'SYS' and rownum < 101) st
  where sa.rnum = st.rnum
  ;
  end loop;
 commit;
 end;
 /
 
 -- Get some aggregate stats for similar queries
 col normalized_statement for a100
 select sum(sorts) sorts,
  sum(executions) sum_executions,
  sum(disk_reads) sum_disk_reads,
  sum(buffer_gets) sum_buffer_gets,
  round(sum(user_io_wait_time)/1000000) sum_user_io_wait_time_s,
  sum(rows_processed) sum_rows_processed,
  regexp_replace(
  regexp_replace(
  regexp_replace(to_char(sql_fulltext), '[[:space:]]+',' ' ),
  '([\(\=\<\>\+, -])[0-9]+\.?[0-9]*','\1:NUM'),
  '''(.)*?''',''':STR''' ) normalized_statement
  from billette_sqltext
  where sql_fulltext like '%SYS.APPLY%'
  group by regexp_replace(
  regexp_replace(
  regexp_replace(to_char(sql_fulltext), '[[:space:]]+',' ' ),
  '([\(\=\<\>\+, -])[0-9]+\.?[0-9]*','\1:NUM'),
  '''(.)*?''',''':STR''' )
 order by normalized_statement, sum_disk_reads Desc
 ;
 
  SORTS SUM_EXECUTIONS SUM_DISK_READS SUM_BUFFER_GETS SUM_USER_IO_WAIT_TIME_S SUM_ROWS_PROCESSED NORMALIZED_STATEMENT
 ---------- -------------- -------------- --------------- ----------------------- ------------------ ----------------------------------------------------------------------------------------------------
  3 12 1 3847 0 24 select ':STR' from SYS.APPLY$_CHANGE_HANDLERS where a = :NUM
  155 2 475018 1373125 231 24 select ':STR' from SYS.APPLY$_CONF_HDLR_COLUMNS where a = :NUM
  42 19 19 1217 0 19 select ':STR' from SYS.APPLY$_CONSTRAINT_COLUMNS where a = :NUM
  33 4 1 1547 0 4 select ':STR' from SYS.APPLY$_DEST_OBJ where a = :NUM
  350 4 306 22066 0 61 select ':STR' from SYS.APPLY$_DEST_OBJ_CMAP where a = :NUM
  3 5 48 3124 0 4 select ':STR' from SYS.APPLY$_DEST_OBJ_OPS where a = :NUM
  95 4 167 4593 0 29 select ':STR' from SYS.APPLY$_ERROR where a = :NUM
  94 5 74 2325 0 5 select ':STR' from SYS.APPLY$_ERROR_HANDLER where a = :NUM
  93 4 6930 356589 5 10 select ':STR' from SYS.APPLY$_ERROR_TXN where a = :NUM
  222 6 106 41380 0 10 select ':STR' from SYS.APPLY$_SOURCE_OBJ where a = :NUM
  63 3 169 2710 0 3 select ':STR' from SYS.APPLY$_SOURCE_SCHEMA where a = :NUM
  144 4 13 10185 0 23 select ':STR' from SYS.APPLY$_VIRTUAL_OBJ_CONS where a = :NUM
 
 12 rows selected.
Here is a real-life example: I have used this regular expression transformation to compare the "logical" OBIEE queries from its report tracking table. I could generate some interesting run-time statistics by statement version. (Note that OBIEE actually runs something other in the db than these logical queries. Also, the one report I looked at below actually runs 9 different versions of a query, depending on the parameters supplied by the user and that could be useful to identify which of these executions runs slower or is more popular etc.).
 set linesize 120 pagesize 1000
  select /*+parallel(8)*/
  count(*) cnt,
  sum(case when total_time_sec < 8 then 1 else 0 end) num_lt_8s,
  sum(case when total_time_sec >= 8 then 1 else 0 end) num_ge_8s,
  round(avg(total_time_sec),2) avg_time,
  min(total_time_sec) min_time,
  max(total_time_sec) max_time,
  min(start_dt) first_run,
  max(start_dt) last_run,
  regexp_replace(
  regexp_replace(
  regexp_replace(query_text, '[[:space:]]+',' ' ),
  '([\(\=\<\>\+, -])[0-9]+\.?[0-9]*','\1:NUM'),
  '''(.)*?''',''':STR''' ) normalized_statement
  from SIEBLE_U.S_NQ_ACCT
  where saw_src_path = '/Path/to/the/report'
  group by
  regexp_replace(
  regexp_replace(
  regexp_replace(query_text, '[[:space:]]+',' ' ),
  '([\(\=\<\>\+, -])[0-9]+\.?[0-9]*','\1:NUM'),
  '''(.)*?''',''':STR''' )
  order by count(*) desc
  ;
 
  CNT NUM_LT_8S NUM_GE_8S AVG_TIME MIN_TIME MAX_TIME FIRST_RUN LAST_RUN
  ---------- ---------- ---------- ---------- ---------- ---------- ------------------ ------------------
  NORMALIZED_STATEMENT
  ------------------------------------------------------------------------------------------------------------------------
  535187 449674 85513 11.04 0 5142 11-OCT-11 00:00:00 15-JAN-13 00:00:00
  SELECT WGET."Member ID" saw_0, WGET."Merchant ID" saw_1, WGET."Merchant Name" saw_2, WGET."Order ID" saw_3, evaluate(':S
  TR',':STR',cast("Transaction Period".Date as char)) saw_4, WGET."Transaction Time" saw_5, WGET."SKU Number" saw_6, evalu
  ate(':STR',':STR',cast(WGET."Sales($)" as char(:NUM))) saw_7, WGET.Quantity saw_8, evaluate(':STR',':STR',cast(WGET."Com
  missions($)" as char(:NUM))) saw_9, evaluate(':STR',':STR',cast("Process Period".Date as char)) saw_10, WGET."Process Ti
  me" saw_11, WGET.TID saw_12, ':STR' saw_13 FROM TABLE1 WHERE ("Process Period"."Date Range" BETWEEN date ':STR' AND date ':
  STR') AND (Network."Network ID" = :NUM) ORDER BY saw_0, saw_1, saw_2, saw_3, saw_4, saw_5, saw_6, saw_7, saw_8, saw_9, s
  aw_10, saw_11, saw_12, saw_13
 
  84210 18901 65309 35.34 0 4243 28-FEB-12 00:00:00 15-JAN-13 00:00:00
  SELECT WGET."Member ID" saw_0, WGET."Merchant ID" saw_1, WGET."Merchant Name" saw_2, WGET."Order ID" saw_3, evaluate(':S
  TR',':STR',cast("Transaction Period".Date as char)) saw_4, WGET."Transaction Time" saw_5, WGET."SKU Number" saw_6, evalu
  ate(':STR',':STR',cast(WGET."Sales($)" as char(:NUM))) saw_7, WGET.Quantity saw_8, evaluate(':STR',':STR',cast(WGET."Com
  missions($)" as char(:NUM))) saw_9, evaluate(':STR',':STR',cast("Process Period".Date as char)) saw_10, WGET."Process Ti
  me" saw_11, WGET.TID saw_12, ':STR' saw_13 FROM TABLE1 WHERE ("Process Period"."Date Range" BETWEEN date ':STR' AND date ':
  STR') AND (Network."Network ID" = :NUM) AND (Advertiser."Advertiser ID" = :NUM) ORDER BY saw_0, saw_1, saw_2, saw_3, saw
  _4, saw_5, saw_6, saw_7, saw_8, saw_9, saw_10, saw_11, saw_12, saw_13
 
  72378 53443 18935 32.87 0 11967 24-NOV-11 00:00:00 15-JAN-13 00:00:00
  SELECT WGET."Member ID" saw_0, WGET."Merchant ID" saw_1, WGET."Merchant Name" saw_2, WGET."Order ID" saw_3, evaluate(':S
  TR',':STR',cast("Transaction Period".Date as char)) saw_4, WGET."Transaction Time" saw_5, WGET."SKU Number" saw_6, evalu
  ate(':STR',':STR',cast(WGET."Sales($)" as char(:NUM))) saw_7, WGET.Quantity saw_8, evaluate(':STR',':STR',cast(WGET."Com
  missions($)" as char(:NUM))) saw_9, evaluate(':STR',':STR',cast("Process Period".Date as char)) saw_10, WGET."Process Ti
  me" saw_11, WGET.TID saw_12, ':STR' saw_13 FROM TABLE1 WHERE ("Process Period"."Date Range" BETWEEN date ':STR' AND date ':
  STR') AND (Network."Network ID" <> :NUM) ORDER BY saw_0, saw_1, saw_2, saw_3, saw_4, saw_5, saw_6, saw_7, saw_8, saw_9,
  saw_10, saw_11, saw_12, saw_13
 
  ...
 
  9 rows selected.
In Summary: Aggregations, similar to the above, can be done with any SQL source table, including the AWR and v$ views. It can be used on substrings of the statements if the whole statement is over 4000 characters (regexp_replace is bound to the varchar2 limit) or if you want to aggregate queries that are too different from each other. In other words, it's a generic string parser and it's up to you to supply the strings that suits your needs, and that makes it very flexible. :) Some caveats: - It may replace more strings than you want. For example, a column named "col-1" would be translated to "col-:NUM" because 1 is preceded by a minus sign (dash). You can prevent that by removing the minus from the regular expression - It may miss some numeric values if it is preceded by a character I didn't add in the list. You can add the character to the list if you find such case. Always test the result to ensure it doesn't break the translation. Regexp can be very picky and unpredictable... - 4000 characters is all a regexp_replace can handle. Any queries larger than that will need to be truncated or transformed to fit in 4000 characters. Hope you will find this regexp_replace expression as useful as I did. Cheers, Marc

No Comments Yet

Let us know what you think

Subscribe by email