Pythian Blog: Technical Track

Oracle parallel query hints - part 3: PX_JOIN_FILTER

In part one of this series on Oracle Parallel Query Hints, I looked at PQ_DISTRIBUTE, and part two was about PQ_Replicate. Now, for the third part in the series, I will take a look at PX_JOIN>FILTER. The PX_JOIN_FILTER is a very interesting hint. It has been around since Oracle 10.2 and it is still not a documented hint. That's more than 10 years ago. The hint's behaviour is mostly well understood. This hint became much more interesting with the introduction of Oracle Exadata as there are some very cool efficiencies. It's a mystery to me why this is not in the official documentation. For starters it used to be the only hint that talks about Parallel eXecution (PX) as opposed to Parallel Query (PQ). I think it has to do with the fact that it was considered an execution optimisation as opposed to a different execution plan. Purpose: The "JOIN FILTER" is a feature used during data re-shuffling between the parallel query processes. It allows each parallel query process to "pre-filter" the data before sending it over using a BLOOM filter. It can only happen with the HASH distribution, however the Exadata functionality also works with the BROADCAST feature by leveraging storage indexes. The topic of BLOOM filters, their performance aspect and etc is quite complex, and in this reference we only cover the usage guide. I suggest reviewing the slides of Christian Antognini on the topic here: Bloom Filters presentation Picking up from the example in Part 1 where I specifically disabled join filters: [sql] select /*+PARALLEL(8) NO_PX_JOIN_FILTER(T2) */ * from tlarge t1 join tlarge_two t2 on t2.id=t1.id where t1.mod5_id=1 ; ---------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20000 | 21M| 638 (0)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10002 | 20000 | 21M| 638 (0)| 00:00:01 | Q1,02 | P->S | QC (RAND) | |* 3 | HASH JOIN BUFFERED | | 20000 | 21M| 638 (0)| 00:00:01 | Q1,02 | PCWP | | | 4 | PX RECEIVE | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | | 5 | PX SEND HASH | :TQ10000 | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | P->P | HASH | | 6 | PX BLOCK ITERATOR | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | PCWC | | |* 7 | TABLE ACCESS STORAGE FULL| TLARGE | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | PCWP | | | 8 | PX RECEIVE | | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | | 9 | PX SEND HASH | :TQ10001 | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | P->P | HASH | | 10 | PX BLOCK ITERATOR | | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | PCWC | | | 11 | TABLE ACCESS STORAGE FULL| TLARGE_TWO | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | ---------------------------------------------------------------------------------------------------------------------------- [/sql] And here's the plan with the JOIN FILTER enabled [sql] select /*+PARALLEL(8) PX_JOIN_FILTER(T2) */ * from tlarge t1 join tlarge_two t2 on t2.id=t1.id where t1.mod5_id=1 ; ----------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | ----------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20000 | 21M| 638 (0)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10002 | 20000 | 21M| 638 (0)| 00:00:01 | Q1,02 | P->S | QC (RAND) | |* 3 | HASH JOIN BUFFERED | | 20000 | 21M| 638 (0)| 00:00:01 | Q1,02 | PCWP | | | 4 | JOIN FILTER CREATE | :BF0000 | 20000 | 10M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | | 5 | PX RECEIVE | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | | 6 | PX SEND HASH | :TQ10000 | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | P->P | HASH | | 7 | PX BLOCK ITERATOR | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | PCWC | | |* 8 | TABLE ACCESS STORAGE FULL| TLARGE | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | PCWP | | | 9 | PX RECEIVE | | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | | 10 | PX SEND HASH | :TQ10001 | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | P->P | HASH | | 11 | JOIN FILTER USE | :BF0000 | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | | 12 | PX BLOCK ITERATOR | | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | PCWC | | |* 13 | TABLE ACCESS STORAGE FULL| TLARGE_TWO | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | ----------------------------------------------------------------------------------------------------------------------------- [/sql] So here's what happens in the plan above with the JOIN FILTER : - PQ Set A reads TLARGE in chunks, applies fixed predicates (mod5_Id=1) hashes the rows on the join key and sends the data to PQ Set B - During this work, PQ Set A creates a BLOOM filter (:BF0000) on the data AFTER the fixed filter predicates. The BLOOM filter is exchanged between PQ Set A processes - PQ Set A starts reading rows from TLARGE_TWO applies all fixed filter predicates (none in this case) and then checks the BLOOM FILTER for "presence" before sending the rows to PQ Set B. This optimisation can reduce the rows exchanged between PQ processes quite dramatically. It all depends on the filter predicates used. This reduction of data exchange has different effect in different configurations - Single server (no Oracle RAC) - result is reduction in CPU usage and as a result reduction in elapsed time - Oracle RAC - if the PQ Query ran on multiple RAC nodes - this optimisation reduces the network traffic as well as CPU time - which results in a more significant reduction in elapsed time - Oracle Exadata with RAC - same effect as in Oracle RAC case, however in the case of a "SMART SCAN" (or storage scan) the BLOOM filter is pushed down (sent) to the storage nodes - and the filtering happens before any data reaches the compute nodes, reducing processing time significantly as now a much larger number of CPUs can participate in the filtering. And finally, the BLOOM filters can interact with Oracle's storage indexes (aka min/max index) where entire blocks can be skipped if data being looked for is outside of the min/max range. This last piece can have a dramatic speed-up effect as it reduces disk read IO. This optimisation is far less known and not promoted at all. A couple of notes: 1. This optimisation usually doesn't happen if you don't have any filter predicates on the hashed table. It's a cost based optimisation. 2. Do not confuse this with BLOOM FILTER partition elimination See example bellow where both JOIN FILTER and PART JOIN FILTER are used (two different features). The PART JOIN FILTER also relies on BLOOM filters, however it is used to eliminate reading certain partitions altogether, but that will be covered in a future blog. It's an amazing feature for large delta loads. [sql] select /*+PARALLEL(8) PX_JOIN_FILTER(t2) */ * from tlarge t1 join tlarge_p t2 on t2.id=t1.id where t1.mod5_id=1 ; -------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20000 | 16M| 644 (0)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 20000 | 16M| 644 (0)| 00:00:01 | | | Q1,01 | P->S | QC (RAND) | |* 3 | HASH JOIN | | 20000 | 16M| 644 (0)| 00:00:01 | | | Q1,01 | PCWP | | | 4 | JOIN FILTER CREATE | :BF0001 | 20000 | 10M| 319 (0)| 00:00:01 | | | Q1,01 | PCWP | | | 5 | PART JOIN FILTER CREATE | :BF0000 | 20000 | 10M| 319 (0)| 00:00:01 | | | Q1,01 | PCWP | | | 6 | PX RECEIVE | | 20000 | 10M| 319 (0)| 00:00:01 | | | Q1,01 | PCWP | | | 7 | PX SEND PARTITION (KEY) | :TQ10000 | 20000 | 10M| 319 (0)| 00:00:01 | | | Q1,00 | P->P | PART (KEY) | | 8 | PX BLOCK ITERATOR | | 20000 | 10M| 319 (0)| 00:00:01 | | | Q1,00 | PCWC | | |* 9 | TABLE ACCESS STORAGE FULL| TLARGE | 20000 | 10M| 319 (0)| 00:00:01 | | | Q1,00 | PCWP | | | 10 | JOIN FILTER USE | :BF0001 | 88447 | 28M| 325 (0)| 00:00:01 | | | Q1,01 | PCWP | | | 11 | PX PARTITION HASH JOIN-FILTER| | 88447 | 28M| 325 (0)| 00:00:01 |:BF0000|:BF0000| Q1,01 | PCWC | | |* 12 | TABLE ACCESS STORAGE FULL | TLARGE_P | 88447 | 28M| 325 (0)| 00:00:01 |:BF0000|:BF0000| Q1,01 | PCWP | | -------------------------------------------------------------------------------------------------------------------------------------------- [/sql] Conclusion PX_JOIN_FILTER is one of those hints that can have a very positive effect on the performance of a query. It is supposed to happen automatically, but when it doesn't - there's a hint to use. In the past I have designed ETL specifically around those features. In those cases I hint for a very specific execution plan as it needs to match the design intent.

No Comments Yet

Let us know what you think

Subscribe by email