Pythian Blog: Technical Track

Oracle parallel query hints - part 1: PQ_DISTRIBUTE

One of the most powerful features of the Oracle database is its ability to leverage multiple cores spread across many RAC servers to execute a single query. Oracle's optimizer tries its best to execute queries in the most efficient manner, but often the best distribution of data is not obvious from the statistics alone - and some hints are needed to help out. The most important hint is PQ_DISTRIBUTE, but it's also the most tricky to use as it depends on so many other hints. The Oracle documentation offers the bare minimum on usage, and there are no "Oracle Support" articles that help. For something so complex and important, it's a rather big miss from Oracle. There should be examples, use cases etc. But then again .. Microsoft's SQL Server is not any better in describing control nobs for parallel query executions. So we have to rely on the community for those. Let's start with basics (from the docs): There are two distinct use cases, that are covered by the same HINT. I am sure there's a reason for that, but it would've been much clearer for everyone if there were two different hints. Use case #1 ("distribution" case): To control the reshuffling of data during parallel DML and direct path loads. Use case #2 ("outer_distribution, inner_distribution"): To control the reshuffling of data for table joins. Those two are very distinct cases, even though they both involve re-shuffling data between parallel query processes. I will cover Case #2 in this blog, as it's easier to demonstrate and test. The DML cases will be in a future blog in this series. First the documentation is very ambiguous on what is considered "outer" and what is "inner" distributions. But basically the "outer" table is the one mentioned in the PQ_DISTRIBUTE hint. The "inner" table the "current result set". In this case - 1 query joining 2 tables - both are identical in content, so neither is "obvious" for broadcast, so the default is hash distribution for the join, which unfortunately results in a BUFFERED join. This means that the result set will be "staged" before it is returned to the application, causing significant extra work to be done. [sql] select /*+PARALLEL(8) */* from tlarge t1 join tlarge_two t2 on t2.id=t1.id ---------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 105M| 638 (0)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10002 | 100K| 105M| 638 (0)| 00:00:01 | Q1,02 | P->S | QC (RAND) | |* 3 | HASH JOIN BUFFERED | | 100K| 105M| 638 (0)| 00:00:01 | Q1,02 | PCWP | | | 4 | PX RECEIVE | | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | | 5 | PX SEND HASH | :TQ10000 | 100K| 52M| 319 (0)| 00:00:01 | Q1,00 | P->P | HASH | | 6 | PX BLOCK ITERATOR | | 100K| 52M| 319 (0)| 00:00:01 | Q1,00 | PCWC | | | 7 | TABLE ACCESS STORAGE FULL| TLARGE | 100K| 52M| 319 (0)| 00:00:01 | Q1,00 | PCWP | | | 8 | PX RECEIVE | | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | | 9 | PX SEND HASH | :TQ10001 | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | P->P | HASH | | 10 | PX BLOCK ITERATOR | | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | PCWC | | | 11 | TABLE ACCESS STORAGE FULL| TLARGE_TWO | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | ---------------------------------------------------------------------------------------------------------------------------- [/sql] Even if we add a filter condition to one of the tables, that will result in significant reduction of rows (the mod5_id=1 reduces the data by 5x), Oracle still uses the HASH HASH distribution The NO_PX_JOIN_FILTER hint will be explained in a future blog of the series. Here it only helps to simplify the examples. [sql] select /*+PARALLEL(8) NO_PX_JOIN_FILTER(T2) */* from tlarge t1 join tlarge_two t2 on t2.id=t1.id where t1.mod5_id=1 ---------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20000 | 21M| 638 (0)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10002 | 20000 | 21M| 638 (0)| 00:00:01 | Q1,02 | P->S | QC (RAND) | |* 3 | HASH JOIN BUFFERED | | 20000 | 21M| 638 (0)| 00:00:01 | Q1,02 | PCWP | | | 4 | PX RECEIVE | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | | 5 | PX SEND HASH | :TQ10000 | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | P->P | HASH | | 6 | PX BLOCK ITERATOR | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | PCWC | | |* 7 | TABLE ACCESS STORAGE FULL| TLARGE | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | PCWP | | | 8 | PX RECEIVE | | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | | 9 | PX SEND HASH | :TQ10001 | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | P->P | HASH | | 10 | PX BLOCK ITERATOR | | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | PCWC | | | 11 | TABLE ACCESS STORAGE FULL| TLARGE_TWO | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | ---------------------------------------------------------------------------------------------------------------------------- [/sql] And now switching the distribution method to BROADCAST Note1 the NO_PX_JOIN_FILTER is again to simplify the example and will be discussed further in the series. Note2 the NO_PQ_REPLICATE hint is used to to demonstrate what is happening behind the scenes, and is explained at the end of the blog. : [sql] select /*+PARALLEL(8) NO_PX_JOIN_FILTER(T2) NO_PQ_REPLICATE(T2) PQ_DISTRIBUTE(T2 BROADCAST NONE)*/* from tlarge t1 join tlarge_two t2 on t2.id=t1.id where t1.mod5_id=1 ; ---------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20000 | 21M| 638 (0)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 20000 | 21M| 638 (0)| 00:00:01 | Q1,01 | P->S | QC (RAND) | |* 3 | HASH JOIN | | 20000 | 21M| 638 (0)| 00:00:01 | Q1,01 | PCWP | | | 4 | PX RECEIVE | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | | 5 | PX SEND BROADCAST | :TQ10000 | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | P->P | BROADCAST | | 6 | PX BLOCK ITERATOR | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | PCWC | | |* 7 | TABLE ACCESS STORAGE FULL| TLARGE | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | PCWP | | | 8 | PX BLOCK ITERATOR | | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | PCWC | | | 9 | TABLE ACCESS STORAGE FULL | TLARGE_TWO | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | ---------------------------------------------------------------------------------------------------------------------------- [/sql] Notice the BUFFERED hash join is gone, and now we have a broadcast operator. This plan is much more efficient, but consumes a bit more memory to execute. Each of the 8 parallel processes would have a copy of its own TLARGE table, but AFTER filtering on mod5_id. You can see this is estimated at 10 MB * 8 processes = 80 MB total. Back on topic - the PQ_DISTRIBUTE hint here says following. When you join the T2 table, take the current result set and broadcast it to all processes that are about to read T2. If you want to switch and have the TLARGE_TWO table to be broadcasted and have a similar plan, you need a lot more hints. The query looks like this: [sql] select /*+PARALLEL(8) NO_PX_JOIN_FILTER(T2) NO_PQ_REPLICATE(T1) PQ_DISTRIBUTE(T1 BROADCAST NONE) LEADING(t2 t1)*/* from tlarge t1 join tlarge_two t2 on t2.id=t1.id where t1.mod5_id=1 ; ---------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20000 | 21M| 638 (0)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 20000 | 21M| 638 (0)| 00:00:01 | Q1,01 | P->S | QC (RAND) | |* 3 | HASH JOIN | | 20000 | 21M| 638 (0)| 00:00:01 | Q1,01 | PCWP | | | 4 | PX RECEIVE | | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | | 5 | PX SEND BROADCAST | :TQ10000 | 100K| 52M| 319 (0)| 00:00:01 | Q1,00 | P->P | BROADCAST | | 6 | PX BLOCK ITERATOR | | 100K| 52M| 319 (0)| 00:00:01 | Q1,00 | PCWC | | | 7 | TABLE ACCESS STORAGE FULL| TLARGE_TWO | 100K| 52M| 319 (0)| 00:00:01 | Q1,00 | PCWP | | | 8 | PX BLOCK ITERATOR | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWC | | |* 9 | TABLE ACCESS STORAGE FULL | TLARGE | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | ---------------------------------------------------------------------------------------------------------------------------- [/sql] Explanation: 1. You need to change the join order - so that T1 is joined SECOND. If it is first, then "T1" is never "joined" 2. You need to change the PQ_DISTRIBUTE so that it has instructions on what to do when joining T1 to the result set. Now interestingly, you can do the exact same thing in a slightly different fashion. See query bellow: [sql] select /*+PARALLEL(8) NO_PX_JOIN_FILTER(T2) NO_PQ_REPLICATE(T2) PQ_DISTRIBUTE(T2 NONE BROADCAST ) LEADING(t1 t2) SWAP_JOIN_INPUTS(t2)*/* from tlarge t1 join tlarge_two t2 on t2.id=t1.id where t1.mod5_id=1 ; ---------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20000 | 21M| 638 (0)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 20000 | 21M| 638 (0)| 00:00:01 | Q1,01 | P->S | QC (RAND) | |* 3 | HASH JOIN | | 20000 | 21M| 638 (0)| 00:00:01 | Q1,01 | PCWP | | | 4 | PX RECEIVE | | 100K| 52M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | | 5 | PX SEND BROADCAST | :TQ10000 | 100K| 52M| 319 (0)| 00:00:01 | Q1,00 | P->P | BROADCAST | | 6 | PX BLOCK ITERATOR | | 100K| 52M| 319 (0)| 00:00:01 | Q1,00 | PCWC | | | 7 | TABLE ACCESS STORAGE FULL| TLARGE_TWO | 100K| 52M| 319 (0)| 00:00:01 | Q1,00 | PCWP | | | 8 | PX BLOCK ITERATOR | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWC | | |* 9 | TABLE ACCESS STORAGE FULL | TLARGE | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | ---------------------------------------------------------------------------------------------------------------------------- [/sql] You will notice that the plan is identical. However the way we reach is very different and very important to understand. This one goes like this: 1. Join order remains the same - T2 is joined to solo result set of T1 2. The PQ_DISTRIBUTE hint now has "NONE BROADCAST" which says: When you join T2 do nothing with the current result set, but broadcast T2 to every process 3. SWAP_JOIN_INPUTS - instructs the optimizer to reverse the order of which table is hashed into memory, and which table is then scanned while probing the in memory table. What's important to understand is that when we have 2 tables, the end result is the same with these different approaches, because "the current result set" is just the other table. But with 3 or more tables "the current result set" can be 2 tables joined. Let's illustrate with a query: [sql] select /*+PARALLEL(8) NO_PX_JOIN_FILTER(T2) NO_PQ_REPLICATE(T2) */* from tlarge t1 join tlarge_two t2 on t2.id=t1.id join tsmall ts on ts.id = t1.id where t1.mod5_id=1 ; ----------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | ----------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 15M| 671 (1)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10003 | 10000 | 15M| 671 (1)| 00:00:01 | Q1,03 | P->S | QC (RAND) | |* 3 | HASH JOIN BUFFERED | | 10000 | 15M| 671 (1)| 00:00:01 | Q1,03 | PCWP | | |* 4 | HASH JOIN | | 10000 | 10M| 352 (1)| 00:00:01 | Q1,03 | PCWP | | | 5 | PX RECEIVE | | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,03 | PCWP | | | 6 | PX SEND HASH | :TQ10000 | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,00 | P->P | HASH | | 7 | PX BLOCK ITERATOR | | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,00 | PCWC | | | 8 | TABLE ACCESS STORAGE FULL| TSMALL | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,00 | PCWP | | | 9 | PX RECEIVE | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,03 | PCWP | | | 10 | PX SEND HASH | :TQ10001 | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | P->P | HASH | | 11 | PX BLOCK ITERATOR | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWC | | |* 12 | TABLE ACCESS STORAGE FULL| TLARGE | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | | 13 | PX RECEIVE | | 100K| 52M| 319 (0)| 00:00:01 | Q1,03 | PCWP | | | 14 | PX SEND HASH | :TQ10002 | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | P->P | HASH | | 15 | PX BLOCK ITERATOR | | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | PCWC | | | 16 | TABLE ACCESS STORAGE FULL | TLARGE_TWO | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | ----------------------------------------------------------------------------------------------------------------------------- [/sql] With the default, Oracle chooses HASH HASH for each join. We have only 1 hash join because the join conditions are on the same column, but if there were slightly different - there would be two BUFFERED hash joins and 2 re-shuffles. I've fabricated this by just adding a "+0" to the join condition: [sql] select /*+PARALLEL(8) NO_PX_JOIN_FILTER(T2) NO_PQ_REPLICATE(T2) */* from tlarge t1 join tlarge_two t2 on t2.id=t1.id join tsmall ts on ts.id = t1.id+0 where t1.mod5_id=1 ; ------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 15M| 671 (1)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10004 | 10000 | 15M| 671 (1)| 00:00:01 | Q1,04 | P->S | QC (RAND) | |* 3 | HASH JOIN BUFFERED | | 10000 | 15M| 671 (1)| 00:00:01 | Q1,04 | PCWP | | | 4 | PX RECEIVE | | 10000 | 10M| 352 (1)| 00:00:01 | Q1,04 | PCWP | | | 5 | PX SEND HASH | :TQ10002 | 10000 | 10M| 352 (1)| 00:00:01 | Q1,02 | P->P | HASH | |* 6 | HASH JOIN BUFFERED | | 10000 | 10M| 352 (1)| 00:00:01 | Q1,02 | PCWP | | | 7 | PX RECEIVE | | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,02 | PCWP | | | 8 | PX SEND HASH | :TQ10000 | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,00 | P->P | HASH | | 9 | PX BLOCK ITERATOR | | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,00 | PCWC | | | 10 | TABLE ACCESS STORAGE FULL| TSMALL | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,00 | PCWP | | | 11 | PX RECEIVE | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | | 12 | PX SEND HASH | :TQ10001 | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | P->P | HASH | | 13 | PX BLOCK ITERATOR | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWC | | |* 14 | TABLE ACCESS STORAGE FULL| TLARGE | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | | 15 | PX RECEIVE | | 100K| 52M| 319 (0)| 00:00:01 | Q1,04 | PCWP | | | 16 | PX SEND HASH | :TQ10003 | 100K| 52M| 319 (0)| 00:00:01 | Q1,03 | P->P | HASH | | 17 | PX BLOCK ITERATOR | | 100K| 52M| 319 (0)| 00:00:01 | Q1,03 | PCWC | | | 18 | TABLE ACCESS STORAGE FULL | TLARGE_TWO | 100K| 52M| 319 (0)| 00:00:01 | Q1,03 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------- [/sql] The extra re-shuffling is at step 5 - where the PX SEND HASH is after the HASH JOIN BUFFERED (going up the tree). The reason why there is no extra re-shuffling in the original case, is because "the current result set" is already distributed to each processing server by the join key, so only "new" tables being joined need to be re-distributed. Let's continue to explore the original 3 table case Let's try to introduce BROADCAST there. The JOIN order is: TS, T1, T2 The logical thing to do is to say formulate PQ_DISTRIBUTE as follow: When you Join T1, please broadcast it. Unfortunately this results in the following: [sql] select /*+PARALLEL(8) NO_PX_JOIN_FILTER(T2) NO_PQ_REPLICATE(T1) NO_PQ_REPLICATE(T2) PQ_DISTRIBUTE(T1 NONE BROADCAST) */* from tlarge t1 join tlarge_two t2 on t2.id=t1.id join tsmall ts on ts.id = t1.id where t1.mod5_id=1 ; ------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 15M| 671 (1)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10003 | 10000 | 15M| 671 (1)| 00:00:01 | Q1,03 | P->S | QC (RAND) | |* 3 | HASH JOIN BUFFERED | | 10000 | 15M| 671 (1)| 00:00:01 | Q1,03 | PCWP | | | 4 | PX RECEIVE | | 10000 | 10M| 352 (1)| 00:00:01 | Q1,03 | PCWP | | | 5 | PX SEND HASH | :TQ10001 | 10000 | 10M| 352 (1)| 00:00:01 | Q1,01 | P->P | HASH | |* 6 | HASH JOIN BUFFERED | | 10000 | 10M| 352 (1)| 00:00:01 | Q1,01 | PCWP | | | 7 | PX BLOCK ITERATOR | | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,01 | PCWC | | | 8 | TABLE ACCESS STORAGE FULL | TSMALL | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,01 | PCWP | | | 9 | PX RECEIVE | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | | 10 | PX SEND BROADCAST | :TQ10000 | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | P->P | BROADCAST | | 11 | PX BLOCK ITERATOR | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | PCWC | | |* 12 | TABLE ACCESS STORAGE FULL| TLARGE | 20000 | 10M| 319 (0)| 00:00:01 | Q1,00 | PCWP | | | 13 | PX RECEIVE | | 100K| 52M| 319 (0)| 00:00:01 | Q1,03 | PCWP | | | 14 | PX SEND HASH | :TQ10002 | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | P->P | HASH | | 15 | PX BLOCK ITERATOR | | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | PCWC | | | 16 | TABLE ACCESS STORAGE FULL | TLARGE_TWO | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------- [/sql] This is a rather strange plan, and what it actually does in steps 6-11 (which are the TS and T1 joins steps) is very inefficient. It's worth explaining in order to better understand parallel execution. The join order is TS, T1, T2 1. The query starts with reading TS and preparing it to join to T1. A set of PQ processes read the table in chunks (Step 7: PX BLOCK ITERATOR) and build an in-memory HASH table by the join key 2. A different set of PQ processes read the T1 table in chunks (Step 11: PX BLOCK ITERATOR) and broadcast the rows to each of the other set of slaves 3. As slaves Set 1 receive records, they probe them against the in-memory table and stage them (Step 6: HASH JOIN BUFFERED) etc. You can already see this is not the outcome we wanted. In the previous two table examples, we actually used PQ_DISTRIBUTE(T1 BROADCAST NONE) - with much better results. This is conceptually wrong as the hint actually says "take the current result set and broadcast it to all processes", but it works for two table joins. So lets try that: [sql] select /*+PARALLEL(8) NO_PX_JOIN_FILTER(T2) NO_PQ_REPLICATE(T1) NO_PQ_REPLICATE(T2) PQ_DISTRIBUTE(T1 BROADCAST NONE ) */* from tlarge t1 join tlarge_two t2 on t2.id=t1.id join tsmall ts on ts.id = t1.id where t1.mod5_id=1 ; ------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 15M| 671 (1)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10003 | 10000 | 15M| 671 (1)| 00:00:01 | Q1,03 | P->S | QC (RAND) | |* 3 | HASH JOIN BUFFERED | | 10000 | 15M| 671 (1)| 00:00:01 | Q1,03 | PCWP | | | 4 | PX RECEIVE | | 10000 | 10M| 352 (1)| 00:00:01 | Q1,03 | PCWP | | | 5 | PX SEND HASH | :TQ10001 | 10000 | 10M| 352 (1)| 00:00:01 | Q1,01 | P->P | HASH | |* 6 | HASH JOIN | | 10000 | 10M| 352 (1)| 00:00:01 | Q1,01 | PCWP | | | 7 | PX RECEIVE | | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,01 | PCWP | | | 8 | PX SEND BROADCAST | :TQ10000 | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,00 | P->P | BROADCAST | | 9 | PX BLOCK ITERATOR | | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,00 | PCWC | | | 10 | TABLE ACCESS STORAGE FULL| TSMALL | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,00 | PCWP | | | 11 | PX BLOCK ITERATOR | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWC | | |* 12 | TABLE ACCESS STORAGE FULL | TLARGE | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | | 13 | PX RECEIVE | | 100K| 52M| 319 (0)| 00:00:01 | Q1,03 | PCWP | | | 14 | PX SEND HASH | :TQ10002 | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | P->P | HASH | | 15 | PX BLOCK ITERATOR | | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | PCWC | | | 16 | TABLE ACCESS STORAGE FULL | TLARGE_TWO | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------- [/sql] Instant success, we have the plan we want for the joining of TS and T1. Now let's apply the same approach to T2 (TLARGE_TWO) [sql] select /*+PARALLEL(8) NO_PX_JOIN_FILTER(T2) NO_PQ_REPLICATE(T1) NO_PQ_REPLICATE(T2) PQ_DISTRIBUTE(T1 BROADCAST NONE ) PQ_DISTRIBUTE(T2 BROADCAST NONE ) */* from tlarge t1 join tlarge_two t2 on t2.id=t1.id join tsmall ts on ts.id = t1.id where t1.mod5_id=1 ; ------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 15M| 671 (1)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10002 | 10000 | 15M| 671 (1)| 00:00:01 | Q1,02 | P->S | QC (RAND) | |* 3 | HASH JOIN | | 10000 | 15M| 671 (1)| 00:00:01 | Q1,02 | PCWP | | | 4 | PX RECEIVE | | 10000 | 10M| 352 (1)| 00:00:01 | Q1,02 | PCWP | | | 5 | PX SEND BROADCAST | :TQ10001 | 10000 | 10M| 352 (1)| 00:00:01 | Q1,01 | P->P | BROADCAST | |* 6 | HASH JOIN | | 10000 | 10M| 352 (1)| 00:00:01 | Q1,01 | PCWP | | | 7 | PX RECEIVE | | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,01 | PCWP | | | 8 | PX SEND BROADCAST | :TQ10000 | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,00 | P->P | BROADCAST | | 9 | PX BLOCK ITERATOR | | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,00 | PCWC | | | 10 | TABLE ACCESS STORAGE FULL| TSMALL | 10000 | 5390K| 32 (0)| 00:00:01 | Q1,00 | PCWP | | | 11 | PX BLOCK ITERATOR | | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWC | | |* 12 | TABLE ACCESS STORAGE FULL | TLARGE | 20000 | 10M| 319 (0)| 00:00:01 | Q1,01 | PCWP | | | 13 | PX BLOCK ITERATOR | | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | PCWC | | | 14 | TABLE ACCESS STORAGE FULL | TLARGE_TWO | 100K| 52M| 319 (0)| 00:00:01 | Q1,02 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------- [/sql] It appears we have excellent results - no HASH JOIN BUFFERED - and we have the BROADCAST - but unfortunately it's far from perfect. In this case the resulting join between TS (TSMALL) and T1 (TLARGE) is broadcasted to every process (Step 5: PX SEND BROADCAST). If the join results in many records - then a lot of data will be broadcasted. And with each additional table - if the number of records grow, the amount of data broadcasted just increases. Now if we use the OTHER method - with the SWAP_JOIN_INPUTS and the PROPER use of PQ_DISTRIBUTE hint, the query looks like this: [sql] select /*+PARALLEL(8) NO_PX_JOIN_FILTER(T2) NO_PQ_REPLICATE(T1) NO_PQ_REPLICATE(T2) LEADING(TS, T1, T2) PQ_DISTRIBUTE(T1 NONE BROADCAST ) SWAP_JOIN_INPUTS(T1) PQ_DISTRIBUTE(T2 NONE BROADCAST ) SWAP_JOIN_INPUTS(T2) */* from tlarge t1 join tlarge_two t2 on t2.id=t1.id join tsmall ts on ts.id = t1.id where t1.mod5_id=1 ; ----------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | TQ |IN-OUT| PQ Distrib | ---------------------------------------------

No Comments Yet

Let us know what you think

Subscribe by email