Question : Are Expression statistics really worth the trouble ?
Answer : Yes they provide the optimizer with accurate information about selectivity / cardinality
When a function, an expression is applied to a column in the where clause, the optimizer doesn' t know the right selectivity / cardinality.
When we gather stats for the expression the optimizer knows the right selectivity / cardinality
Here is an example
SQL> set autotrace traceonly
SQL> alter session set events='10053 trace name context forever,level 1';
Session altered.
SQL>
SQL> select * from T_ORDER_OR where CR_FK = 'EUR' and extract(year from OR_ORDERDATE) = '2012';
250564 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value:
3547894390
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2506 | 589K| 6914 (1)| 00:01:23 |
|* 1 | TABLE ACCESS FULL| T_ORDER_OR | 2506 | 589K| 6914 (1)| 00:01:23 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CR_FK"='EUR' AND EXTRACT(YEAR FROM
INTERNAL_FUNCTION("OR_ORDERDATE"))=2012)
Statistics
----------------------------------------------------------
24 recursive calls
5 db block gets
269789 consistent gets
25145 physical reads
1636 redo size
166582238 bytes sent via SQL*Net to client
71160699 bytes received via SQL*Net from client
501130 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
250564 rows processed
#################################################################################################
Access path analysis for T_ORDER_OR
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for T_ORDER_OR[T_ORDER_OR]
Column (#9):
NewDensity:0.124779, OldDensity:0.000000 BktCnt:1000000, PopBktCnt:1000000, PopValCnt:4, NDV:4
Column (#9): CR_FK(
AvgLen: 4 NDV: 4 Nulls: 0 Density: 0.124779
Histogram: Freq #Bkts: 4 UncompBkts: 1000000 EndPtVals: 4Table: T_ORDER_OR Alias: T_ORDER_OR
Card: Original: 1000000.000000 Rounded: 2506 Computed: 2505.64 Non Adjusted: 2505.64
Access Path: TableScan
Cost: 6913.94 Resp: 6913.94 Degree: 0
Cost_io: 6880.00 Cost_cpu: 678673412
Resp_io: 6880.00 Resp_cpu: 678673412
Access Path: index (RangeScan)
Index: OR_IDX4
resc_io: 251490.00 resc_cpu: 2056568786
ix_sel: 0.250564 ix_sel_with_filters: 0.250564
Cost: 251592.85 Resp: 251592.85 Degree: 1
Best:: AccessPath: TableScan
Cost: 6913.94 Degree: 1 Resp: 6913.94 Card: 2505.64 Bytes: 0
#################################################################################################
There are 2 filters
CR_FK = 'EUR' since there are only 4 currencies for which there are orders the NDV for conumn 9 is 4
extract ( year or_orderdate = '2012' ) : The optimizer does not have an idea about the right selectivity / cardinality, the optimizer assumes here a selectivity iof 0.01 ( 1% )
But execution lists that this assumed selectivity is totally wrong, since all the orders were ordered in 2012
SQL> select dbms_stats.create_extended_stats(null,'T_ORDER_OR','(extract(year from OR_ORDERDATE))') from dual;
DBMS_STATS.CREATE_EXTENDED_STATS(NULL,'T_ORDER_OR','(EXTRACT(YEARFROMOR_ORDERDATE))')
--------------------------------------------------------------------------------------------------------------
SYS_STUBQ5SF4KAJNMA00YIWL3XX9L
SQL> begin
2 dbms_stats.gather_table_stats(user,'T_ORDER_OR',estimate_percent => 100, method_opt => 'FOR ALL COLUMNS SIZE AUTO',no_invalidate => false);
3 end;
4 /
PL/SQL procedure successfully completed.
SQL> set autotrace traceonly
SQL> alter session set events='10053 trace name context forever,level 1';
Session altered.
SQL>
SQL> select * from T_ORDER_OR where CR_FK = 'EUR' and extract(year from OR_ORDERDATE) = '2012';
250564 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value:
3547894390
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250K| 58M| 6914 (1)| 00:01:23 |
|* 1 | TABLE ACCESS FULL| T_ORDER_OR | 250K| 58M| 6914 (1)| 00:01:23 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CR_FK"='EUR' AND EXTRACT(YEAR FROM
INTERNAL_FUNCTION("OR_ORDERDATE"))=2012)
Statistics
----------------------------------------------------------
31 recursive calls
7 db block gets
269791 consistent gets
25146 physical reads
1588 redo size
166582238 bytes sent via SQL*Net to client
71160699 bytes received via SQL*Net from client
501130 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
250564 rows processed
#################################################################################################
Access path analysis for T_ORDER_OR
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for T_ORDER_OR[T_ORDER_OR]
Column (#9):
NewDensity:0.124779, OldDensity:0.000000 BktCnt:1000000, PopBktCnt:1000000, PopValCnt:4, NDV:4
Column (#9): CR_FK(
AvgLen: 4 NDV: 4 Nulls: 0 Density: 0.124779
Histogram: Freq #Bkts: 4 UncompBkts: 1000000 EndPtVals: 4
Column (#10): SYS_STUBQ5SF4KAJNMA00YIWL3XX9L(
AvgLen: 4 NDV: 1 Nulls: 0 Density: 1.000000 Min: 2012 Max: 2012
Column (#6): OR_TOTALAMOUNT(
AvgLen: 6 NDV: 952029 Nulls: 0 Density: 0.000001 Min: 50 Max: 100000
ColGroup (#1, Index) OR_IDX4
Col#: 6 9 CorStregth: 3.86
ColGroup Usage:: PredCnt: 2 Matches Full: Partial:
***** Virtual column Adjustment ******
Column name SYS_STUBQ5SF4KAJNMA00YIWL3XX9L
cost_cpu 550.00
cost_io 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00
***** End virtual column Adjustment ******
Table: T_ORDER_OR Alias: T_ORDER_OR
Card: Original: 1000000.000000 Rounded: 250564 Computed: 250564.00 Non Adjusted: 250564.00
Access Path: TableScan
Cost: 6913.94 Resp: 6913.94 Degree: 0
Cost_io: 6880.00 Cost_cpu: 678673412
Resp_io: 6880.00 Resp_cpu: 678673412
Access Path: index (RangeScan)
Index: OR_IDX4
resc_io: 251490.00 resc_cpu: 2056568786
ix_sel: 0.250564 ix_sel_with_filters: 0.250564
Cost: 251592.85 Resp: 251592.85 Degree: 1
Best:: AccessPath: TableScan
Cost: 6913.94 Degree: 1 Resp: 6913.94 Card: 250564.00 Bytes: 0
***************************************
#################################################################################################
Because the optimizer assumes a selective cardinality for the t_order_or row source, t_order_or , t_order_or is the driving table and joined using a nested loop join with t_delivery_dl using index DL_IDX1
SQL> begin
2 dbms_stats.drop_extended_stats(null,'T_ORDER_OR','(extract(year from OR_ORDERDATE))');
3 end;
4 /
PL/SQL procedure successfully completed.
SQL> select * from T_ORDER_OR,
2 T_DELIVERY_DL
3 where OR_FK = OR_PK
4 and CR_FK = 'EUR' and extract(year from OR_ORDERDATE) = '2012'
5 and su_fk = 1;
2635 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 319855565
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 26 | 12272 | 6985 (1)| 00:01:24 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 26 | 12272 | 6985 (1)| 00:01:24 |
|* 3 | TABLE ACCESS FULL | T_ORDER_OR | 26 | 6266 | 6907 (1)| 00:01:23 |
|* 4 | INDEX RANGE SCAN | DL_IDX1 | 1 | | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| T_DELIVERY_DL | 1 | 231 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("SU_FK"=1 AND "CR_FK"='EUR' AND EXTRACT(YEAR FROM
INTERNAL_FUNCTION("OR_ORDERDATE"))=2012)
4 - access("OR_FK"="OR_PK")
Statistics
----------------------------------------------------------
31 recursive calls
7 db block gets
33369 consistent gets
29411 physical reads
1732 redo size
3060326 bytes sent via SQL*Net to client
1468218 bytes received via SQL*Net from client
7907 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2635 rows processed >> estimated cardinality of 26 rows is NOT close to the real number of rows returned
Because the optimizer knows that access to the t_order_or row source is not selectif at all, t_order_or , t_order_or is the hash table table and joined using a hash join with t_delivery_dl
SQL> select dbms_stats.create_extended_stats(null,'T_ORDER_OR','(extract(year from OR_ORDERDATE))') from dual;
SQL>
SQL> begin
2 dbms_stats.gather_table_stats(user,'T_ORDER_OR',estimate_percent => 100, method_opt => 'FOR ALL COLUMNS SIZE AUTO',no_invalidate => false);
3 end;
4 /
PL/SQL procedure successfully completed.
SQL> select * from T_ORDER_OR,
2 T_DELIVERY_DL
3 where OR_FK = OR_PK
4 and CR_FK = 'EUR' and extract(year from OR_ORDERDATE) = '2012'
5 and su_fk = 1;
2635 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1384926585
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2584 | 1201K| 13538 (1)| 00:02:43 |
|* 1 | HASH JOIN | | 2584 | 1201K| 13538 (1)| 00:02:43 |
|* 2 | TABLE ACCESS FULL| T_ORDER_OR | 2580 | 617K| 6907 (1)| 00:01:23 |
| 3 | TABLE ACCESS FULL| T_DELIVERY_DL | 1001K| 220M| 6625 (1)| 00:01:20 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("OR_FK"="OR_PK")
2 - filter("SU_FK"=1 AND "CR_FK"='EUR' AND EXTRACT(YEAR FROM
INTERNAL_FUNCTION("OR_ORDERDATE"))=2012)
Statistics
----------------------------------------------------------
38 recursive calls
6 db block gets
51991 consistent gets
49010 physical reads
1588 redo size
3060326 bytes sent via SQL*Net to client
1468218 bytes received via SQL*Net from client
7907 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
2635 rows processed >> estimated cardinality of 2584 is close to the real number of rows returned