SQL Tuning

An introduction to SQL Tuning : Index Clustering Factor

User Rating:  / 2
PoorBest 
Parent Category: Articles
Created on Friday, 09 March 2012 12:40
Last Updated on Tuesday, 10 December 2019 14:44
Published on Friday, 09 March 2012 12:40
Written by Guy Lambregts
Hits: 4272

An introduction to SQL Tuning : Index Clustering Factor

 

The choice between full table scan and index access does not only depend on the amount of distinct index keys but also on the index clustering factor.


As always we use the SQL Tuning sample schema we set up.

We have 1.000.000 order in the T_ORDER_OR table
We have 99 different suppliers for which there are orders in the T_ORDER_OR table
We have 4 different currencies used in the T_ORDER_OR table

Question Q1 : We only would like the orders for a given supplier and a given currency, this is less than 1 percent of the data.
Will an index be used ?  We have an index on the supplier column, so we suppose the index will be used, don' t we ?



SQL> connect tuning/TUNING
Connected.
SQL> explain plan for select * from t_order_or, t_supplier_su
where su_pk = su_fk
and su_name like '%COS%'
and cr_fk = 'EUR'; 

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
Plan hash value: 3229324847

--------------------------------------------------------------------------------------
| Id  | Operation          |    Name       | Rows  | Bytes | Cost (%CPU)| Time       |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |               | 12864 |  5577K|  6912   (1)|   00:01:23 |
|*  1 |  HASH JOIN         |               | 12864 |  5577K|  6912   (1)|   00:01:23 |
|*  2 |   TABLE ACCESS FULL| T_SUPPLIER_SU |     5 |  1015 |     3   (0)|   00:00:01 |
|*  3 |   TABLE ACCESS FULL| T_ORDER_OR    |   254K|    58M|  6907   (1)|   00:01:23 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("SU_PK"="SU_FK")
2 - filter("SU_NAME" LIKE '%COS%' AND "SU_NAME" IS NOT NULL)
3 - filter("CR_FK"='EUR')

17 rows selected..


The answer is :
No the index (OR_IDX1) is not used. The choise between full table scan ( T_ORDER_OR ) and index access (OR_IDX1) does not only depend on the amount of distinct index keys (99) but also on the index clustering factor. Since orders were randomly inserted they were not added ordered by supplier. If the orders would have been ordered by supplier then the index would have been used, ie the index would have had a significant better clustering factor.
The choice between full table scan and index access does not only depend on the amount of distinct index keys but also on the index clustering factor.

There we go.


SQL> create table T_ORDER_SORTED_OR as select * from T_ORDER_OR order by su_fk;     

Table created.

SQL> create index SORTED_OR_IDX1 on T_ORDER_SORTED_OR (su_fk) tablespace users;

Index created.

-- stats on the new table
SQL> begin
2  dbms_stats.gather_table_stats('TUNING','T_ORDER_SORTED_OR',cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE AUTO');
3  end;
4  /

PL/SQL procedure successfully completed.

-- stats gathered in the same manner on the original table
SQL> begin
2  dbms_stats.gather_table_stats('TUNING','T_ORDER_OR',cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE AUTO');
3  end;
4  /

PL/SQL procedure successfully completed.

SQL> select index_name,table_name,blevel,leaf_blocks,distinct_keys,AVG_LEAF_BLOCKS_PER_KEY,AVG_DATA_BLOCKS_PER_KEY,CLUSTERING_FACTOR,NUM_ROWS from user_indexes where table_name in ('T_ORDER_OR','T_ORDER_SORTED_OR') and index_type != 'LOB' order by 1;

INDEX_NAME             TABLE_NAME         BLEVEL LEAF_BLOCKS    DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR     NUM_ROWS
-------------------- -------------------- ------ ----------- ---------------- ----------------------- ----------------------- ----------------- ----------------
OR_IDX1                T_ORDER_OR            2    1954               99               19                      8341         825761      1000000
OR_IDX2                T_ORDER_OR            2    2653              432                6                        58          25418      1000000
OR_PK                  T_ORDER_OR            2    1875          1000000                1                         1          25000      1000000
SORTED_OR_IDX1         T_ORDER_SORTED_OR     2    1954               99               19                       252          25000      1000000


When we compare the index stats between T_ORDER_OR.OR_IDX1 and T_ORDER_SORTED.SORTED_OR_IDX1 then we notice

1.the number of leaf blocks is the same,
2.the number of distict keys is the same
3.the number of average leaf blocks per key is the same
4.there is a huge difference between the clustering factor and average data blocks per key

The bad clustering factor for the T_ORDER_OR.OR_IDX1 index means that per index key ( per supplier ) we need to visit a lot of of data blocks to get all the table entries for that key ( randomly added data )
The good clustering factor for the T_ORDER_SORTED.SORTED_OR_IDX1 index means that per index key ( per supplier ) we do NOT need to visit a lot of of data blocks to get all the table entries for that key ( sorted ordered by index key )



SQL_SESSION1> alter session set events '10053 trace name context forever, level 1';

Session altered.

SQL_SESSION1>  explain plan for select * from t_order_or, t_supplier_su
where su_pk = su_fk
and su_name like '%COS%'
and cr_fk = 'EUR'; 

Explained.

SQL_SESSION1>  select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
Plan hash value: 3229324847

--------------------------------------------------------------------------------------
| Id  | Operation          | Name          | Rows  | Bytes | Cost (%CPU)| Time       |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |               | 12957 |  5618K|  6912   (1)|   00:01:23 |
|*  1 |  HASH JOIN         |               | 12957 |  5618K|  6912   (1)|   00:01:23 |
|*  2 |   TABLE ACCESS FULL| T_SUPPLIER_SU |     5 |  1015 |     3   (0)|   00:00:01 |
|*  3 |   TABLE ACCESS FULL| T_ORDER_OR    |   256K|    58M|  6907   (1)|   00:01:23 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("SU_PK"="SU_FK")
2 - filter("SU_NAME" LIKE '%COS%' AND "SU_NAME" IS NOT NULL)
3 - filter("CR_FK"='EUR')

17 rows selected.




SQL_SESSION1> disconnect

SQL_SESSION2> connect tuning/TUNING
Connected.
SQL_SESSION2> alter session set events '10053 trace name context forever, level 1';

Session altered.

SQL_SESSION2> explain plan for select * from t_order_sorted_or, t_supplier_su
where su_pk = su_fk
and su_name like '%COS%'
and cr_fk = 'EUR'; 

Explained.

SQL_SESSION2> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
Plan hash value: 1060742342

---------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                    | 12626 |  5474K|  1357   (1)| 00:00:17 |
|   1 |  NESTED LOOPS                |                    |       |       |            |          |
|   2 |   NESTED LOOPS               |                    | 12626 |  5474K|  1357   (1)| 00:00:17 |
|*  3 |    TABLE ACCESS FULL         | T_SUPPLIER_SU      |     5 |  1015 |     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN          | SORTED_OR_IDX1     | 10101 |       |    20   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| T_ORDER_SORTED_OR  |  2525 |   594K|   273   (0)| 00:00:04 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("SU_NAME" LIKE '%COS%' AND "SU_NAME" IS NOT NULL)
4 - access("SU_PK"="SU_FK")
5 - filter("CR_FK"='EUR')

19 rows selected.



SESSION1 : 10053 CBO Trace

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Table Stats::
Table: T_ORDER_OR  Alias: T_ORDER_OR
#Rows: 1000000  #Blks:  25397  AvgRowLen:  241.00  ChainCnt:  0.00
Column (#2): SU_FK(
AvgLen: 3 NDV: 99 Nulls: 0 Density: 0.003460 Min: 1 Max: 99
Histogram: Freq  #Bkts: 99  UncompBkts: 5492  EndPtVals: 99
Index Stats::
Index: OR_IDX1  Col#: 2
LVLS: 2  #LB: 1954  #DK: 99  LB/K: 19.00  DB/K: 8341.00  CLUF: 825761.00
Index: OR_IDX2  Col#: 3
LVLS: 2  #LB: 2653  #DK: 432  LB/K: 6.00  DB/K: 58.00  CLUF: 25418.00
Index: OR_PK  Col#: 1
LVLS: 2  #LB: 1875  #DK: 1000000  LB/K: 1.00  DB/K: 1.00  CLUF: 25000.00
Index: SYS_IL0000079756C00008$$  Col#:    (NOT ANALYZED)
LVLS: 1  #LB: 25  #DK: 100  LB/K: 1.00  DB/K: 1.00  CLUF: 800.00
Access path analysis for T_ORDER_OR
***************************************



OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T_SUPPLIER_SU[T_SUPPLIER_SU]#0  T_ORDER_SORTED_OR[T_ORDER_SORTED_OR]#1

***************
Now joining: T_ORDER_SORTED_OR[T_ORDER_SORTED_OR]#1
***************
NL Join
Outer table: Card: 5.00  Cost: 3.00  Resp: 3.00  Degree: 1  Bytes: 203
Access path analysis for T_ORDER_OR
Inner table: T_ORDER_OR  Alias: T_ORDER_OR
Access Path: TableScan
NL Join:  Cost: 34532.24  Resp: 34532.24  Degree: 1
Cost_io: 34397.00  Cost_cpu: 2704378866
Resp_io: 34397.00  Resp_cpu: 2704378866
Access Path: index (AllEqJoinGuess)
Index: OR_IDX1
resc_io: 8362.00  resc_cpu: 65207402
ix_sel: 0.010101  ix_sel_with_filters: 0.010101
NL Join : Cost: 41412.05  Resp: 41412.05  Degree: 1
Cost_io: 41395.90  Cost_cpu: 322947630
Resp_io: 41395.90  Resp_cpu: 322947630

Best NL cost: 34532.24
resc: 34532.24  resc_io: 34397.00  resc_cpu: 2704378866
resp: 34532.24  resp_io: 34397.00  resc_cpu: 2704378866
Join Card:  12957.322681 = outer (5.000000) * inner (256554.989075) * sel (0.010101)
Join Card - Rounded: 12957 Computed: 12957.32
Outer table:  T_SUPPLIER_SU  Alias: T_SUPPLIER_SU
resc: 3.00  card 5.00  bytes: 203  deg: 1  resp: 3.00
Inner table:  T_ORDER_OR  Alias: T_ORDER_OR
resc: 6907.05  card: 256554.99  bytes: 241  deg: 1  resp: 6907.05
using dmeth: 2  #groups: 1
SORT ressource         Sort statistics
Sort width:         247 Area size:      217088 Max Area size:    43620352
Degree:               1
Blocks to Sort: 1 Row size:     234 Total Rows:              5
Initial runs:   1 Merge passes:  0 IO Cost / pass:          0
Total IO sort cost: 0      Total CPU sort cost: 19996699
Total Temp space used: 0
SORT ressource         Sort statistics
Sort width:         247 Area size:      217088 Max Area size:    43620352
Degree:               1
Blocks to Sort: 8670 Row size:     276 Total Rows:         256555
Initial runs:   2 Merge passes:  1 IO Cost / pass:       4698
Total IO sort cost: 13368      Total CPU sort cost: 441205750
Total Temp space used: 140125000
SM join: Resc: 20301.12  Resp: 20301.12  [multiMatchCost=0.00]
SM Join
SM cost: 20301.12
resc: 20301.12 resc_io: 20251.00 resc_cpu: 1002128468
resp: 20301.12 resp_io: 20251.00 resp_cpu: 1002128468
SM Join (with index on outer)
Access Path: index (FullScan)
Index: SU_PK
resc_io: 4.00  resc_cpu: 77586
ix_sel: 1.000000  ix_sel_with_filters: 1.000000
Cost: 4.00  Resp: 4.00  Degree: 1
Outer table:  T_SUPPLIER_SU  Alias: T_SUPPLIER_SU
resc: 4.00  card 5.00  bytes: 203  deg: 1  resp: 4.00
Inner table:  T_ORDER_OR  Alias: T_ORDER_OR
resc: 6907.05  card: 256554.99  bytes: 241  deg: 1  resp: 6907.05
using dmeth: 2  #groups: 1
SORT ressource         Sort statistics
Sort width:         247 Area size:      217088 Max Area size:    43620352
Degree:               1
Blocks to Sort: 8670 Row size:     276 Total Rows:         256555
Initial runs:   2 Merge passes:  1 IO Cost / pass:       4698
Total IO sort cost: 13368      Total CPU sort cost: 441205750
Total Temp space used: 140125000
SM join: Resc: 20301.12  Resp: 20301.12  [multiMatchCost=0.00]
Outer table:  T_SUPPLIER_SU  Alias: T_SUPPLIER_SU
resc: 3.00  card 5.00  bytes: 203  deg: 1  resp: 3.00
Inner table:  T_ORDER_OR  Alias: T_ORDER_OR
resc: 6907.05  card: 256554.99  bytes: 241  deg: 1  resp: 6907.05
using dmeth: 2  #groups: 1
Cost per ptn: 1.78  #ptns: 1
hash_area: 124 (max=10650) buildfrag: 1  probefrag: 7924  ppasses: 1
Hash join: Resc: 6911.83  Resp: 6911.83  [multiMatchCost=0.00]
HA Join
HA cost: 6911.83
resc: 6911.83 resc_io: 6883.00 resc_cpu: 576580357
resp: 6911.83 resp_io: 6883.00 resp_cpu: 576580357
Best:: JoinMethod: Hash
Cost: 6911.83  Degree: 1  Resp: 6911.83  Card: 12957.32 Bytes: 444


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


SESSION2 : 10053 CBO Trace

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Table Stats::
Table: T_ORDER_SORTED_OR  Alias: T_ORDER_SORTED_OR
#Rows: 1000000  #Blks:  25203  AvgRowLen:  241.00  ChainCnt:  0.00
Column (#2): SU_FK(
AvgLen: 3 NDV: 99 Nulls: 0 Density: 0.010101 Min: 1 Max: 99
Index Stats::
Index: SORTED_OR_IDX1  Col#: 2
LVLS: 2  #LB: 1954  #DK: 99  LB/K: 19.00  DB/K: 252.00  CLUF: 25000.00
Index: SYS_IL0000080083C00008$$  Col#:    (NOT ANALYZED)
LVLS: 1  #LB: 25  #DK: 100  LB/K: 1.00  DB/K: 1.00  CLUF: 800.00
Access path analysis for T_ORDER_SORTED_OR


***************************************
GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T_SUPPLIER_SU[T_SUPPLIER_SU]#0  T_ORDER_SORTED_OR[T_ORDER_SORTED_OR]#1

***************
Now joining: T_ORDER_SORTED_OR[T_ORDER_SORTED_OR]#1
***************
NL Join
Outer table: Card: 5.00  Cost: 3.00  Resp: 3.00  Degree: 1  Bytes: 203
Access path analysis for T_ORDER_SORTED_OR
Inner table: T_ORDER_SORTED_OR  Alias: T_ORDER_SORTED_OR
Access Path: TableScan
NL Join:  Cost: 34268.90  Resp: 34268.90  Degree: 1
Cost_io: 34134.00  Cost_cpu: 2697471069
Resp_io: 34134.00  Resp_cpu: 2697471069
Access Path: index (AllEqJoinGuess)
Index: SORTED_OR_IDX1
resc_io: 273.00  resc_cpu: 7602074
ix_sel: 0.010101  ix_sel_with_filters: 0.010101
NL Join : Cost: 1357.24  Resp: 1357.24  Degree: 1
Cost_io: 1355.35  Cost_cpu: 37801256
Resp_io: 1355.35  Resp_cpu: 37801256

Best NL cost: 1357.24
resc: 1357.24  resc_io: 1355.35  resc_cpu: 37801256
resp: 1357.24  resp_io: 1355.35  resc_cpu: 37801256
Join Card:  12626.262626 = outer (5.000000) * inner (250000.000000) * sel (0.010101)
Join Card - Rounded: 12626 Computed: 12626.26
Outer table:  T_SUPPLIER_SU  Alias: T_SUPPLIER_SU
resc: 3.00  card 5.00  bytes: 203  deg: 1  resp: 3.00
Inner table:  T_ORDER_SORTED_OR  Alias: T_ORDER_SORTED_OR
resc: 6854.98  card: 250000.00  bytes: 241  deg: 1  resp: 6854.98
using dmeth: 2  #groups: 1
SORT ressource         Sort statistics
Sort width:         247 Area size:      217088 Max Area size:    43620352
Degree:               1
Blocks to Sort: 1 Row size:     234 Total Rows:              5
Initial runs:   1 Merge passes:  0 IO Cost / pass:          0
Total IO sort cost: 0      Total CPU sort cost: 19996699
Total Temp space used: 0
SORT ressource         Sort statistics
Sort width:         247 Area size:      217088 Max Area size:    43620352
Degree:               1
Blocks to Sort: 8448 Row size:     276 Total Rows:         250000
Initial runs:   2 Merge passes:  1 IO Cost / pass:       4576
Total IO sort cost: 13024      Total CPU sort cost: 430011387
Total Temp space used: 136553000
SM join: Resc: 19904.49  Resp: 19904.49  [multiMatchCost=0.00]
SM Join
SM cost: 19904.49
resc: 19904.49 resc_io: 19855.00 resc_cpu: 989552546
resp: 19904.49 resp_io: 19855.00 resp_cpu: 989552546
SM Join (with index on outer)
Access Path: index (FullScan)
Index: SU_PK
resc_io: 4.00  resc_cpu: 77586
ix_sel: 1.000000  ix_sel_with_filters: 1.000000
Cost: 4.00  Resp: 4.00  Degree: 1
Outer table:  T_SUPPLIER_SU  Alias: T_SUPPLIER_SU
resc: 4.00  card 5.00  bytes: 203  deg: 1  resp: 4.00
Inner table:  T_ORDER_SORTED_OR  Alias: T_ORDER_SORTED_OR
resc: 6854.98  card: 250000.00  bytes: 241  deg: 1  resp: 6854.98
using dmeth: 2  #groups: 1
SORT ressource         Sort statistics
Sort width:         247 Area size:      217088 Max Area size:    43620352
Degree:               1
Blocks to Sort: 8448 Row size:     276 Total Rows:         250000
Initial runs:   2 Merge passes:  1 IO Cost / pass:       4576
Total IO sort cost: 13024      Total CPU sort cost: 430011387
Total Temp space used: 136553000
SM join: Resc: 19904.49  Resp: 19904.49  [multiMatchCost=0.00]
Outer table:  T_SUPPLIER_SU  Alias: T_SUPPLIER_SU
resc: 3.00  card 5.00  bytes: 203  deg: 1  resp: 3.00
Inner table:  T_ORDER_SORTED_OR  Alias: T_ORDER_SORTED_OR
resc: 6854.98  card: 250000.00  bytes: 241  deg: 1  resp: 6854.98
using dmeth: 2  #groups: 1
Cost per ptn: 1.75  #ptns: 1
hash_area: 124 (max=10650) buildfrag: 1  probefrag: 7721  ppasses: 1
Hash join: Resc: 6859.73  Resp: 6859.73  [multiMatchCost=0.00]
HA Join
HA cost: 6859.73
resc: 6859.73 resc_io: 6831.00 resc_cpu: 574543297
resp: 6859.73 resp_io: 6831.00 resp_cpu: 574543297
Best:: JoinMethod: NestedLoop
Cost: 1357.24  Degree: 1  Resp: 1357.24  Card: 12626.26 Bytes: 444