#### 3.2 Cost based optimizer (CBO)

CBO follows Expense calculation methodology. All execution plans are tagged with a cost, the one with the lowest cost will be chosen. The higher the cost the more resources will be used by the execution plan, the lower the cost, the more efficient the query is.

CBO uses all available information-statistics and histograms stored in the dictionary, user provided hints and supplied parameter settings to arrive at the cost. CBO generates all possible permutations of access methods and then chooses what fits best. The number of permutations depends on the number of tables present in the query and can sometimes be around 80,000 permutations or even more! Please refer to the parameter section in part 2 of this series for setting related parameters.

CBO may also perform operations such as query transformation, view merging, OR transformation, push join predicates, etc. that would change the original statement and alter existing or add new predicates, all with the aim of deriving new access plans that could be better than the existing ones. Note that transformation does not affect the data that is returned, only the execution path. Please refer to the parameter section in part 2 of this series for information related to this.

**3.2.1 Statistics**

Statistics provide critical input in order for CBO to work properly; these are generated for data storing objects and include information such as the number of rows in a table, distinct values in a column, number of leaf blocks in an index, etc. The more accurate the statistics, the more efficient the results provided by Optimizer. Please refer to the Generating statistics section in part 3 of this series for how this information is generated and how best we can maintain it.

Statistics may be exact or estimated. Statistics generated with a COMPUTE clause analyzes all of the data in the object. This gives the optimizer accurate information to work on and arrive at a good execution plan.

Statistics generated with an ESTIMATE clause analyzes data in the object to the extent of sample size mentioned. Sample size may be specified as number of rows or percentage of rows that should be randomly analyzed to generate the statistics. Optionally block sampling may also be specified. This saves on time if there are many huge tables in the system. The guarantee of good execution plans will depend on how close the estimated value is to the exact values. You can try out your setup at different sample sizes to arrive at an appropriate figure or have different estimation levels for different types of tables, but the idea is to get as close to accuracy as feasible.

Statistics are stored in a data dictionary in tables owned by SYS user. The following views display the statistics collected for tables, columns and indexes.

*For Tables*

**DBA_TABLES**-
**NUM_ROWS**- Number of rows. -
**BLOCKS**- Number of used blocks. -
**EMPTY_BLOCKS**- Number of empty blocks that have never been used. -
**AVG_SPACE**- Average free space (in bytes) in blocks allocated to the table. All empty and free blocks are considered for this. -
**CHAIN_CNT**- Number of chained or migrated rows. -
**AVG_ROW_LEN**- Average row length in bytes. -
**LAST_ANALYZED**- Date when the table was last analyzed. -
**SAMPLE_SIZE**- Sample size provided for ESTIMATE statistics. Equal to NUM_ROWS if COMPUTE. -
**GLOBAL_STATS**- For partitioned tables, YES - statistics collected as a whole, NO - statistics are estimated from statistics. -
**USER_STATS**- Set to YES if user has explicitly set the statistics for the table.

Statistics for individual partitions of a table can be seen from DBA_TAB_PARTITIONS. Cluster statistics is available from DBA_CLUSTERS. Object table statistics are present in DBA_OBJECT_TABLES.

*For Columns*

**DBA_TAB_COLUMNS**-
**NUM_DISTINCT**- Number of distinct values. -
**LOW_VALUE**- Lowest value -
**HIGH_VALUE**- Highest value -
**DENSITY**- Density of the column -
**NUM_NULLS**- Number of records with null value for the concerned column. -
**NUM_BUCKETS**- Number of buckets in histograms. Refer Histograms section. -
**SAMPLE_SIZE**- Sample size provided for ESTIMATE statistics. Equal to total rows if COMPUTE. -
**LAST_ANALYZED**- Date when the table was last analyzed.

DBA_TAB_COL_STATISTICS shows similar data. Partitioned Table column statistics can be viewed from DBA_PART_COL_STATISTICS and DBA_SUBPART_COL_STATISTICS.

*For Indexes*

**DBA_INDEXES**-
**BLEVEL**- Depth of the index, from root to leaf. -
**LEAF_BLOCKS**- Number of leaf blocks. -
**DISTINCT KEYS**- Number of distinct keys. -
**AVG_LEAF_BLOCKS_PER_KEY**- Average number of leaf blocks in which each distinct key appears, should be 1 for unique indexes. -
**AVG_DATA_BLOCKS_PER_KEY**- Average number of blocks in the table that are pointed to by a distinct key. -
**CLUSTERING_FACTOR**- A count that determines the ordering of the index. Index is ordered if count is closer to the number of blocks, i.e., entries in single leaf tend to point to rows in same blocks in the table. Index is randomly ordered if closer to the number of rows, i.e., entries in single leaf are pointing to rows spread across multiple blocks. -
**NUM_ROWS**- Number of rows indexed. -
**SAMPLE_SIZE**- Sample size provided for ESTIMATE statistics. Equal to NUM_ROWS if COMPUTE.. -
**LAST_ANALYZED**- Date when the table was last analyzed. -
**GLOBAL_STATS**- For partitioned indexes, YES - statistics collected as a whole, NO - statistics are estimated from statistics. -
**USER_STATS**- Set to YES if user has explicitly set the statistics for the index. -
**PCT_DIRECT_ACCESS**- For secondary indexes on IOTs, percentage of rows with valid guess.

Statistics for individual partitions of indexes can be seen from DBA_IND_PARTITIONS and DBA_IND_SUBPARTITIONS.

Dictionary tables related to Histogram information are discussed later.