pg_dbms_stats 1.3.1


  1. Name
  2. Summary
  3. Feature Description
  4. Installation
  5. Uninstallation
  6. Usage Example
  7. Restrictions
  8. Architecture
  9. Operating Environment
  10. Related Item
  11. Appendix A. Object List

Name

pg_dbms_stats -- Statistical information management and indirectly influence the execution plan.

Summary

PostgreSQL aggregates values sampled from the table or index by the ANALYZE command and maintains statistical information. Query optimizer calculate the cost of a query using this statistical information and choose lowest cost execution plans.

pg_dbms_stats is help to Query optimizer generate best execution plan. It provide more relevant statistic information to planner, by interspersed with a planner process, to replace the dummy statistics previously created statistics planner's reference. This is useful when you want to reduce the risk when execution plan suddenly changed during operation and the performance of the system is reduced.

Features

pg_dbms_stats has following seven methods to manipulate statistical information used by the Planner. All functions are operated as SQL functions other than Export. For more information on each function please refer object list.

Backup

Summary
In order to reproduce current execution plan in the future, backup current planner statistics.
How to Use
Run backup_<object unit>_stats() SQL function
Detail

Statistical information to be included in the backup can be specified for the database(currently connected), schema, table or particular column of table. For example if you want to save statistic for all table and their columns in schema, specify schema name in function. For large backup specify database or schema.

Current backup information can be found in the dbms_stats.backup_history table. See dbms_stat.backup_history table for more information.

Restore

Summary
Restore earlier backup statistics plan.
How to Use
Run restore_stats() or restore_<object unit>_stats() SQL function.
Detail

Restore the Statistical information of object selected at backup, without affecting other object's statistic. Backupped statistical information can be restore by two way, as follow :

Backup ID
Restore all statistical information contained in the backup by passing a backup ID to restore_stats() function. It is simpler for the regular basis backup operations by specifying database or schema. if you have pg_dbms_stats more than one database, backup ID is unique to each database.
Object + Timestamp
Use restore_<object unit>_stats() function to restore statistic information of object either of the database, schema, table, and column at specified point in time. It's needed backup statistic with older Timestamp and wider range than restoring.
This method is useful when we don't know that which proper execution plan had been selected.
Restore point is fixed at the time of backup, so no need to explicitly locked after restore.

Purge

Summary
In the ability to delete all backup statistics that are no longer needed. You can remove backup using backup ID.
How to Use
Run purge_stats() SQL function
Detail

It can delete backup before any time specified.

Lock

Summary
While selecting statistical information prevent to change by ANALYZE process, acquire lock on it.
How to use
Call lock_<object unit>_stats() SQL function.
Detail

Specify database (currently connected), schemas, table or column to get lock on its statistical information.

Unlock

Summary
In order to restore PostgreSQL execution plan criteria to original state, unlock the acquired lock earlier.
How to Use
Call unlock_<object unit>_stats() SQL function.
Detail

Release lock on pg_class and pg_statistic specify database (currently connected), schemas, table or column to get lock on its statistical information. Also unlocking can be possible by specifying different lock unit.

Export

Summary
Export current statistics in the external file.
How to Use
Create a COPY statement with reference to the sample SQL file(export__stats-.sql.sample), and execute.
Detail

It uses COPY command to export, it need to specify directory where PostgreSQL user have permission to export file. Depending on application you can export statistical information two way as follow:

PostgreSQL statistics
pg_dbms_stats copies statistic information from PostgreSQL store in pg_class and pg_statistic table. Do not install pg_dbms_stats intended to copy statistic from one environment to another. It used to performance tuning and analysis.
Currently valid statistics
Statistics planner looking via pg_dbms_stats currently. It can be edit and turned back in production static information, also can be included in OS file backup.

Sample file - has been installed in the extension sub-directory, display this directory using "pg_config --docdir".

Import

Summary
Restore the statistic information from external file created by Export.
How to use
Call import_<object unit>_stats() SQL function.
Detail

Statistical information can be imported for database(currently connected), schema, table or particular column of table Please place the file you want to import in the directory readable by PostgreSQL running user.

Installation

pg_dbms_stats installation is similar to the standard extension(EXTENSION).

Build and install

Build pg_dbms_stats from source code use make command then to install, log on user who (most case super-user) has appropriate make install permission. Note that correct pg_config should set in environment variable PATH.

$ cd pg_dbms_stats
$ make
$ su
# make install

Register with database

To register with database connect the to database using superuser and Run CREATE EXTENSION pg_dbms_stats command.

Note that dbnameis name of targeted database.

$ psql -d dbname -c "CREATE EXTENSION pg_dbms_stats"

De-registration is done by DROP EXTENSION, dbms_stats schema still exist, drop if necessary.

Load pg_dbms_stats

If you want to enable pg_dbms_stat for particular session please load library as show below. If you want to enable to all session set shared_preload_libraries GUC to 'pg_dbms_stats' in postgresql.conf and restart server.

postgres=# LOAD 'pg_dbms_stats';
LOAD
postgres=#

Note: When you see error similar as below, please follow Register with database. Do not forgot to create pg_dbms_stats extension on targeted database as mention above.

test=# SELECT * FROM test;
ERROR:  schema "dbms_stats" does not exist
LINE 1: SELECT relpages, reltuples, curpages  FROM dbms_stats.relati...
                                                   ^
QUERY:  SELECT relpages, reltuples, curpages  FROM dbms_stats.relation_stats_locked WHERE relid = $1
test=#

Disabling pg_dbms_stats

If you want to disable after having loaded the pg_dbms_stats, please set pg_dbms_stats.use_locked_stats to off, as in the following example

test=# SET pg_dbms_stats.use_locked_stats TO off;
SET
test=# SELECT * FROM test; -- Create a plan by normal standards
...
test=# SET pg_dbms_stats.use_locked_stats TO on;
SET
test=# SELECT * FROM test; -- Create a plan with dummy statistics
...

If you want to disable to all session, set pg_dbms_stats.use_locked_stats to off and restart server. In PostgreSQL 9.1 set pg_dbms_stats.use_locked_stats in postgresql.conf please refer custom_variable_classes.

# postgresql.conf
shared_preload_libraries = 'pg_dbms_stats'
custom_variable_classes = 'pg_dbms_stats'	# Do not use for PG9.2
pg_dbms_stats.use_locked_stats = off
custom_varible_casses are deprecated in PostgreSQL 9.2, you can use pg_dbms_stats.use_locked_stats same as standard GUC parameters.

Uninstallation

To uninstall pg_dbms_stats, follow steps as shown. dbname is targeted dabase name.

  1. log in to OS super user and run make uninstall in pg_dbms_stats source directory.
  2. $ cd pg_dbms_stats
    $ su
    # make uninstall
  3. Please execute DROP EXTENSION command by connecting a super user to a database that has been registered pg_dbms_stats
  4. $ psql -d dbname -c "DROP EXTENSION pg_dbms_stats"
  5. If you do not need (the one you saved in backup function and lock function) pg_dbms_stats statistics, please delete the dbms_stats schema.
  6. $ psql -d dbname -c "DROP SCHEMA dbms_stats"

Usage Example

Operation of pg_dbms_stats is divided in three category as "Backup", "Lock" and "Export". If you cannot determine which practices are appropriate, we recommend start with backup operation.

Backup Statistical Information

In daily service operation, if problem arise backup_xxx() and restore_xxx() functions to backup and restore the statistical information. As long as there is no problem please get backup by specifying database.

To restore there are two method, one in which specify the backup ID that represent the backup and other by specifying timestamp which indicated statistical information at particular point in time. If you want to restore using backup ID, it should match with valid back, there for use same backup ID created at backup generation. When you restore with timestamp restore using backup data using date and time specified for each object. Please restore the database unit by specifying the date and time you want to restore, if there are any issue then try to restore using backup ID.

-- Take a daily backup, and then run ANALYZE.
test=# SELECT dbms_stats.backup_database_stats('comment');
 backup_database_stats
-----------------------
                     1
(1 row)

test=# ANALYZE;
ANALYZE
test=#
-- Restore a day before backup and lock the statistics.
test=# SELECT dbms_stats.restore_database_stats(now() - '1 day');

Note : If you specify the time stamp to restore, the newest statistics before the specified time stamp are restores for each column or table that included in the target object.

The following demonstrates restoring the backup data using timestamp. The time column provide the timestamp information from backup statistics for each table.

test=# SELECT b.id, b.time, r.relname
	 FROM dbms_stats.relation_stats_backup r
	 JOIN dbms_stats.backup_history b ON (r.id=b.id)
	ORDER BY id;
 id |          time          |     relname
----+------------------------+-----------------
  4 | 2012-01-01 00:00:00+09 | public.droptest
  5 | 2012-01-02 12:00:00+09 | public.test
(5 rows)

test=# SELECT dbms_stats.restore_database_stats('2012-01-03 00:00:00+09');
 restore_database_stats
------------------------
 test
 droptest
(2 rows)

test=#

Lock Statistical Information Operation

If you want to acquire lock on object when server start, use lock_xxx() SQL function.

test=# SELECT dbms_stats.lock_database_stats();
 lock_database_stats
---------------------
 droptest
 test
(2 rows)

test=#

Export Statistical Information

If you want to reproduced execution plan of one database to the another, on this targeted database import statistical information using import_xxx() function from exported file, use export_xxx_stat-.sql_sample file to export statistical information as backup using COPY command. Sample file - has been installed in the extension sub-directory, display this directory using "pg_config --docdir".

$ cd pg_dbms_stats
$ psql -d test -f export_effective_stats-9.1.sql
BEGIN
COMMIT
$ psql -d test2 -c "SELECT dbms_stats.import_database_stats('$PWD/export_stats.dmp')"
 import_database_stats
-----------------------

(1 row)

$

Note: Export create export_stats.dmp file in current working directory. If file with the same name already exist, it will overwrite this file. If you want to avoid this overwrite please rename existing file or move to other directory.
Please note that, while importing absolute path of export_stats.dmp required else it will raise error as shown follow:

test=# select dbms_stats.import_database_stats('export_stats.dmp');
ERROR:  relative path not allowed for dbms_stats_export to file
test=#

If you export statistical inforamtion in binary formate and try to import on destination database which is uses other PostgreSQL version, then it will not work, please refer more on COPY command.

Restrictions

While using pg_dbms_stats you should consider some constraints as follow:

Work required prior to use
Before using lock and backup function of pg_dbms_stats, execute ANALYZE command to update statistical information. If you perform lock and backup on statistic which old or does not exist, it will not generate any error, but execution plan can not be controlled.
Timing of backup
There's no function to make ANALYZE or rewriting statistics as trrigers to take backup. You can perform a backup on regular basis using job management tool like cron, run backup. Disable ANALYZE of autovacuum process and run the statistic backup job.
Purpose of tools
This tool is to reproduce the past statistic information, not for new execution plan to try performance tuning.
Impact of GUC parameters
As mention earlier this tool is used to reproduce the past statistic information using saved backup. If GUC parameter affect the execution plan, if those parameter are changed then execution plan too. Please read more on PostgreSQL query planning GUC Parameters.
Conflict with hook functions
pg_dbms_stats uses hook function as follow: get_relation_info_hook can be used together with pg_dbms_stats while loading, but do not use an external tool that uses the remaining three hook function.
Dump-Restore considerations
pg_dbms_stats uses the anyarray type to hold column level statistics information of user-defined tables. It can cause error during restore, since the data type information of column value is lost when dump a text representation. pg_dbms_stats that contains object in export and restore, following steps as shown below:
  1. 「COPY <with dbms_stats schema table name > TO '<file name>' FORMAT binary;、Epg_dbms_stats dump statistical information to '<file_name>' file in binary formate.
  2. 「pgdump --exclude-schema 'dbms_stats' <database name> > <Dump file name>、EDump database except dbms_stats schema.
  3. 「pg_restore -d <targeted database name> <Dump file name>、ERestore dump data to targeted database.
  4. Install pg_dbms_stats.
  5. 「COPY <with dbms_stats schema table name> FROM '<file name>' FORMAT binary;、EIt will restore table of dbms_stats schema from binary backup file.
Object deletion considerations
When you removing object such as column or table from statistic, use unlock and purge function of pg_dbms_stats before. Remove target object before cause you to delete directly by DELETE query from the statisitics pg_dbms_stats kept. Furthermore, in addition while running the restore or import statistics pg_dbms_stats creates its own locking.

Architecture

Overall Structure

pg_dbms_stats govern to hold and provide statistical information to generate execution plan, instead of statistic generated by ANALYZE. To use statistical information can be done by either of following ways.

pg_dbms_stats manages its own statistics in some table/file groups.

Currently valid statistics
Statistics planner looking via pg_dbms_stats currently. It can be edit and turned back in production static information, also can be included in OS file backup.
Backup statistic
Statistical information can be backup any point of time and hold multiple version of statistics. Backup history is maintained in the history table. In restore feature uses to replaces with valid statistics.
Statistic that have been exported
This can be used to export Statistical information to save on OS file system. By changing name of file can be hold the multiple version of Statistics. Same import feature used to load valid statistical information.

pg_dbms_stats provide its own function, it does not change original PostgreSQL statistics(pg_catalog.pg_class and pg_catalog.pg_statistic).

pg_dbms_stats user interface

Each feature of the pg_dbms_stats is implemented in the SQL statement to updating the PostgreSQL statistics from their own statistics table contents described in the previous section. However data inconsistencies occur due to direct execution of SQL statement to delete operation by user. Avoid such situation using these operation through interface provided by pg_dbms_stats is highly recommended.

Disabling ANALYSZE used by autovacuum

pg_dbms_stats doesn't disable or prohibit ANALYZE. Though ANALYZE cause pg_statistic change, pg_dbms_stats can hide those. If you think to repeat the ineffective ANALYZE as a waste of resources, please disable the ANALYZE fork by autovacuum. You can disable ANALYZE by setting thresholds as shown below. Even if you don't disable it, doesn't cause any trouble

For Database cluster

# postgresql.conf
autovacuum_analyze_threshold = 100000
autovacuum_analyze_scale_factor = 100

For Individual Table

test=# ALTER TABLE tbl SET (
  autovacuum_analyze_threshold = 100000,
  autovacuum_analyze_scale_factor = 100);

Statistic Lock

While generation execution plan, the query optimizer calculates the cost on the basis of the state of the actual data or statistics below. pg_dbms_stats can lock all these information. During lock it continued referencing changed file size when it actually changed.

Operating Requirement.

PostgreSQL Virsion
PostgreSQL 9.1, 9.2
OS
RHEL 6.1

Related Item

psql, vacuumdb COPY