pg_dbms_stats -- Statistical information management and indirectly influence the execution plan.
PostgreSQL aggregates values sampled from the table or index by the ANALYZE command and maintains statistical information. Query optimizer calculate the cost of a query using this statistical information and choose lowest cost execution plans.
pg_dbms_stats is help to Query optimizer generate best execution plan. It provide more relevant statistic information to planner, by interspersed with a planner process, to replace the dummy statistics previously created statistics planner's reference. This is useful when you want to reduce the risk when execution plan suddenly changed during operation and the performance of the system is reduced.
pg_dbms_stats has following seven methods to manipulate statistical information used by the Planner. All functions are operated as SQL functions other than Export. For more information on each function please refer object list.
Statistical information to be included in the backup can be specified for the database(currently connected), schema, table or particular column of table. For example if you want to save statistic for all table and their columns in schema, specify schema name in function. For large backup specify database or schema.
Current backup information can be found in the dbms_stats.backup_history table. See dbms_stat.backup_history table for more information.
Restore the Statistical information of object selected at backup, without affecting other object's statistic. Backupped statistical information can be restore by two way, as follow :
It can delete backup before any time specified.
Specify database (currently connected), schemas, table or column to get lock on its statistical information.
Release lock on pg_class and pg_statistic specify database (currently connected), schemas, table or column to get lock on its statistical information. Also unlocking can be possible by specifying different lock unit.
It uses COPY command to export, it need to specify directory where PostgreSQL user have permission to export file. Depending on application you can export statistical information two way as follow:
Sample file - has been installed in the extension sub-directory, display this directory using "pg_config --docdir".
Statistical information can be imported for database(currently connected), schema, table or particular column of table Please place the file you want to import in the directory readable by PostgreSQL running user.
pg_dbms_stats installation is similar to the standard extension(EXTENSION).
Build pg_dbms_stats from source code use make command then to install, log on user who (most case super-user) has appropriate make install permission. Note that correct pg_config should set in environment variable PATH.
$ cd pg_dbms_stats $ make $ su # make install
To register with database connect the to database using superuser and Run CREATE EXTENSION pg_dbms_stats command.
Note that dbnameis name of targeted database.
$ psql -d dbname -c "CREATE EXTENSION pg_dbms_stats"
De-registration is done by DROP EXTENSION, dbms_stats schema still exist, drop if necessary.
If you want to enable pg_dbms_stat for particular session please load library as show below. If you want to enable to all session set shared_preload_libraries GUC to 'pg_dbms_stats' in postgresql.conf and restart server.
postgres=# LOAD 'pg_dbms_stats'; LOAD postgres=#
Note: When you see error similar as below, please follow Register with database. Do not forgot to create pg_dbms_stats extension on targeted database as mention above.
test=# SELECT * FROM test; ERROR: schema "dbms_stats" does not exist LINE 1: SELECT relpages, reltuples, curpages FROM dbms_stats.relati... ^ QUERY: SELECT relpages, reltuples, curpages FROM dbms_stats.relation_stats_locked WHERE relid = $1 test=#
If you want to disable after having loaded the pg_dbms_stats, please set pg_dbms_stats.use_locked_stats to off, as in the following example
test=# SET pg_dbms_stats.use_locked_stats TO off; SET test=# SELECT * FROM test; -- Create a plan by normal standards ... test=# SET pg_dbms_stats.use_locked_stats TO on; SET test=# SELECT * FROM test; -- Create a plan with dummy statistics ...
If you want to disable to all session, set pg_dbms_stats.use_locked_stats to off and restart server. In PostgreSQL 9.1 set pg_dbms_stats.use_locked_stats in postgresql.conf please refer custom_variable_classes.
# postgresql.conf shared_preload_libraries = 'pg_dbms_stats' custom_variable_classes = 'pg_dbms_stats' # Do not use for PG9.2 pg_dbms_stats.use_locked_stats = offcustom_varible_casses are deprecated in PostgreSQL 9.2, you can use pg_dbms_stats.use_locked_stats same as standard GUC parameters.
To uninstall pg_dbms_stats, follow steps as shown. dbname is targeted dabase name.
$ cd pg_dbms_stats $ su # make uninstall
$ psql -d dbname -c "DROP EXTENSION pg_dbms_stats"
$ psql -d dbname -c "DROP SCHEMA dbms_stats"
Operation of pg_dbms_stats is divided in three category as "Backup", "Lock" and "Export". If you cannot determine which practices are appropriate, we recommend start with backup operation.
In daily service operation, if problem arise backup_xxx() and restore_xxx() functions to backup and restore the statistical information. As long as there is no problem please get backup by specifying database.
To restore there are two method, one in which specify the backup ID that represent the backup and other by specifying timestamp which indicated statistical information at particular point in time. If you want to restore using backup ID, it should match with valid back, there for use same backup ID created at backup generation. When you restore with timestamp restore using backup data using date and time specified for each object. Please restore the database unit by specifying the date and time you want to restore, if there are any issue then try to restore using backup ID.
-- Take a daily backup, and then run ANALYZE. test=# SELECT dbms_stats.backup_database_stats('comment'); backup_database_stats ----------------------- 1 (1 row) test=# ANALYZE; ANALYZE test=#
-- Restore a day before backup and lock the statistics. test=# SELECT dbms_stats.restore_database_stats(now() - '1 day');
Note : If you specify the time stamp to restore, the newest statistics before the specified time stamp are restores for each column or table that included in the target object.
The following demonstrates restoring the backup data using timestamp. The time column provide the timestamp information from backup statistics for each table.
test=# SELECT b.id, b.time, r.relname FROM dbms_stats.relation_stats_backup r JOIN dbms_stats.backup_history b ON (r.id=b.id) ORDER BY id; id | time | relname ----+------------------------+----------------- 4 | 2012-01-01 00:00:00+09 | public.droptest 5 | 2012-01-02 12:00:00+09 | public.test (5 rows) test=# SELECT dbms_stats.restore_database_stats('2012-01-03 00:00:00+09'); restore_database_stats ------------------------ test droptest (2 rows) test=#
If you want to acquire lock on object when server start, use lock_xxx() SQL function.
test=# SELECT dbms_stats.lock_database_stats(); lock_database_stats --------------------- droptest test (2 rows) test=#
If you want to reproduced execution plan of one database to the another, on this targeted database import statistical information using import_xxx() function from exported file, use export_xxx_stat-.sql_sample file to export statistical information as backup using COPY command. Sample file - has been installed in the extension sub-directory, display this directory using "pg_config --docdir".
$ cd pg_dbms_stats $ psql -d test -f export_effective_stats-9.1.sql BEGIN COMMIT $ psql -d test2 -c "SELECT dbms_stats.import_database_stats('$PWD/export_stats.dmp')" import_database_stats ----------------------- (1 row) $
Note:
Export create export_stats.dmp file in current working directory. If file with the same name already exist, it will overwrite this file.
If you want to avoid this overwrite please rename existing file or move to other directory.
Please note that, while importing absolute path of export_stats.dmp required else it will raise error as shown follow:
test=# select dbms_stats.import_database_stats('export_stats.dmp'); ERROR: relative path not allowed for dbms_stats_export to file test=#
If you export statistical inforamtion in binary formate and try to import on destination database which is uses other PostgreSQL version, then it will not work, please refer more on COPY command.
While using pg_dbms_stats you should consider some constraints as follow:
pg_dbms_stats govern to hold and provide statistical information to generate execution plan, instead of statistic generated by ANALYZE. To use statistical information can be done by either of following ways.
pg_dbms_stats manages its own statistics in some table/file groups.
pg_dbms_stats provide its own function, it does not change original PostgreSQL statistics(pg_catalog.pg_class and pg_catalog.pg_statistic).
Each feature of the pg_dbms_stats is implemented in the SQL statement to updating the PostgreSQL statistics from their own statistics table contents described in the previous section. However data inconsistencies occur due to direct execution of SQL statement to delete operation by user. Avoid such situation using these operation through interface provided by pg_dbms_stats is highly recommended.
pg_dbms_stats doesn't disable or prohibit ANALYZE. Though ANALYZE cause pg_statistic change, pg_dbms_stats can hide those. If you think to repeat the ineffective ANALYZE as a waste of resources, please disable the ANALYZE fork by autovacuum. You can disable ANALYZE by setting thresholds as shown below. Even if you don't disable it, doesn't cause any trouble
For Database cluster
# postgresql.conf autovacuum_analyze_threshold = 100000 autovacuum_analyze_scale_factor = 100
For Individual Table
test=# ALTER TABLE tbl SET ( autovacuum_analyze_threshold = 100000, autovacuum_analyze_scale_factor = 100);
While generation execution plan, the query optimizer calculates the cost on the basis of the state of the actual data or statistics below. pg_dbms_stats can lock all these information. During lock it continued referencing changed file size when it actually changed.