DiffStats and ExplainFull (DSEF) ================================= *Detailed SQL reports for third party help & support* . DiffStats and ExplainFull can generate detailed reports which are useful for troubleshooting performance of a SQL statement, and especially for working with third parties who are helping in the process. It reduces the amount of back-and-forth requests for information by capturing a great deal of commonly useful data about the performance of a SQL statement. The report is textual and can be reviewed before passing it along to a third party. For all tables involved in the query, DSEF will look at every column in the table, and for any column which has them, the report will include the beginning and end of histograms and MCVs. This typically includes the lowest and highest literal values in those columns. Query text and explain plan output can also contain literal data values. These are all used by the query planner, and they are needed - for example - in order to determine if out-of-range statistics [1] are impacting planning. Nevertheless, in some cases it may be necessary to manually redact literals before passing along the report to a third party. The extension consists of a number of functions which are installed into the database. These functions fall into two broad categories: 1. A function that is a wrapper around "EXPLAIN ANALYZE" - besides ensuring that all diagnostics options are used, it also dumps additional information like server version and full planner statistics for all functions and tables referenced by the SQL. 2. A set of functions to capture and report all possible statistics tracked by the database during a test SQL statement execution [1]: for more information about out-of-range statistics, see https://pganalyze.com/blog/5mins-postgres-out-of-range-planner-statistics Installation ------------- The quickest way to install DSEF is to create its functions with this command: curl -O https://raw.githubusercontent.com/ardentperf/dsef/main/sql/dsef.sql psql '97string0' $$); To get even more information (including wait times on Aurora), we wrap our explain analyze inside `ds_start()` and `ds_report()`. This will take a snapshot of system counters and produce a report about which counters changed during query execution. Note that as of 2024, most open source PostgreSQL counters operate at the database level rather than the session level. These can be very useful on a test system that's otherwise idle, besides the debug session running this command. On a busier system, it can still be helpful to have an idea what the overall system profile is during execution. Aurora wait events are captured and reported at both the system and the session level. There is sometimes a short delay in PostgreSQL after query completion and before stat updates become visible, so it helps to include a small `pg_sleep()` before getting the report if you are copy/pasting everything at once rather than keying by hand. NOTE: the `ds_start()` function will attempt to automatically install and use the `pg_proctab` extension, if it is available and the user has privileges. If this causes an error, then the error is ignored and `pg_proctab` statistics are not included in the report. select ds_start(); select * from explain_analyze_full('select * from customers c,customers2 c2 where c.id=c2.id limit 1000000'); select pg_sleep(0.05); select * from ds_report(); More than 10 years ago, Tom Kyte published a short snippet of code called RunStats. Simple yet brilliant - it allowed a SQL statement to be executed twice, compared stats between the two executions, and printed a report. We can do the same thing on PostgreSQL by inserting a call to `ds_capture()` between our two SQL statements. select ds_start(); select * from explain_analyze_full($$ select * from customers c,customers2 c2 where c.id=c2.id limit 1000 $$); select * from ds_capture(); set enable_indexscan=off; select * from explain_analyze_full($$ select * from customers c,customers2 c2 where c.id=c2.id limit 1000 $$); select pg_sleep(0.05); select * from ds_report_diff(); In PostgreSQL, we need to be careful about running DML inside of EXPLAIN ANALYZE because the SQL is actually executed to gather stats. This means that we typically want to wrap the command inside a transaction, and then ROLLBACK. select ds_start(); BEGIN; select * from explain_analyze_full($$ update my_test set t=99 where i=100000 $$); ROLLBACK; select pg_sleep(0.05); select * from ds_report();