an Open-Source SQL miner on PostgreSQL WAL log
XLogMiner is used for parsing the SQL statement out from the PostgreSQL WAL (write ahead logs) logs, and it can also generate the corresponding "undo SQL".
You need configure the WAL level to "logical", and setup the table to "full" mode. For example, the blow statement is setting the table "t1" to "full" mode.
alter table t1 replica identity FULL;
The project was developed based on PostgreSQL 9.4.x, with only basic verification on PostgreSQL 9.6. Please let us know if you find any issues during your use.
- Copy the xlogminer directory into the "../contrib/" directory of the PostgreSQL source code location
- Enter "contrib/xlogminer" directory
- Execute commands
make && make install
create extension xlogminer;
-- Add WAL file or directory
select xlogminer_xlogfile_add('/opt/test/wal');
-- Note: the parameter can be file name or directory name.
-- remove WAL file or directory
select xlogminer_xlogfile_remove('/opt/test/wal');
-- Note: the parameter can be file name or directory name.
-- List WAL files
select xlogminer_xlogfile_list();
select xlogminer_start(’START_TIMSTAMP’,’STOP_TIMESTAMP’,’START_XID’,’STOP_XID’);
-- Run below sql to parse all the WAL logs
select xlogminer_start('null','null',0,0);
- START_TIMESTAMP:Specify the start time condition of the records in the return results, the xlogminer will start parsing from this time value. If this value is NULL, the earlist records will be displayed from the WAL lists. If this time value is not included in the xlog lists, a.k.a all the records are ealier then this value, the NULL will be returned.
- STOP_TIMESTAMP:Specify the ending time condition of the records in the results, the xlogminer will stop parsing when the result is later than this time. If this parameter is NULL, then all the records after START_TIMESTAMP will parsed and displyed in the WAL logs.
- START_XID:Similiar with START_TIMESTAMP, specify the starting XID value
- STOP_XID:Similiar with STOP_TIMESTAMP,specify the ending XID value
select * from xlogminer_contents;
This function is used to free the memory and stop the WAL parsing. No parameters available.
select xlogminer_stop();
create extension xlogminer;
select xlogminer_build_dictionary('/opt/proc/store_dictionary');
-- Note: the parameter can be file name or directory name.
create extension xlogminer;
select xlogminer_load_dictionary('/opt/test/store_dictionary');
-- Note: the parameter can be file name or directory name.
💡 the parameter can be file name or directory name.
-- Add WAL files
select xlogminer_xlogfile_add('/opt/test/wal');
-- Note: the parameter can be file name or directory name.
-- Remove WAL files
select xlogminer_xlogfile_remove('/opt/test/wal');
-- Note:the parameter can be file name or directory name.
-- list WAL files
select xlogminer_xlogfile_list();
-- Note:the parameter can be file name or directory name.
select xlogminer_start(’START_TIMSTAMP’,’STOP_TIMESTAMP’,’START_XID’,’STOP_XID’)
- START_TIMESTAMP:Specify the start time condition of the records in the return results, the xlogminer will start parsing from this time value. If this value is NULL, the earlist records will be displayed from the WAL lists. If this time value is not included in the xlog lists, a.k.a all the records are ealier then this value, the NULL will be returned.
- STOP_TIMESTAMP:Specify the ending time condition of the records in the results, the xlogminer will stop parsing when the result is later than this time. If this parameter is NULL, then all the records after START_TIMESTAMP will parsed and displyed in the WAL logs.
- START_XID:Similiar with START_TIMESTAMP, specify the starting XID value
- STOP_XID:Similiar with STOP_TIMESTAMP,specify the ending XID value
select * from xlogminer_contents;
This function is used to free the memory and stop the WAL parsing. No parameters available.
select xlogminer_stop();
create xxx as select * from xlogminer_contents;
- Only DML statements will be parsed in this version, DDL statement not supported.
- The DML statemes would NOT be parsed out when the below DDL related operations were executed: Deleting/Truncating table, table space modification and column type modification etcs.
- The parsing result is depending on the latest database dictionary. For example, after user1 created table t1, the table owner was modified to user2, then all the parsing results related to table t1 will be marked with user2.
- If WAL logs are missed in a time stage, the SQL statements executed in that time stage would NOT be parsed out.
- The "ctid" attribute is the value of the change "at that time". If there are "ctid" changes due to vacuum or other operations, this value will be inaccurate. We need use this value to determine the corresponding undo tuples when the rows of data are duplicate, it does not mean that you can execute such undo statements directly.
- If the table is not set to full mode, then the "update" and "delete" statement will not be resolved. (Of course, this affects the use of this software, the next version will make improvements to this problem)
- If the database log level is not set to logical, there will be unpredictable lost of the SQL statements
- If the DDL statement "drop" was executed, all related column value will be decoded as "encode('AD976BC56F',hex)" before this DDL execution.
Please contact us with [email protected] if you have any comments or find any bugs, thanks!