環境:既有運作的一個 PostgreSQL instance,其資料夾位在/home/aaa/pgsql/9.5/data/,沒有 tablespace,資料庫的 superuser 為 aaa。
WAL archived 目錄亦放置於同一台主機上/home/aaa/pgsql/wals 裡面。
以下備份操作都在同一主機上進行備份與還原。
1. 在 postgresql.conf 編輯:
[aaa@localhost ~]$ vim /home/aaa/pgsql/9.5/data/postgresql.conf
wal_level = archive # hot_standby、logical 也可以
archive_mode = on
archive_command = 'cp %p /home/aaa/pgsql/wals/%f'
max_wal_senders = 1 # 允許用 pg_basebackup 連入資料庫
2. 在 pg_hba.conf 編輯,允許待會全備份的連線
[aaa@localhost ~]$ vim /home/aaa/pgsql/9.5/data/pg_hba.conf
host replication aaa 127.0.0.1/32 trust
3. 重新啟動資料庫
[aaa@localhost ~]$ pg_ctl -D /home/aaa/pgsql/9.5/data/ restart
4. 使用 pg_basebackup 指令進行資料庫全備份,弄成壓縮檔
[aaa@localhost ~]$ mkdir /home/aaa/pgsql/total/ [aaa@localhost ~]$ pg_basebackup -p 5432 -Ft -z \ > -D /home/aaa/pgsql/total/ NOTICE: pg_stop_backup complete, all required WAL segments have been archived
5. 資料庫繼續進行服務:插入一些資料進去,停頓三十秒,並紀錄時間,如此重複三次。
[aaa@localhost ~]$ psql -p 5432 -d postgres -c "BEGIN;CREATE TABLE \ > dummy1 AS SELECT * FROM pg_class;COMMIT; \ > SELECT pg_sleep(30);SELECT now();" now ------------------------------- 2016-05-10 16:10:01.719448+08 (1 row) [aaa@localhost ~]$ psql -p 5432 -d postgres -c "BEGIN;CREATE TABLE \ > dummy2 AS SELECT * FROM pg_class;COMMIT; \ > SELECT pg_sleep(30);SELECT now();" now ------------------------------- 2016-05-10 16:11:57.732774+08 (1 row) [aaa@localhost ~]$ psql -p 5432 -d postgres -c "BEGIN;CREATE TABLE \ > dummy3 AS SELECT * FROM pg_class;COMMIT; \ > SELECT pg_sleep(30);SELECT now();" now ------------------------------- 2016-05-10 16:14:14.526662+08 (1 row)
6. 模擬資料庫發生異常。使用 pg_ctl 的 immediate mode 會造成關檔不完全,用此進行模擬:
[aaa@localhost ~]$ psql -p 5432 postgres\ > -c "BEGIN;SELECT \ > pg_sleep(3);CREATE TABLE \ > dummy4 AS SELECT * FROM \ > pg_class; SELECT \ > pg_sleep(30);SELECT now(); \ > Commit;" WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost |
[aaa@localhost ~]$ # 左方過個四~五秒再執行強至關閉指令 [aaa@localhost ~]$ [aaa@localhost ~]$ pg_ctl -D \ > /home/aaa/pgsql/9.5/data/ \ > -m immediate stop waiting for server to shut down.... done server stopped |
目前資料庫的 instance 已經有部份損毀。利用既有的內容完成回復作業
7. 先將突然中止的 /home/aaa/pgsql/9.5/data/ 重新命名,留作備用;接著解壓縮全備份
[aaa@localhost ~]$ mv /home/aaa/pgsql/9.5/data/ \ > /home/aaa/pgsql/9.5/data.crashed [aaa@localhost ~]$ mkdir /home/aaa/pgsql/9.5/data/ [aaa@localhost ~]$ chmod 700 /home/aaa/pgsql/9.5/data/ [aaa@localhost ~]$ tar zxf /home/aaa/pgsql/total/base.tar.gz \ > -C /home/aaa/pgsql/9.5/data/
8. 移除目前 /home/aaa/pgsql/9.5/data/ 中被一併打包的 pg_xlog/ 內容以及運作標記的 lock:
[aaa@localhost ~]$ rm /home/aaa/pgsql/9.5/data/postmaster.* [aaa@localhost ~]$ rm /home/aaa/pgsql/9.5/data/pg_xlog/*
9. 將/home/aaa/pgsql/9.5/data.crashed/ 裡面的 pg_xlog/ 內容放到等著回復的/home/aaa/pgsql/9.5/data/pg_xlog/裡面
[aaa@localhost ~]$ cp -rp /home/aaa/pgsql/9.5/data.crashed/pg_xlog/* \ /home/aaa/pgsql/9.5/data/pg_xlog/
10. 設置 recovery.conf,指定回復到先前 dummy2 表格建立後,dummy3 建立之前
[aaa@localhost ~]$ vim /home/aaa/pgsql/9.5/data/recovery.conf
restore_command = 'cp /home/cchsu/pgsql/wala95/%f %p'
recovery_target_time = '2016-05-10 16:11:59'
11. 啟動資料庫,並同時觀察狀況
[aaa@localhost ~]$ pg_ctl -D \ > /home/cchsu/pgsql/95stdby2/ \ > start server starting bash-4.3$ LOG: redirecting log output to logging collector process HINT: Future log output will appear in directory "pg_log". |
[aaa@localhost ~]$ tail -f \ > /home/aaa/pgsql/9.5/data/ \ > pg_log/postgresql-2016-05- \ 10_170312.log LOG: starting point-in-time recovery to 2016-05-10 16:11:59+08 LOG: restored log file "00000004.history" from archive LOG: restored log file "000000040000000000000028" from archive LOG: redo starts at 0/28000060 LOG: consistent recovery state reached at 0/28000130 LOG: database system is ready to accept read only connections LOG: restored log file "000000040000000000000029" from archive LOG: recovery stopping before commit of transaction 1449, time 2016-05-10 16:12:31.941565+08 LOG: recovery has paused HINT: Execute pg_xlog_replay_resume() to continue. |
12. 進入資料庫看
postgres=# select tablename from pg_tables postgres-# where schemaname = 'public' postgres-# order by tablename; tablename ------------------- dummy1 dummy2 (2 rows) postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row)
由此確定資料庫僅回復到所指定的位置了
13. 確定回復目前狀態,正常啟用
postgres=# SELECT pg_xlog_replay_resume(); pg_xlog_replay_resume ----------------------- (1 row) postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- f (1 row)
參考資料:
PostgreSQL Point-in-time Recovery (Incremental Backup)
Point In Time Recovery From Backup using PostgreSQL Continuous Archving - Zetetic
PostgreSQL Point-in-time Recovery (using WAL) | Madilator
Point in time recovery: PITR using pg_basebackup with PostgreSQL 9.2
The Scale-Out Blog: Simple HA with PostgreSQL Point-In-Time Recovery
The Internals of PostgreSQL : Chapter 9 Write Ahead Log (WAL)
select * from depesz; » Blog Archive » Write Ahead Log + Understanding postgresql.conf: checkpoint_segments, checkpoint_timeout, checkpoint_warning
Korry Douglas-PostgreSQL - The Comprehensive Guide