渡鴉之丘: Point-in-Time Recovery 操作筆記

環境：既有運作的一個 PostgreSQL instance，其資料夾位在/home/aaa/pgsql/9.5/data/，沒有 tablespace，資料庫的 superuser 為 aaa。

WAL archived 目錄亦放置於同一台主機上/home/aaa/pgsql/wals 裡面。

以下備份操作都在同一主機上進行備份與還原。

1. 在 postgresql.conf 編輯：

[aaa@localhost ~]$ vim /home/aaa/pgsql/9.5/data/postgresql.conf

wal_level = archive    # hot_standby、logical 也可以
archive_mode = on
archive_command = 'cp %p /home/aaa/pgsql/wals/%f'
max_wal_senders = 1    # 允許用 pg_basebackup 連入資料庫

2. 在 pg_hba.conf 編輯，允許待會全備份的連線

[aaa@localhost ~]$ vim /home/aaa/pgsql/9.5/data/pg_hba.conf

host    replication     aaa             127.0.0.1/32            trust

3. 重新啟動資料庫

[aaa@localhost ~]$ pg_ctl -D /home/aaa/pgsql/9.5/data/ restart

4. 使用 pg_basebackup 指令進行資料庫全備份，弄成壓縮檔

[aaa@localhost ~]$ mkdir /home/aaa/pgsql/total/
[aaa@localhost ~]$ pg_basebackup -p 5432 -Ft -z \
> -D /home/aaa/pgsql/total/
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived

5. 資料庫繼續進行服務：插入一些資料進去，停頓三十秒，並紀錄時間，如此重複三次。

[aaa@localhost ~]$ psql -p 5432 -d postgres -c "BEGIN;CREATE TABLE \
> dummy1 AS SELECT * FROM pg_class;COMMIT; \
> SELECT pg_sleep(30);SELECT now();"
             now            
-------------------------------
2016-05-10 16:10:01.719448+08
(1 row)

[aaa@localhost ~]$ psql -p 5432 -d postgres -c "BEGIN;CREATE TABLE \
> dummy2 AS SELECT * FROM pg_class;COMMIT; \
> SELECT pg_sleep(30);SELECT now();"
             now            
-------------------------------
2016-05-10 16:11:57.732774+08
(1 row)

[aaa@localhost ~]$ psql -p 5432 -d postgres -c "BEGIN;CREATE TABLE \
> dummy3 AS SELECT * FROM pg_class;COMMIT; \
> SELECT pg_sleep(30);SELECT now();"
             now            
-------------------------------
2016-05-10 16:14:14.526662+08
(1 row)

6. 模擬資料庫發生異常。使用 pg_ctl 的 immediate mode 會造成關檔不完全，用此進行模擬：

[aaa@localhost ~]$ psql -p 5432 postgres\ > -c "BEGIN;SELECT \
> pg_sleep(3);CREATE TABLE \
> dummy4 AS SELECT * FROM \
> pg_class; SELECT \
> pg_sleep(30);SELECT now(); \
> Commit;"
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction
 and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.
connection to server was lost

[aaa@localhost ~]$ # 左方過個四～五秒再執行強至關閉指令
[aaa@localhost ~]$ 
[aaa@localhost ~]$ pg_ctl -D \
> /home/aaa/pgsql/9.5/data/ \
> -m immediate stop
waiting for server to shut down.... done
server stopped

目前資料庫的 instance 已經有部份損毀。利用既有的內容完成回復作業

7. 先將突然中止的 /home/aaa/pgsql/9.5/data/ 重新命名，留作備用；接著解壓縮全備份

[aaa@localhost ~]$ mv /home/aaa/pgsql/9.5/data/ \
> /home/aaa/pgsql/9.5/data.crashed
[aaa@localhost ~]$ mkdir /home/aaa/pgsql/9.5/data/
[aaa@localhost ~]$ chmod 700 /home/aaa/pgsql/9.5/data/
[aaa@localhost ~]$ tar zxf /home/aaa/pgsql/total/base.tar.gz \
> -C /home/aaa/pgsql/9.5/data/

8. 移除目前 /home/aaa/pgsql/9.5/data/ 中被一併打包的 pg_xlog/ 內容以及運作標記的 lock：

[aaa@localhost ~]$ rm /home/aaa/pgsql/9.5/data/postmaster.*
[aaa@localhost ~]$ rm /home/aaa/pgsql/9.5/data/pg_xlog/*

9. 將/home/aaa/pgsql/9.5/data.crashed/ 裡面的 pg_xlog/ 內容放到等著回復的/home/aaa/pgsql/9.5/data/pg_xlog/裡面

[aaa@localhost ~]$ cp -rp /home/aaa/pgsql/9.5/data.crashed/pg_xlog/* \
/home/aaa/pgsql/9.5/data/pg_xlog/

10. 設置 recovery.conf，指定回復到先前 dummy2 表格建立後，dummy3 建立之前

[aaa@localhost ~]$ vim /home/aaa/pgsql/9.5/data/recovery.conf

restore_command = 'cp /home/cchsu/pgsql/wala95/%f %p'
recovery_target_time = '2016-05-10 16:11:59'

11. 啟動資料庫，並同時觀察狀況

[aaa@localhost ~]$ pg_ctl -D \
> /home/cchsu/pgsql/95stdby2/ \
> start
server starting
bash-4.3$ LOG:  redirecting log output to logging collector process
HINT:  Future log output will appear in directory "pg_log".

[aaa@localhost ~]$ tail -f \
> /home/aaa/pgsql/9.5/data/ \ > pg_log/postgresql-2016-05- \
10_170312.log
LOG:  starting point-in-time recovery to 2016-05-10 16:11:59+08
LOG:  restored log file "00000004.history" from archive
LOG:  restored log file "000000040000000000000028" from archive
LOG:  redo starts at 0/28000060
LOG:  consistent recovery state reached at 0/28000130
LOG:  database system is ready to accept read only connections
LOG:  restored log file "000000040000000000000029" from archive
LOG:  recovery stopping before commit of transaction 1449, time 2016-05-10 16:12:31.941565+08
LOG:  recovery has paused
HINT:  Execute pg_xlog_replay_resume() to continue.

12. 進入資料庫看

postgres=# select tablename from pg_tables
postgres-#  where schemaname = 'public'
postgres-#  order by tablename;
     tablename   
-------------------
 dummy1
 dummy2
(2 rows)
postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

由此確定資料庫僅回復到所指定的位置了

13. 確定回復目前狀態，正常啟用

postgres=# SELECT pg_xlog_replay_resume();
 pg_xlog_replay_resume
-----------------------
(1 row)

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
(1 row)

參考資料：

How to perform PostgreSQL point-in-time recovery

PostgreSQL Point-in-time Recovery (Incremental Backup)

PostgreSQL Point-in-time Recovery (Incremental Backup)

Point In Time Recovery From Backup using PostgreSQL Continuous Archving - Zetetic

PostgreSQL Point-in-time Recovery (using WAL) | Madilator

Point in time recovery: PITR using pg_basebackup with PostgreSQL 9.2

The Scale-Out Blog: Simple HA with PostgreSQL Point-In-Time Recovery

The Internals of PostgreSQL : Chapter 9 Write Ahead Log (WAL)

select * from depesz; » Blog Archive » Write Ahead Log + Understanding postgresql.conf: checkpoint_segments, checkpoint_timeout, checkpoint_warning

Korry Douglas-PostgreSQL - The Comprehensive Guide

渡鴉之丘

MathJax

MathJax-2

MathJax-3

Google Code Prettify

置頂入手筆記

EnterproseDB Quickstart — 快速入門筆記

2016年5月10日星期二

Point-in-Time Recovery 操作筆記

沒有留言:

張貼留言