最近測試 EDB 資料庫搭配 Barman 備份功能,以為怪怪的又讓我挖到 bug
就努力試了一下 Python debug 工具追蹤一下 Barman 其中步驟的運作
在仔細對照釐清後,看起來是我眼睛業障重~
這篇筆記紀錄一下下這個小旅程。
在設定 Barman 的過程中,會設定一個「非 superuser」的備份帳號
具體設定可以在手冊 Barman Manual: Preliminary steps - PostgreSQL connection 看到
以下列一點權限檢查方式
[enterprisedb@edb16p ~]$ psql -U bkpuser Null display is "(NULL)". Timing is on. psql (16.2.0) Type "help" for help. [[local]] edb=> \du bkpuser List of roles Role name | Attributes -----------+----------------- bkpuser | Replication + | Profile default [[local]] edb=> [[local]] edb=> \drg List of role grants Role name | Member of | Options | Grantor -----------+----------------------+--------------+-------------- bkpuser | pg_checkpoint | INHERIT, SET | enterprisedb bkpuser | pg_monitor | INHERIT, SET | enterprisedb bkpuser | pg_read_all_settings | INHERIT, SET | enterprisedb bkpuser | pg_read_all_stats | INHERIT, SET | enterprisedb dbzuser | pg_read_all_data | INHERIT, SET | enterprisedb efm | pg_read_all_settings | INHERIT, SET | enterprisedb efm | pg_read_all_stats | INHERIT, SET | enterprisedb (7 rows) [[local]] edb=> select proname,proacl from pg_proc where 'bkpuser=X' = any(proacl); WARNING: defaulting grantor to user ID 10 LINE 1: select proname,proacl from pg_proc where 'bkpuser=X' = any(p... ^ proname | proacl -------------------------+------------------------------------------------------ pg_backup_start | {enterprisedb=X/enterprisedb,bkpuser=X/enterprisedb} pg_backup_stop | {enterprisedb=X/enterprisedb,bkpuser=X/enterprisedb} pg_switch_wal | {enterprisedb=X/enterprisedb,bkpuser=X/enterprisedb} pg_create_restore_point | {enterprisedb=X/enterprisedb,bkpuser=X/enterprisedb} (4 rows) Time: 2.915 ms [[local]] edb=>
搭配著設定檔之後,就要執行檢查。
在排除一些東忘西忘的調整(pg_hba.conf、帳密、參數、以及 barman 本身的設定檔之後)還是有一個不能通過的狀況。
[barman@edb-pem-server ~]$ barman check bkp_testenv Server bkp_testenv: PostgreSQL: OK no access to backup functions: FAILED (privileges for PostgreSQL backup functions are required (see documentation)) PostgreSQL streaming: OK wal_level: OK PostgreSQL server is standby: OK Primary server is not a standby: OK Primary and standby have same system ID: OK has monitoring privileges (WAL streaming): OK PostgreSQL streaming (WAL streaming): OK wal_level (WAL streaming): OK systemid coherence (WAL streaming): OK (no system Id stored on disk) replication slot (WAL streaming): OK directories: OK retention policy settings: OK backup maximum age: OK (no last_backup_maximum_age provided) backup minimum size: OK (0 B) wal maximum age: OK (no last_wal_maximum_age provided) wal size: OK (0 B) compression settings: OK failed backups: OK (there are 0 failed backups) minimum redundancy requirements: OK (have 0 backups, expected at least 0) pg_basebackup: OK pg_basebackup compatible: OK pg_basebackup supports tablespaces mapping: OK systemid coherence: OK (no system Id stored on disk) pg_receivexlog: OK pg_receivexlog compatible: OK receive-wal running: OK archive_mode: OK archive_command: OK continuous archiving: OK archiver errors: OK [barman@edb-pem-server ~]$
當然上面這個錯誤,將 Barman 指定成 superuser 一定會歐趴,但實際上不會這麼使用。
一定得找出原因。
在反覆檢查,甚至用關鍵字在 Barman 程式碼搜尋,定位了幾個程式碼位置(這邊測試的是 Barman 3.10.0,所以直接用這個版本為主)
直接搜尋錯誤訊息 "no access to backup functions",會定位出barman/server.py#L766 這邊有一個檢查段落從 dict 裡面抓 "has_backup_privileges" 這個 key
在程式碼用 grep -rl 抓出包含這個 key "has_backup_privileges" 的可能檔案,會發現在 barman/postgres.py#L976 這邊會給出值;而這個判斷的 method 出現在同一份檔案的 barman/postgres.py#L563 PostgreSQLConnection.has_backup_privileges(self) 裡面
仔細看一下,看得到它湊的 SQL 在查物件呼叫權限。
由於上面從 psql 裡面「千真萬確」看到我的權限有設定好,因此我天真的以為或許是 Python 的 driver psycopg2 的 fetchone() 呼叫過程中取值的問題(psycopg2.cursor.fetchone() 回傳的是 tuple datatype,這邊判斷的是 Python 的 true/false)
因此努力找一下,使用 python -m 的方式直接進入 debug mode
以下先定位(breakpoint)出第一個取 dict 檢查的部份,查看它的 true/false 判斷,結果竟然是 False
(Note:barman 的 timeout 機制會踢掉動作,因此有以下灰色部份自動跳出,這時要重新執行)
[barman@edb-pem-server ~]$ python3 -m pdb /usr/bin/barman check bkp_testenv > /usr/bin/barman(3)<module>() -> __requires__ = 'barman==3.10.0' (Pdb) break barman/server.py:766 Breakpoint 1 at /usr/lib/python3.6/site-packages/barman/server.py:766 (Pdb) where /usr/lib64/python3.6/bdb.py(434)run() -> exec(cmd, globals, locals) <string>(1)<module>() > /usr/bin/barman(3)<module>() -> __requires__ = 'barman==3.10.0' (Pdb) continue Server bkp_testenv: PostgreSQL: OK > /usr/lib/python3.6/site-packages/barman/server.py(766)check_postgres() -> if remote_status.get("has_backup_privileges"): (Pdb) check timeout: FAILED (barman check command timed out) The program exited via sys.exit(). Exit status: 1 > /usr/bin/barman(3)<module>() -> __requires__ = 'barman==3.10.0' (Pdb) c Server bkp_testenv: PostgreSQL: OK > /usr/lib/python3.6/site-packages/barman/server.py(766)check_postgres() -> if remote_status.get("has_backup_privileges"): (Pdb) where /usr/lib64/python3.6/bdb.py(434)run() -> exec(cmd, globals, locals) <string>(1)<module>() /usr/bin/barman(11)<module>() -> load_entry_point('barman==3.10.0', 'console_scripts', 'barman')() /usr/lib/python3.6/site-packages/barman/cli.py(2390)main() -> args.func(args) /usr/lib/python3.6/site-packages/barman/cli.py(1225)check() -> server.check() /usr/lib/python3.6/site-packages/barman/server.py(600)check() -> self.check_postgres(check_strategy) > /usr/lib/python3.6/site-packages/barman/server.py(766)check_postgres() -> if remote_status.get("has_backup_privileges"): (Pdb) p remote_status.get("has_backup_privileges") False (Pdb) check timeout: FAILED (barman check command timed out) The program exited via sys.exit(). Exit status: 1 > /usr/bin/barman(3)<module>() -> __requires__ = 'barman==3.10.0' (Pdb) where /usr/lib64/python3.6/bdb.py(434)run() -> exec(cmd, globals, locals) <string>(1)<module>() > /usr/bin/barman(3)<module>() -> __requires__ = 'barman==3.10.0' (Pdb) exit() [barman@edb-pem-server ~]$
在以為我要多吞幾顆葉黃素的時候,我不死心繼續往來源 function 的回傳位置找
底下除了找回傳值,由於查詢的 SQL 是依著不同版號的資料庫判斷生成的,也把 SQL 抓出來丟進去 psql
[barman@edb-pem-server ~]$ python3 -m pdb /usr/bin/barman check bkp_testenv > /usr/bin/barman(3)<module>() -> __requires__ = 'barman==3.10.0' (Pdb) break barman/postgres.py:633 Breakpoint 1 at /usr/lib/python3.6/site-packages/barman/postgres.py:633 (Pdb) continue Server bkp_testenv: > /usr/lib/python3.6/site-packages/barman/postgres.py(633)has_backup_privileges() -> return cur.fetchone()[0] (Pdb) p cur.fetchone()[0] False (Pdb) p backup_check_query "\n SELECT\n usesuper\n OR\n (\n (\n pg_has_role(CURRENT_USER, 'pg_monitor', 'MEMBER')\n OR\n (\n pg_has_role(CURRENT_USER, 'pg_read_all_settings', 'MEMBER')\n AND pg_has_role(CURRENT_USER, 'pg_read_all_stats', 'MEMBER')\n )\n )\n AND\n (\n has_function_privilege(CURRENT_USER, 'pg_backup_start(text,bool)', 'EXECUTE')\n )\n AND\n (\n has_function_privilege(CURRENT_USER, 'pg_backup_stop(bool)', 'EXECUTE')\n )\n AND has_function_privilege(\n CURRENT_USER, 'pg_switch_wal()', 'EXECUTE')\n AND has_function_privilege(\n CURRENT_USER, 'pg_create_restore_point(text)', 'EXECUTE')\n )\n FROM\n pg_user\n WHERE\n usename = CURRENT_USER\n " (Pdb) check timeout: FAILED (barman check command timed out) The program exited via sys.exit(). Exit status: 1 > /usr/bin/barman(3)<module>() -> __requires__ = 'barman==3.10.0' (Pdb)
在 psql 裡面,把那句 SQL 的 \n 拿掉,丟進去查看看。。。
但。。。怎麼查出來是 true。。。。。。。。。。。。。。。?????????
[enterprisedb@edb16p ~]$ psql -U bkpuser Null display is "(NULL)". Timing is on. psql (16.2.0) Type "help" for help. [[local]] edb=> SELECT [[local]] edb-> usesuper [[local]] edb-> OR [[local]] edb-> ( [[local]] edb(> ( [[local]] edb(> pg_has_role(CURRENT_USER, 'pg_monitor', 'MEMBER') [[local]] edb(> OR [[local]] edb(> ( [[local]] edb(> pg_has_role(CURRENT_USER, 'pg_read_all_settings', 'MEMBER') [[local]] edb(> AND pg_has_role(CURRENT_USER, 'pg_read_all_stats', 'MEMBER') [[local]] edb(> ) [[local]] edb(> ) [[local]] edb(> AND [[local]] edb(> ( [[local]] edb(> has_function_privilege(CURRENT_USER, 'pg_backup_start(text,bool)', 'EXECUTE') [[local]] edb(> ) [[local]] edb(> AND [[local]] edb(> ( [[local]] edb(> has_function_privilege(CURRENT_USER, 'pg_backup_stop(bool)', 'EXECUTE') [[local]] edb(> ) [[local]] edb(> AND has_function_privilege( [[local]] edb(> CURRENT_USER, 'pg_switch_wal()', 'EXECUTE') [[local]] edb(> AND has_function_privilege( [[local]] edb(> CURRENT_USER, 'pg_create_restore_point(text)', 'EXECUTE') [[local]] edb(> ) [[local]] edb-> FROM [[local]] edb-> pg_user [[local]] edb-> WHERE [[local]] edb-> usename = CURRENT_USER [[local]] edb-> ; ?column? ---------- t (1 row) Time: 3.404 ms [[local]] edb=>
最後。。。只好試試看看我的連線跑到哪邊了
我這邊的測試有準備 primary/standby 的結構,並搭配 Barman 的 model 功能做測試
另外擔心會不會連接到隔壁台
[barman@edb-pem-server ~]$ python3 -m pdb /usr/bin/barman check bkp_testenv > /usr/bin/barman(3)<module>() -> __requires__ = 'barman==3.10.0' (Pdb) break barman/postgres.py:633 Breakpoint 1 at /usr/lib/python3.6/site-packages/barman/postgres.py:633 (Pdb) continue Server bkp_testenv: > /usr/lib/python3.6/site-packages/barman/postgres.py(633)has_backup_privileges() -> return cur.fetchone()[0] (Pdb) self._conn.get_dsn_parameters() {'user': 'bkpuser', 'passfile': '/var/lib/barman/.pgpass', 'channel_binding': 'prefer', 'dbname': 'postgres', 'host': 'edb16s', 'port': '5444', 'options': '', 'sslmode': 'prefer', 'sslcompression': '0', 'sslcertmode': 'allow', 'sslsni': '1', 'ssl_min_protocol_version': 'TLSv1.2', 'gssencmode': 'prefer', 'krbsrvname': 'postgres', 'gssdelegation': '0', 'target_session_attrs': 'any', 'load_balance_hosts': 'disable'} (Pdb)
結果。。。。。。。。。上面看到。。。。。。。我在 barman 連線的 dbname(填在 conninfo,完整參數要看手冊的 man5 頁面)我好像填的是 postgres
但我。。好像 GRANT 是在 edb 執行的。。。。。。。
檢查一下。。真的沒錯~
[enterprisedb@edb16s ~]$ psql -U bkpuser Null display is "(NULL)". Timing is on. psql (16.2.0) Type "help" for help. [[local]] edb=> SELECT usesuper OR ( ( pg_has_role(CURRENT_USER, 'pg_monitor', 'MEMBER') OR ( pg_has_role(CURRENT_USER, 'pg_read_all_settings', 'MEMBER') AND pg_has_role(CURRENT_USER, 'pg_read_all_stats', 'MEMBER') ) ) AND ( has_function_privilege(CURRENT_USER, 'pg_backup_start(text,bool)', 'EXECUTE') ) AND ( has_function_privilege(CURRENT_USER, 'pg_backup_stop(bool)', 'EXECUTE') ) AND has_function_privilege( CURRENT_USER, 'pg_switch_wal()', 'EXECUTE') AND has_function_privilege( CURRENT_USER, 'pg_create_restore_point(text)', 'EXECUTE') ) FROM pg_user WHERE usename = CURRENT_USER; ?column? ---------- t (1 row) Time: 3.125 ms [[local]] edb=> \c postgres You are now connected to database "postgres" as user "bkpuser". [[local]] postgres=> SELECT usesuper OR ( ( pg_has_role(CURRENT_USER, 'pg_monitor', 'MEMBER') OR ( pg_has_role(CURRENT_USER, 'pg_read_all_settings', 'MEMBER') AND pg_has_role(CURRENT_USER, 'pg_read_all_stats', 'MEMBER') ) ) AND ( has_function_privilege(CURRENT_USER, 'pg_backup_start(text,bool)', 'EXECUTE') ) AND ( has_function_privilege(CURRENT_USER, 'pg_backup_stop(bool)', 'EXECUTE') ) AND has_function_privilege( CURRENT_USER, 'pg_switch_wal()', 'EXECUTE') AND has_function_privilege( CURRENT_USER, 'pg_create_restore_point(text)', 'EXECUTE') ) FROM pg_user WHERE usename = CURRENT_USER; ?column? ---------- f (1 row) Time: 2.960 ms [[local]] postgres=>
所以釐清原因:function grant 所在的 database 跟 barman 設定檔的連線字串 database 不同,所以沒看到。
不過順帶一提,group 的權限可以跨 database 所以沒差,只有 function 會因為不同 database 而影響(找個安慰自己的托詞)
這個告訴我自己:
就算手冊放在那邊,還是得多加測試做好準備
barman 這個 non-superuser 功能是正常的
python -m pdb 可以用來 debug,不需要手工安插讓人尷尬的一句程式碼 import pdb; pdb.set_trace(); 就能夠開啟 debug 功能
該吃葉黃素或是胡蘿蔔素惹嗎?
參考資料
How to step through Python code to help debug issues? - Stack Overflow
what is the difference between "next" and "until" in pdb - Stack Overflow
pdb — The Python Debugger — Python 3.12.3 documentation
Python Debugger (pdb): Navigating through multi-module code using pdb - Stack Overflow
Python's many command-line utilities - Python Morsels
沒有留言:
張貼留言