MathJax

MathJax-2

MathJax-3

Google Code Prettify

置頂入手筆記

EnterproseDB Quickstart — 快速入門筆記

由於考慮採用 EnterpriseDB 或是直接用 PostgreSQL 的人,通常需要一些入手的資料。這邊紀錄便提供相關快速上手的簡單筆記 ~ 這篇筆記以 資料庫安裝完畢後的快速使用 為目標,基本紀錄登入使用的範例:

2019年9月1日 星期日

在 Postgres 用 PGroonga 作正體中文全文檢索簡單測試筆記

PGSQL 內建有不錯的全文檢索功能,而且支援多種語言。但是,內建的全文檢索功能沒有直接支援東亞的 CJK 語言。
中文的部份的全文檢索,目前看到的資訊,只有中國的社群開發的簡體中文相關外掛,但這不夠適合台灣的正體中文使用;這主要是兩國文化不同,適用中國的字典檔在切詞上不完全適用於台灣的關係。
不過,還有其他的方案,主要是由日本社群所採用的外掛,分別有 pg_bigm 以及 PGroonga 也支援 CJK 的全文檢索。
最近剛好看到,PGroonga 針對 PGDG YUM Repo 的版本有提供 RHEL 現成的套件(不好意思,我很懶惰,只想用現成的。。。),在台灣 PGroonga 這個套件有點少人提到,因此這邊就嘗試看看,看看可不可以用來作為正體中文的全文檢索工具。

先紀錄目前收集到 PGSQL vs CJK 的狀況
 - 日本的用戶用的是日本團隊開發的 pg_bigm,或是現在要測的 pgroonga 外掛
 - 韓國的用戶好像有一個標準的字典檔 textsearch_ko
 - 中國的用戶用 zhparserSCWS也有 pg_scws 這樣的外掛)或 pg_jieba 結巴,但裡面沒有正常的正體中文字典可以用
 - 台灣的用戶。。。目前沒有直接相關的資源。。。目前看到的,應該是拿中國的結巴模組,作簡繁轉換以及詞庫的調整改良而成,有看到 Jseg 以及 jieba-zh_TW: 結巴中文斷詞台灣繁體版本這兩款。但整合 PGSQL 的部份目前(2019/9)還沒有看到
 - 可以用的字典檔是中研院的專利,中研院中文斷詞系統,不過沒有碰過,不清楚狀況~~但~~在 2019 年八月已經釋出了下一版,而且是開源的函式庫 ckiplab/ckiptagger: CKIP Neural Chinese Word Segmentation, POS Tagging, and NER。有機會來試試
 - 自己做字典檔。。。不過也還不清楚怎麼進行(例如,如何用爬蟲收集關鍵字,以及 PostgreSQL 的字典檔格式等等問題),有機會再了解看看

PGroonga 是基於 Groonga 這個專案的 PostgreSQL 套件;Groonga 也支援其他的資料庫,例如 MySQL。詳細的相關介紹可以看這邊

接著就開始來試試~首先,在安裝好 PGDG 提供的原生 PGSQL 11 的主機上(其實是 LXC Container 裡面的服務~),安裝 Groonga 的 Repo(輸出內容略)
[root@pg1 ~]# rpm -qa | grep postgresql
postgresql11-server-11.1-1PGDG.rhel7.x86_64
postgresql11-libs-11.1-1PGDG.rhel7.x86_64
postgresql11-11.1-1PGDG.rhel7.x86_64
postgresql11-contrib-11.1-1PGDG.rhel7.x86_64
[root@pg1 ~]# 
[root@pg1 ~]# yum install -y https://packages.groonga.org/centos/groonga-release-latest.noarch.rpm

然後裡面就有 PGroonga 了
[root@pg1 ~]# yum search pgroonga
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
epel/x86_64/metalink                                                 |  15 kB  00:00:00     
 * base: centos4.zswap.net
 * epel: mirror.grid.uchicago.edu
 * extras: centos4.zswap.net
 * updates: centos.mirrors.hoobly.com
epel                                                                 | 5.4 kB  00:00:00     
groonga-centos                                                       | 2.9 kB  00:00:00     
(1/4): epel/x86_64/group_gz                                          |  88 kB  00:00:00     
(2/4): epel/x86_64/primary_db                                        | 6.8 MB  00:00:00     
(3/4): epel/x86_64/updateinfo                                        | 1.0 MB  00:00:00     
(4/4): groonga-centos/7/x86_64/primary_db                            | 612 kB  00:00:01     
================================== N/S matched: pgroonga ===================================
postgresql-pgroonga-debuginfo.x86_64 : Debug information for package postgresql-pgroonga
postgresql10-pgroonga-debuginfo.x86_64 : Debug information for package postgresql10-pgroonga
postgresql11-pgroonga-debuginfo.x86_64 : Debug information for package postgresql11-pgroonga
postgresql94-pgroonga-debuginfo.x86_64 : Debug information for package postgresql94-pgroonga
postgresql95-pgroonga-debuginfo.x86_64 : Debug information for package postgresql95-pgroonga
postgresql96-pgroonga-debuginfo.x86_64 : Debug information for package postgresql96-pgroonga
postgresql-pgroonga.x86_64 : Fast full-text search plugin for PostgreSQL based on Groonga
postgresql10-pgroonga.x86_64 : Fast full-text search plugin for PostgreSQL based on Groonga
postgresql11-pgroonga.x86_64 : Fast full-text search plugin for PostgreSQL based on Groonga
postgresql94-pgroonga.x86_64 : Fast full-text search plugin for PostgreSQL based on Groonga
postgresql95-pgroonga.x86_64 : Fast full-text search plugin for PostgreSQL based on Groonga
postgresql96-pgroonga.x86_64 : Fast full-text search plugin for PostgreSQL based on Groonga

  Name and summary matches only, use "search all" for everything.
[root@pg1 ~]# 

由於我這邊裝的是 PGSQL 11,因此要裝相應套件 postgresql11-pgroonga
[root@pg1 ~]# yum install -y postgresql11-pgroonga
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: centos4.zswap.net
 * epel: mirror.steadfastnet.com
 * extras: centos4.zswap.net
 * updates: centos.mirrors.hoobly.com
Resolving Dependencies
--> Running transaction check
---> Package postgresql11-pgroonga.x86_64 0:2.2.1-1.el7 will be installed
--> Processing Dependency: groonga-libs >= 9.0.4 for package: postgresql11-pgroonga-2.2.1-1.el7.x86_64
--> Processing Dependency: msgpack for package: postgresql11-pgroonga-2.2.1-1.el7.x86_64
--> Processing Dependency: libmsgpackc.so.2()(64bit) for package: postgresql11-pgroonga-2.2.1-1.el7.x86_64
--> Processing Dependency: libgroonga.so.0()(64bit) for package: postgresql11-pgroonga-2.2.1-1.el7.x86_64
--> Running transaction check
---> Package groonga-libs.x86_64 0:9.0.7-1.el7 will be installed
--> Processing Dependency: libzstd for package: groonga-libs-9.0.7-1.el7.x86_64
--> Processing Dependency: libzstd.so.1()(64bit) for package: groonga-libs-9.0.7-1.el7.x86_64
---> Package msgpack.x86_64 0:1.4.1-1.el7 will be installed
--> Running transaction check
---> Package libzstd.x86_64 0:1.4.2-1.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

============================================================================================
 Package                      Arch          Version             Repository             Size
============================================================================================
Installing:
 postgresql11-pgroonga        x86_64        2.2.1-1.el7         groonga-centos        420 k
Installing for dependencies:
 groonga-libs                 x86_64        9.0.7-1.el7         groonga-centos        2.0 M
 libzstd                      x86_64        1.4.2-1.el7         epel                  252 k
 msgpack                      x86_64        1.4.1-1.el7         epel                   23 k

Transaction Summary
============================================================================================
Install  1 Package (+3 Dependent packages)

Total download size: 2.7 M
Installed size: 16 M
Downloading packages:
warning: /var/cache/yum/x86_64/7/epel/packages/libzstd-1.4.2-1.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 352c64e5: NOKEY
Public key for libzstd-1.4.2-1.el7.x86_64.rpm is not installed
(1/4): libzstd-1.4.2-1.el7.x86_64.rpm                                | 252 kB  00:00:00     
(2/4): msgpack-1.4.1-1.el7.x86_64.rpm                                |  23 kB  00:00:00     
warning: /var/cache/yum/x86_64/7/groonga-centos/packages/postgresql11-pgroonga-2.2.1-1.el7.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID 45499429: NOKEY
Public key for postgresql11-pgroonga-2.2.1-1.el7.x86_64.rpm is not installed
(3/4): postgresql11-pgroonga-2.2.1-1.el7.x86_64.rpm                  | 420 kB  00:00:01     
(4/4): groonga-libs-9.0.7-1.el7.x86_64.rpm                           | 2.0 MB  00:00:02     
--------------------------------------------------------------------------------------------
Total                                                       1.3 MB/s | 2.7 MB  00:00:02     
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-groonga
Importing GPG key 0x45499429:
 Userid     : "groonga Key (groonga Official Signing Key) "
 Fingerprint: c97e 4649 a205 1d0c ea1a 73f9 72a7 496b 4549 9429
 Package    : groonga-release-1.5.1-1.noarch (installed)
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-groonga
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-groonga-RSA4096
Importing GPG key 0x34839225:
 Userid     : "Groonga Key (Groonga Official Signing Key) "
 Fingerprint: 2701 f317 cfcc cb97 5cad e9c2 624c f774 3483 9225
 Package    : groonga-release-1.5.1-1.noarch (installed)
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-groonga-RSA4096
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
Importing GPG key 0x352C64E5:
 Userid     : "Fedora EPEL (7) "
 Fingerprint: 91e9 7d7c 4a5e 96f1 7f3e 888f 6a2f aea2 352c 64e5
 Package    : epel-release-7-11.noarch (@extras)
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : msgpack-1.4.1-1.el7.x86_64                                               1/4 
  Installing : libzstd-1.4.2-1.el7.x86_64                                               2/4 
  Installing : groonga-libs-9.0.7-1.el7.x86_64                                          3/4 
  Installing : postgresql11-pgroonga-2.2.1-1.el7.x86_64                                 4/4 
  Verifying  : groonga-libs-9.0.7-1.el7.x86_64                                          1/4 
  Verifying  : libzstd-1.4.2-1.el7.x86_64                                               2/4 
  Verifying  : postgresql11-pgroonga-2.2.1-1.el7.x86_64                                 3/4 
  Verifying  : msgpack-1.4.1-1.el7.x86_64                                               4/4 

Installed:
  postgresql11-pgroonga.x86_64 0:2.2.1-1.el7                                                

Dependency Installed:
  groonga-libs.x86_64 0:9.0.7-1.el7               libzstd.x86_64 0:1.4.2-1.el7              
  msgpack.x86_64 0:1.4.1-1.el7                   

Complete!
[root@pg1 ~]# 

然後就可以確認套件了
postgres=# select name from pg_available_extensions where name like 'p%';
        name        
--------------------
 plpgsql
 pg_trgm
 pgstattuple
 postgres_fdw
 pg_visibility
 pgcrypto
 pgrowlocks
 pageinspect
 pg_buffercache
 pg_stat_statements
 pg_freespacemap
 pg_prewarm
 pgroonga_database
 pgroonga
(14 rows)

postgres=# 

接著就啟用外掛,向其他的 PGSQL 外掛一樣~
postgres=# create extension pgroonga;
ERROR:  could not load library "/usr/pgsql-11/lib/pgroonga.so": /usr/pgsql-11/lib/pgroonga.so: undefined symbol: getmissingattr
postgres=# 

ㄟ~怎麼失敗惹。。。。搜尋一下網路,其實可以發現,是因為我太久沒更新 PostgreSQL 的小版本的關係 could not load library "/usr/pgsql-11/lib/timescaledb.so": /usr/pgsql-11/lib/timescaledb.so: undefined symbol: getmissingattr · Issue #1161 · timescale/timescaledb

因此補一下更新,再回頭啟用套件就成功了~
postgres=# \q
-bash-4.2$ exit
logout
[root@pg1 ~]# yum update postgresql11
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: centos4.zswap.net
 * epel: mirror.grid.uchicago.edu
 * extras: centos4.zswap.net
 * updates: centos.mirrors.hoobly.com
Resolving Dependencies
--> Running transaction check
---> Package postgresql11.x86_64 0:11.1-1PGDG.rhel7 will be updated
--> Processing Dependency: postgresql11(x86-64) = 11.1-1PGDG.rhel7 for package: postgresql11-contrib-11.1-1PGDG.rhel7.x86_64
--> Processing Dependency: postgresql11(x86-64) = 11.1-1PGDG.rhel7 for package: postgresql11-server-11.1-1PGDG.rhel7.x86_64
---> Package postgresql11.x86_64 0:11.5-1PGDG.rhel7 will be an update
--> Processing Dependency: postgresql11-libs(x86-64) = 11.5-1PGDG.rhel7 for package: postgresql11-11.5-1PGDG.rhel7.x86_64
--> Running transaction check
---> Package postgresql11-contrib.x86_64 0:11.1-1PGDG.rhel7 will be updated
---> Package postgresql11-contrib.x86_64 0:11.5-1PGDG.rhel7 will be an update
---> Package postgresql11-libs.x86_64 0:11.1-1PGDG.rhel7 will be updated
---> Package postgresql11-libs.x86_64 0:11.5-1PGDG.rhel7 will be an update
---> Package postgresql11-server.x86_64 0:11.1-1PGDG.rhel7 will be updated
---> Package postgresql11-server.x86_64 0:11.5-1PGDG.rhel7 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

============================================================================================
 Package                      Arch           Version                   Repository      Size
============================================================================================
Updating:
 postgresql11                 x86_64         11.5-1PGDG.rhel7          pgdg11         1.6 M
Updating for dependencies:
 postgresql11-contrib         x86_64         11.5-1PGDG.rhel7          pgdg11         617 k
 postgresql11-libs            x86_64         11.5-1PGDG.rhel7          pgdg11         361 k
 postgresql11-server          x86_64         11.5-1PGDG.rhel7          pgdg11         4.7 M

Transaction Summary
============================================================================================
Upgrade  1 Package (+3 Dependent packages)

Total download size: 7.3 M
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
(1/4): postgresql11-contrib-11.5-1PGDG.rhel7.x86_64.rpm              | 617 kB  00:00:00     
(2/4): postgresql11-11.5-1PGDG.rhel7.x86_64.rpm                      | 1.6 MB  00:00:00     
(3/4): postgresql11-libs-11.5-1PGDG.rhel7.x86_64.rpm                 | 361 kB  00:00:00     
(4/4): postgresql11-server-11.5-1PGDG.rhel7.x86_64.rpm               | 4.7 MB  00:00:00     
--------------------------------------------------------------------------------------------
Total                                                       8.6 MB/s | 7.3 MB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Updating   : postgresql11-libs-11.5-1PGDG.rhel7.x86_64                                1/8 
  Updating   : postgresql11-11.5-1PGDG.rhel7.x86_64                                     2/8 
  Updating   : postgresql11-server-11.5-1PGDG.rhel7.x86_64                              3/8 
  Updating   : postgresql11-contrib-11.5-1PGDG.rhel7.x86_64                             4/8 
  Cleanup    : postgresql11-contrib-11.1-1PGDG.rhel7.x86_64                             5/8 
  Cleanup    : postgresql11-server-11.1-1PGDG.rhel7.x86_64                              6/8 
  Cleanup    : postgresql11-11.1-1PGDG.rhel7.x86_64                                     7/8 
  Cleanup    : postgresql11-libs-11.1-1PGDG.rhel7.x86_64                                8/8 
  Verifying  : postgresql11-server-11.5-1PGDG.rhel7.x86_64                              1/8 
  Verifying  : postgresql11-contrib-11.5-1PGDG.rhel7.x86_64                             2/8 
  Verifying  : postgresql11-libs-11.5-1PGDG.rhel7.x86_64                                3/8 
  Verifying  : postgresql11-11.5-1PGDG.rhel7.x86_64                                     4/8 
  Verifying  : postgresql11-contrib-11.1-1PGDG.rhel7.x86_64                             5/8 
  Verifying  : postgresql11-server-11.1-1PGDG.rhel7.x86_64                              6/8 
  Verifying  : postgresql11-libs-11.1-1PGDG.rhel7.x86_64                                7/8 
  Verifying  : postgresql11-11.1-1PGDG.rhel7.x86_64                                     8/8 

Updated:
  postgresql11.x86_64 0:11.5-1PGDG.rhel7                                                    

Dependency Updated:
  postgresql11-contrib.x86_64 0:11.5-1PGDG.rhel7                                            
  postgresql11-libs.x86_64 0:11.5-1PGDG.rhel7                                               
  postgresql11-server.x86_64 0:11.5-1PGDG.rhel7                                             

Complete!
[root@pg1 ~]# service postgresql-11 restart
Redirecting to /bin/systemctl restart postgresql-11.service
[root@pg1 ~]# su - postgres
Last login: Fri Aug 30 15:35:20 UTC 2019 on lxc/console
-bash-4.2$ psql
psql (11.5)
Type "help" for help.

postgres=# create extension pgroonga;
CREATE EXTENSION
postgres=# 

接著就是裝一點測試資料了~
postgres=# CREATE TABLE memos (
 id integer,
 content text );
CREATE TABLE
postgres=# insert into memos values 
(1, '我家門前有小河,'),
(2, '後面有山坡'),
(3, '山坡上面野花多,'),
(4, '野花紅似火'),
(5, '小河裡有白鵝,'),
(6, '鵝兒戲綠波'),
(7, '戲弄綠波鵝兒快樂,'),
(8, '昂首唱清歌');
INSERT 0 8
postgres=# 

要查詢的話,可以參考手冊教學 Tutorial | PGroonga,裡面有列舉可以用的運算子,基本上是新增 &@ 這個比較運算,跟內建的 Full Text Search 運算子不太一樣。這個運算子在啟用外掛才能使用,因此沒有裝 PGroonga 的資料庫就不會有~~
postgres=# SELECT * FROM memos WHERE content &@ '山坡';
 id |         content          
----+--------------------------
  2 | 後面有山坡
  3 | 山坡上面野花多,
(2 rows)

postgres=# 

不過除了直接查詢之外,也可以直接以 pgroonga 的功能,建立 index:這一點跟內建的全文檢索操作,其實是不太一樣的。(內建的全文檢索是針對字串產生 tsvector,裡面會依據字典去切割出字串與出現次數的紀錄)
postgres=# CREATE INDEX ON memos USING pgroonga (content);
CREATE INDEX
postgres=# 
postgres=# explain SELECT * FROM memos WHERE content &@ '山坡';
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Index Scan using memos_content_idx on memos  (cost=0.00..4.04 rows=1 width=36)
   Index Cond: (content &@ '山坡'::text)
(2 rows)

postgres=# 

簡易測試就到這邊~以後有機會再仔細的找真實一點的資料來用用看。

參考資料
只是把查到的都紀錄起來而已








沒有留言:

張貼留言