1) UUID 以英數字表示時,是一個 36 個字元的無序編號(還是可以用 hex code 方式排序,但沒什麼特殊意義)。若採用 bit 的方式儲存,則是一個 16 byte 長的資料。光從資料長度,就可以看到讀取上節約的差距。
因此可以的話,盡量選擇對應的 datatype,而非用字串代替。
2) 使用 UUID 只會有等號比對,也不太可能有其他比對方式(例如大於小於或是 LIKE 比對等),因此搭配 hash index 是不錯的選項。
當然如果得要求 primary key 在 PGSQL 還是沒得選,只能用預設的 btree index。
3) UUID 有好幾代,通常大家知道的就是隨機號碼(第四代)。
不過 UUID 其實也有 hash 計算的版本(第三、五代),所以可以當作 md5 hash 用。
用 uuid 版本的 md5(嚴格說來應該不完全是 md5,反正就是 v3/v5 的 hash),或是直接把 md5 轉成 uuid。
以下試一下 v3 的使用,注意第一個參數有規則,不能亂塞,只能用指定的內建函數(原理是什麼倒沒仔細去了解。。就名稱而言叫做 UUID namespace)。以下示範相同內容產生的 UUID v3 不會變動
edb=# create table testhash(ky uuid,str text); CREATE TABLE edb=# create index on testhash using hash(ky); CREATE INDEX edb=# create extension uuid-ossp; ERROR: syntax error at or near "-" LINE 1: create extension uuid-ossp; ^ edb=# create extension "uuid-ossp"; CREATE EXTENSION edb=# insert into testhash values (uuid_generate_v3 ('xx','http://www.postgresql.org'),'http://www.postgresql.org'); ERROR: invalid input syntax for type uuid: "xx" LINE 1: insert into testhash values (uuid_generate_v3 ('xx','http://... ^ edb=# edb=# insert into testhash values (uuid_generate_v3 (uuid_nil(),'http://www.postgresql.org'),'http://www.postgresql.org'); INSERT 0 1 edb=# insert into testhash values (uuid_generate_v3 (uuid_nil(),'http://www.postgresql.org'),'http://www.postgresql.org'); INSERT 0 1 edb=# select * from testhash ; ky | str --------------------------------------+--------------------------- 15615b44-e5e8-3fea-b573-696f6a2b061e | http://www.postgresql.org 15615b44-e5e8-3fea-b573-696f6a2b061e | http://www.postgresql.org (2 rows) edb=#
不過以上的結果,會和 md5 計算值直接以 uuid 格式儲存的結果不一樣(廢話~)。
edb=# insert into testhash values (md5 ('http://www.postgresql.org')::uuid,'http://www.postgresql.org'); INSERT 0 1 edb=# select * from testhash ; ky | str --------------------------------------+--------------------------- 15615b44-e5e8-3fea-b573-696f6a2b061e | http://www.postgresql.org 15615b44-e5e8-3fea-b573-696f6a2b061e | http://www.postgresql.org cf03e637-1c82-4c5f-ba23-1c9cc83d629d | http://www.postgresql.org (3 rows) edb=#
另外也有機會在程式碼端預先處理 UUID 產生:基本上就是找些對應的 library 處理。
這些方式都可以產生一些可重現的內容。
4) 搭配 hash index 就有機會處理資料比對加速:資料可以拆解,用 join 形式呈現:當然這個 JOIN 通常不太自然,這就看大家程式規劃跟開發技能了。
一個很簡化的構想 — 使用 UUID 同時配合真實資料的比對 — 可以考慮 covering index 配合真實資料來加速:建立真實資料的 index 並帶 UUID 作為 include 內容。
以下模擬,data_id 是通常的查詢條件,ky 是對應的 uuid,相關表格之間以 ky 串起來。
create table table_mst(data_id int, ky uuid); create table table_dtl(ky uuid, dsc text); create unique index on table_mst using btree(ky); create unique index on table_mst using btree(data_id) include (ky); ALTER TABLE table_dtl ADD CONSTRAINT fk_dtl_mst_ky FOREIGN KEY (ky) REFERENCES table_mst(ky); create index on table_dtl using hash(ky); insert into table_mst select g, uuid_generate_v4() from generate_series(1,10000) g; insert into table_dtl select ky, 'detail data '||data_id from table_mst; -- 若是單一筆,則可以用 insert returning 取得 ky 值以往下套用,並套在相同 Tx 內。 analyze table_mst; analyze table_dtl;
edb=# explain select d.dsc from table_mst m join table_dtl d on m.ky = d.ky and m.data_id = 4; QUERY PLAN -------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.29..12.33 rows=1 width=16) -> Index Only Scan using table_mst_data_id_ky_idx on table_mst m (cost=0.29..4.30 rows=1 width=16) Index Cond: (data_id = 4) -> Index Scan using table_dtl_ky_idx on table_dtl d (cost=0.00..8.02 rows=1 width=32) Index Cond: (ky = m.ky) (5 rows) edb=# explain analyze select d.dsc from table_mst m join table_dtl d on m.ky = d.ky and m.data_id = 4; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.29..12.33 rows=1 width=16) (actual time=0.043..0.046 rows=1 loops=1) -> Index Only Scan using table_mst_data_id_ky_idx on table_mst m (cost=0.29..4.30 rows=1 width=16) (actual time=0.014..0.015 rows=1 loops=1) Index Cond: (data_id = 4) Heap Fetches: 0 -> Index Scan using table_dtl_ky_idx on table_dtl d (cost=0.00..8.02 rows=1 width=32) (actual time=0.025..0.026 rows=1 loops=1) Index Cond: (ky = m.ky) Planning Time: 0.253 ms Execution Time: 0.086 ms (8 rows) edb=#
5) UUID 除了用 hex 跟連字號表示的格式,還有以 base64 的格式,在 PGSQL 可以作轉換。
uuid 轉 base64:因為 UUID 還沒支援直接轉成 bytea,有看到一個還沒紀錄到手冊的函數 uuid_send() 搭配著用以下方法完全來自這邊 postgresql - How to short UUIDs with Postgres? - Database Administrators Stack Exchange 的回答,箇中奧秘還不太瞭~
edb=# select encode(uuid_send(uuid_generate_v3 (uuid_nil(), 'http://www.postgresql.org')),'base64'); encode -------------------------- FWFbROXoP+q1c2lvaisGHg== (1 row) edb=#base64 轉 uuid:使用相同範例
edb=# select substring(decode('FWFbROXoP+q1c2lvaisGHg==', 'base64')::text from 3)::uuid;
substring
--------------------------------------
15615b44-e5e8-3fea-b573-696f6a2b061e
(1 row)
edb=# -- 記得跟上面 3) 內容比對一下
目前想到&找到的大概是這些~
參考資料
PostgreSQL: Documentation: 72.1. Overview
PostgreSQL: Documentation: F.49. uuid-ossp
How should I index a UUID in Postgres? - Database Administrators Stack Exchange
Generating v5 UUID. What is name and namespace? - Stack Overflow
Re-Introducing Hash Indexes in PostgreSQL | Haki Benita
UUID 原理與實作分析 - 該如何挑選適合的 UUID 版本 | Yuanchieh's Blog
Is there any way to generate the same UUID from a String - Stack Overflow
postgresql - What is the optimal data type for an MD5 field? - Database Administrators Stack Exchange
Are type 3 UUID's collision probablity the same as MD5's - Stack Overflow
UUID GUID Oversimplified – Are They Really Unique? - Scaleyourapp
PostgreSQL UUID type performance - Stack Overflow
UUID, serial or identity columns for PostgreSQL auto-generated primary keys? - Cybertec Blog
Any significant performance disadvantage to using uuid as primary key? : r/PostgreSQL
UUID Benchmark - begriffs/uuid-btree-vs-hash.md · GitHub Gist
PostgreSQL and Databases in general: Hash indexes are faster than Btree indexes? - PostgreSQL and Databases in general
Look It Up: Practical PostgreSQL Indexing
UUID as Primary Key - Ingram Chen
UUID or GUID as Primary Keys? Be Careful! | by Tom Harrison
Why Covering Indexes in Postgres Are Incredibly Helpful - PostgreSQL Blog | Crunchy Data
best index type for uuid in postgres 11 - Database Administrators Stack Exchange
Salts and UUIDs for your PostgreSQL database | by Panos Zafeiropoulos | Nov, 2022 | Dev Genius
Partition Table of PostgreSQL Hash and Partition ID Calculation - Alibaba Cloud Community
How to short UUIDs with Postgres? - Database Administrators Stack Exchange
What format does Elasticsearch use for its ids? - Stack Overflow
Postgresql base64 encode - Stack Overflow
Convert PostgreSQL bytea-stored serialized-java-UUID to postgresql-UUID - Stack Overflow
Generating a UUID in Postgres for Insert statement? - Stack Overflow
Generating URL-Safe Short IDs in Postgresql | &yet blog
Generate unique base64 in database - postgresql
Converting byte arrays to UUIDs in Postgres - defn.io
參考資料
PostgreSQL: Documentation: 72.1. Overview
PostgreSQL: Documentation: F.49. uuid-ossp
How should I index a UUID in Postgres? - Database Administrators Stack Exchange
Generating v5 UUID. What is name and namespace? - Stack Overflow
Re-Introducing Hash Indexes in PostgreSQL | Haki Benita
UUID 原理與實作分析 - 該如何挑選適合的 UUID 版本 | Yuanchieh's Blog
Is there any way to generate the same UUID from a String - Stack Overflow
postgresql - What is the optimal data type for an MD5 field? - Database Administrators Stack Exchange
Are type 3 UUID's collision probablity the same as MD5's - Stack Overflow
UUID GUID Oversimplified – Are They Really Unique? - Scaleyourapp
PostgreSQL UUID type performance - Stack Overflow
UUID, serial or identity columns for PostgreSQL auto-generated primary keys? - Cybertec Blog
Any significant performance disadvantage to using uuid as primary key? : r/PostgreSQL
UUID Benchmark - begriffs/uuid-btree-vs-hash.md · GitHub Gist
PostgreSQL and Databases in general: Hash indexes are faster than Btree indexes? - PostgreSQL and Databases in general
Look It Up: Practical PostgreSQL Indexing
UUID as Primary Key - Ingram Chen
UUID or GUID as Primary Keys? Be Careful! | by Tom Harrison
Why Covering Indexes in Postgres Are Incredibly Helpful - PostgreSQL Blog | Crunchy Data
best index type for uuid in postgres 11 - Database Administrators Stack Exchange
Salts and UUIDs for your PostgreSQL database | by Panos Zafeiropoulos | Nov, 2022 | Dev Genius
Partition Table of PostgreSQL Hash and Partition ID Calculation - Alibaba Cloud Community
How to short UUIDs with Postgres? - Database Administrators Stack Exchange
What format does Elasticsearch use for its ids? - Stack Overflow
Postgresql base64 encode - Stack Overflow
Convert PostgreSQL bytea-stored serialized-java-UUID to postgresql-UUID - Stack Overflow
Generating a UUID in Postgres for Insert statement? - Stack Overflow
Generating URL-Safe Short IDs in Postgresql | &yet blog
Generate unique base64 in database - postgresql
Converting byte arrays to UUIDs in Postgres - defn.io
沒有留言:
張貼留言