在 Postgres 11 的 Materialized View,進行刷新(REFRESH MATERILAIZED VIEW)會把資料整組清掉,重新查詢並把結果拉回來快取。
現在,要透過 Foreign Data Wrapper,搭配 Materialized View,從 Greenplum(基於 PostgreSQL 的延伸專案)把查詢拉到 Postgres 裡面。
首先,在 GPDB 上面建立一個表格一下:這裡用一份有一點意思的資料好了:
bash-4.2# su - gpadmin [gpadmin@mdw ~]$ source /usr/local/greenplum-db/greenplum_path.sh [gpadmin@mdw ~]$ psql << EOF CREATE TABLE yellow_tripdata ( VendorID integer, --1 or 2 tpep_pickup_datetime timestamp, --don't contain timezone info tpep_dropoff_datetime timestamp, --don't contain timezone info passenger_count integer, trip_distance numeric, RatecodeID integer, --1~6 store_and_fwd_flag boolean, PULocationID integer, DOLocationID integer, payment_type integer, --1~6 fare_amount numeric, extra numeric, mta_tax numeric, tip_amount numeric, tolls_amount numeric, improvement_surcharge numeric, total_amount numeric ) WITH (appendonly=true, compresstype=zlib, compresslevel=5) DISTRIBUTED BY (tpep_pickup_datetime, tpep_dropoff_datetime); EOF [gpadmin@mdw ~]$ for datamonth in {01..12} do # Download the data curl https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2018-${datamonth}.csv --output yellow_tripdata_2018-${datamonth}.csv # Remove the second empty line sed -i 1,2d yellow_tripdata_2018-${datamonth}.csv # Replace newline character CL-RF with RF only sed 's/\r//' -i yellow_tripdata_2018-${datamonth}.csv # Import the data psql -c "COPY yellow_tripdata FROM '$HOME/yellow_tripdata_2018-${datamonth}.csv' CSV HEADER;" # Remove the csv file to save the space rm -f yellow_tripdata_2018-${datamonth}.csv done 。。。中間輸出訊息略過。。。 [gpadmin@mdw ~]$ psql -c "select pg_size_pretty(pg_relation_size('yellow_tripdata'))" pg_size_pretty ---------------- 3241 MB (1 row) [gpadmin@mdw ~]$ psql -c 'analyze yellow_tripdata' [gpadmin@mdw ~]$ psql -c "select count(1) from yellow_tripdata where (tpep_dropoff_datetime - tpep_pickup_datetime ) > '1 day';" count ------- 65 (1 row) [gpadmin@mdw ~]$
以上產生了 的資料。
接著,先在 PGSQL 上面建立外部表,指到 GPDB。這邊方便起見,直接用 gpadmin 帳號登入:
postgres=# CREATE EXTENSION postgres_fdw; postgres=# CREATE SERVER gp_db FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'mdw', port '5432', dbname 'postgres'); postgres=# CREATE USER MAPPING FOR PUBLIC SERVER gp_db OPTIONS (user 'gpadmin', password '123'); postgres=# CREATE FOREIGN TABLE yellow_tripdata ( VendorID integer, --1 or 2 tpep_pickup_datetime timestamp, --don't contain timezone info tpep_dropoff_datetime timestamp, --don't contain timezone info passenger_count integer, trip_distance numeric, RatecodeID integer, --1~6 store_and_fwd_flag boolean, PULocationID integer, DOLocationID integer, payment_type integer, --1~6 fare_amount numeric, extra numeric, mta_tax numeric, tip_amount numeric, tolls_amount numeric, improvement_surcharge numeric, total_amount numeric ) SERVER gp_db OPTIONS (schema_name 'public', table_name 'yellow_tripdata'); postgres=#
測通一下。注意這邊要設定 Isolation Level
postgres=# BEGIN; BEGIN postgres=# SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; SET postgres=# SELECT * FROM yellow_tripdata LIMIT 2; vendorid | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | ratecodeid | store_and_fwd_flag | pulocationid | dolocationid | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount ----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+-------------- 1 | 01-JAN-18 00:44:55 | 01-JAN-18 01:03:05 | 1 | 2.70 | 1 | f | 239 | 140 | 2 | 14 | 0.5 | 0.5 | 0 | 0 | 0.3 | 15.3 1 | 01-JAN-18 00:20:22 | 01-JAN-18 00:52:51 | 1 | 10.20 | 1 | f | 140 | 257 | 2 | 33.5 | 0.5 | 0.5 | 0 | 0 | 0.3 | 34.8 (2 rows) postgres=# COMMIT; COMMIT postgres=#
測通之後,就直接建立 Materialized View 了。由於 GPDB 對 postgres_fdw 僅接受 SERIALIZABLE 的模式,因此這邊以一個 Transaction 進行
postgres=# BEGIN; BEGIN postgres=# SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; SET postgres=# CREATE MATERIALIZED VIEW mat_view (tpep_pickup_datetime, tpep_dropoff_datetime, total_amount) AS SELECT tpep_pickup_datetime, tpep_dropoff_datetime, total_amount FROM yellow_tripdata WHERE (tpep_dropoff_datetime - tpep_pickup_datetime) > '1 day'; SELECT 65 postgres=# COMMIT; COMMIT postgres=#
查看看~
postgres=# SELECT* FROM mat_view LIMIT 3; tpep_pickup_datetime | tpep_dropoff_datetime | total_amount ----------------------+-----------------------+-------------- 11-JAN-18 16:04:51 | 15-JAN-18 09:46:55 | 2694.8 19-JAN-18 09:44:15 | 22-JAN-18 14:39:19 | 2310.3 26-JAN-18 17:55:18 | 29-JAN-18 10:17:31 | 0 (3 rows) postgres=#
最後刷新看看~不過這邊來源端沒有異動,刷新後也是一樣內容,但至少檢驗是否有作用
postgres=# BEGIN; BEGIN postgres=# SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; SET postgres=# REFRESH MATERIALIZED VIEW mat_view; REFRESH MATERIALIZED VIEW postgres=# COMMIT; COMMIT postgres=#
完成~
參考資料
沒有留言:
張貼留言