MathJax

MathJax-2

MathJax-3

Google Code Prettify

置頂入手筆記

EnterproseDB Quickstart — 快速入門筆記

由於考慮採用 EnterpriseDB 或是直接用 PostgreSQL 的人,通常需要一些入手的資料。這邊紀錄便提供相關快速上手的簡單筆記 ~ 這篇筆記以 資料庫安裝完畢後的快速使用 為目標,基本紀錄登入使用的範例:

2019年3月24日 星期日

從 Postgres 11.2 對 Greenplum 5.17.0 作 Materialized View

好久以前有紀錄從 Postgres 開 postgres_fdw 連線到 Greenplum 的筆記,前一陣子又看到 Materialized Views and Foreign Data Wrappers | EnterpriseDB 這篇文章,拿來試著對 Greenplum 試試看~

在 Postgres 11 的 Materialized View,進行刷新(REFRESH MATERILAIZED VIEW)會把資料整組清掉,重新查詢並把結果拉回來快取。

現在,要透過 Foreign Data Wrapper,搭配 Materialized View,從 Greenplum(基於 PostgreSQL 的延伸專案)把查詢拉到 Postgres 裡面。

首先,在 GPDB 上面建立一個表格一下:這裡用一份有一點意思的資料好了:

這邊用紐約市的計程車資料資料位置),雖然有人整理好一組 Script 方便匯入,這邊簡便起見,用一份我這邊稍微處理過的方式來匯入。
這邊只有處理其中一份表格:
bash-4.2# su - gpadmin
[gpadmin@mdw ~]$ source /usr/local/greenplum-db/greenplum_path.sh
[gpadmin@mdw ~]$ psql << EOF
CREATE TABLE yellow_tripdata (
 VendorID               integer,   --1 or 2
 tpep_pickup_datetime   timestamp, --don't contain timezone info
 tpep_dropoff_datetime  timestamp, --don't contain timezone info
 passenger_count        integer,
 trip_distance          numeric,
 RatecodeID             integer,   --1~6
 store_and_fwd_flag     boolean,
 PULocationID           integer,
 DOLocationID           integer,
 payment_type           integer,   --1~6
 fare_amount            numeric,
 extra                  numeric,
 mta_tax                numeric,
 tip_amount             numeric,
 tolls_amount           numeric,
 improvement_surcharge  numeric,
 total_amount           numeric
) WITH (appendonly=true, compresstype=zlib, compresslevel=5)
DISTRIBUTED BY (tpep_pickup_datetime, tpep_dropoff_datetime);
EOF
[gpadmin@mdw ~]$ for datamonth in {01..12}
do
# Download the data
curl https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2018-${datamonth}.csv --output yellow_tripdata_2018-${datamonth}.csv
# Remove the second empty line
sed -i 1,2d yellow_tripdata_2018-${datamonth}.csv
# Replace newline character CL-RF with RF only
sed 's/\r//' -i yellow_tripdata_2018-${datamonth}.csv
# Import the data
psql -c "COPY yellow_tripdata FROM '$HOME/yellow_tripdata_2018-${datamonth}.csv' CSV HEADER;"
# Remove the csv file to save the space
rm -f yellow_tripdata_2018-${datamonth}.csv
done
。。。中間輸出訊息略過。。。
[gpadmin@mdw ~]$ psql -c "select pg_size_pretty(pg_relation_size('yellow_tripdata'))"
 pg_size_pretty 
----------------
 3241 MB
(1 row)

[gpadmin@mdw ~]$ psql -c 'analyze yellow_tripdata'
[gpadmin@mdw ~]$ psql -c "select count(1) from yellow_tripdata where (tpep_dropoff_datetime - tpep_pickup_datetime ) > '1 day';"
 count 
-------
    65
(1 row)

[gpadmin@mdw ~]$ 

以上產生了 的資料。

接著,先在 PGSQL 上面建立外部表,指到 GPDB。這邊方便起見,直接用 gpadmin 帳號登入:
postgres=# CREATE EXTENSION postgres_fdw;
postgres=# CREATE SERVER gp_db FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'mdw', port '5432', dbname 'postgres');
postgres=# CREATE USER MAPPING FOR PUBLIC SERVER gp_db 
OPTIONS (user 'gpadmin', password '123');
postgres=# CREATE FOREIGN TABLE yellow_tripdata (
 VendorID               integer,   --1 or 2
 tpep_pickup_datetime   timestamp, --don't contain timezone info
 tpep_dropoff_datetime  timestamp, --don't contain timezone info
 passenger_count        integer,
 trip_distance          numeric,
 RatecodeID             integer,   --1~6
 store_and_fwd_flag     boolean,
 PULocationID           integer,
 DOLocationID           integer,
 payment_type           integer,   --1~6
 fare_amount            numeric,
 extra                  numeric,
 mta_tax                numeric,
 tip_amount             numeric,
 tolls_amount           numeric,
 improvement_surcharge  numeric,
 total_amount           numeric
) SERVER gp_db
OPTIONS (schema_name 'public', table_name 'yellow_tripdata');
postgres=# 

測通一下。注意這邊要設定 Isolation Level
postgres=# BEGIN;
BEGIN
postgres=# SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SET
postgres=# SELECT * FROM yellow_tripdata LIMIT 2;
 vendorid | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | ratecodeid | store_and_fwd_flag | pulocationid | dolocationid | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount 
----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------
        1 | 01-JAN-18 00:44:55   | 01-JAN-18 01:03:05    |               1 |          2.70 |          1 | f                  |          239 |          140 |            2 |          14 |   0.5 |     0.5 |          0 |            0 |                   0.3 |         15.3
        1 | 01-JAN-18 00:20:22   | 01-JAN-18 00:52:51    |               1 |         10.20 |          1 | f                  |          140 |          257 |            2 |        33.5 |   0.5 |     0.5 |          0 |            0 |                   0.3 |         34.8
(2 rows)

postgres=# COMMIT;
COMMIT
postgres=# 

測通之後,就直接建立 Materialized View 了。由於 GPDB 對 postgres_fdw 僅接受 SERIALIZABLE 的模式,因此這邊以一個 Transaction 進行
postgres=# BEGIN;
BEGIN
postgres=# SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SET
postgres=# CREATE MATERIALIZED VIEW mat_view (tpep_pickup_datetime, tpep_dropoff_datetime, total_amount) AS
        SELECT tpep_pickup_datetime, tpep_dropoff_datetime, total_amount 
        FROM yellow_tripdata 
        WHERE (tpep_dropoff_datetime - tpep_pickup_datetime) > '1 day';
SELECT 65
postgres=# COMMIT;
COMMIT
postgres=# 

查看看~
postgres=# SELECT* FROM mat_view LIMIT 3;
 tpep_pickup_datetime | tpep_dropoff_datetime | total_amount 
----------------------+-----------------------+--------------
 11-JAN-18 16:04:51   | 15-JAN-18 09:46:55    |       2694.8
 19-JAN-18 09:44:15   | 22-JAN-18 14:39:19    |       2310.3
 26-JAN-18 17:55:18   | 29-JAN-18 10:17:31    |            0
(3 rows)

postgres=# 

最後刷新看看~不過這邊來源端沒有異動,刷新後也是一樣內容,但至少檢驗是否有作用
postgres=# BEGIN;
BEGIN
postgres=# SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SET
postgres=# REFRESH MATERIALIZED VIEW mat_view;
REFRESH MATERIALIZED VIEW
postgres=# COMMIT;
COMMIT
postgres=# 


完成~


參考資料

沒有留言:

張貼留言