MathJax

MathJax-2

MathJax-3

Google Code Prettify

置頂入手筆記

EnterproseDB Quickstart — 快速入門筆記

由於考慮採用 EnterpriseDB 或是直接用 PostgreSQL 的人,通常需要一些入手的資料。這邊紀錄便提供相關快速上手的簡單筆記 ~ 這篇筆記以 資料庫安裝完畢後的快速使用 為目標,基本紀錄登入使用的範例:

2017年7月25日 星期二

在 PostgreSQL 產生累積次數分配圖表的 SQL 語法

一般敘述性統計,常見到兩種圖,次數分佈圖和累計次數分佈圖。繼上次的次數分佈的 SQL 筆記之後,這次換成累計次數分佈(Cumulative Frequency Distribution Chart)了。

要達到目的,只要基於先前筆記的次數分配表的 SQL,增加加總和求比例進行計算就能得到了~

下面範例其實是從 Oracle 借來的指令(來源在這裡),這指令中的語法,原生的 PostgreSQL 都有支援!
with freq_dist as
(select intvl, count(*) freq
  from (select width_bucket(sal,
                (select min(sal) from emp),
                (select max(sal)+1 from emp), 10) intvl
          from emp)
group by intvl
order by intvl),
--
rel_freq_dist as
(select intvl, freq::float/(select count(*) from emp) rel_freq
  from freq_dist)
--
select intvl,
      sum(rel_freq) over (order by intvl
      rows between unbounded preceding and current row) cum_freq
 from rel_freq_dist;
 intvl |     cum_freq      
-------+-------------------
     1 | 0.214285714285714
     2 | 0.571428571428571
     4 | 0.642857142857143
     5 | 0.714285714285714
     6 | 0.928571428571428
    10 |                 1
(6 rows)

另外,調整上面範例,使用 generate_series() 也可以達成一樣的功能,以下做成一個 psql Script
\set tblname  emp
\set data_col sal
\set begin_val 0
\set end_val   10000
\set specify_interval 700
\set barscaling 30

WITH intvl_series AS(
    SELECT numrange( coord
           , lead(coord) OVER (ORDER BY coord)
           ) AS bucket
    FROM generate_series(:begin_val, :end_val, 
                         :specify_interval) AS coord
)
   , freq_dist AS ( 
   SELECT bucket, count(:data_col) AS freq 
   FROM :tblname, intvl_series 
   WHERE :data_col <@ bucket 
   GROUP BY bucket 
   ORDER BY bucket 
) 
   , rel_freq_dist AS ( 
   SELECT bucket AS intvl 
        , freq::float/(SELECT sum(freq) 
                       FROM freq_dist) rel_freq 
   FROM freq_dist 
) 
   , cum_freq_dist AS ( 
   SELECT intvl, 
          sum(rel_freq) over (order by intvl 
                              rows between unbounded preceding and current row) cum_freq 
   from rel_freq_dist 
)
 SELECT intvl, cum_freq, 
        repeat('■', 
                ( cum_freq::float 
                  / max(cum_freq) OVER() 
                  * :barscaling 
                )::int 
        ) AS bar 
   FROM cum_freq_dist;
    intvl    |      cum_freq      |              bar               
-------------+--------------------+--------------------------------
 [700,1400)  | 0.0714285714285714 | ■■
 [1400,2100) |  0.428571428571429 | ■■■■■■■■■■■■■
 [2100,2800) |  0.571428571428571 | ■■■■■■■■■■■■■■■■■
 [3500,4200) |  0.642857142857143 | ■■■■■■■■■■■■■■■■■■■
 [4200,4900) |  0.928571428571428 | ■■■■■■■■■■■■■■■■■■■■■■■■■■■■
 [7700,8400) |                  1 | ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
(6 rows)

和前一篇筆記一樣,一樣可惜的是,沒有資料的區間沒辦法用零表示. . .

最後,附上一個不畫圖的版本,在前一篇也如法炮製即可~
\set tblname  emp
\set data_col sal
\set begin_val 0
\set end_val   10000
\set specify_interval 700

WITH intvl_series AS(
    SELECT numrange( coord
           , lead(coord) OVER (ORDER BY coord)
           ) AS bucket
    FROM generate_series(:begin_val, :end_val, 
                         :specify_interval) AS coord
)
   , freq_dist AS ( 
   SELECT bucket, count(:data_col) AS freq 
   FROM :tblname, intvl_series 
   WHERE :data_col <@ bucket 
   GROUP BY bucket 
   ORDER BY bucket 
) 
   , rel_freq_dist AS ( 
   SELECT bucket AS intvl 
        , freq::float/(SELECT sum(freq) 
                       FROM freq_dist) rel_freq 
   FROM freq_dist 
)  
   SELECT intvl, 
          sum(rel_freq) over (order by intvl 
                              rows between unbounded preceding and current row) cum_freq 
   from rel_freq_dist 
;
    intvl    |      cum_freq      
-------------+--------------------
 [700,1400)  | 0.0714285714285714
 [1400,2100) |  0.428571428571429
 [2100,2800) |  0.571428571428571
 [3500,4200) |  0.642857142857143
 [4200,4900) |  0.928571428571428
 [7700,8400) |                  1
(6 rows)


參考資料

其他相關資訊

沒有留言:

張貼留言