要達到目的,只要基於先前筆記的次數分配表的 SQL,增加加總和求比例進行計算就能得到了~
下面範例其實是從 Oracle 借來的指令(來源在這裡),這指令中的語法,原生的 PostgreSQL 都有支援!
with freq_dist as (select intvl, count(*) freq from (select width_bucket(sal, (select min(sal) from emp), (select max(sal)+1 from emp), 10) intvl from emp) group by intvl order by intvl), -- rel_freq_dist as (select intvl, freq::float/(select count(*) from emp) rel_freq from freq_dist) -- select intvl, sum(rel_freq) over (order by intvl rows between unbounded preceding and current row) cum_freq from rel_freq_dist;
intvl | cum_freq -------+------------------- 1 | 0.214285714285714 2 | 0.571428571428571 4 | 0.642857142857143 5 | 0.714285714285714 6 | 0.928571428571428 10 | 1 (6 rows)
另外,調整上面範例,使用 generate_series() 也可以達成一樣的功能,以下做成一個 psql Script
\set tblname emp \set data_col sal \set begin_val 0 \set end_val 10000 \set specify_interval 700 \set barscaling 30 WITH intvl_series AS( SELECT numrange( coord , lead(coord) OVER (ORDER BY coord) ) AS bucket FROM generate_series(:begin_val, :end_val, :specify_interval) AS coord ) , freq_dist AS ( SELECT bucket, count(:data_col) AS freq FROM :tblname, intvl_series WHERE :data_col <@ bucket GROUP BY bucket ORDER BY bucket ) , rel_freq_dist AS ( SELECT bucket AS intvl , freq::float/(SELECT sum(freq) FROM freq_dist) rel_freq FROM freq_dist ) , cum_freq_dist AS ( SELECT intvl, sum(rel_freq) over (order by intvl rows between unbounded preceding and current row) cum_freq from rel_freq_dist ) SELECT intvl, cum_freq, repeat('■', ( cum_freq::float / max(cum_freq) OVER() * :barscaling )::int ) AS bar FROM cum_freq_dist;
intvl | cum_freq | bar -------------+--------------------+-------------------------------- [700,1400) | 0.0714285714285714 | ■■ [1400,2100) | 0.428571428571429 | ■■■■■■■■■■■■■ [2100,2800) | 0.571428571428571 | ■■■■■■■■■■■■■■■■■ [3500,4200) | 0.642857142857143 | ■■■■■■■■■■■■■■■■■■■ [4200,4900) | 0.928571428571428 | ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ [7700,8400) | 1 | ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ (6 rows)
和前一篇筆記一樣,一樣可惜的是,沒有資料的區間沒辦法用零表示. . .
最後,附上一個不畫圖的版本,在前一篇也如法炮製即可~
\set tblname emp \set data_col sal \set begin_val 0 \set end_val 10000 \set specify_interval 700 WITH intvl_series AS( SELECT numrange( coord , lead(coord) OVER (ORDER BY coord) ) AS bucket FROM generate_series(:begin_val, :end_val, :specify_interval) AS coord ) , freq_dist AS ( SELECT bucket, count(:data_col) AS freq FROM :tblname, intvl_series WHERE :data_col <@ bucket GROUP BY bucket ORDER BY bucket ) , rel_freq_dist AS ( SELECT bucket AS intvl , freq::float/(SELECT sum(freq) FROM freq_dist) rel_freq FROM freq_dist ) SELECT intvl, sum(rel_freq) over (order by intvl rows between unbounded preceding and current row) cum_freq from rel_freq_dist ;
intvl | cum_freq -------------+-------------------- [700,1400) | 0.0714285714285714 [1400,2100) | 0.428571428571429 [2100,2800) | 0.571428571428571 [3500,4200) | 0.642857142857143 [4200,4900) | 0.928571428571428 [7700,8400) | 1 (6 rows)
參考資料
其他相關資訊
沒有留言:
張貼留言