postgresql generic example


Example

Preparing data:

create table wf_example(i int, t text,ts timestamptz,b boolean);
insert into wf_example select 1,'a','1970.01.01',true;
insert into wf_example select 1,'a','1970.01.01',false;
insert into wf_example select 1,'b','1970.01.01',false;
insert into wf_example select 2,'b','1970.01.01',false;
insert into wf_example select 3,'b','1970.01.01',false;
insert into wf_example select 4,'b','1970.02.01',false;
insert into wf_example select 5,'b','1970.03.01',false;
insert into wf_example select 2,'c','1970.03.01',true;

Running:

select *
  , dense_rank() over (order by i) dist_by_i 
  , lag(t) over () prev_t 
  , nth_value(i, 6) over () nth
  , count(true) over (partition by i) num_by_i 
  , count(true) over () num_all
  , ntile(3) over() ntile
from wf_example
;

Result:

 i | t |           ts           | b | dist_by_i | prev_t | nth | num_by_i | num_all | ntile
---+---+------------------------+---+-----------+--------+-----+----------+---------+-------
 1 | a | 1970-01-01 00:00:00+01 | f |         1 |        |   3 |        3 |       8 |     1
 1 | a | 1970-01-01 00:00:00+01 | t |         1 | a      |   3 |        3 |       8 |     1
 1 | b | 1970-01-01 00:00:00+01 | f |         1 | a      |   3 |        3 |       8 |     1
 2 | c | 1970-03-01 00:00:00+01 | t |         2 | b      |   3 |        2 |       8 |     2
 2 | b | 1970-01-01 00:00:00+01 | f |         2 | c      |   3 |        2 |       8 |     2
 3 | b | 1970-01-01 00:00:00+01 | f |         3 | b      |   3 |        1 |       8 |     2
 4 | b | 1970-02-01 00:00:00+01 | f |         4 | b      |   3 |        1 |       8 |     3
 5 | b | 1970-03-01 00:00:00+01 | f |         5 | b      |   3 |        1 |       8 |     3
(8 rows)

Explanation:

dist_by_i: dense_rank() over (order by i) is like a row_number per distinct values. Can be used for the number of distinct values of i (count(DISTINCT i) wold not work). Just use the maximum value.

prev_t: lag(t) over () is a previous value of t over the whole window. mind that it is null for the first row.

nth: nth_value(i, 6) over () is the value of sixth rows column i over the whole window

num_by_i: count(true) over (partition by i) is an amount of rows for each value of i

num_all: count(true) over () is an amount of rows over a whole window

ntile: ntile(3) over() splits the whole window to 3 (as much as possible) equal in quantity parts