Thursday, 23 February 2017

Calculating Cumulative totals/Running totals in Hive

Hi All,

Many of us might have come across a requirement to implement running totals. Below are a few of the implementations.

Consider a bank maintains account and balance details as below.










Running total should give the sum of balance from beginning to current month.

SELECT acc_no, acc_type,Yr_Mnth, bal, SUM(bal) OVER (PARTITION BY (acc_no,acc_type) ORDER BY Yr_Mnth)  as running_total FROM bnk_bal;










If rolling has to happen to a fixed number of rows, we can use RANGE BETWEEN.
ex: Below query will give the running total for every 3 months.

SELECT acc_no, acc_type,Yr_Mnth, bal, SUM(bal) OVER (PARTITION BY (acc_no,acc_type) ORDER BY Yr_Mnth RANGE BETWEEN PRECEDING 2 ROWS AND CURRENT ROW)  as running_total FROM bnk_bal;

The Mindset Behind Reliable Data Systems I’ve been in data engineering long enough to see the stack change many times over. Tools come and g...