Thursday 23 February 2017

Calculating Cumulative totals/Running totals in Hive

Hi All,

Many of us might have come across a requirement to implement running totals. Below are a few of the implementations.

Consider a bank maintains account and balance details as below.










Running total should give the sum of balance from beginning to current month.

SELECT acc_no, acc_type,Yr_Mnth, bal, SUM(bal) OVER (PARTITION BY (acc_no,acc_type) ORDER BY Yr_Mnth)  as running_total FROM bnk_bal;










If rolling has to happen to a fixed number of rows, we can use RANGE BETWEEN.
ex: Below query will give the running total for every 3 months.

SELECT acc_no, acc_type,Yr_Mnth, bal, SUM(bal) OVER (PARTITION BY (acc_no,acc_type) ORDER BY Yr_Mnth RANGE BETWEEN PRECEDING 2 ROWS AND CURRENT ROW)  as running_total FROM bnk_bal;

 A good reference for Shell scripting  https://linuxcommand.org/lc3_writing_shell_scripts.php