操作 | txn_数量 | 累计数量 | txn_金额 | 购买成本 | 卖出比率 | NET_COST |
---|---|---|---|---|---|---|
购买 | 250 | 250 | 5000 | 5000 | 0 | 0 |
卖 | 100 | 150 | 3000 | 0 | 0.4 | 0 |
购买 | 150 | 300 | 1500 | 1500 | 0 | 0 |
卖 | 225 | 75 | 4000 | 0 | 0.75 | 0 |
上面我有一个模拟交易表。我的最终目标是得出每行的平均成本,最准确的方法是(净成本/累计数量)。
为了计算净成本,我需要做一个条件运行总和:
下面是 NET_COST 的预期输出,然后可用于导出 AVERAGE_COST
操作 | txn_数量 | 累计数量 | txn_金额 | 购买成本 | 卖出比率 | NET_COST | AVERAGE_COST |
---|---|---|---|---|---|---|---|
购买 | 250 | 250 | 5000 | 5000 | 0 | 5000 | 20 |
卖 | 100 | 150 | 3000 | 0 | 0.4 | 3000 | 20 |
购买 | 150 | 300 | 1500 | 1500 | 0 | 4500 | 15 |
卖 | 225 | 75 | 4000 | 0 | 0.75 | 1125 | 15 |
为了进一步清晰起见,以下是 NET_COST 的每个单元格中发生的情况
NET_COST |
---|
0 + 5000 |
5000 - (5000 * 0.4) |
3000+1500 |
4500 - (4500 * 0.75) |
这可以在 SQL(Impala / Hive)中实现吗?
如果他们支持递归查询:
with cte(id, operation, cost_of_purchase, sell_ratio, net_cost) as (
select d.*,
case operation when 'buy' then cost_of_purchase
else 0 end
from data d where id = 1
union all
select d.id, d.operation, d.cost_of_purchase, d.sell_ratio,
case d.operation when 'buy' then
c.net_cost + d.cost_of_purchase
else c.net_cost - c.net_cost * d.sell_ratio
end
from cte c
join data d on c.id + 1 = d.id
)
select * from cte ;
如果他们支持 MODEL 子句:
select * from data
model
dimension by (id)
measures( operation as operation, cost_of_purchase as cost_of_purchase, sell_ratio as sell_ratio, 0 as net_cost )
rules
(
net_cost[any] = case operation[cv()] when 'buy'
then nvl(net_cost[cv()-1],0) + cost_of_purchase[cv()]
else nvl(net_cost[cv()-1],0) - nvl(net_cost[cv()-1],0) * sell_ratio[cv()]
end
)
;