hadoop - RANK OVER function in Hive -
i'm trying run query in hive return top 10 url appear more in adimpression table.
select ranked_mytable.url, ranked_mytable.cnt ( select iq.url, iq.cnt, rank() on (partition iq.url order iq.cnt desc) rnk ( select url, count(*) cnt store.adimpression ai inner join zuppa.adgroupcreativesubscription agcs on agcs.id = ai.adgroupcreativesubscriptionid inner join zuppa.adgroup ag on ag.id = agcs.adgroupid ai.datehour >= '2014-05-15 00:00:00' , ag.siteid = 1240 group url ) iq ) ranked_mytable ranked_mytable.rnk <= 10 order ranked_mytable.url, ranked_mytable.rnk desc ;
unfortunately error message stating:
failed: semanticexception [error 10002]: line 26:23 invalid column reference 'rnk'
i've tried debug , until ranked_mytable
sub-queries goes smooth. i've tried comment where ranked_mytable.rnk <= 10
clause error message keep appearing.
hive unable order column not in "output" of select statement. fix it, include column in selected columns:
select ranked_mytable.url, ranked_mytable.cnt, ranked_mytable.rnk ( select iq.url, iq.cnt, rank() on (partition iq.url order iq.cnt desc) rnk ( select url, count(*) cnt store.adimpression ai inner join zuppa.adgroupcreativesubscription agcs on agcs.id = ai.adgroupcreativesubscriptionid inner join zuppa.adgroup ag on ag.id = agcs.adgroupid ai.datehour >= '2014-05-15 00:00:00' , ag.siteid = 1240 group url ) iq ) ranked_mytable ranked_mytable.rnk <= 10 order ranked_mytable.url, ranked_mytable.rnk desc ;
if don't want 'rnk' column in final output, expect wrap whole thing in inner-query , select out 'url' , 'cnt' fields.
Comments
Post a Comment