hive - How do I get the latest record? -
this table:
create table test ( id string, name string, age string, modified string)
and data:
id name age modifed 1 10 2011-11-11 11:11:11 1 11 2012-11-11 12:00:00 2 b 20 2012-12-10 10:11:12 2 b 20 2012-12-10 10:11:12 2 b 20 2012-12-12 10:11:12 2 b 20 2012-12-15 10:11:12
i want latest record (include every columns id,name,age,modified) group id,as data above,the correct result is:
1 11 2012-11-11 12:00:00 2 b 20 2012-12-15 10:11:12
i using below query in hive, working fine in sql http://sqlfiddle.com/#!2/bfbd5/42 not working fine in hive
select * test (id, modified) in(select id, max(modified) test group id)
i using 0.13 version of hive.
hive allows 1 column in in subquery. try left semijoin:
select * test left semi join (select id, max(modified) modified test) b on (a.modified = b.modified , a.id=b.id);
it sure seems right answer using straight forward query though. select max of 2 columns , sure group columns don't have aggregate functions.
select id , name , max(age) age , max(modified) modified test group id, name;
Comments
Post a Comment