hive - How do I get the latest record? -


this table:

create table test ( id string, name string, age string, modified string) 

and data:

id    name   age  modifed 1          10   2011-11-11 11:11:11 1          11   2012-11-11 12:00:00 2     b      20   2012-12-10 10:11:12 2     b      20   2012-12-10 10:11:12 2     b      20   2012-12-12 10:11:12 2     b      20   2012-12-15 10:11:12 

i want latest record (include every columns id,name,age,modified) group id,as data above,the correct result is:

1          11   2012-11-11 12:00:00 2     b      20   2012-12-15 10:11:12 

i using below query in hive, working fine in sql http://sqlfiddle.com/#!2/bfbd5/42 not working fine in hive

select * test (id, modified) in(select id, max(modified) test group id) 

i using 0.13 version of hive.

hive allows 1 column in in subquery. try left semijoin:

select * test  left semi join  (select id, max(modified) modified test) b  on (a.modified = b.modified , a.id=b.id); 

it sure seems right answer using straight forward query though. select max of 2 columns , sure group columns don't have aggregate functions.

select id , name , max(age) age , max(modified) modified test group id, name; 

Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

Nuget pack csproj using nuspec -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -