SPARK SQL Equivalent of Qualify + Row_number statements -
does know best way apache spark sql achieve same results standard sql qualify() + rnk or row_number statements?
for example:
- i have spark dataframe called statement_data 12 monthly records each 100 unique account_numbers, therefore 1200 records in total
- each monthly record has field called "statement_date" can used determining recent record
i want final result new spark dataframe 3 recent records (as determined statement_date descending) each of 100 unique account_numbers, therefore 300 final records in total.
in standard teradata sql, can following:
select * statement_data qualify row_number () over(partition acct_id order statement_date desc) <= 3
apache spark sql not have standalone qualify function i'm aware of, maybe i'm screwing syntax or can't find documentation qualify exists.
it fine if need in 2 steps long 2 steps are:
- a select query or alternative method assign rank/row numbering each account_number's records
- a select query i'm selecting records rank <= 3 (i.e. choose 1st, 2nd, , 3rd recent records).
edit 1 - 7/23 2:09pm: initial solution provided zero323 not working me in spark 1.4.1 spark sql 1.4.1 dependency installed.
edit 2 - 7/23 3:24pm: turns out error related using sql context objects query instead of hive context. able run below solution correctly after adding following code create , use hive context:
final javasparkcontext sc2; final hivecontext hc2; dataframe df; hc2 = testhive$.module$; sc2 = new javasparkcontext(hc2.sparkcontext()); .... // initial spark/sql contexts set dataframes sparkconf conf = new sparkconf().setappname("statement test"); ... dataframe stmtsummary = hc2.sql("select * (select acct_id, stmt_end_dt, stmt_curr_bal, row_number() on (partition acct_id order stmt_curr_bal desc) rank_num stmt_data) tmp rank_num <= 3");
there no qualify
(it useful check parser source) can use subquery this:
select * ( select *, row_number() on ( partition acct_id order statement_date desc ) rank df ) tmp rank <= 3
Comments
Post a Comment