r - Non-uniform `shift`ing: leads/lags with multiple within-period&ID observations -


the use of shift well-documented way lead , lag values of variables, i'm struggling extending logic situation have multiple observations within each period or ids.

example:

dt<-data.table(id=rep(1:2,c(6,3)),                pd=rep(rep(1:3,2),c(1:3,1,1,1)),                firm=c(rep(c("01","02"),c(4,2)),                       c("04","05","06"))) 

the basic approach wrong:

dt[,paste0("firm_",c("lag","lead")):=  .(shift(firm),shift(firm,type="lead")),by=id] > dt    id pd firm firm_lag firm_lead 1:  1  1   01       na        01 2:  1  2   01       01        01 3:  1  2   01       01        01 4:  1  3   01       01        02 #all lags should 01 id 1 in pd 3 5:  1  3   02       01        02 #all leads should na id 1 in pd 3 6:  1  3   02       02        na 7:  2  1   04       na        05 8:  2  2   05       04        06 9:  2  3   06       05        na 

my current workaround self-merge:

dt[setkey(unique(dt)[,.(pd,shift(firm),shift(firm,type="lead")),by=id],id,pd),    c("lag","lead"):=.(i.v2,i.v3)] > dt    id pd firm lag lead 1:  1  1   01  na   01 2:  1  2   01  01   01 3:  1  2   01  01   01 4:  1  3   01  01   na 5:  1  3   02  01   na 6:  1  3   02  01   na 7:  2  1   04  na   05 8:  2  2   05  04   06 9:  2  3   06  05   na 

this correct, quite messy. there easier way accomplish this?

here's 1 way without merging:

dt[,c("lag","lead"):={   r <- rleid(pd)   x <- .sd[,firm[1],by=pd]$v1   lapply(c("lag","lead"), function(y) x[shift(r,type=y)] ) }, by=id] 

as far performance goes, there faster ways .sd line. try ave(firm,pd,fun=function(x)x[1]), example.


Comments

Popular posts from this blog

python - Healpy: From Data to Healpix map -

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -